Regular Expressions For Parsing HTML (For Indenting)

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
jolinar
Forum Commoner
Posts: 61
Joined: Tue May 24, 2005 4:24 pm
Location: in front of computer

Regular Expressions For Parsing HTML (For Indenting)

Post by jolinar »

I posted some stuff on this earlier, wondering how to parse the HTML and you folk suggested regular expressions. Now I've run in to a new problem. It isn't working. Here is the offending code:

Code: Select all

<?php
/*
 * Created on 22-Aug-2005
 *
 * Author - Daniel Snowden
 */
 
class writer {
	
	var $content;
	var $indent;
	var $i;
	var $openTagPattern;
	var $closeTagPattern;
	
	function writer() {
		$this->indent = 0;
		$this->i = 0;
		$this->openTagPattern = "/^</";
		$this->closeTagPattern = "/^<\//";
	}
	
	function addTag($data) {
		$i = 0;
		for($i=0; $i<$this->indent; $i++) {
			$data = " ".$data;
		}
		$this->content[$this->i] = $data;
		$this->i++;
		if(preg_match($this->openTagPattern,$data)) {
			if(preg_match($this->closeTagPattern,$data)) {
				$this->indent--;
			}
			else {
				$this->indent++;
			}
		}
	}
	
	function output() {
		$i = 0;
		$n = count($this->content);
		
		for ($i=0; $i<$n; $i++) {
			print $this->content[$i]."\n";
		}
	}
}

?>
The output it produces (in the Eclipse console window anyway) is this:

Code: Select all

<html>
 <head>
 <title>
 Dans Gallery
 </title>
 </head>
 <body>
The tags were fed in to the addTag function 1 at a time, does anybody know what is going on (and have I made a n00bish error)
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Post by anjanesh »

You can try Example 1 in preg_replace
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Moved to Regex....
Post Reply