Page 1 of 1

Regular Expressions For Parsing HTML (For Indenting)

Posted: Tue Aug 23, 2005 11:55 am
by jolinar
I posted some stuff on this earlier, wondering how to parse the HTML and you folk suggested regular expressions. Now I've run in to a new problem. It isn't working. Here is the offending code:

Code: Select all

<?php
/*
 * Created on 22-Aug-2005
 *
 * Author - Daniel Snowden
 */
 
class writer {
	
	var $content;
	var $indent;
	var $i;
	var $openTagPattern;
	var $closeTagPattern;
	
	function writer() {
		$this->indent = 0;
		$this->i = 0;
		$this->openTagPattern = "/^</";
		$this->closeTagPattern = "/^<\//";
	}
	
	function addTag($data) {
		$i = 0;
		for($i=0; $i<$this->indent; $i++) {
			$data = " ".$data;
		}
		$this->content[$this->i] = $data;
		$this->i++;
		if(preg_match($this->openTagPattern,$data)) {
			if(preg_match($this->closeTagPattern,$data)) {
				$this->indent--;
			}
			else {
				$this->indent++;
			}
		}
	}
	
	function output() {
		$i = 0;
		$n = count($this->content);
		
		for ($i=0; $i<$n; $i++) {
			print $this->content[$i]."\n";
		}
	}
}

?>
The output it produces (in the Eclipse console window anyway) is this:

Code: Select all

<html>
 <head>
 <title>
 Dans Gallery
 </title>
 </head>
 <body>
The tags were fed in to the addTag function 1 at a time, does anybody know what is going on (and have I made a n00bish error)

Posted: Tue Aug 23, 2005 12:58 pm
by anjanesh
You can try Example 1 in preg_replace

Posted: Tue Aug 23, 2005 3:05 pm
by feyd
Moved to Regex....