Page 1 of 1

html parser: full html parse and edit in place/ add tags

Posted: Mon Jul 07, 2008 5:14 am
by tgkprog
hello

I need to inject links in already ready html pages

since the links are dynamic i dont want to do a static one time update of the html

also some of the pages are dynamic too.

so i added a buffer call back

Code: Select all

 
 
$r = ob_start("bufferFilter");//4096
 
function bufferFilter($buffer)
{
    global $words;
    $d  = addWords($words, $buffer);
    return $d ;
 
}
 
to parse the html i was searching for < and matching > ... but thats a very basic and not very hardy way .

i also see that I can get all the tags

using

Code: Select all

function get_tags( $tag, $xml ) {
   $tag = preg_quote($tag);
   $matches[]="1";
   $matches[]="2";
   $regex = "/<\/?\w+((\s+(\w|\w[\w-]*\w)(\s*=\s*(?:\”.*?\”|’.*?’|[^'\">\s]+))?)+\s*|\s*)\/?>/i";
    preg_match_all($regex,
                    $xml,
                    $matches
                    );
   /*preg_match_all('{<'.$tag.'[^>]*>(.*?)</'.$tag.'.'}',
                    $xml,
                    $matches,
                    PREG_PATTERN_ORDER);
                    */
 
   return $matches;
 }
 
 
 
$tags = get_tags("", $html);
 
var_dump($tags);
 
and then i could again search the html for the actual content ...but i guess if there is a good hardy freeware parser that can do the same i would rather use that.

so do u have any recommendations?

also if there is nothing like this than i will use this method .... right now i have listed the following tags as to ignore when parsing (leave unaltered) :
  • * head
    * script
    * embed
    * object
any other tag whose text should be left alone?

{cant use perl}