html parser: full html parse and edit in place/ add tags
Posted: Mon Jul 07, 2008 5:14 am
hello
I need to inject links in already ready html pages
since the links are dynamic i dont want to do a static one time update of the html
also some of the pages are dynamic too.
so i added a buffer call back
to parse the html i was searching for < and matching > ... but thats a very basic and not very hardy way .
i also see that I can get all the tags
using
and then i could again search the html for the actual content ...but i guess if there is a good hardy freeware parser that can do the same i would rather use that.
so do u have any recommendations?
also if there is nothing like this than i will use this method .... right now i have listed the following tags as to ignore when parsing (leave unaltered) :
{cant use perl}
I need to inject links in already ready html pages
since the links are dynamic i dont want to do a static one time update of the html
also some of the pages are dynamic too.
so i added a buffer call back
Code: Select all
$r = ob_start("bufferFilter");//4096
function bufferFilter($buffer)
{
global $words;
$d = addWords($words, $buffer);
return $d ;
}
i also see that I can get all the tags
using
Code: Select all
function get_tags( $tag, $xml ) {
$tag = preg_quote($tag);
$matches[]="1";
$matches[]="2";
$regex = "/<\/?\w+((\s+(\w|\w[\w-]*\w)(\s*=\s*(?:\”.*?\”|’.*?’|[^'\">\s]+))?)+\s*|\s*)\/?>/i";
preg_match_all($regex,
$xml,
$matches
);
/*preg_match_all('{<'.$tag.'[^>]*>(.*?)</'.$tag.'.'}',
$xml,
$matches,
PREG_PATTERN_ORDER);
*/
return $matches;
}
$tags = get_tags("", $html);
var_dump($tags);
so do u have any recommendations?
also if there is nothing like this than i will use this method .... right now i have listed the following tags as to ignore when parsing (leave unaltered) :
- * head
* script
* embed
* object
{cant use perl}