Page 1 of 1
HTML Works - mending faulty html
Posted: Fri Mar 27, 2009 6:38 am
by php_east
sometime i get faulty htmls or incompete ones cut pasted by users. is there any PHP classes that does the mending of it ( as best possible ) ?
example of such an incident would be a simple <a href ...tag with no closure, which will make the entire text highlighted. ( i am looking for a php solution ).
Re: HTML Works - mending faulty html
Posted: Fri Mar 27, 2009 7:20 am
by mattpointblank
Re: HTML Works - mending faulty html
Posted: Fri Mar 27, 2009 7:24 am
by php_east
thanks, yes, was looking at that at the moment. am not sure of its capability to repair, but certainly worth a try. this is the only one solution i can find so far.
Re: HTML Works - mending faulty html
Posted: Fri Mar 27, 2009 7:56 am
by php_east
is this tidy installed normally on 99% of hosts ? i hate to work something out assuming the host has tidy, but i don't mind if 99% of them do.
right now i am more inclined towards making my own php solution.
Re: HTML Works - mending faulty html
Posted: Fri Mar 27, 2009 8:24 am
by mattpointblank
I don't think so... there is this though:
http://pecl.php.net/package/tidy - look in phpinfo()?
Re: HTML Works - mending faulty html
Posted: Fri Mar 27, 2009 8:48 am
by php_east
ok, thanks. this is what i have tried.
i have a test faulty html which i ...
1. feed directly to output
2. clean using tidy first then output
3. feed into a dom parser (DOMDocument) before output.
and much to my delight, DOMDocument does quite a good job of it.
both DOMDocument and Tidy can repair the fauly html, but tidy inserts a full HTML, wheres dom inserts a simple html.
my input is a faulty unenclosed <a href=...
output is formatted as follows ( i left out the rest of the details for clarity)
RAW INPUT
DOMDocument
Code: Select all
<html><body><a href...................</a></body></html>
TIDY
Code: Select all
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
<a href...................
</a>
</body>
</html>
and with DOMDocument being standard in PHP, that would be my choice for auto-correcting HTMLs. Hope this will save time for someone else.