HTML Works - mending faulty html

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

HTML Works - mending faulty html

Post by php_east »

sometime i get faulty htmls or incompete ones cut pasted by users. is there any PHP classes that does the mending of it ( as best possible ) ?

example of such an incident would be a simple <a href ...tag with no closure, which will make the entire text highlighted. ( i am looking for a php solution ).
mattpointblank
Forum Contributor
Posts: 304
Joined: Tue Dec 23, 2008 6:29 am

Re: HTML Works - mending faulty html

Post by mattpointblank »

User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Re: HTML Works - mending faulty html

Post by php_east »

thanks, yes, was looking at that at the moment. am not sure of its capability to repair, but certainly worth a try. this is the only one solution i can find so far.
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Re: HTML Works - mending faulty html

Post by php_east »

is this tidy installed normally on 99% of hosts ? i hate to work something out assuming the host has tidy, but i don't mind if 99% of them do.

right now i am more inclined towards making my own php solution.
mattpointblank
Forum Contributor
Posts: 304
Joined: Tue Dec 23, 2008 6:29 am

Re: HTML Works - mending faulty html

Post by mattpointblank »

I don't think so... there is this though: http://pecl.php.net/package/tidy - look in phpinfo()?
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Re: HTML Works - mending faulty html

Post by php_east »

ok, thanks. this is what i have tried.

i have a test faulty html which i ...
1. feed directly to output
2. clean using tidy first then output
3. feed into a dom parser (DOMDocument) before output.

and much to my delight, DOMDocument does quite a good job of it.

both DOMDocument and Tidy can repair the fauly html, but tidy inserts a full HTML, wheres dom inserts a simple html.

my input is a faulty unenclosed <a href=...
output is formatted as follows ( i left out the rest of the details for clarity)

RAW INPUT

Code: Select all

<a href...................
DOMDocument

Code: Select all

<html><body><a href...................</a></body></html>
TIDY

Code: Select all

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
<a href...................
</a>
</body>
</html>
 
and with DOMDocument being standard in PHP, that would be my choice for auto-correcting HTMLs. Hope this will save time for someone else.
Post Reply