Page 1 of 1

HTML validation

Posted: Tue Sep 02, 2008 7:25 am
by batfastad
Hi everyone

I'm designing a simple content management system for our website running on our intranet Apache server, and updating data stored on our host's MySQL server.
This is all working fine.

There will only be a couple of users that will have access to the system, but I'm looking for a way to make sure that they enter valid HTML into the CMS.

I already have some primitive checks on special characters:
- Comparing the number of & with the number of & to make sure they match. That ensures that all & are properly entity-ised
- Making sure the number of " chars is even
- Making sure the number of < equals the number of >

Obviously #2 is flawed because when people type quotes in text they tend to enclose the quote with "", which still results in an even count.
#1 and #3 seem to be fairly sound though, if not technically accurate/complete.

Anyone know of any methods/classes out there that can interface with the W3C validator and return me a true/false on whether the code is valid or not?
I was thinking about using curl to check a link to the page and scan the returned content for any text like "x Errors:" or something.

There must be a better way, surely?
Does the W3C offer an XML RPC interface? Or API or anything?

Thanks, B

Re: HTML validation

Posted: Tue Sep 02, 2008 9:20 am
by jayshields
I think HTMLPurifier might fulfill your needs. Check it out.

Re: HTML validation

Posted: Tue Sep 02, 2008 4:09 pm
by Mds
Well, I think so this will be helpful for you . htmlspecialchars
check it out .

Re: HTML validation

Posted: Tue Sep 02, 2008 11:05 pm
by Ambush Commander
If you don't mind disallowing stuff like forms and scripts, HTML Purifier will do the job perfectly. Otherwise, you'll probably want to look at the following options:
  • Remotely requesting the w3c validator service to figure out if the page is valid
  • Running the input through HTML Tidy and seeing what happens
  • Parsing it with DOM and then running a DTD validation on it
  • Enabling HTML Purifier's trusted mode and seeing if that is good enough
There isn't really a good validator library for PHP yet, unfortunately.

Re: HTML validation

Posted: Wed Sep 03, 2008 3:01 am
by swiftouch
I applaud you for trying to do this yourself.

I would install tinyMCE or FCKeditor. So much easier in my opinion.

Re: HTML validation

Posted: Wed Sep 03, 2008 3:11 am
by JAB Creations
PHP + cURL + W3C validator. Export the (X)HTML data to a temporary file, create the URL, validate the URL using cURL. It creates a lot less load and you can let the W3C update their validator instead of doing it yourself. :mrgreen:

Re: HTML validation

Posted: Wed Sep 03, 2008 3:24 pm
by batfastad
Yeah I'm thinking definitely go for some sort of CURL implementation on this.

Although the page lives on our intranet, there is a script on our website which will also output the page, so I will just use that as the link to give to the validator. Then I'll just scan through the text of the returned page looking for something like [Invalid] or xx Errors

Thanks for the info, thought I'd check in case there was anything else I was missing

Thanks, B