HTML validation

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

HTML validation

Post by batfastad »

Hi everyone

I'm designing a simple content management system for our website running on our intranet Apache server, and updating data stored on our host's MySQL server.
This is all working fine.

There will only be a couple of users that will have access to the system, but I'm looking for a way to make sure that they enter valid HTML into the CMS.

I already have some primitive checks on special characters:
- Comparing the number of & with the number of & to make sure they match. That ensures that all & are properly entity-ised
- Making sure the number of " chars is even
- Making sure the number of < equals the number of >

Obviously #2 is flawed because when people type quotes in text they tend to enclose the quote with "", which still results in an even count.
#1 and #3 seem to be fairly sound though, if not technically accurate/complete.

Anyone know of any methods/classes out there that can interface with the W3C validator and return me a true/false on whether the code is valid or not?
I was thinking about using curl to check a link to the page and scan the returned content for any text like "x Errors:" or something.

There must be a better way, surely?
Does the W3C offer an XML RPC interface? Or API or anything?

Thanks, B
User avatar
jayshields
DevNet Resident
Posts: 1912
Joined: Mon Aug 22, 2005 12:11 pm
Location: Leeds/Manchester, England

Re: HTML validation

Post by jayshields »

I think HTMLPurifier might fulfill your needs. Check it out.
User avatar
Mds
Forum Contributor
Posts: 110
Joined: Tue Apr 22, 2008 8:56 pm
Contact:

Re: HTML validation

Post by Mds »

Well, I think so this will be helpful for you . htmlspecialchars
check it out .
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Re: HTML validation

Post by Ambush Commander »

If you don't mind disallowing stuff like forms and scripts, HTML Purifier will do the job perfectly. Otherwise, you'll probably want to look at the following options:
  • Remotely requesting the w3c validator service to figure out if the page is valid
  • Running the input through HTML Tidy and seeing what happens
  • Parsing it with DOM and then running a DTD validation on it
  • Enabling HTML Purifier's trusted mode and seeing if that is good enough
There isn't really a good validator library for PHP yet, unfortunately.
User avatar
swiftouch
Forum Commoner
Posts: 80
Joined: Sun Dec 10, 2006 7:35 am
Location: Salt Lake City, Utah

Re: HTML validation

Post by swiftouch »

I applaud you for trying to do this yourself.

I would install tinyMCE or FCKeditor. So much easier in my opinion.
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Re: HTML validation

Post by JAB Creations »

PHP + cURL + W3C validator. Export the (X)HTML data to a temporary file, create the URL, validate the URL using cURL. It creates a lot less load and you can let the W3C update their validator instead of doing it yourself. :mrgreen:
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Re: HTML validation

Post by batfastad »

Yeah I'm thinking definitely go for some sort of CURL implementation on this.

Although the page lives on our intranet, there is a script on our website which will also output the page, so I will just use that as the link to give to the validator. Then I'll just scan through the text of the returned page looking for something like [Invalid] or xx Errors

Thanks for the info, thought I'd check in case there was anything else I was missing

Thanks, B
Post Reply