Page 1 of 1

Convert value of class in tags to lowercase?

Posted: Sun Aug 07, 2011 10:20 pm
by erika
I would like to convert the value of every class name in an XHTML document to lowercase. I've researched this and have found a few examples, but I keep running into memory issues and I'm not sure that they would do exactly what I want, anyway. I'm trying to convert a 14,000 line file line-by-line. A class appears once on almost all of the last 13,000 lines.

Here's an example:

[text]<p class=BadClassName>[/text]

There are no quotes around the original in most of the lines--it's a Word HTML file that I didn't create. I add the quotes in later with Tidy, but there doesn't seem to be an option in Tidy to change all classes to lowercase.

I don't want to convert all text within all tags to lowercase, as there may be links to external URLs over which I have no control that are case-sensitive.

Any suggestions would be appreciated.

Re: Convert value of class in tags to lowercase?

Posted: Mon Aug 08, 2011 12:36 am
by McInfo
There is an example on the preg_replace_callback() manual page that does almost what you need. You just need to change the search pattern and write the lines to an output file instead of echoing them.

This script showcases a pattern that might be useful.

Code: Select all

<?php
header('Content-Type: text/plain');
$pattern = '~(?<=\bclass=)("[^"]+|\'[^\']+|[^>\s]+)~i';
$subject = '<P CLASS=BIGBADCLASS><P TITLE=""CLASS=BIG ID=NONE><P CLASS="BIG STUFF"><P CLASS=A_B><P>';
function callback ($matches) {
    return strtolower($matches[1]);
}
echo preg_replace_callback($pattern, 'callback', $subject);
# <P CLASS=bigbadclass><P TITLE=""CLASS=big ID=NONE><P CLASS="big stuff"><P CLASS=a_b><P>

Re: Convert value of class in tags to lowercase?

Posted: Mon Aug 08, 2011 10:17 am
by tr0gd0rr
It will be more reliable if you use DOMDocument. Something like this would work great:

Code: Select all

$doc = new DOMDocument();
$doc->loadHTML($html);
foreach ($doc->getElementsByTagName('*') as $element) {
    if ($element->hasAttribute('class')) {
        $element->setAttribute('class', strtolower($element->getAttribute('class')));
    }
}

Re: Convert value of class in tags to lowercase?

Posted: Wed Aug 10, 2011 1:17 pm
by erika
Thank you. :)

McInfo, that worked for me perfectly.

Tr0gd0rr, keeping in mind your indication that I might have trouble with reliability of the previously mentioned solution, I tried yours, as well. I had some trouble with implementing it--I read the information on the link you included, but I must have missed something... I don't think I defined it right. All of my HTML at this point in my document is in a gigantic string. I tried several variations to use DOMDocument (it has to be implemented as a class first, right?) and I kept getting any of various errors. If my variable name is $tidy (the output from when I ran it through Tidy), what do I need to do before I use the code sample you provided? Is there another line or two I need to have? If so, I can't figure out precising what it (they) should be.

Thanks!

--Erika

Re: Convert value of class in tags to lowercase?

Posted: Wed Aug 10, 2011 2:17 pm
by erika
I already have simple html dom parser working on this file, if that helps... but I just copy and pasted a solution, I don't know how to come up with it myself. :/