Page 1 of 1

How To Use This Function On External File?

Posted: Tue Nov 23, 2010 2:31 am
by abrogard
I found this little function for cleaning Word general html code and I know a bloke who could really use it.

Unfortunately my php isn't good enough yet for ME to know how to use it:

function cleanHTML($html) {
/// <summary>
/// Removes all FONT and SPAN tags, and all Class and Style attributes.
/// Designed to get rid of non-standard Microsoft Word HTML tags.
/// </summary>
// start by completely removing all unwanted tags

$html = ereg_replace("<(/)?(font|span|del|ins)[^>]*>","",$html);

// then run another pass over the html (twice), removing unwanted attributes

$html = ereg_replace("<([^>]*)(class|lang|style|size|face)=(\"[^\"]*\"|'[^']*'|[^>]+)([^>]*)>","<\\1>",$html);
$html = ereg_replace("<([^>]*)(class|lang|style|size|face)=(\"[^\"]*\"|'[^']*'|[^>]+)([^>]*)>","<\\1>",$html);

// sample word html <p class="aaa" style="background:dot">abc</p> will return <p > </p>
}

I assume I put it in a php document like, maybe 'cleanhtml.php'

and then outside of the function declaration and definition I call the function with the file I want to clean as an argument.
Something like: cleanHTML( myfilenameonthehardrive.html )

And save that file. cleanhtml.php in the same dir as the 'dirty' file I've quoted.

Then I open it with my browser and it should run and end and the output should be a cleaned file.

Yes?

Something like that?

But exactly how? Because I haven't made it work yet...

:)



And then I

Re: How To Use This Function On External File?

Posted: Tue Nov 23, 2010 9:03 am
by Celauran
I'd pass the full path to the file in as the argument, fopen() it inside the function, read to an array, clean out the cruft, then write back to the file.

Re: How To Use This Function On External File?

Posted: Tue Nov 23, 2010 11:17 am
by AbraCadaver
I haven't tested it, but first ereg functions are deprecated so you can convert them to preg functions. Next, you need to return the $html in order to use it. Assuming the webserver has permissions to write to the directory:

Code: Select all

function cleanHTML($html) {
   $html = preg_replace("#<(/)?(font|span|del|ins)[^>]*>#","",$html);
   $html = preg_replace("#<([^>]*)(class|lang|style|size|face)=(\"[^\"]*\"|'[^']*'|[^>]+)([^>]*)>#","<\\1>",$html);
   $html = preg_replace("#<([^>]*)(class|lang|style|size|face)=(\"[^\"]*\"|'[^']*'|[^>]+)([^>]*)>#","<\\1>",$html);

   return $html;
}
//get html
$dirty = file_get_contents('/path/to/myfilenameonthehardrive.html');
//clean it
$clean = cleanHTML($dirty);
//save to new file
file_put_contents('/path/to/clean_myfilenameonthehardrive.html', $clean);
//display it
echo $clean;