Hello everyone, I'm looking to write some regex/php to basically strip down some HTML that I've CURL-ed from another site, and appears to have been written in MS Word.... Here's a little bit of what I'm getting...
Code: Select all
<p class=MsoNormal><b><u>Driver regulations and Safety:<o:p></o:p></u></b></p>
<ol style='margin-top:0in' start=1 type=1>
<li class=MsoNormal style='mso-list:l17 level1 lfo19;tab-stops:list .5in'>Must
be 16 years of age (NO EXCEPTIONS!!)</li>
You can see that there are three things I would to take out. First, I would like to get rid of the <o:p> 's. I don't even know what those are. Secondly, I'd like to get rid of all class definitions and Third, the style definitions.
Can this be easily done?