Strip bad MS Word tags from string?
Posted: Wed Feb 25, 2009 5:30 am
Everything works in my CMS, but there has been a problem when the client copies the topic content from his MS Word Documents on the Rich text area (I'm using Tiny MCE). This results in a post that has these MS tags in:
These tags cause Internet Explorer 6/7 (ugh..) to break the page layout. Firefox seems to gracefully ignore them.
Tiny MCE has a button to paste from an MS Word Document and removes all these tags, but the client most of the times forgets to use it, so it results into a broken CMS. Is there any server side way to remove those tags with PHP while keeping the rest of the rich html content?
Code: Select all
<meta http-equiv="\"Content-Type\"" content="\"text/html;" charset="utf-8\"" />
<meta name="\"ProgId\"" content="\"Word.Document\"" />
<meta name="\"Generator\"" content="\"Microsoft" />
<meta name="\"Originator\"" content="\"Microsoft" />
<link rel="\"File-List\"" href="\" />
<!--[if gte mso 9]>
<xml> Normal 0 false false false MicrosoftInternetExplorer4 </xml><![endif]--><!--[if gte mso 9]>
<xml> </xml><![endif]-->
<style><!--
--></style>
<!--[if gte mso 10]>
<mce:style><! /* Style Definitions */ table.MsoNormalTable {mso-style-name:\"Table Normal\"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:\"\"; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:\"Times New Roman\"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;} --> <!--[endif]-->Tiny MCE has a button to paste from an MS Word Document and removes all these tags, but the client most of the times forgets to use it, so it results into a broken CMS. Is there any server side way to remove those tags with PHP while keeping the rest of the rich html content?