Trying to strip MSWord tags

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
Sindarin
Forum Regular
Posts: 521
Joined: Tue Sep 25, 2007 8:36 am
Location: Greece

Trying to strip MSWord tags

Post by Sindarin »

I am trying to create a function to detect and remove the ugly tags MSWord leaves behind when copy-pasted in my rich text field:

Code: Select all

<?php
 
 
/* DETECT AND REMOVE MSTAGS */
 
function strip_mstags($str)
{
 
$str=str_replace('<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />','<!--stripped-->',$str);
$str=str_replace('<meta name="ProgId" content="Word.Document" />','<!--stripped-->',$str);
$str=str_replace('<meta name="Generator" content="Microsoft Word 11" />','<!--stripped-->',$str);
$str=str_replace('<meta name="Originator" content="Microsoft Word 11" />','<!--stripped-->',$str);
$str=str_replace('<!--[if gte mso 9]><xml>','<!--stripped-->',$str);
$str=str_replace('</xml><![endif]-->','<!--stripped-->',$str);
$str=str_replace('<!--[if gte mso 10]>','<!--stripped-->',$str);
$str=str_replace('<mce:style>','<!--stripped-->',$str);
$str=str_replace('<p class="MsoNormal">','<!--stripped-->',$str);
$str=str_replace('<o:p>','<!--stripped-->',$str);
$str=str_replace('</o:p>','<!--stripped-->',$str);
$str=str_replace('<link rel="File-List" href="','<!--stripped-->',$str);
$str=str_replace('<!--[if','<!--stripped-->',$str);
$str=str_replace('<![endif]-->','<!--stripped-->',$str);
$str=str_replace('<w:WordDocument>','<!--stripped-->',$str);
 
return $str;
}
 
?>
Can someone provide me with a better method to do this, supply any lists with ms specific tags and a way to remove the whole contents of:

Code: Select all

 
<!--
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
    {mso-style-parent:"";
    margin:0cm;
    margin-bottom:.0001pt;
    mso-pagination:widow-orphan;
    font-size:12.0pt;
    font-family:"Times New Roman";
    mso-fareast-font-family:"Times New Roman";}
a:link, span.MsoHyperlink
    {color:blue;
    text-decoration:underline;
    text-underline:single;}
a:visited, span.MsoHyperlinkFollowed
    {color:purple;
    text-decoration:underline;
    text-underline:single;}
@page Section1
    {size:612.0pt 792.0pt;
    margin:72.0pt 90.0pt 72.0pt 90.0pt;
    mso-header-margin:36.0pt;
    mso-footer-margin:36.0pt;
    mso-paper-source:0;}
div.Section1
    {page:Section1;}
-->
 
mattpointblank
Forum Contributor
Posts: 304
Joined: Tue Dec 23, 2008 6:29 am

Re: Trying to strip MSWord tags

Post by mattpointblank »

This is a tricky one - I use TinyMCE for rich text editing online which has functions to strip out bad code like the above - try it?
User avatar
Sindarin
Forum Regular
Posts: 521
Joined: Tue Sep 25, 2007 8:36 am
Location: Greece

Re: Trying to strip MSWord tags

Post by Sindarin »

I have, and it works, however my clients are not that savvy to use the Paste from Word button all the time. That's why I want to use PHP for that work.
Post Reply