Page 1 of 1

replace <p> by \\n\\n

Posted: Mon May 10, 2010 3:18 am
by lclqt12
Hi all,

I have a text : str = "abc<p> cde</p>, <p>fasd";
The expect result is : "abc\n\n cde, \n\nfasd"

How could i do that ? ( using regular expression )

Re: replace <p> by \\n\\n

Posted: Mon May 10, 2010 3:25 am
by garygay
lclqt12 wrote:Hi all,

I have a text : str = "abc<p> cde</p>, <p>fasd";
The expect result is : "abc\n\n cde, \n\nfasd"

How could i do that ? ( using regular expression )
hi,

$str = str_replace('/<p>/','\n\n',$str);
str_replace('/<\/p>/','',$str)

Re: replace <p> by \\n\\n

Posted: Mon May 10, 2010 4:19 am
by lclqt12
Thank you for your support.
But as i said, in my text, <p> is not only <p>. It could be <p font ....> or <p margin ... >
Therefore, i wounder how could i replace all <p ....> with \n\n

Re: replace <p> by \\n\\n

Posted: Mon May 10, 2010 9:20 am
by AbraCadaver

Code: Select all

$str = preg_replace('#</?p[^>]*>#', "\n\n", $str);

Re: replace <p> by \\n\\n

Posted: Mon May 10, 2010 9:44 am
by ridgerunner
AbraCadaver, your regex has a problem: It replaces the end of paragraph tag </p> with \n\n (only the opening <p> should have this substitution). Also, you should probably specify the "i" ignore-case modifier.

The opening and closing tags need to be handled separately like so:

Code: Select all

// replace <p ...> opening tags with double linefeeds
$str = preg_replace('/<p[^>]*+>/i', '\n\n', $str);
// strip </p> closing tags
$str = str_replace('</p>', '', $str);
If you're curious, here is a more complex (but commented) regex which does it in one step:

Code: Select all

$str = preg_replace('%
    <p[^>]*>        # match an opening paragraph tag with attributes
    (               # begin group 1 to capture paragraph contents
      [^<]*+        # consume everything up to next < tag start char
      (?:           # begin unrolling the loop...
        (?!</?p\b)  # at a position that is not a paragraph tag
        <           # match the beginning of a non-p tag
        [^<]*+      # consume everything up to next < tag start char
      )*+           # repeat the loop as many times as required
    )               # end group 1 capturing paragraph contents
    (?=</?p\b|$)    # stop matching on <p*>, </p> or end of string
    (?:</p>)?       # if there is a closing </p> match and discard it
    %ix', '\n\n$1', $str);
Note that neither of these solutions get rid of other embedded HTML tags that may be between the <p> tags.
For example: "abc<p> <em>cde</em></p>, <p>fasd"

Re: replace <p> by \\n\\n

Posted: Mon May 10, 2010 10:01 am
by AbraCadaver
Ahh yes, I misread the original post and thought to replace the closing tag also. I didn't notice the second opening tag.