Strip double <br>'s?

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
ykarmi
Forum Commoner
Posts: 35
Joined: Mon Oct 30, 2006 4:45 pm

Strip double <br>'s?

Post by ykarmi »

Hey Guys,
I have a document with different lines.
I have double line breaks (<br>) between some lines and some lines with just one line breaks between them, like this:

Code: Select all

word<br>
word<br><br>
hello<br>
anotherword<br><br><br>
goodword<br>
I'm trying to write a regex syntax to leave me with only one <br> per line but I can't figure out what that would be.
Could you please help?
I know this is pretty basic but I just heard of regex for the first time yesterday (I read most of this website - http://www.regular-expressions.info about it though)
Thank you!
Yuval

Edit: Tried

Code: Select all

$content = preg_replace('# +#',' ',$content);
but no luck... (space -plus sign- replaced with just one space)
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

untested..

Code: Select all

while (preg_match('#<br\s{0,}/{0,}>\s{0,}<br\s{0,}/{0,}>#i', $content))
{
    $content = preg_replace('#<br\s{0,}/{0,}>\s{0,}<br\s{0,}/{0,}>#i', '<br />', $content);
}
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

I'm sure you know astions, but {0,} can be written as *
I would write it as this

Code: Select all

$pattern = '#(<br\s*/?>\s*){2,}#i';
while (preg_match($pattern, $content)) {
    $content = preg_replace($pattern, '<br />', $content);
}
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Why bother with the preg_match call? It's unnecessary. preg_replace will replace all occurrences.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

Does is get simpler than this?

Code: Select all

preg_replace('/(?:<br>){2,}/', '<br>', $content);
Note that I'm not bothering about uppercased tags or xhtml breaks. You just have got to know the context and since your example text only shows <br>s there is no point in making the regex match anything other.

Also note that if you put \s* after the <br> in your regex, as in ole's snippet, you're stripping out all newlines and will end up with one long string.
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

GeertDD wrote:Note that I'm not bothering about uppercased tags or xhtml breaks.
Why not? PCRE is powerful enough that there's no reason not to.
GeertDD wrote:Also note that if you put \s* after the <br> in your regex, as in ole's snippet, you're stripping out all newlines and will end up with one long string.
You're correct. That could be easily solved by adding a '?' quantifier after the '*' quantifier.
Although, it wouldn't make a difference to the HTML display.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

superdezign wrote:
GeertDD wrote:Note that I'm not bothering about uppercased tags or xhtml breaks.
Why not? PCRE is powerful enough that there's no reason not to.
Of course PCRE is powerful enough, but is that a reason to start implementing stuff that you know you don't need? If you're working with html files, there's no need for xhtml compatible regex. All I'm trying to say is to know your data and keep things simple. Saves time as well.
GeertDD wrote:Also note that if you put \s* after the <br> in your regex, as in ole's snippet, you're stripping out all newlines and will end up with one long string.
You're correct. That could be easily solved by adding a '?' quantifier after the '*' quantifier.
Although, it wouldn't make a difference to the HTML display.
Nice fix, \s*? does the job indeed. No difference to html display, but I wouldn't like to start editing a single line html file. :wink:
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Of course PCRE is powerful enough, but is that a reason to start implementing stuff that you know you don't need? If you're working with html files, there's no need for xhtml compatible regex. All I'm trying to say is to know your data and keep things simple. Saves time as well.
Whilst I agree that over-engineering is a bad thing how can you say that you don't know you won't need it? What if you want to use it to process someone else's HTML? Also if I'm writing something that will only work within certain parameters I'll have to document that in the code, some times this is more complicated than actually beefing up the code itself. For me personally I write my breaks as <br /> so there's always a chance I could add extra whitespace or omit the slash. If I can write something that can cope with that without too much effort or code bloat I'm going to.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

Sure, as I said it all depends on the context of your data. I don't know where ykarmi's html is coming from, of course. Anyway, I've made my point and I get yours.
mrkite
Forum Contributor
Posts: 104
Joined: Tue Sep 11, 2007 4:19 am

Post by mrkite »

GeertDD wrote:Does is get simpler than this?

Code: Select all

preg_replace('/(?:<br>){2,}/', '<br>', $content);

Code: Select all

preg_replace('/(?:(<br[^>]*>)\s*){2,}/',"\\1\n",$content);
Not only gets <br /> and <br> but also preserves your choice.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

mrkite wrote:

Code: Select all

preg_replace('/(?:(<br[^>]*>)\s*){2,}/',"\\1\n",$content);
Not only gets <br /> and <br> but also preserves your choice.
Okay, but what if your text contained <br style="clear:both">? ;-)
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

GeertDD wrote:Okay, but what if your text contained <br style="clear:both">? ;-)
Is that comment in reference to the '\\1' replacement?
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Post by Jenk »

GeertDD wrote:
mrkite wrote:

Code: Select all

preg_replace('/(?:(<br[^>]*>)\s*){2,}/',"\\1\n",$content);
Not only gets <br /> and <br> but also preserves your choice.
Okay, but what if your text contained <br style="clear:both">? ;-)
Then it will replace double entries of <br style="clear:both"> with one <br style="clear:both">.
Post Reply