Page 1 of 1
Strip double <br>'s?
Posted: Sat Aug 11, 2007 10:48 pm
by ykarmi
Hey Guys,
I have a document with different lines.
I have double line breaks (<br>) between some lines and some lines with just one line breaks between them, like this:
Code: Select all
word<br>
word<br><br>
hello<br>
anotherword<br><br><br>
goodword<br>
I'm trying to write a regex syntax to leave me with only one <br> per line but I can't figure out what that would be.
Could you please help?
I know this is pretty basic but I just heard of regex for the first time yesterday (I read most of this website -
http://www.regular-expressions.info about it though)
Thank you!
Yuval
Edit: Tried
Code: Select all
$content = preg_replace('# +#',' ',$content);
but no luck... (space -plus sign- replaced with just one space)
Posted: Sat Aug 11, 2007 11:39 pm
by Benjamin
untested..
Code: Select all
while (preg_match('#<br\s{0,}/{0,}>\s{0,}<br\s{0,}/{0,}>#i', $content))
{
$content = preg_replace('#<br\s{0,}/{0,}>\s{0,}<br\s{0,}/{0,}>#i', '<br />', $content);
}
Posted: Sun Aug 12, 2007 3:20 am
by Ollie Saunders
I'm sure you know astions, but {0,} can be written as *
I would write it as this
Code: Select all
$pattern = '#(<br\s*/?>\s*){2,}#i';
while (preg_match($pattern, $content)) {
$content = preg_replace($pattern, '<br />', $content);
}
Posted: Sun Aug 12, 2007 8:42 am
by superdezign
Why bother with the preg_match call? It's unnecessary. preg_replace will replace all occurrences.
Posted: Mon Aug 13, 2007 10:55 am
by GeertDD
Does is get simpler than this?
Code: Select all
preg_replace('/(?:<br>){2,}/', '<br>', $content);
Note that I'm not bothering about uppercased tags or xhtml breaks. You just have got to know the context and since your example text only shows <br>s there is no point in making the regex match anything other.
Also note that if you put \s* after the <br> in your regex, as in ole's snippet, you're stripping out all newlines and will end up with one long string.
Posted: Mon Aug 13, 2007 11:08 am
by superdezign
GeertDD wrote:Note that I'm not bothering about uppercased tags or xhtml breaks.
Why not? PCRE is powerful enough that there's no reason not to.
GeertDD wrote:Also note that if you put \s* after the <br> in your regex, as in ole's snippet, you're stripping out all newlines and will end up with one long string.
You're correct. That could be easily solved by adding a '?' quantifier after the '*' quantifier.
Although, it wouldn't make a difference to the HTML display.
Posted: Wed Aug 15, 2007 3:15 am
by GeertDD
superdezign wrote:GeertDD wrote:Note that I'm not bothering about uppercased tags or xhtml breaks.
Why not? PCRE is powerful enough that there's no reason not to.
Of course PCRE is powerful enough, but is that a reason to start implementing stuff that you know you don't need? If you're working with html files, there's no need for xhtml compatible regex. All I'm trying to say is to know your data and keep things simple. Saves time as well.
GeertDD wrote:Also note that if you put \s* after the <br> in your regex, as in ole's snippet, you're stripping out all newlines and will end up with one long string.
You're correct. That could be easily solved by adding a '?' quantifier after the '*' quantifier.
Although, it wouldn't make a difference to the HTML display.
Nice fix, \s*? does the job indeed. No difference to html display, but I wouldn't like to start editing a single line html file.

Posted: Wed Aug 15, 2007 4:00 am
by Ollie Saunders
Of course PCRE is powerful enough, but is that a reason to start implementing stuff that you know you don't need? If you're working with html files, there's no need for xhtml compatible regex. All I'm trying to say is to know your data and keep things simple. Saves time as well.
Whilst I agree that over-engineering is a bad thing how can you say that you don't know you won't need it? What if you want to use it to process someone else's HTML? Also if I'm writing something that will only work within certain parameters I'll have to document that in the code, some times this is more complicated than actually beefing up the code itself. For me personally I write my breaks as <br /> so there's always a chance I could add extra whitespace or omit the slash. If I can write something that can cope with that without too much effort or code bloat I'm going to.
Posted: Wed Aug 15, 2007 11:00 am
by GeertDD
Sure, as I said it all depends on the context of your data. I don't know where ykarmi's html is coming from, of course. Anyway, I've made my point and I get yours.
Posted: Tue Sep 11, 2007 1:27 pm
by mrkite
GeertDD wrote:Does is get simpler than this?
Code: Select all
preg_replace('/(?:<br>){2,}/', '<br>', $content);
Code: Select all
preg_replace('/(?:(<br[^>]*>)\s*){2,}/',"\\1\n",$content);
Not only gets <br /> and <br> but also preserves your choice.
Posted: Wed Sep 12, 2007 7:20 am
by GeertDD
mrkite wrote:Code: Select all
preg_replace('/(?:(<br[^>]*>)\s*){2,}/',"\\1\n",$content);
Not only gets <br /> and <br> but also preserves your choice.
Okay, but what if your text contained
<br style="clear:both">? ;-)
Posted: Wed Sep 12, 2007 8:07 am
by superdezign
GeertDD wrote:Okay, but what if your text contained
<br style="clear:both">?

Is that comment in reference to the '\\1' replacement?
Posted: Wed Sep 12, 2007 11:25 am
by Jenk
GeertDD wrote:mrkite wrote:Code: Select all
preg_replace('/(?:(<br[^>]*>)\s*){2,}/',"\\1\n",$content);
Not only gets <br /> and <br> but also preserves your choice.
Okay, but what if your text contained
<br style="clear:both">?

Then it will replace double entries of <br style="clear:both"> with one <br style="clear:both">.