Hey Guys,
I have a document with different lines.
I have double line breaks (<br>) between some lines and some lines with just one line breaks between them, like this:
I'm trying to write a regex syntax to leave me with only one <br> per line but I can't figure out what that would be.
Could you please help?
I know this is pretty basic but I just heard of regex for the first time yesterday (I read most of this website - http://www.regular-expressions.info about it though)
Thank you!
Yuval
Note that I'm not bothering about uppercased tags or xhtml breaks. You just have got to know the context and since your example text only shows <br>s there is no point in making the regex match anything other.
Also note that if you put \s* after the <br> in your regex, as in ole's snippet, you're stripping out all newlines and will end up with one long string.
GeertDD wrote:Note that I'm not bothering about uppercased tags or xhtml breaks.
Why not? PCRE is powerful enough that there's no reason not to.
GeertDD wrote:Also note that if you put \s* after the <br> in your regex, as in ole's snippet, you're stripping out all newlines and will end up with one long string.
You're correct. That could be easily solved by adding a '?' quantifier after the '*' quantifier.
Although, it wouldn't make a difference to the HTML display.
GeertDD wrote:Note that I'm not bothering about uppercased tags or xhtml breaks.
Why not? PCRE is powerful enough that there's no reason not to.
Of course PCRE is powerful enough, but is that a reason to start implementing stuff that you know you don't need? If you're working with html files, there's no need for xhtml compatible regex. All I'm trying to say is to know your data and keep things simple. Saves time as well.
GeertDD wrote:Also note that if you put \s* after the <br> in your regex, as in ole's snippet, you're stripping out all newlines and will end up with one long string.
You're correct. That could be easily solved by adding a '?' quantifier after the '*' quantifier.
Although, it wouldn't make a difference to the HTML display.
Nice fix, \s*? does the job indeed. No difference to html display, but I wouldn't like to start editing a single line html file.
Of course PCRE is powerful enough, but is that a reason to start implementing stuff that you know you don't need? If you're working with html files, there's no need for xhtml compatible regex. All I'm trying to say is to know your data and keep things simple. Saves time as well.
Whilst I agree that over-engineering is a bad thing how can you say that you don't know you won't need it? What if you want to use it to process someone else's HTML? Also if I'm writing something that will only work within certain parameters I'll have to document that in the code, some times this is more complicated than actually beefing up the code itself. For me personally I write my breaks as <br /> so there's always a chance I could add extra whitespace or omit the slash. If I can write something that can cope with that without too much effort or code bloat I'm going to.
Sure, as I said it all depends on the context of your data. I don't know where ykarmi's html is coming from, of course. Anyway, I've made my point and I get yours.