Page 1 of 1

Matching multiple pattern on single line???

Posted: Mon Feb 04, 2008 3:18 pm
by alex.barylski
Here is my code:

Code: Select all

/<a.+href=["|\'](.[^"|\']+)/i
It basically matches all 'href' attributes inside <a> tags...the only problem is that when there are two or more <a> on a single line (CRLF)...only the last 'href' is matched...???

How do I match multiple <a href> when they reside on a single line? Why is only the last one being matched?

p.s-I'm using preg_match_all with default arguments

Cheers :)

Re: Matching multiple pattern on single line???

Posted: Mon Feb 04, 2008 4:35 pm
by arjan.top
This should work:

Code: Select all

 
/<a(.+?)href=["|']([^"|']+?)/i
 

Re: Matching multiple pattern on single line???

Posted: Mon Feb 04, 2008 5:11 pm
by alex.barylski
Hey thanks for the reply... :)

Unfortunately, that regex doesn't seem to work...I've tried looking at yours and mine to try and figure out what was different:

Here is what I currently have constructed:

Code: Select all

'/<a.+href=["|\'](.[^"|\']+)["|\']?/i'
Perhaps you can tell me why my regex seems to match only the last 'href' in a line....when there are two like so:

Code: Select all

<a href="index.html">Test 1</a> <a href="about_ut.htm">Test 2</a>
When I match against something like the above, only the about_us.htm is returned. WHereas if they are on seperate lines:

Code: Select all

 <a href="index.html">Test 1</a><a href="about_ut.htm">Test 2</a> 
Because of the new line...thes regex works beautifully...

As a hack I just add a newline to each '>' and this seems to have corrected the problem...but I'd prefer to solve this in regex if possible???

Cheers :)

Re: Matching multiple pattern on single line???

Posted: Mon Feb 04, 2008 7:14 pm
by Chris Corbyn

Code: Select all

/<a\s+[^>]*?\bhref=(["'])(.*?)\\1/i
The <a\s+ looks for <a followed by whitespace. The [^>]*? looks for anything except ">", zero or more time (non-greedy). The \bhref=(['"]) looks for href=" or href=', provided the "href" part is not the end of another word (i.e. the \b is a boundary). It captures the type of quotes used into $1 (or \\1). The (.*?) is the value of the href, and the \\1 is the closing quote captured before.

However, this will only match the first href on a line. You need a preg_match_all() to get both href's.