Matching multiple pattern on single line???

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Matching multiple pattern on single line???

Post by alex.barylski »

Here is my code:

Code: Select all

/<a.+href=["|\'](.[^"|\']+)/i
It basically matches all 'href' attributes inside <a> tags...the only problem is that when there are two or more <a> on a single line (CRLF)...only the last 'href' is matched...???

How do I match multiple <a href> when they reside on a single line? Why is only the last one being matched?

p.s-I'm using preg_match_all with default arguments

Cheers :)
User avatar
arjan.top
Forum Contributor
Posts: 305
Joined: Sun Oct 14, 2007 4:36 am
Location: Hoče, Slovenia

Re: Matching multiple pattern on single line???

Post by arjan.top »

This should work:

Code: Select all

 
/<a(.+?)href=["|']([^"|']+?)/i
 
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Matching multiple pattern on single line???

Post by alex.barylski »

Hey thanks for the reply... :)

Unfortunately, that regex doesn't seem to work...I've tried looking at yours and mine to try and figure out what was different:

Here is what I currently have constructed:

Code: Select all

'/<a.+href=["|\'](.[^"|\']+)["|\']?/i'
Perhaps you can tell me why my regex seems to match only the last 'href' in a line....when there are two like so:

Code: Select all

<a href="index.html">Test 1</a> <a href="about_ut.htm">Test 2</a>
When I match against something like the above, only the about_us.htm is returned. WHereas if they are on seperate lines:

Code: Select all

 <a href="index.html">Test 1</a><a href="about_ut.htm">Test 2</a> 
Because of the new line...thes regex works beautifully...

As a hack I just add a newline to each '>' and this seems to have corrected the problem...but I'd prefer to solve this in regex if possible???

Cheers :)
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Matching multiple pattern on single line???

Post by Chris Corbyn »

Code: Select all

/<a\s+[^>]*?\bhref=(["'])(.*?)\\1/i
The <a\s+ looks for <a followed by whitespace. The [^>]*? looks for anything except ">", zero or more time (non-greedy). The \bhref=(['"]) looks for href=" or href=', provided the "href" part is not the end of another word (i.e. the \b is a boundary). It captures the type of quotes used into $1 (or \\1). The (.*?) is the value of the href, and the \\1 is the closing quote captured before.

However, this will only match the first href on a line. You need a preg_match_all() to get both href's.
Post Reply