[SOLVED] Regular expression pattern match

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Regular expression pattern match

Post by Chris Corbyn »

Can anyone see a problem with this pattern match for extracting link url's from hyperlinks in html documents?
<a href="(.+)">
it doesn't seem to stop copying data at the second double quote for the first link on a page. It waits until it reaches the secound double quote of the second link on the page and then works as expected.

Any better offers (doesn't matter about whitespace or case etc just assume format to be strictly <a href="somelink.html">)

Thanks in advance :-)
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

your regex is set to be greedy.. try this:

Code: Select all

<a href="(.+?)">
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

No just realised my problem is that if any two links occur on the same line i get the problems stated. Otherwise it works fine. I'm missing something vital in my pattern match that tells it when it's reached the end of the link and to start again on the same line. Could it be \b? I'll test it....
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

try my regex..
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Thanks feyd... it's improved but now if there happens to be two links on the same line it only reads to the first one. What do I use to make it continue along the line? I thought it was "g" for global?

In perl it would be
/<a href="(.+?)">/gi
right or am I wrong about the g? Seems to do the same thing in perl :-(

Thanks again
Last edited by Chris Corbyn on Fri Aug 06, 2004 10:20 pm, edited 1 time in total.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

preg_match_all
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Cheers! :-) You're a clever guy ;-)
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

for some reason the PCRE functions don't support g, which I find kinda silly.. but I guess, for the most part, you want all of them to operate globally anyways.. I dunno.. :?
Post Reply