Page 1 of 1

regex to get linked images

Posted: Mon Aug 13, 2007 10:45 pm
by GeXus
I want to be able to parse a URL and get the image URL only for images that are linked to an image.. so for example

Code: Select all

// This would not be grabbed
<img src="image.jpg"/>

// This would not be grabbed
<a href="http://www.abc.com"><img src="image.jpg"/></a>

// This would be - it would return 'http://www.abc.com/image_full.jpg'
<a href="http://www.abc.com/image_full.jpg"><img src="image.jpg"/></a>

Would anyone be able to help me out with how I would do this? I really have no clue! :)

Thanks a lot!

Posted: Mon Aug 13, 2007 10:58 pm
by Benjamin

Code: Select all

$link = '<a href="http://www.abc.com/image_full.jpg"><img src="image.jpg"/></a>';

if (preg_match_all('#<\s{0,2}a\s{1,3}.*?href\s{1,3}=\s{1,3}[\'"]{1}http:.*?[\'"]{1}.*?>\s{1,3}(<\s{1,3}img\s{1,3}.*?src\s{1,3}=\s{1,3}[\'"]{1}http.*?[\'"]{1}.*?>)\s{1,3}<\s{1,3}/a\s{1,3}>#im', $link, $matches))
{
    echo "<pre>" . print_r($matches, true) . "</pre>";
}
totally untested.

Posted: Tue Aug 14, 2007 8:52 pm
by GeXus
Nice...

Doesnt quite work though.. I will try messing around with... If you have any suggestions..... :) I'm really BAD at regex

I really appreciate this!

Posted: Tue Aug 14, 2007 9:00 pm
by Benjamin
What did/didn't it match? I'm sure it needs some work.

Posted: Tue Aug 14, 2007 9:02 pm
by GeXus
It just didnt match anything.. i ran it exactly as you have it

Posted: Tue Aug 14, 2007 9:04 pm
by Benjamin
Ok, I'll tweak it later.

Posted: Tue Aug 14, 2007 9:06 pm
by GeXus
astions wrote:Ok, I'll tweak it later.
Sweet... Thanks!

Posted: Tue Aug 14, 2007 10:23 pm
by GeXus
I've got an expression that seems to be doing the job!

Code: Select all

#(?<=href=\x22)([\w:.]*/)+\w+\.jpg(?=\x22)#im
Thanks a lot for your help!

Posted: Tue Aug 14, 2007 10:38 pm
by Benjamin
Ok cool. One less thing on my todo list tonight :)

Posted: Tue Aug 14, 2007 10:43 pm
by GeXus
Just want to update... It is

Code: Select all

#(?<=href=\x22)(?:[\w:.]*)+\w+\.jpg(?=\x22)#im
The prevous one was not matching <a href="img.jpg"><img src="img.jpg"/></a> where the HREF was relative.

Thanks again!

Posted: Wed Aug 15, 2007 3:03 am
by GeertDD
GeXus wrote:

Code: Select all

#(?<=href=\x22)(?:[\w:.]*)+\w+\.jpg(?=\x22)#im
The prevous one was not matching <a href="img.jpg"><img src="img.jpg"/></a> where the HREF was relative.
But that one doesn't match full image URLs anymore.

Try this one:

Code: Select all

#(?<=href=\x22).+?\.jpg(?=\x22)#i