Any questions involving matching text strings to patterns - the pattern is called a "regular expression."
Moderator: General Moderators
GeXus
Forum Regular
Posts: 631 Joined: Sat Mar 11, 2006 8:59 am
Post
by GeXus » Mon Aug 13, 2007 10:45 pm
I want to be able to parse a URL and get the image URL only for images that are linked to an image.. so for example
Code: Select all
// This would not be grabbed
<img src="image.jpg"/>
// This would not be grabbed
<a href="http://www.abc.com"><img src="image.jpg"/></a>
// This would be - it would return 'http://www.abc.com/image_full.jpg'
<a href="http://www.abc.com/image_full.jpg"><img src="image.jpg"/></a>
Would anyone be able to help me out with how I would do this? I really have no clue!
Thanks a lot!
Benjamin
Site Administrator
Posts: 6935 Joined: Sun May 19, 2002 10:24 pm
Post
by Benjamin » Mon Aug 13, 2007 10:58 pm
Code: Select all
$link = '<a href="http://www.abc.com/image_full.jpg"><img src="image.jpg"/></a>';
if (preg_match_all('#<\s{0,2}a\s{1,3}.*?href\s{1,3}=\s{1,3}[\'"]{1}http:.*?[\'"]{1}.*?>\s{1,3}(<\s{1,3}img\s{1,3}.*?src\s{1,3}=\s{1,3}[\'"]{1}http.*?[\'"]{1}.*?>)\s{1,3}<\s{1,3}/a\s{1,3}>#im', $link, $matches))
{
echo "<pre>" . print_r($matches, true) . "</pre>";
}
totally untested.
GeXus
Forum Regular
Posts: 631 Joined: Sat Mar 11, 2006 8:59 am
Post
by GeXus » Tue Aug 14, 2007 8:52 pm
Nice...
Doesnt quite work though.. I will try messing around with... If you have any suggestions.....
I'm really BAD at regex
I really appreciate this!
Benjamin
Site Administrator
Posts: 6935 Joined: Sun May 19, 2002 10:24 pm
Post
by Benjamin » Tue Aug 14, 2007 9:00 pm
What did/didn't it match? I'm sure it needs some work.
GeXus
Forum Regular
Posts: 631 Joined: Sat Mar 11, 2006 8:59 am
Post
by GeXus » Tue Aug 14, 2007 9:02 pm
It just didnt match anything.. i ran it exactly as you have it
Benjamin
Site Administrator
Posts: 6935 Joined: Sun May 19, 2002 10:24 pm
Post
by Benjamin » Tue Aug 14, 2007 9:04 pm
Ok, I'll tweak it later.
GeXus
Forum Regular
Posts: 631 Joined: Sat Mar 11, 2006 8:59 am
Post
by GeXus » Tue Aug 14, 2007 9:06 pm
astions wrote: Ok, I'll tweak it later.
Sweet... Thanks!
GeXus
Forum Regular
Posts: 631 Joined: Sat Mar 11, 2006 8:59 am
Post
by GeXus » Tue Aug 14, 2007 10:23 pm
I've got an expression that seems to be doing the job!
Code: Select all
#(?<=href=\x22)([\w:.]*/)+\w+\.jpg(?=\x22)#im
Thanks a lot for your help!
Benjamin
Site Administrator
Posts: 6935 Joined: Sun May 19, 2002 10:24 pm
Post
by Benjamin » Tue Aug 14, 2007 10:38 pm
Ok cool. One less thing on my todo list tonight
GeXus
Forum Regular
Posts: 631 Joined: Sat Mar 11, 2006 8:59 am
Post
by GeXus » Tue Aug 14, 2007 10:43 pm
Just want to update... It is
Code: Select all
#(?<=href=\x22)(?:[\w:.]*)+\w+\.jpg(?=\x22)#im
The prevous one was not matching <a href="img.jpg"><img src="img.jpg"/></a> where the HREF was relative.
Thanks again!
GeertDD
Forum Contributor
Posts: 274 Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium
Post
by GeertDD » Wed Aug 15, 2007 3:03 am
GeXus wrote: Code: Select all
#(?<=href=\x22)(?:[\w:.]*)+\w+\.jpg(?=\x22)#im
The prevous one was not matching <a href="img.jpg"><img src="img.jpg"/></a> where the HREF was relative.
But that one doesn't match full image URLs anymore.
Try this one: