Matching Media Files and Image Sources

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
jasonx
Forum Newbie
Posts: 3
Joined: Sat Oct 22, 2005 3:40 am

Matching Media Files and Image Sources

Post by jasonx »

I am trying to write a regex which finds links in a file and the image the link represents.

The format of what I am trying to match may look like the following:

<a href="some.mpg"><img src="some.jpg"></a>

My problem is that other tags may be present. Also the html code may span multiple lines. I am only interested in the link to the media file and the source image of the image tag.

My current regex is:

Code: Select all

preg_match_all("#<a.*href=[\"\'](.*)(.mpg|.mpeg|.mov|.avi|.wmv)[\"\'].*>.*<img.*src=[\"\'](.*)(.jpg|.jpeg)[\"\'].*>.*</a>#is", $fileContents, $matches, PREG_SET_ORDER);
What is happening is that my regex isnt stoping at the first encounter of </a>. It keeps going until the last match.

Can anyone provide some advice?

Cheers
Jason
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

your pattern is greedy, either switch the .* to .*? or add the U pattern modifier.
jasonx
Forum Newbie
Posts: 3
Joined: Sat Oct 22, 2005 3:40 am

Post by jasonx »

cheers feyd that solved my problem.

One other question I have is that my regex is matching things like:

<a href=link.html>more html</a><a href=something.mpg><img src=some.jpg></a>

The part of my regex that is causing this is

Code: Select all

<a.*href=[\"\'](.*)(.mpg|.mpeg|.mov|.avi|.wmv)[\"\'].*>
How would I make it so if it didn't encounter any of those media extensions and then encountered the proceeding '>' it wouldn't match?

So for the above example it would only start matching on the second link tag.

Cheers
Jason
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Code: Select all

preg_match_all("#<a[^>]+href=[\"\'](.*(?:\.mpe?g|\.mov|\.avi|\.wmv))[\"\'].*>.*<img.*src=[\"\'](.*\.jpe?g)[\"\'].*>.*</a>#isU", $fileContents, $matches, PREG_SET_ORDER);
may work..
jasonx
Forum Newbie
Posts: 3
Joined: Sat Oct 22, 2005 3:40 am

Post by jasonx »

feyd that produces the same result as my original regex.

Some sample html I am testing with is below

Code: Select all

<a href="2.mpg"><img src="2.jpg" border="0" class="thumbs"></a></div></td>
<span class="style4">  <a href="text.html">testing<br></a></span></div></td>
<div align="center"><a href="3.mpg"><img src="3.jpg" border="0" class="thumbs"></a></div></td>
		<td colspan="4" rowspan="2">
			<img src="images/md_31.gif" width="21" height="243" alt=""></td>
		<td colspan="6" background="images/md_32.gif" width="322" height="242" alt=""><div align="center"><a href="4.mpg"><img src="4.jpg" border="0" class="thumbs"></a></div></td>
My function I am writing is this:

Code: Select all

function matchMovies($fileContents)
{
    preg_match_all("#<a[^>]+href=[\"\'](.*(?:\.mpe?g|\.mov|\.avi|\.wmv))[\"\'].*>.*<img.*src=[\"\'](.*\.jpe?g)[\"\'].*>.*</a>#isU", $fileContents, $matches, PREG_SET_ORDER); 
    print_r($matches);
    return $matches;
}
Cheers
Jason
Post Reply