Page 1 of 1

Matching Media Files and Image Sources

Posted: Sat Oct 22, 2005 4:36 am
by jasonx
I am trying to write a regex which finds links in a file and the image the link represents.

The format of what I am trying to match may look like the following:

<a href="some.mpg"><img src="some.jpg"></a>

My problem is that other tags may be present. Also the html code may span multiple lines. I am only interested in the link to the media file and the source image of the image tag.

My current regex is:

Code: Select all

preg_match_all("#<a.*href=[\"\'](.*)(.mpg|.mpeg|.mov|.avi|.wmv)[\"\'].*>.*<img.*src=[\"\'](.*)(.jpg|.jpeg)[\"\'].*>.*</a>#is", $fileContents, $matches, PREG_SET_ORDER);
What is happening is that my regex isnt stoping at the first encounter of </a>. It keeps going until the last match.

Can anyone provide some advice?

Cheers
Jason

Posted: Sat Oct 22, 2005 8:23 am
by feyd
your pattern is greedy, either switch the .* to .*? or add the U pattern modifier.

Posted: Sat Oct 22, 2005 9:09 am
by jasonx
cheers feyd that solved my problem.

One other question I have is that my regex is matching things like:

<a href=link.html>more html</a><a href=something.mpg><img src=some.jpg></a>

The part of my regex that is causing this is

Code: Select all

<a.*href=[\"\'](.*)(.mpg|.mpeg|.mov|.avi|.wmv)[\"\'].*>
How would I make it so if it didn't encounter any of those media extensions and then encountered the proceeding '>' it wouldn't match?

So for the above example it would only start matching on the second link tag.

Cheers
Jason

Posted: Sat Oct 22, 2005 9:13 am
by feyd

Code: Select all

preg_match_all("#<a[^>]+href=[\"\'](.*(?:\.mpe?g|\.mov|\.avi|\.wmv))[\"\'].*>.*<img.*src=[\"\'](.*\.jpe?g)[\"\'].*>.*</a>#isU", $fileContents, $matches, PREG_SET_ORDER);
may work..

Posted: Sat Oct 22, 2005 9:30 am
by jasonx
feyd that produces the same result as my original regex.

Some sample html I am testing with is below

Code: Select all

<a href="2.mpg"><img src="2.jpg" border="0" class="thumbs"></a></div></td>
<span class="style4">  <a href="text.html">testing<br></a></span></div></td>
<div align="center"><a href="3.mpg"><img src="3.jpg" border="0" class="thumbs"></a></div></td>
		<td colspan="4" rowspan="2">
			<img src="images/md_31.gif" width="21" height="243" alt=""></td>
		<td colspan="6" background="images/md_32.gif" width="322" height="242" alt=""><div align="center"><a href="4.mpg"><img src="4.jpg" border="0" class="thumbs"></a></div></td>
My function I am writing is this:

Code: Select all

function matchMovies($fileContents)
{
    preg_match_all("#<a[^>]+href=[\"\'](.*(?:\.mpe?g|\.mov|\.avi|\.wmv))[\"\'].*>.*<img.*src=[\"\'](.*\.jpe?g)[\"\'].*>.*</a>#isU", $fileContents, $matches, PREG_SET_ORDER); 
    print_r($matches);
    return $matches;
}
Cheers
Jason