Page 1 of 1

extract href

Posted: Sun Sep 07, 2008 6:12 pm
by yacahuma
Hello,

I want to parse the page produce by a web browser when you allow directory browsing.
an example

Code: Select all

 
<html><head><META http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>mediatracker.selfip.com - /mtcanal2/</title></head><body><H1>test - /mtcanal2/</H1><hr> 
 
<pre><A HREF="/">[To Parent Directory]</A><br><br>     Tuesday, August 26, 2008  5:59 PM    147345966 <A HREF="/mtcanal2/canal_2_08_26_08_5pm.wmv">canal_2_08_26_08_5pm.wmv</A><br>    Thursday, August 28, 2008  2:39 PM   2569751096 <A HREF="/mtcanal2/canal_2_08_27_08_11pm.wmv">canal_2_08_27_08_11pm.wmv</A><br>   Wednesday, August 27, 2008  9:47 PM    260430522 <A HREF="/mtcanal2/canal_2_08_27_08_5pm.wmv">canal_2_08_27_08_5pm.wmv</A><br></pre><hr></body></html>
 
All I want to at the end is a an array full of links
like
[0]= /mtcanal2/canal_2_08_26_08_5pm.wmv
[1]=/mtcanal2/canal_2_08_27_08_11pm.wmv
[3]=/mtcanal2/canal_2_08_27_08_5pm.wmv

I created this script

Code: Select all

 
$str = implode("",file('http://localhost/dir'));
$returnArray=array(); 
$regex_pattern = "/<A HREF=\"(.*)\">(.*)<\/A>/";
preg_match_all($regex_pattern,$str, $returnArray);
print_r($returnArray);
 
But it does not work.

Re: extract href

Posted: Mon Sep 08, 2008 9:01 am
by yacahuma
Hello

I kept searching and I finally found the answer on the web

this is the code

Code: Select all

 
function getFileLinksFromUrl($url,$ext)
{
$str = implode("",file($url));
 
$matches = array();
 
$regex_pattern = "/A[\s]+[^>]*?HREF[\s]?=[\s\"\']+".
                    "(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/A>/";
 
preg_match_all ($regex_pattern,$str, $matches);
        
$matches = $matches[1];
$list = array();
$files = array();
foreach($matches as $var)
{    
   $path_parts = pathinfo($var);
   if (isset($path_parts['extension']) &&  $path_parts['extension'] == $ext)
      $files[] =  $path_parts['filename'] . '.' . $path_parts['extension'];
}
return $files;
 
}//end of