Page 2 of 2
Posted: Thu Jan 27, 2005 2:21 pm
by mjseaden
feyd,
Really sorry about this, but it still doesn't seem to be working.
I'm still getting empty returns when trying for
http://www.google.co.uk/index.html.
The code is
Code: Select all
<?php
// URL finding code
// get contents of a file into a string
function fileContents($file)
{
$fp = @fopen($file, 'rb');
if(!$fp) return '';
$contents = '';
while(feof($fp) !== false)
$contents .= fread($fp, 1024);
fclose($fp);
return $contents;
}
$filename = $_GETї'url'];
$contents = fileContents( $filename );
// Retrieve all URLs from the HTML
$urls = array( 'href', 'src', 'action', 'background' ); // resolve these attributes from the text
$urls = implode( '|', $urls );
preg_match_all( '#\s+?(' . $urls . ')\s*?=\s*?(ї''"]?)(.*?)\\2ї\s\>]#is', $contents, $matches );
print_r($matches);
?>
and the output is
Code: Select all
Array ( ї0] => Array ( ) ї1] => Array ( ) ї2] => Array ( ) ї3] => Array ( ) )
When I echo $contents, it's not returning any data, so it appears the fileContents function isn't working. I know
http://www.google.co.uk/index.html exists.
Any idea what's going on?
Cheers
Mark
Posted: Thu Jan 27, 2005 2:38 pm
by mjseaden
Feyd!
I've fixed the file opening code using file().
I get the following output:
Code: Select all
Array ( ї0] => Array ( ї0] => href="/imghp?hl=en&tab=wi&ie=UTF-8"> ї1] => href="/grphp?hl=en&tab=wg&ie=UTF-8"> ї2] => href="/nwshp?hl=en&tab=wn&ie=UTF-8"> ї3] => href="/options/index.html" ї4] => href=/advanced_search?hl=en> ї5] => href=/preferences?hl=en> ї6] => href=/language_tools?hl=en> ї7] => href="/ads/"> ї8] => href=/services/> ї9] => href=/intl/en/about.html> ї10] => href=http://www.google.co.uk/jobs/> ї11] => href=http://www.google.com/ncr> ) ї1] => Array ( ї0] => href ї1] => href ї2] => href ї3] => href ї4] => href ї5] => href ї6] => href ї7] => href ї8] => href ї9] => href ї10] => href ї11] => href ) ї2] => Array ( ї0] => " ї1] => " ї2] => " ї3] => " ї4] => ї5] => ї6] => ї7] => " ї8] => ї9] => ї10] => ї11] => ) ї3] => Array ( ї0] => /imghp?hl=en&tab=wi&ie=UTF-8 ї1] => /grphp?hl=en&tab=wg&ie=UTF-8 ї2] => /nwshp?hl=en&tab=wn&ie=UTF-8 ї3] => /options/index.html ї4] => /advanced_search?hl=en ї5] => /preferences?hl=en ї6] => /language_tools?hl=en ї7] => /ads/ ї8] => /services/ ї9] => /intl/en/about.html ї10] => http://www.google.co.uk/jobs/ ї11] => http://www.google.com/ncr ) )
With the following code:
Code: Select all
<?php
$filename = $_GETї'url'];
$contents = implode('', file($filename));
// Retrieve all URLs from the HTML
$urls = array( 'href' ); // resolve these attributes from the text
$urls = implode( '|', $urls );
preg_match_all( '#\s+?(' . $urls . ')\s*?=\s*?(ї''"]?)(.*?)\\2ї\s\>]#is', $contents, $matches );
print_r($matches);
?>
This looks good, as it looks correct! However, it looks like a double dimension array, and some of the array elements seem to just store 'href', some only ".
Is there any way to get a straight one-dimensional array with just the HREF="<contents>" <contents> stored in each element?
I'd really appreciate your help on this, as I'll be able to continue with my project.
Cheers
Mark
Posted: Thu Jan 27, 2005 2:38 pm
by feyd
there were several problems... of which I went through and fixed.. haven't tested it on much though..
Code: Select all
<?php
// URL finding code
// get contents of a file into a string
function fileContents($file)
{
$fp = fopen($file, 'rb');
if(!$fp) return '';
$contents = '';
while(!feof($fp))
{
$contents .= fread($fp, 1024);
}
fclose($fp);
return $contents;
}
//$filename = $_SERVERї'argv']ї1];
$filename = $_GETї'url'];
$contents = fileContents( $filename );
var_export($contents);
// Retrieve all URLs from the HTML
$urls = array( 'href', 'src', 'action', 'background' ); // resolve these attributes from the text
$urls = implode( '|', $urls );
preg_match_all( '#(?<!їa-z0-9])(' . $urls . ')\s*?=\s*?(ї''"]?)(.*?)\\2ї\s>]#is', $contents, $matches );
print_r($matches);
?>
Posted: Thu Jan 27, 2005 2:38 pm
by John Cartwright
you know there is an edit button...
no need for 4 posts in a row.
Posted: Thu Jan 27, 2005 2:41 pm
by feyd
getting a unidimensional version is just $matches[0]
Posted: Thu Jan 27, 2005 2:46 pm
by mjseaden
function 'var_export' is not recognised. Hmm.
Posted: Thu Jan 27, 2005 2:48 pm
by feyd
just comment that line out.. it was for debugging.
Posted: Thu Jan 27, 2005 2:53 pm
by mjseaden
Hi feyd
When I
I don't get any output apart from the word 'Array'.
Is there any way to get it in serial elements, with the URL in each element?
I hope that makes sense - then this whole issue is resolved!
Many thanks
Mark
Posted: Thu Jan 27, 2005 2:55 pm
by feyd
$matches[0] is an array... print_r($matches[0]);
silly monkey..

Posted: Thu Jan 27, 2005 2:57 pm
by mjseaden
Got it! Needed print_r($matches[3])
Thanks a lot Feyd, I appreciate it!