Spidering a reciprocol link site for a link
Moderator: General Moderators
Spidering a reciprocol link site for a link
Hi there,
I need some simple code to spider a site, given the site's URL, for a link to my page so that I can check automatically over a 3 week period after requesting a link whether the webmaster has added my link to one of their pages.
I need it to spider to whole site because it is not always clear which page the link may be added to.
Any help would be greatly appreciated - it just looks like a simple regex() and fopen() script.
Many thanks
Mark
I need some simple code to spider a site, given the site's URL, for a link to my page so that I can check automatically over a 3 week period after requesting a link whether the webmaster has added my link to one of their pages.
I need it to spider to whole site because it is not always clear which page the link may be added to.
Any help would be greatly appreciated - it just looks like a simple regex() and fopen() script.
Many thanks
Mark
Script checker
Hi,
I appreciate the response, but I'm afraid the script's beyond me for the moment.
All I need is a script that, given an URL, will return all the links <A HREFs> on the page, likely using fopen() and regex(). I can do the rest.
In fact, can anyone give me a regex() that will return the URL in a <A HREF""> tag?
Cheers
Mark
I appreciate the response, but I'm afraid the script's beyond me for the moment.
All I need is a script that, given an URL, will return all the links <A HREFs> on the page, likely using fopen() and regex(). I can do the rest.
In fact, can anyone give me a regex() that will return the URL in a <A HREF""> tag?
Cheers
Mark
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
Code: Select all
$urls = array( 'href', 'src', 'action', 'background' ); // resolve these attributes from the text
$urls = implode( '|', $urls );
preg_match_all( '#\s+?(' . $urls . ')\s*?=\s*?(ї''"]?)(.*?)\\2ї\s\>]#is', $data, $matches );
print_r($matches);Hi feyd
Thanks, but I'm getting these errors with the following script:
Any idea what's wrong with this? It seems to be warning me that the file open was successful!?
Cheers
Mark
Thanks, but I'm getting these errors with the following script:
Code: Select all
Warning: fopen("http://www.???.biz/index.php", "r") - Success in /home/XXX/quickcheck.php on line 5 Warning: stat failed for http://www.???.biz/index.php (errno=2 - No such file or directory) in /home/XXX/quickcheck.php on line 6 Warning: Supplied argument is not a valid File-Handle resource in /home/XXX/quickcheck.php on line 6 Array ( ї0] => Array ( ) ї1] => Array ( ) ї2] => Array ( ) ї3] => Array ( ) ) Warning: Supplied argument is not a valid File-Handle resource in /home/XXX/quickcheck.php on line 14Code: Select all
<?php
// URL finding code
// get contents of a file into a string
$filename = $_GET['url'];
$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
// Retrieve all URLs from the HTML
$urls = array( 'href', 'src', 'action', 'background' ); // resolve these attributes from the text
$urls = implode( '|', $urls );
preg_match_all( '#\s+?(' . $urls . ')\s*?=\s*?([''"]?)(.*?)\\2[\s\>]#is', $content, $matches );
print_r($matches);
fclose($handle);
?>Cheers
Mark
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
Code: Select all
function fileContents($file)
{
$fp = @fopen($url, 'rb');
if(!$fp) return '';
$contents = '';
while(feof($fp) !== false)
$contents .= fread($fp, 1024);
fclose($fp);
return $contents;
}Thanks feyd,
It doesn't seem to be returning any contents - for example in this case I used http://www.google.com/index.html.
Cheers
Mark
Code: Select all
Array ( ї0] => Array ( ) ї1] => Array ( ) ї2] => Array ( ) ї3] => Array ( ) )Cheers
Mark
Feyd,
Using the following script
I'm getting the following output for ?url=http://www.google.com/index.html:
Any ideas? It returns the same for other URLs.
Cheers
Mark
Using the following script
Code: Select all
<?php
// URL finding code
// get contents of a file into a string
function fileContents($file)
{
$fp = @fopen($file, 'rb');
if(!$fp) return '';
$contents = '';
while(feof($fp) !== false)
$contents .= fread($fp, 1024);
fclose($fp);
return $contents;
}
$filename = $_GETї'url'];
$contents = fileContents( $filename );
// Retrieve all URLs from the HTML
$urls = array( 'href', 'src', 'action', 'background' ); // resolve these attributes from the text
$urls = implode( '|', $urls );
preg_match_all( '#\s+?(' . $urls . ')\s*?=\s*?(ї''"]?)(.*?)\\2ї\s\>]#is', $content, $matches );
print_r($matches);
?>Code: Select all
Array ( ї0] => Array ( ) ї1] => Array ( ) ї2] => Array ( ) ї3] => Array ( ) )Cheers
Mark