I have spent the last 2 days trying to find or build a script that will read a web page from a url, and do a preg_match on all domain types found on the page. I have had very little luck, so ANY help would be excellento.
basically, heres what Im trying to do.
using a form, I want to enter a url, submit, and have the php script fopen the page and go through all the html or text and create a list of urls and domains in that page. But It's been hard making it recognize the difference in urls with or without the http.
so if a page had say 6 text written urls or domains...all different like:
http://www.domain.com
http://domain2.net
http://www.domain3.org
http://www.domain4.info
domain5.com
123domain.com
I want to list all those just like above... nothing fancy...
Here is some "tryout code" I've tried... Im not a PRO
This first one half way does it... it does not convert all the domains and urls. Plus it also returns text and content surrounding the domains and urls.
Code: Select all
<?php
function makeLinks($text)
{
$text = eregi_replace('(((f|ht){1}tp://)[-a-zA-Z0-9@:%_\+.~#?&//=]+)', '<a href="\\1">\\1</a>', $text);
$text = eregi_replace('([[]()[{}])(www.[-a-zA-Z0-9@:%_\+.~#?&//=]+)', '\\1<a href="http://\\2">\\2</a>', $text);
return ($text);
}
$the_page = fopen($url, "r");
while(!feof($the_page))
{
$each_line = fgetss($the_page, 80000);
echo makeLinks($each_line);
}
fclose($the_page);
?>now this one... works better, but does not match all the domains and urls. only if they have http in them. But like I mentioned above... I need more flexibility.
Code: Select all
<?php
function instring($String,$Find,$CaseSensitive = false)
{
$i=0;
while (strlen($String)>=$i)
{
unset($substring);
if ($CaseSensitive)
{
$Find=strtolower($Find);
$String=strtolower($String);
}
$substring=substr($String,$i,strlen($Find));
if ($substring==$Find) return true;
$i++;
}
return false;
}
if($url)
{
$html = @implode("",file($url));
@preg_match_all('(((f|ht){1}tp://)[-a-zA-Z0-9@:%_\+.~#?&//=]+)', $html, $matches);
for ($i=0; $i< count($matches[0]); $i++)
{
if(instring($matches[0][$i], $find, $CaseSensitive = false))
{
$no=1;
}
else
{
echo $matches[0][$i]."<BR>";
}
}
?>So... I need to get these
http://www.domain.com
http://domain2.net
http://www.domain3.org
http://www.domain4.info
domain5.com
123domain.com
out of a web page and make it print out like this
http://www.domain.com
http://domain2.net
http://www.domain3.org
http://www.domain4.info
http://domain5.com
http://123domain.com
In other words... I want to be able to scan all my pages from all my sites and create a list of urls that were found.
PLEASE
I sincerely appreciate any help you got.
Thanks in advance.