Page 1 of 1

Fetch URLs from HTML page

Posted: Thu May 28, 2009 7:41 am
by usamaalam
Hello everybody,

I need your help to fetch URLs from an HTML page. I have HTML of a page in a string variable and I need to fetch all the URLs of images, css, javascript etc. URL can be "http://www.abc.com/images/myimage.jpg" or "images/myimage.jpg" or "myimages/myimage.jpg" or "style/style.css" etc.

Is there a way to do this with PHP?

Thanks.

Re: Fetch URLs from HTML page

Posted: Thu May 28, 2009 8:56 am
by jaoudestudios
Depending on what you are trying to achieve you might be able to use wget - if you're on linux

Re: Fetch URLs from HTML page

Posted: Thu May 28, 2009 2:40 pm
by prometheuzz
usamaalam wrote:...
Is there a way to do this with PHP?

Thanks.
Sure, use an html parser.
This looks like a simple one: http://simplehtmldom.sourceforge.net/

And a regex solution may look like:

Code: Select all

$url = 'http://forums.devnetwork.net/viewtopic.php?f=38&t=100899&p=542419#p542419';
$content = file_get_contents($url); 
preg_match_all("/(?:(?<=href=['\"])|(?<=src=['\"]))[^'\"]+(?=['\"])/i", $content, $links);
print_r($links);
Although I'd opt for a html parser.