Fetch URLs from HTML page

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
usamaalam
Forum Newbie
Posts: 6
Joined: Thu Oct 11, 2007 3:05 am

Fetch URLs from HTML page

Post by usamaalam »

Hello everybody,

I need your help to fetch URLs from an HTML page. I have HTML of a page in a string variable and I need to fetch all the URLs of images, css, javascript etc. URL can be "http://www.abc.com/images/myimage.jpg" or "images/myimage.jpg" or "myimages/myimage.jpg" or "style/style.css" etc.

Is there a way to do this with PHP?

Thanks.
User avatar
jaoudestudios
DevNet Resident
Posts: 1483
Joined: Wed Jun 18, 2008 8:32 am
Location: Surrey

Re: Fetch URLs from HTML page

Post by jaoudestudios »

Depending on what you are trying to achieve you might be able to use wget - if you're on linux
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Fetch URLs from HTML page

Post by prometheuzz »

usamaalam wrote:...
Is there a way to do this with PHP?

Thanks.
Sure, use an html parser.
This looks like a simple one: http://simplehtmldom.sourceforge.net/

And a regex solution may look like:

Code: Select all

$url = 'http://forums.devnetwork.net/viewtopic.php?f=38&t=100899&p=542419#p542419';
$content = file_get_contents($url); 
preg_match_all("/(?:(?<=href=['\"])|(?<=src=['\"]))[^'\"]+(?=['\"])/i", $content, $links);
print_r($links);
Although I'd opt for a html parser.
Post Reply