preg_match to find urls on page

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

preg_match to find urls on page

Post by josh »

uhh... The preg match special characters confuse me

First of all I'm sure this has been asked before, I couldn't find anything and I'm in a hurry to get this done

(The show 'lost' comes on soon heh)

Code: Select all

<?php
$html=file_get_contents($url);
$html = preg_match_all("/(http:\/\/(.*))[\s]*/", $html, $matches);
// I know it's print_r, print_arr is a function I made that color codes the array
print_arr($matches);
?>
It's supposed to return an array of all the URL's on any given page, for some reason it's going crazy... try it out you'll see

I suck with all these special characters heh, anyone know what I'm doing wrong?


EDIT... seems to work on a few pages i try but not in all cases, try to run it on http://www.google.com/search?hl=en&q=te ... gle+Search for example
rehfeld
Forum Regular
Posts: 741
Joined: Mon Oct 18, 2004 8:14 pm

Post by rehfeld »

for starters replace (.*) w/ (.*?)

currently its being "greedy", where it tries to match the largest possible matches
you want it to match the shortest possible match

theres lots of things you will need to change though,
that pattern is far too simple to be effective

this place helped me w/ regex immensly

http://www.regular-expressions.info/tutorial.html
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Post by josh »

Thank you
Post Reply