regex for html links

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Extremest
Forum Commoner
Posts: 84
Joined: Mon Aug 29, 2005 12:39 pm

regex for html links

Post by Extremest »

I am working on a spider with multi_curl and am having trouble with the regex to find the links in the content. I am currently using this regex for it yet it does not grab them all for some reason. Could anyone please help me.

Code: Select all

function links($site){ 
//Pattern building across multiple lines to avoid page distortion.
$pattern = "/((@import\s+[\"'`]([\w:?=@&\/#._;-]+)[\"'`];)|";
$pattern .= "(:\s*url\s*\([\s\"'`]*([\w:?=@&\/#._;-]+)";
$pattern .= "([\s\"'`]*\))|<[^>]*\s+(src|href|url)\=[\s\"'`]*";
$pattern .= "([\w:?=@&\/#._;-]+)[\s\"'`]*[^>]*>))/i";
//End pattern building.
preg_match_all ($pattern, $site, $matches);
return (is_array($matches)) ? $matches:FALSE;
}
Extremest
Forum Commoner
Posts: 84
Joined: Mon Aug 29, 2005 12:39 pm

Post by Extremest »

I am sorry that regex is fine. I have got that working fine. Just having some problems with removing the ones that I don't want. For some reason it is removing some that are fine and there is not even a match.
Post Reply