Url finding regex

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
dayyanb
Forum Commoner
Posts: 46
Joined: Wed Jan 23, 2008 12:34 am

Url finding regex

Post by dayyanb »

I wanted a regex to search a page for urls that are inside a html attribute.

I wrote this one, does it look good?

Is there any way to get around doubling the number of \?

Code: Select all

 
<?php
preg_match_all('/(?:(?:href|src|data|action)\s*=\s*(?:"(?:((?:\\\\.|[^\\\\])*?)")|\'(?:((?:\\\\.|[^\\\\])*?)\')|\s?(?:((?:\\\\.|[^\\\\])*?)(?:\s|>))))/is',$pagedata,$var,PREG_SET_ORDER);
?>
 
Last edited by dayyanb on Sat Feb 09, 2008 4:29 pm, edited 1 time in total.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Re: Url finding regex

Post by GeertDD »

dayyanb wrote:Is there any way to get around doubling the number of \?
Try this

Code: Select all

 
echo '\\s'; // output: \s
echo '\s'; // output: \s
 
Since \s has no other special meaning inside a PHP string, the backslash is preserved (2nd example). The only special chars between single quotes are: \' to escape other single quotes and \\ for a single backslash. So whether you use \\s or \s doesn't matter here.
dayyanb
Forum Commoner
Posts: 46
Joined: Wed Jan 23, 2008 12:34 am

Re: Url finding regex

Post by dayyanb »

Ok thanks maybe I can make my code more legible now.
Post Reply