Page 1 of 1

Url finding regex

Posted: Fri Jan 25, 2008 7:59 pm
by dayyanb
I wanted a regex to search a page for urls that are inside a html attribute.

I wrote this one, does it look good?

Is there any way to get around doubling the number of \?

Code: Select all

 
<?php
preg_match_all('/(?:(?:href|src|data|action)\s*=\s*(?:"(?:((?:\\\\.|[^\\\\])*?)")|\'(?:((?:\\\\.|[^\\\\])*?)\')|\s?(?:((?:\\\\.|[^\\\\])*?)(?:\s|>))))/is',$pagedata,$var,PREG_SET_ORDER);
?>
 

Re: Url finding regex

Posted: Sat Feb 09, 2008 4:09 pm
by GeertDD
dayyanb wrote:Is there any way to get around doubling the number of \?
Try this

Code: Select all

 
echo '\\s'; // output: \s
echo '\s'; // output: \s
 
Since \s has no other special meaning inside a PHP string, the backslash is preserved (2nd example). The only special chars between single quotes are: \' to escape other single quotes and \\ for a single backslash. So whether you use \\s or \s doesn't matter here.

Re: Url finding regex

Posted: Sat Feb 09, 2008 4:25 pm
by dayyanb
Ok thanks maybe I can make my code more legible now.