Page 1 of 1

Improvements to my pattern?

Posted: Sun Feb 24, 2008 12:40 pm
by HCBen
Anyone see a way that I could streamline this pattern, or if there are any issues I need to be aware of:

Code: Select all

'/<[\w]+\s*[a-z_0-9;\-=:"\'\s]*[id|class]=["\']?f:([\w]+)["\']?\s*[a-z_0-9;\-=:"\'\s]*>(.+?)<\/[\w]+>/xius'
I'm using it to match any html tag with an id value or class name prefaced with "f:" to obtain the text within the open/close tags - and I needed to allow for any additional tag attributes.

It works fine and I haven't run into any problems so far. Just thought I'd throw it out there and see what others think.

Thanks,
Ben

Re: Improvements to my pattern?

Posted: Sun Feb 24, 2008 4:06 pm
by John Cartwright
I dont know if mine turned out much better,

Code: Select all

$foo = '<textarea type="foo" id="f:foobarington" style=2"foo">input 1</textarea>
 
<textarea type="fee" class="f:feebarington">input 2</textarea>';
 
preg_match_all('#<\w+.*?[id|class]=["\']f:([^"\']+)["\'].*?>(.*?)<[^>]+>#is', $foo, $matches);
 
echo '<pre>';
print_r($matches);

Code: Select all

Array
(
    ....
 
    [1] => Array
        (
            [0] => foobarington
            [1] => feebarington
        )
 
    [2] => Array
        (
            [0] => input 1
            [1] => input 2
        )
 
)
//step in regex guru

Re: Improvements to my pattern?

Posted: Mon Feb 25, 2008 12:47 am
by jmut
Just be very careful if allowing users to add attributes.

e.g this will execute the javascript in IE, despite the fact it's part of style sheet. Don't ask why, they just thought it's cool I guess.

<div style="background:url('javascript:alert(1)')">
</div>

Re: Improvements to my pattern?

Posted: Mon Feb 25, 2008 3:46 pm
by HCBen
Jcart - Your's is better. I changed it slightly and had to allow for class/id's without quotes (as I don't have full control of the :( ) :

Code: Select all

#<\w+[^>]*(?:id|class)=["\']?f:(\w+)["\']?[^>]*>(.+?)<[^>]+>#
Also, I replaced .*? with [^>]* because it's safer, I believe. Other than that I'm not sure how much more can be done to improve it now...

It works great!

Thanks,
Ben