Any questions involving matching text strings to patterns - the pattern is called a "regular expression."
Moderator: General Moderators
HCBen
Forum Commoner
Posts: 33 Joined: Thu Jun 22, 2006 3:15 pm
Location: Indiana
Post
by HCBen » Sun Feb 24, 2008 12:40 pm
Anyone see a way that I could streamline this pattern, or if there are any issues I need to be aware of:
Code: Select all
'/<[\w]+\s*[a-z_0-9;\-=:"\'\s]*[id|class]=["\']?f:([\w]+)["\']?\s*[a-z_0-9;\-=:"\'\s]*>(.+?)<\/[\w]+>/xius'
I'm using it to match any html tag with an id value or class name prefaced with "f:" to obtain the text within the open/close tags - and I needed to allow for any additional tag attributes.
It works fine and I haven't run into any problems so far. Just thought I'd throw it out there and see what others think.
Thanks,
Ben
John Cartwright
Site Admin
Posts: 11470 Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:
Post
by John Cartwright » Sun Feb 24, 2008 4:06 pm
I dont know if mine turned out much better,
Code: Select all
$foo = '<textarea type="foo" id="f:foobarington" style=2"foo">input 1</textarea>
<textarea type="fee" class="f:feebarington">input 2</textarea>';
preg_match_all('#<\w+.*?[id|class]=["\']f:([^"\']+)["\'].*?>(.*?)<[^>]+>#is', $foo, $matches);
echo '<pre>';
print_r($matches);
Code: Select all
Array
(
....
[1] => Array
(
[0] => foobarington
[1] => feebarington
)
[2] => Array
(
[0] => input 1
[1] => input 2
)
)
//step in regex guru
jmut
Forum Regular
Posts: 945 Joined: Tue Jul 05, 2005 3:54 am
Location: Sofia, Bulgaria
Contact:
Post
by jmut » Mon Feb 25, 2008 12:47 am
Just be very careful if allowing users to add attributes.
e.g this will execute the javascript in IE, despite the fact it's part of style sheet. Don't ask why, they just thought it's cool I guess.
<div style="background:url('javascript:alert(1)')">
</div>
HCBen
Forum Commoner
Posts: 33 Joined: Thu Jun 22, 2006 3:15 pm
Location: Indiana
Post
by HCBen » Mon Feb 25, 2008 3:46 pm
Jcart - Your's is better. I changed it slightly and had to allow for class/id's without quotes (as I don't have full control of the
) :
Code: Select all
#<\w+[^>]*(?:id|class)=["\']?f:(\w+)["\']?[^>]*>(.+?)<[^>]+>#
Also, I replaced .*? with [^>]* because it's safer, I believe. Other than that I'm not sure how much more can be done to improve it now...
It works great!
Thanks,
Ben