Page 1 of 1

regex and lookaround or backreferences

Posted: Fri Jan 07, 2005 1:21 am
by rehfeld
im not exactly sure whther i need lookaround or backreferences in this scenario, im not very familiar w/ either.

like for example, i want to be able to match the attribute values for ALL of the tags in my example.

my problem is, i need to start the match w/ either
attribute="
or
attribute='

but then i need the regex to remember whether it started w/ a double or single quote, and use that as the ending delimiter.
i made a pattern but obviously it wont work if the value is something like

attribute="foo's"

because im trying to capture foo's not just foo

Code: Select all

<?php

$subject = '

<tag attribute="value">
<tag attribute="foo''s">
<tag attribute="a ''value'' with quotes">
<tag attribute=''another "value" with """quotes''>

';



$pattern = '/attribute=(''|")([^''"]*)(''|")/i';

preg_match_all($pattern, $subject, $matches);

?>

so its like i need to capture which character i matched in the first parenthesis, then use that character again in the second and third parenthesis. i just dont know how.

the results im trying to acheive from the above $subject are as follows

Code: Select all

value
foo's
a 'value' with quotes
another "value" with """quotes

any direction you could give me is appreciated

Posted: Fri Jan 07, 2005 1:26 am
by feyd
backreferences...

Code: Select all

$pattern = '#attribute\s*=\s*(["''])?[^\s]*?\\\\1#';

here's a tag stripper I wrote a while ago:
Simple REGEX needed... wrote:

Code: Select all

<?php

$test = '<TD WIDTH="14%" BACKGROUND="images.jpg"><A HREF="http://something.xxx">
<IMG SRC="image.gif" BORDER="0" ONLOAD="if (this.width>50) this.border=1" ALT="Preview by Thumbshots"
WIDTH="45">testestets>blah</A></TD>';

echo htmlentities(preg_replace('#<.*?(\s+[\w\W]+?(\s*=\s*([''"]?).*?\\\\3))*?>#s','',$test),ENT_QUOTES);

?>
outputs

Code: Select all

testestets&gt;blah
you can probably adapt that to extract all the tag attributes as well..

nice post count, btw.. :twisted:

Posted: Sat Jan 08, 2005 6:18 pm
by rehfeld
ahh i see.


thanks, makes sense now. :)