regex and lookaround or backreferences

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
rehfeld
Forum Regular
Posts: 741
Joined: Mon Oct 18, 2004 8:14 pm

regex and lookaround or backreferences

Post by rehfeld »

im not exactly sure whther i need lookaround or backreferences in this scenario, im not very familiar w/ either.

like for example, i want to be able to match the attribute values for ALL of the tags in my example.

my problem is, i need to start the match w/ either
attribute="
or
attribute='

but then i need the regex to remember whether it started w/ a double or single quote, and use that as the ending delimiter.
i made a pattern but obviously it wont work if the value is something like

attribute="foo's"

because im trying to capture foo's not just foo

Code: Select all

<?php

$subject = '

<tag attribute="value">
<tag attribute="foo''s">
<tag attribute="a ''value'' with quotes">
<tag attribute=''another "value" with """quotes''>

';



$pattern = '/attribute=(''|")([^''"]*)(''|")/i';

preg_match_all($pattern, $subject, $matches);

?>

so its like i need to capture which character i matched in the first parenthesis, then use that character again in the second and third parenthesis. i just dont know how.

the results im trying to acheive from the above $subject are as follows

Code: Select all

value
foo's
a 'value' with quotes
another "value" with """quotes

any direction you could give me is appreciated
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

backreferences...

Code: Select all

$pattern = '#attribute\s*=\s*(["''])?[^\s]*?\\\\1#';

here's a tag stripper I wrote a while ago:
Simple REGEX needed... wrote:

Code: Select all

<?php

$test = '<TD WIDTH="14%" BACKGROUND="images.jpg"><A HREF="http://something.xxx">
<IMG SRC="image.gif" BORDER="0" ONLOAD="if (this.width>50) this.border=1" ALT="Preview by Thumbshots"
WIDTH="45">testestets>blah</A></TD>';

echo htmlentities(preg_replace('#<.*?(\s+[\w\W]+?(\s*=\s*([''"]?).*?\\\\3))*?>#s','',$test),ENT_QUOTES);

?>
outputs

Code: Select all

testestets&gt;blah
you can probably adapt that to extract all the tag attributes as well..

nice post count, btw.. :twisted:
rehfeld
Forum Regular
Posts: 741
Joined: Mon Oct 18, 2004 8:14 pm

Post by rehfeld »

ahh i see.


thanks, makes sense now. :)
Post Reply