Page 1 of 2
preg_replace regex nightmare! Simple expression or no?
Posted: Wed Oct 27, 2004 12:33 pm
by Calipoop
I have a guestbook type feature with the following xml structure:
<entry id="12341234" published="true">
<text></text>
</entry>
Using php, I'm trying to manipulate each individual entry by the id# (i.e. replace or delete an entry). I can match the first tag easily using
Code: Select all
<?php
preg_replace("#(<$idEntry).*.entry>#U", "DELETED", $entry);
?>
where $idEntry is 'entry id="12341234"'
But chaos ensues if there's line breaks in the text. There's got to be a simple way to match the entire entry from <entry id...> to </entry> no matter what text is in the <text> field - multiple line breaks, question marks, blah blah blah. Right?
Any advice?
Posted: Wed Oct 27, 2004 1:46 pm
by redmonkey
Code: Select all
<?php
$entry = preg_replace('/^<' . $idEntry . '.*?^<\/entry>/ms', 'DELETED', $entry);
?>
Assumes of course that the actual text you are trying to replace matches that of your sample above.
Posted: Wed Oct 27, 2004 1:57 pm
by Calipoop
Thanks for the reply red.
I tested it out - no luck, it didn't match anything. I'm gonna take a look at your expression as a starting point.
First question is when you use '.' and the s modifier is set, do you need to escape the . for concat?
Also never really understood the ^
Posted: Wed Oct 27, 2004 2:06 pm
by redmonkey
Calipoop wrote:Thanks for the reply red.
I tested it out - no luck, it didn't match anything. I'm gonna take a look at your expression as a starting point.
As mentioned, the regex assumes that the text you are matching is the same as the sample you detailed above or more specifically that the opening and closing 'entry' tags are at the very begining of a new line.
e.g. the regex will match....
Code: Select all
<guestbook>
<entry id="12341234" published="true">
<text></text>
</entry>
</guestbook>
...but will not match....
Code: Select all
<guestbook>
<entry id="12341234" published="true">
<text></text>
</entry>
</guestbook>
....nor will it match....
Code: Select all
<guestbook>
<entry id="12341234" published="true"><text></text></entry>
</guestbook>
Calipoop wrote:First question is when you use '.' and the s modifier is set, do you need to escape the . for concat?
No.
Posted: Wed Oct 27, 2004 2:14 pm
by Calipoop
Oh I see. Well that stinks, is there a way around that? There are lots of newlines and whitespace in my xml file - is there a way for regex to match
<$idEntry
then anything in between including new lines and whitespace;
then end with the first occurrance of </entry> on any line?
(where $idEntry is of the form: entry id="unique")
Posted: Wed Oct 27, 2004 2:32 pm
by redmonkey
Code: Select all
<?php
$entry = preg_replace('/<' . $idEntry . '.*?<\/entry>/s', 'DELETED', $entry);
?>
It doesn't stink, it is by design. Personally I feel it is better to create your regex to be as restrictive as possible then open it as and when needed rather than using the 'match all' dot syntax and then running into an unforeseen pattern match and loosing data.
Posted: Wed Oct 27, 2004 3:07 pm
by Calipoop
HAH! sorry bout the stink comment, I'm just frustrated. Anyway, the new expression you gave looks to be close, but your expression is doing something similar to what my latest expression is doing:
it's deleting everything before and after the one entry that I want deleted. I get a whole slew of repeating DELETED all 1 space away from eachother. I had been getting a similar result as this except instead of spaces separating the DELETED it was a single letter, presumably from the previous entries.
Thanks again for your help - this is day two of my fight. I wonder if the deleting BEFORE and AFTER is a key into the error? I print out the $idEntry each time and it does in fact match exactly one entry in the fields. Don't understand why it's grabbing everything...!
Posted: Wed Oct 27, 2004 3:38 pm
by Calipoop
Red, I'm getting close thanks to your help, but not quite there.
This expression:
Code: Select all
<?php
$entry = preg_replace('/<entry.*?<\/entry>/s', 'DELETED', $entry,1);
?>
deletes only the entire first entry. The '1' parameter limits it to one match.
Again, I match the exact value for $idEntry to the value in the xml file, but the replace function is just not finding it. The $idEntry value in the xml file is definitely on a different line than from where the search starts. I think that is the problem...?
Posted: Wed Oct 27, 2004 3:43 pm
by redmonkey
Unsure what exactly the problem is? perhaps you can supply some of the actual XML file you are working with?
The following works as I expect it to, but perhaps I'm not reading your requirements correctly.
Code: Select all
<?php
$entry = '<entry id="12341233" published="true">
<text>Some text1</text>
</entry>
<entry id="12341234" published="true">
<text>Some text2</text>
</entry>
<entry id="12341235" published="true">
<text>Some text3</text>
</entry>';
$idEntry = 'entry id="12341234"';
$entry = preg_replace('/<' . $idEntry . '.*?<\/entry>/ms', "DELETED", $entry);
echo $entry;
?>
outputs...
Code: Select all
<entry id="12341233" published="true">
<text>Some text1</text>
</entry>
DELETED
<entry id="12341235" published="true">
<text>Some text3</text>
</entry>
Posted: Wed Oct 27, 2004 3:44 pm
by Calipoop
Code: Select all
<?php
$tempstories = preg_replace('/<entry id="1098494923933".*?<\/entry>/s', 'DELETED', $tempstories,1);
?>
with the above expression, I've substituted the exact value of $idEntry in the match expression, and it works. So this would mean that this part of your expression:
' . $idEntry . '
is not translating right?
Posted: Wed Oct 27, 2004 3:49 pm
by redmonkey
I'd more inclined to think that the value of $idEntry is not being set right at your end. Have you tried to echo this variable and confirmed it's value is as expected?
Posted: Wed Oct 27, 2004 4:03 pm
by Calipoop
yeah, I've definitely confirmed the entry id every time - it's being set correctly even without spaces and everything. printed out EXACTLY:
entry id="1098494923933"
could it be getting confused cause there are double quotes in the variable? Maybe the double quotes need escaping or something?
I can supply the xml if you think it will help, but God, I was avoiding it cause it's a complete mess. The xml code is functional, but the formatting is a disaster. I think the fact that I could extract the entry without the variable means it's a variable problem?
thanks for stickin with me!
Posted: Wed Oct 27, 2004 4:05 pm
by Calipoop
WAIT
HOLD UP I THINK I KNOW WHAT;s WRONG
Posted: Wed Oct 27, 2004 4:12 pm
by Calipoop
GOT IT. MAN THANK YOU SO MUCH.
LAST THING - the expression that worked is kind of a hybrid of yours and mine (more of yours than mine) I'm using:
Code: Select all
<?php
$entry = preg_replace('/<' . $idEntry . '.*?<\/entry>/s', 'DELETED', $entry,1);
?>
you're using:
Code: Select all
<?php
$entry = preg_replace('/<' . $idEntry . '.*?<\/entry>/ms', "DELETED", $entry);
?>
the difference in mine is there's no /m modifier and I've added the 1 parameter cause there will only be 1 unique entry. Are those differences likely to make a big difference? I must admit, (and as you can probably tell) I'm not very experienced with modifiers and regex in general...
Posted: Wed Oct 27, 2004 4:27 pm
by redmonkey
The 'm' modifer is not required for the regex you are using, my use of that modifer in my last offering was a mistake on my part

(copying and pasting and not paying attention). The regex in my second example would be the one to use.
I am still confused as to why you are needing to specify the '1' parameter as this should not be the case. But then again without actually seeing the file you are working with it is difficult to know exactly what is going on.
If it works for you then perhaps you should just accept it but, to me it doesn't seem right so I suspect there may be another underlying problem.