preg_replace problem

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
zenhop
Forum Newbie
Posts: 18
Joined: Mon Jan 19, 2009 2:18 pm

preg_replace problem

Post by zenhop »

Hello,
I wish to replace tags by a user defined function.
But only the first match is done. I don't understand why, but I guess it's a regex problem...

template.html:

Code: Select all

<html xmlns:fb="http://www.blackwizzard.com/ftl/1.1">
    <fb:message title="Hello">world</fb:message>
    <fb:message title="another message" message="another content" />
    <fb:box title="this is the box's title" width="80%">
        box's content
        <fb:message title="hello">
            a message in a box
        </fb:message>
    </fb:box>
</html>
regex.php:

Code: Select all

<?php
 
    function freadFile($file) {
        if (!$handle = fopen ($file, "r")) {
            exit;
        }
        $contents = fread ($handle, filesize($file)+1);
        fclose($handle);
        return $contents;
    }
    
    $buffer = freadFile("template.html");
    // <([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1[:]\2>
    // <\<fb:(.*?)\>(.*?)\</fb:message\>>
    $output = $buffer;
    $output = preg_replace("<\<fb:(.*?) (.*?)\>(.*?)\</fb:message\>>",replace_function('\\1','\\2','\\3'),$output,-1,$count);
    
    echo "<textarea cols=80 rows=20>$count DONE\n\n".$output."</textarea>";
    
    function replace_function($tagType, $tagArgs, $in) {
        return "[[type:$tagType args:($tagArgs) $in]]";
    }
?>
output:

Code: Select all

1 DONE
 
<html xmlns:fb="http://www.blackwizzard.com/ftl/1.1">
    [[type:message args:(title="Hello") world]]
    <fb:message title="another message" message="another content" />
    <fb:box title="this is the box's title" width="80%">
        box's content
        <fb:message title="hello">
            a message in a box
        </fb:message>
    </fb:box>
</html>
Do you understand why only the first <fb:message> get replaced?

thx in advance!
mintedjo
Forum Contributor
Posts: 153
Joined: Wed Nov 19, 2008 6:23 am

Re: preg_replace problem

Post by mintedjo »

I'm mostly guessing but maybe it will help.
1) It doesn't look like you have told it to look for the self closing tags at all

Code: Select all

<fb:message title="another message" message="another content" />

Code: Select all

preg_replace("<\<fb:(.*?) (.*?)\s*(\>(.*?)(\</fb:message\>|\\s*/\>)>",replace_function('\\1','\\2','\\3'),$output,-1,$count);
That might help to recognise the self closing tags - but might not if i've done it wrong. :-P

2) Lazy matching is probably not doing you any favours.
When it analyses this tag...

Code: Select all

<fb:box title="this is the box's title" width="80%">
the first

Code: Select all

(.*?)
(bare in mind there is a space at the end of that) will match title="this (with a space on the end). This will probably screw up your regex a bit.

3) You are only looking for a closing </fb:message> tag and never for the closing </fb:box> tags so there is a chance to mismatch

4) Theres no way for your method to analyse nested elements properly and with this approach even if you made some changes and got it almost working theres a chance you would end up matching...

Code: Select all

<fb:box title="this is the box's title" width="80%">
box's content
<fb:message title="hello">
a message in a box
</fb:message>
as one element which I am guessing is not what you want.

5) Using the <> as the regex delimiters is confusing :-o. Try to use something that doesn't occur in the regex itself and it will make your regex more readable.

I hope one of those is some help at least - and I hope none of them are wrong and misleading xD
zenhop
Forum Newbie
Posts: 18
Joined: Mon Jan 19, 2009 2:18 pm

Re: preg_replace problem

Post by zenhop »

Ok, let just concentrate on the <fb:message></fb:message> tags.
How can I do to get them all, even the nested one?
Is there a way? I really need to find a way!
I've tried XML, domxml, simplexml, DOMdocument, and home-made parsing using strpos().
Only one works well, the last one using strpos to parse the old way the document. 655 lines of code, no bugs, can handle nested tags, singleton tags, and everything. But, it takes so much resources that I've got kicked-out of my hosting company!
It's a good solution for a few thousands pageviews, but I'm having a few millions per months, and it's killing the server, so I really need to rebuild my parser from scratch, and using regex is the only way I can see!
That, or having xml templates so I can use domxml, but I really don't feel like writing again all my templates and using XML+XSLT for display. I like the regular html+css.

SO how can I do that?
I just wish to get an example on one tag, I can figure out the rest of the code.
thx!
mintedjo
Forum Contributor
Posts: 153
Joined: Wed Nov 19, 2008 6:23 am

Re: preg_replace problem

Post by mintedjo »

I don't know how to do it but somebody will :-)
zenhop
Forum Newbie
Posts: 18
Joined: Mon Jan 19, 2009 2:18 pm

Re: preg_replace problem

Post by zenhop »

Ok, I kind of found a way, but not working for singleton tags. I don't know how to handle them...

So here is the code:

template.html:

Code: Select all

<html xmlns:fb="http://www.blackwizzard.com/ftl/1.1">
    <fb:message title="Hello">world</fb:message>
    <fb:message title="Hello2">world2</fb:message>
    <fb:message title="Hello3">
        world3
    </fb:message>
    <fb:box title="this is the box's title" width="80%">
        box's content
        <fb:message title="hello">
            a message in a box
        </fb:message>
    </fb:box>
</html>
regex.php:

Code: Select all

<?php
 
    function freadFile($file) {
        if (!$handle = fopen ($file, "r")) {
            exit;
        }
        $contents = fread ($handle, filesize($file)+1);
        fclose($handle);
        return $contents;
    }
    
    $buffer = freadFile("template.html");
    $output = $buffer;
    
    $output = search($output,0);
    
    echo "<textarea cols=80 rows=20>".$output."</textarea>";
    
    function replace_function($tagType, $tagArgs, $in) {
        return "[[BEGGIN:$tagType args:($tagArgs) $in END:$tagType]]";
    }
    
    function search($in, $count) {
        $reg = "#\<fb:(.*?)\s+(.*?)\>(.*?)\</fb:\\1\>#mis";
        if (preg_match($reg, $in)) {
            $in = preg_replace($reg,replace_function('\\1','\\2','\\3'),$in,-1,$count2);
            $count += $count2;
            return search($in, $count);
        } else {
            return $in;
        }
    }
?>
Output:

Code: Select all

<html xmlns:fb="http://www.blackwizzard.com/ftl/1.1">
    [[BEGGIN:message args:(title="Hello") world END:message]]
    [[BEGGIN:message args:(title="Hello2") world2 END:message]]
    [[BEGGIN:message args:(title="Hello3") 
        world3
     END:message]]
    [[BEGGIN:box args:(title="this is the box's title" width="80%") 
        box's content
        [[BEGGIN:message args:(title="hello") 
            a message in a box
         END:message]]
     END:box]]
</html>
If anybody knows how to also parse singleton tags...
mintedjo
Forum Contributor
Posts: 153
Joined: Wed Nov 19, 2008 6:23 am

Re: preg_replace problem

Post by mintedjo »

Code: Select all

#<fb:(.*?)\s+(.*?)\s*(>(.*?)</fb:\\1>|/>)#mis
Note that if you do this the group which was previously 3rd becomes the 4th group so you will need to change this line

Code: Select all

$in = preg_replace($reg,replace_function('\\1','\\2','\\[b][u]4[/u][/b]'),$in,-1,$count2);
zenhop
Forum Newbie
Posts: 18
Joined: Mon Jan 19, 2009 2:18 pm

Re: preg_replace problem

Post by zenhop »

YEAH! :drunk:
It's working perfectly :)

Thx a lot!
Post Reply