Page 1 of 1
need regular expression for preg_replace()
Posted: Sun Sep 11, 2005 3:18 pm
by mortz
feyd | Please use Code: Select all
tags where approriate when posting code. Read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]
This one is difficult to explain.
I want to use preg_replace() to replace symbols that's not inside <> and [] to <span class="symbol">{symbol}</span>
IE:
[quote]Hello, do you use <a href="http://www.google.com">Google?</a>[/quote]
will be replaced with[quote]Hello<span class="symbol">,</span> do you use <a href="http://www.google.com">Google<span class="symbol">?</span></a>[/quote]
Right now my code is:
Code: Select all
function replacesymbols($var)
{
$hl = array("(",")","!","?","-","_","+","*","'",".",",",":",";","^","~","$","%","#","@");
for ($i=0; $i<count($hl); $i++) $var = str_replace($hl[$i], "<span class=\"symbol\">".$hl[$i]."</span>", $var);
return $var;
}
but if $var contains HTML tags or BBCodes with links etc, my page gets screwed up
Anyone have a clue?
feyd | Please use Code: Select all
tags where approriate when posting code. Read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]
Posted: Sun Sep 11, 2005 3:50 pm
by sweatje
I think this should work for you, expressed as a
SimpleTest test:
Code: Select all
function testWrapSymbolsInSpans() {
$str = 'Hello, do you use <a href="http://www.google.com/?query=">Google?</a>';
$target = 'Hello<span class="symbol">,</span> do you use <a href="http://www.google.com/?query=">Google<span class="symbol">?</span></a>';
$regex = '~((?:^|>|\]).*?)([-()!?_+*\'.,:;^\~$%#@])(?=.*(?:<|\[|$))~ms';
$result = preg_replace($regex, '\\1<span class="symbol">\\2</span>', $str);
$this->assertEqual($target, $result);
}
Posted: Tue Sep 13, 2005 4:00 am
by mortz
Thanks for the reply!
But. I can't get the code to work properly.
I changed my function into this:
Code: Select all
function replacesymbols($var)
{
/*$hl = array("(",")","!","?","-","_","+","*","'",".",",",":",";","^","~","$","%","#","@");
for ($i=0; $i<count($hl); $i++) $var = str_replace($hl[$i], "<span class=\"symbol\">".$hl[$i]."</span>", $var);*/
$regex = '~((?:^|>|\]).*?)([-()!?_+*\'.,:;^\~$%#@])(?=.*(?:<|\[|$))~ms';
$var = preg_replace($regex, '\\1<span class="symbol">\\2</span>', $var);
return $var;
}
Only the first symbol in the string gets replaced.
If i run preg_replace() in a for loop, more symbols gets replaced.
But, the symbols already replaced, once again gets replaced.
How could I deal with this?
Posted: Tue Sep 13, 2005 6:13 am
by sweatje
Hello,
That regex should have worked on the entire string. If you notice in the test I wrote, there were two substitutions that took place; the , after Hello and the ? after Google.
You could put a negative look behind assertion to exclude the <span class="symbol">, and then a loop should work, but I would try to see why your code is not working the same as the code I had tested.
BTW, I tested on
$ php -v
PHP 4.4.0 (cli) (built: Jul 11 2005 16:13:16)
Copyright (c) 1997-2004 The PHP Group
Zend Engine v1.3.0, Copyright (c) 1998-2004 Zend Technologies
On a Windows platform.
HTH
Posted: Tue Sep 13, 2005 12:39 pm
by mortz
Hmm, there's something strange with your regex.
I have discovered that it replaces the first symbol after a '>'
If string is:
Code: Select all
$str = '<br>hello...<br>wtf?! <>?? <;; >::<br><a href="http://www.google.com/" target="_blank">Google!</a>11';
The result is:
Code: Select all
$result = '<br>hello<span class="symbol">.</span>..<br>wtf<span class="symbol">?</span>! <><span class="symbol">?</span>? <;; ><span class="symbol">:</span>:<br><a href="http<span class="symbol>:</span>//www.google.com/" target="_blank">Google<span class="symbol">!</span></a>11<br>hello<span class="symbol">.</span>..<br>wtf<span class="symbol">?</span>! <><span class="symbol">?</span>? <;; ><span class="symbol">:</span>:<br><a href="http<span class="symbol>:</span>//www.google.com/" target="_blank">Google<span class="symbol">!</span></a>11'
Thanks for your help so far, anyway

Posted: Tue Sep 13, 2005 6:07 pm
by sweatje
Yes, I see that problem now. Sometimes when I run into problems like this I do it in two steps. This should work for you now:
Code: Select all
function wrap_symbols($in) {
return preg_replace('~([-()!?_+*\'.,:;^\~$%#@]+)~', '<span class="symbol">\\1</span>', $in[1]);
}
class MiscTestCase extends UnitTestCase {
function testWrapSymbolsInSpans() {
$str = 'Hello, WTF?! do you use <a href="http://www.google.com/?query=">Google?</a>';
$target = 'Hello<span class="symbol">,</span> WTF<span class="symbol">?!</span> do you use <a href="http://www.google.com/?query=">Google<span class="symbol">?</span></a>';
$regex = '~ # begin regex delimited by tilda
( # start capture 1
(?:^|>|\]) # look for a start of line or end of regular or bb tag
.*? # then grab anything, ungreedy
(?:<|\[|$) # look for start of tag or bbtag or end of line
) # end of capture 1
# m multi line
# s . includes newling
# x extended whitespace parsing, i.e. allow these comments
~msx';
$result = preg_replace_callback($regex, 'wrap_symbols', $str);//$str;
$this->assertEqual($target, $result);
}
}
Posted: Wed Sep 14, 2005 7:14 am
by mortz
LoL, now it works perfectly!
Thanks alot!
nice comments in your regex, btw =)
I've been studying regex documentation on the net to find out what your last one did, and tried to fix it myself

But. It was too complicated and I only made the errors worse
I've used it in a cutenewshack, on this page
http://nazareth.moo.no, if you are curious

[/url]