need regular expression for preg_replace()

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
mortz
Forum Newbie
Posts: 10
Joined: Wed May 19, 2004 4:23 am
Location: Norway

need regular expression for preg_replace()

Post by mortz »

feyd | Please use

Code: Select all

and

Code: Select all

tags where approriate when posting code. Read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]


This one is difficult to explain.

I want to use preg_replace() to replace symbols that's not inside <> and [] to <span class="symbol">{symbol}</span>

IE:
[quote]Hello, do you use <a href="http://www.google.com">Google?</a>[/quote]
will be replaced with[quote]Hello<span class="symbol">,</span> do you use <a href="http://www.google.com">Google<span class="symbol">?</span></a>[/quote]


Right now my code is:

Code: Select all

function replacesymbols($var)
{
	$hl = array("(",")","!","?","-","_","+","*","'",".",",",":",";","^","~","$","%","#","@");
	for ($i=0; $i<count($hl); $i++) $var = str_replace($hl[$i], "<span class=\"symbol\">".$hl[$i]."</span>", $var);
	return $var;
}
but if $var contains HTML tags or BBCodes with links etc, my page gets screwed up :P


Anyone have a clue?


feyd | Please use

Code: Select all

and

Code: Select all

tags where approriate when posting code. Read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]
User avatar
sweatje
Forum Contributor
Posts: 277
Joined: Wed Jun 29, 2005 10:04 pm
Location: Iowa, USA

Post by sweatje »

I think this should work for you, expressed as a SimpleTest test:

Code: Select all

function testWrapSymbolsInSpans() {

$str = 'Hello, do you use <a href="http://www.google.com/?query=">Google?</a>';
$target = 'Hello<span class="symbol">,</span> do you use <a href="http://www.google.com/?query=">Google<span class="symbol">?</span></a>';
$regex = '~((?:^|>|\]).*?)([-()!?_+*\'.,:;^\~$%#@])(?=.*(?:<|\[|$))~ms';

$result = preg_replace($regex, '\\1<span class="symbol">\\2</span>', $str);
$this->assertEqual($target, $result);

}
mortz
Forum Newbie
Posts: 10
Joined: Wed May 19, 2004 4:23 am
Location: Norway

Post by mortz »

Thanks for the reply! :D

But. I can't get the code to work properly.

I changed my function into this:

Code: Select all

function replacesymbols($var)
{
    /*$hl = array("(",")","!","?","-","_","+","*","'",".",",",":",";","^","~","$","%","#","@");
    for ($i=0; $i<count($hl); $i++) $var = str_replace($hl[$i], "<span class=\"symbol\">".$hl[$i]."</span>", $var);*/

    $regex = '~((?:^|>|\]).*?)([-()!?_+*\'.,:;^\~$%#@])(?=.*(?:<|\[|$))~ms';
    $var = preg_replace($regex, '\\1<span class="symbol">\\2</span>', $var); 

    return $var;
}
Only the first symbol in the string gets replaced.
If i run preg_replace() in a for loop, more symbols gets replaced.
But, the symbols already replaced, once again gets replaced. :P


How could I deal with this?
User avatar
sweatje
Forum Contributor
Posts: 277
Joined: Wed Jun 29, 2005 10:04 pm
Location: Iowa, USA

Post by sweatje »

Hello,

That regex should have worked on the entire string. If you notice in the test I wrote, there were two substitutions that took place; the , after Hello and the ? after Google.

You could put a negative look behind assertion to exclude the <span class="symbol">, and then a loop should work, but I would try to see why your code is not working the same as the code I had tested.

BTW, I tested on
$ php -v
PHP 4.4.0 (cli) (built: Jul 11 2005 16:13:16)
Copyright (c) 1997-2004 The PHP Group
Zend Engine v1.3.0, Copyright (c) 1998-2004 Zend Technologies

On a Windows platform.

HTH
mortz
Forum Newbie
Posts: 10
Joined: Wed May 19, 2004 4:23 am
Location: Norway

Post by mortz »

Hmm, there's something strange with your regex.

I have discovered that it replaces the first symbol after a '>'

If string is:

Code: Select all

$str = '<br>hello...<br>wtf?! <>?? <;; >::<br><a href="http://www.google.com/" target="_blank">Google!</a>11';
The result is:

Code: Select all

$result = '<br>hello<span class="symbol">.</span>..<br>wtf<span class="symbol">?</span>! <><span class="symbol">?</span>? <;; ><span class="symbol">:</span>:<br><a href="http<span class="symbol>:</span>//www.google.com/" target="_blank">Google<span class="symbol">!</span></a>11<br>hello<span class="symbol">.</span>..<br>wtf<span class="symbol">?</span>! <><span class="symbol">?</span>? <;; ><span class="symbol">:</span>:<br><a href="http<span class="symbol>:</span>//www.google.com/" target="_blank">Google<span class="symbol">!</span></a>11'

Thanks for your help so far, anyway :)
User avatar
sweatje
Forum Contributor
Posts: 277
Joined: Wed Jun 29, 2005 10:04 pm
Location: Iowa, USA

Post by sweatje »

Yes, I see that problem now. Sometimes when I run into problems like this I do it in two steps. This should work for you now:

Code: Select all

function wrap_symbols($in) {
	return preg_replace('~([-()!?_+*\'.,:;^\~$%#@]+)~', '<span class="symbol">\\1</span>', $in[1]);
}

class MiscTestCase extends UnitTestCase {

function testWrapSymbolsInSpans() {

$str = 'Hello, WTF?! do you use <a href="http://www.google.com/?query=">Google?</a>';
$target = 'Hello<span class="symbol">,</span> WTF<span class="symbol">?!</span> do you use <a href="http://www.google.com/?query=">Google<span class="symbol">?</span></a>';
$regex = '~  # begin regex delimited by tilda
(   # start capture 1
(?:^|>|\])  # look for a start of line or end of regular or bb tag
.*?  # then grab anything, ungreedy
(?:<|\[|$)   # look for start of tag or bbtag or end of line
)  # end of capture 1
  # m multi line
  # s . includes newling
  # x extended whitespace parsing, i.e. allow these comments
~msx';

$result = preg_replace_callback($regex, 'wrap_symbols', $str);//$str;

$this->assertEqual($target, $result);

}
}
mortz
Forum Newbie
Posts: 10
Joined: Wed May 19, 2004 4:23 am
Location: Norway

Post by mortz »

LoL, now it works perfectly!

Thanks alot! :D


nice comments in your regex, btw =)
I've been studying regex documentation on the net to find out what your last one did, and tried to fix it myself :P
But. It was too complicated and I only made the errors worse :P

I've used it in a cutenewshack, on this page http://nazareth.moo.no, if you are curious :P[/url]
Post Reply