Page 1 of 1

Regex help removing multiple commas

Posted: Sun Jul 02, 2006 3:07 am
by shriver
On my forum I have the following few functions that will remove extra ! . or ? and replace them with only 1.

$message_array[$x] = preg_replace("/([\!])+/", "\\1", $message_array[$x]);
$message_array[$x] = preg_replace("/([\?])+/", "\\1", $message_array[$x]);
$message_array[$x] = preg_replace("/([\.])+/", "\\1", $message_array[$x]);

For example the following text:

Hello!!! What.. is your name???? will become Hello! What. is your name?

Now I want to do the same with commas, so I added a 4th line

$message_array[$x] = preg_replace("/([\,])+/", "\\1", $message_array[$x]);

but this does not work as expected. Any extra commas still remain in the text. I wasn't sure it needed to be escaped or not so I tried it without \ and still no dice. I even tried setting a variable equal to chr(44) and then passing the var, but that didn't work either.

Can someone tell me what I'm doing wrong here? I'm guessing it has something to do with commas being used to separate parameters. Thanks for any help.

Posted: Sun Jul 02, 2006 4:07 am
by Robert Plank
Your regex worked fine for me (without the backslash). By the way if you want to get that all on one line

Code: Select all

<?php

$string = "Hello!!! What.. is,,,,, your name????";
$string = preg_replace("/([!\?,\.])+/", "\\1", $string);
echo $string;

?>

Posted: Sun Jul 02, 2006 8:45 am
by sweatje
Robert Plank wrote:Your regex worked fine for me (without the backslash). By the way if you want to get that all on one line

Code: Select all

<?php

$string = "Hello!!! What.. is,,,,, your name????";
$string = preg_replace("/([!\?,\.])+/", "\\1", $string);
echo $string;

?>
You don't need to escape the ? and the . as they are not pcre meta characters with a charater group:

/([!?,.])+/

will work fine.

Posted: Sun Jul 02, 2006 3:15 pm
by shriver
Thanks you two, the code worked fine.

Posted: Mon Jul 03, 2006 12:20 pm
by shriver
Sorry to bug again but I just noticed something. It will only remove multiple commas if there also are ! . or ? in the text. If there are only commas then it doesn't remove the extra ones. If I try the same thing with just ! . or ? it will remove them as expected. It's certainly strange behavior.

I'll show some examples of what I tried and what the result was:

Example 1 (Bad Result)
Input: test,,,
Result: test,,,

Example 2
Input: test...
Result: test.

Example 3
Input: test!!!
Result: test!

Example 4
Input: test???
Result: test?

Example 5
Input: test,,, test???
Result: test, test?

Example 6 (Bad Result)
Input: test,,, test,,,
Result: test,,, test,,,

Example 7 (Bad Result)
Input: test test,,,
Result: test test,,,

Example 8
Input: test test!! test,,,
Result: test test! test,

Example 9
Input: ...test test,,,
Result: .test test,

Example 10
Input: test! test. test? test,,,
Result: test! test. test? test,

I'm at a loss here as to why, when there are only commas, that it doesn't remove the extras. I tried switching the order that each gets replaced, but that didn't change anything.

Posted: Mon Jul 03, 2006 12:26 pm
by Robert Plank
They all worked for me.

Code: Select all

<?php

function stripPunctuation($string) {
   return preg_replace("/([!?,.])+/", "\\1", $string);
}

$input = array(
   'test,,,',
   'test...',
   'test!!!',
   'test???',
   'test,,, test???',
   'test,,, test,,,',
   'test test,,,',
   'test test!! test,,,',
   '...test test,,,',
   'test! test. test? test,,,'
);

$output = array_map("stripPunctuation", $input);

echo "<xmp>";
print_r($output);
echo "</xmp>";

?>
My output:

Code: Select all

Array
(
    [0] => test,
    [1] => test.
    [2] => test!
    [3] => test?
    [4] => test, test?
    [5] => test, test,
    [6] => test test,
    [7] => test test! test,
    [8] => .test test,
    [9] => test! test. test? test,
)

Posted: Mon Jul 03, 2006 12:30 pm
by sweatje
seems to work for me:

Code: Select all

$ php -r 'echo preg_replace("/([!?,.])+/", "\\1", "test,,, test,,,");'
test, test,
$ php -r 'echo preg_replace("/([!?,.])+/", "\\1", "test test,,,");'
test test,
$ php -v
PHP 5.1.2 (cli) (built: Jan 11 2006 16:40:00)
Copyright (c) 1997-2006 The PHP Group
Zend Engine v2.1.0, Copyright (c) 1998-2006 Zend Technologies
And on php4 also:

Code: Select all

$ php -r 'echo preg_replace("/([!?,.])+/", "\\1", "test,,, test,,,");'
test, test,
$ php -r 'echo preg_replace("/([!?,.])+/", "\\1", "test test,,,");'
test test,
$ php -v
PHP 4.4.2-pl2-gentoo (cli) (built: Jun 15 2006 04:45:06)
Copyright (c) 1997-2006 The PHP Group
Zend Engine v1.3.0, Copyright (c) 1998-2004 Zend Technologies

Posted: Mon Jul 03, 2006 12:32 pm
by shriver
Ehm, sorry. It was a mistake on my part further up in the script, I had if (preg_match("/([!?.])+/", $message)) and forgot to add the comma to that :oops:

Ignore my forgetfulness, ha.

Thanks anyways.

Posted: Mon Jul 03, 2006 3:47 pm
by bokehman
Robert Plank wrote:

Code: Select all

"\\1"
This might work but it is wrong. \1 should be used for back references only. In this context you should be using $1.

Posted: Mon Jul 03, 2006 5:46 pm
by shriver
I see, I wasn't sure if that existed in php. I've mostly used regex in mIRC, and there I would use either \1 or \t (for matched text). I wasn't sure what the equivalent of \t was in php. Thanks for pointing that out.

Posted: Mon Jul 03, 2006 8:29 pm
by sweatje
bokehman wrote:
Robert Plank wrote:

Code: Select all

"\\1"
This might work but it is wrong.
Pretty strong wording, considering the manual shows examples using this style and it is more cross-language compatable.

The cautionary statement at the begining of the page is for more than 9 captures, when the $number variable syntax is preferable.

Posted: Tue Jul 04, 2006 2:40 am
by bokehman
sweatje wrote:it is more cross-language compatable.
That's not true. For example Apache's regex engine does not allow it.

Posted: Tue Jul 04, 2006 7:26 am
by sweatje
bokehman wrote:
sweatje wrote:it is more cross-language compatable.
That's not true. For example Apache's regex engine does not allow it.
My prefered windows editor TextPad does:
textpad help wrote:\0 to \9 Substitute the text matching tagged expression 0 through 9.
as does sed
sed help wrote:\1 \2 ...\9 backreference, matches i-th memorized \(..\)
as does vim
vim docs wrote:3.5 Grouping and Backreferences

You can group parts of the pattern expression enclosing them with "\(" and "\)" and refer to them inside the replacement pattern by their special number \1, \2 ... \9.
Many regex engines implement the concept of numeric backreferences. Not all implementations have variables, let alone automatically bind variables to the grouping results.

Posted: Tue Jul 04, 2006 8:48 am
by bokehman
sweatje wrote:Many regex engines implement the concept of numeric backreferences.
You are confusing back references with replacements. I said \1 should only be used for backreferences. I didn't say it shouldn't be used for back references. The context to which I was refering was replacement which is very different. Apache uses \1 for back references and $1 for replacement. Perl too!

Posted: Tue Jul 04, 2006 9:09 am
by feyd
we don't need to debate variants of backreferencing in this thread boys.