Regex help removing multiple commas

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
shriver
Forum Newbie
Posts: 5
Joined: Sun Jul 02, 2006 3:02 am

Regex help removing multiple commas

Post by shriver »

On my forum I have the following few functions that will remove extra ! . or ? and replace them with only 1.

$message_array[$x] = preg_replace("/([\!])+/", "\\1", $message_array[$x]);
$message_array[$x] = preg_replace("/([\?])+/", "\\1", $message_array[$x]);
$message_array[$x] = preg_replace("/([\.])+/", "\\1", $message_array[$x]);

For example the following text:

Hello!!! What.. is your name???? will become Hello! What. is your name?

Now I want to do the same with commas, so I added a 4th line

$message_array[$x] = preg_replace("/([\,])+/", "\\1", $message_array[$x]);

but this does not work as expected. Any extra commas still remain in the text. I wasn't sure it needed to be escaped or not so I tried it without \ and still no dice. I even tried setting a variable equal to chr(44) and then passing the var, but that didn't work either.

Can someone tell me what I'm doing wrong here? I'm guessing it has something to do with commas being used to separate parameters. Thanks for any help.
Robert Plank
Forum Contributor
Posts: 110
Joined: Sun Dec 26, 2004 9:04 pm
Contact:

Post by Robert Plank »

Your regex worked fine for me (without the backslash). By the way if you want to get that all on one line

Code: Select all

<?php

$string = "Hello!!! What.. is,,,,, your name????";
$string = preg_replace("/([!\?,\.])+/", "\\1", $string);
echo $string;

?>
User avatar
sweatje
Forum Contributor
Posts: 277
Joined: Wed Jun 29, 2005 10:04 pm
Location: Iowa, USA

Post by sweatje »

Robert Plank wrote:Your regex worked fine for me (without the backslash). By the way if you want to get that all on one line

Code: Select all

<?php

$string = "Hello!!! What.. is,,,,, your name????";
$string = preg_replace("/([!\?,\.])+/", "\\1", $string);
echo $string;

?>
You don't need to escape the ? and the . as they are not pcre meta characters with a charater group:

/([!?,.])+/

will work fine.
shriver
Forum Newbie
Posts: 5
Joined: Sun Jul 02, 2006 3:02 am

Post by shriver »

Thanks you two, the code worked fine.
shriver
Forum Newbie
Posts: 5
Joined: Sun Jul 02, 2006 3:02 am

Post by shriver »

Sorry to bug again but I just noticed something. It will only remove multiple commas if there also are ! . or ? in the text. If there are only commas then it doesn't remove the extra ones. If I try the same thing with just ! . or ? it will remove them as expected. It's certainly strange behavior.

I'll show some examples of what I tried and what the result was:

Example 1 (Bad Result)
Input: test,,,
Result: test,,,

Example 2
Input: test...
Result: test.

Example 3
Input: test!!!
Result: test!

Example 4
Input: test???
Result: test?

Example 5
Input: test,,, test???
Result: test, test?

Example 6 (Bad Result)
Input: test,,, test,,,
Result: test,,, test,,,

Example 7 (Bad Result)
Input: test test,,,
Result: test test,,,

Example 8
Input: test test!! test,,,
Result: test test! test,

Example 9
Input: ...test test,,,
Result: .test test,

Example 10
Input: test! test. test? test,,,
Result: test! test. test? test,

I'm at a loss here as to why, when there are only commas, that it doesn't remove the extras. I tried switching the order that each gets replaced, but that didn't change anything.
Robert Plank
Forum Contributor
Posts: 110
Joined: Sun Dec 26, 2004 9:04 pm
Contact:

Post by Robert Plank »

They all worked for me.

Code: Select all

<?php

function stripPunctuation($string) {
   return preg_replace("/([!?,.])+/", "\\1", $string);
}

$input = array(
   'test,,,',
   'test...',
   'test!!!',
   'test???',
   'test,,, test???',
   'test,,, test,,,',
   'test test,,,',
   'test test!! test,,,',
   '...test test,,,',
   'test! test. test? test,,,'
);

$output = array_map("stripPunctuation", $input);

echo "<xmp>";
print_r($output);
echo "</xmp>";

?>
My output:

Code: Select all

Array
(
    [0] => test,
    [1] => test.
    [2] => test!
    [3] => test?
    [4] => test, test?
    [5] => test, test,
    [6] => test test,
    [7] => test test! test,
    [8] => .test test,
    [9] => test! test. test? test,
)
User avatar
sweatje
Forum Contributor
Posts: 277
Joined: Wed Jun 29, 2005 10:04 pm
Location: Iowa, USA

Post by sweatje »

seems to work for me:

Code: Select all

$ php -r 'echo preg_replace("/([!?,.])+/", "\\1", "test,,, test,,,");'
test, test,
$ php -r 'echo preg_replace("/([!?,.])+/", "\\1", "test test,,,");'
test test,
$ php -v
PHP 5.1.2 (cli) (built: Jan 11 2006 16:40:00)
Copyright (c) 1997-2006 The PHP Group
Zend Engine v2.1.0, Copyright (c) 1998-2006 Zend Technologies
And on php4 also:

Code: Select all

$ php -r 'echo preg_replace("/([!?,.])+/", "\\1", "test,,, test,,,");'
test, test,
$ php -r 'echo preg_replace("/([!?,.])+/", "\\1", "test test,,,");'
test test,
$ php -v
PHP 4.4.2-pl2-gentoo (cli) (built: Jun 15 2006 04:45:06)
Copyright (c) 1997-2006 The PHP Group
Zend Engine v1.3.0, Copyright (c) 1998-2004 Zend Technologies
shriver
Forum Newbie
Posts: 5
Joined: Sun Jul 02, 2006 3:02 am

Post by shriver »

Ehm, sorry. It was a mistake on my part further up in the script, I had if (preg_match("/([!?.])+/", $message)) and forgot to add the comma to that :oops:

Ignore my forgetfulness, ha.

Thanks anyways.
User avatar
bokehman
Forum Regular
Posts: 509
Joined: Wed May 11, 2005 2:33 am
Location: Alicante (Spain)

Post by bokehman »

Robert Plank wrote:

Code: Select all

"\\1"
This might work but it is wrong. \1 should be used for back references only. In this context you should be using $1.
shriver
Forum Newbie
Posts: 5
Joined: Sun Jul 02, 2006 3:02 am

Post by shriver »

I see, I wasn't sure if that existed in php. I've mostly used regex in mIRC, and there I would use either \1 or \t (for matched text). I wasn't sure what the equivalent of \t was in php. Thanks for pointing that out.
User avatar
sweatje
Forum Contributor
Posts: 277
Joined: Wed Jun 29, 2005 10:04 pm
Location: Iowa, USA

Post by sweatje »

bokehman wrote:
Robert Plank wrote:

Code: Select all

"\\1"
This might work but it is wrong.
Pretty strong wording, considering the manual shows examples using this style and it is more cross-language compatable.

The cautionary statement at the begining of the page is for more than 9 captures, when the $number variable syntax is preferable.
User avatar
bokehman
Forum Regular
Posts: 509
Joined: Wed May 11, 2005 2:33 am
Location: Alicante (Spain)

Post by bokehman »

sweatje wrote:it is more cross-language compatable.
That's not true. For example Apache's regex engine does not allow it.
User avatar
sweatje
Forum Contributor
Posts: 277
Joined: Wed Jun 29, 2005 10:04 pm
Location: Iowa, USA

Post by sweatje »

bokehman wrote:
sweatje wrote:it is more cross-language compatable.
That's not true. For example Apache's regex engine does not allow it.
My prefered windows editor TextPad does:
textpad help wrote:\0 to \9 Substitute the text matching tagged expression 0 through 9.
as does sed
sed help wrote:\1 \2 ...\9 backreference, matches i-th memorized \(..\)
as does vim
vim docs wrote:3.5 Grouping and Backreferences

You can group parts of the pattern expression enclosing them with "\(" and "\)" and refer to them inside the replacement pattern by their special number \1, \2 ... \9.
Many regex engines implement the concept of numeric backreferences. Not all implementations have variables, let alone automatically bind variables to the grouping results.
User avatar
bokehman
Forum Regular
Posts: 509
Joined: Wed May 11, 2005 2:33 am
Location: Alicante (Spain)

Post by bokehman »

sweatje wrote:Many regex engines implement the concept of numeric backreferences.
You are confusing back references with replacements. I said \1 should only be used for backreferences. I didn't say it shouldn't be used for back references. The context to which I was refering was replacement which is very different. Apache uses \1 for back references and $1 for replacement. Perl too!
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

we don't need to debate variants of backreferencing in this thread boys.
Post Reply