Page 1 of 1

How to extract all words in string

Posted: Fri Nov 18, 2005 1:30 pm
by walter78
Hi!

I'm a novice in regular expressions. May you please show me a way how to extract all the words in string? The string looks as following: “(word1 word2 word3 word4)” (with the surrounding parentheses). I want to get all words in a named group. I know, it’s a not complicated task, but I wasted two evenings for it and it scares me :roll:.

Thank you very much :D

Posted: Fri Nov 18, 2005 1:32 pm
by hawleyjr

Posted: Fri Nov 18, 2005 7:30 pm
by Chris Corbyn
If your string is exactly in the same pattern you showed, it;s just a single metacharacter.

Code: Select all

preg_match_all('/\w+/', $string, $matches);

print_r($matches);
explode() will be quicker but more code since you'll need to use substring() or another explode() to remove the parens.

Posted: Sat Nov 19, 2005 5:22 am
by walter78
Thank you. The code

Code: Select all

preg_match_all('/\w+/', $string, $matches);
works OK if the string contains no parenthesis. But I need to parse the string like (word1 word2 word3 word4). If I put the parenthesis in the pattern, the preg_match_all returns false:

Code: Select all

$str = '(word1 word2 word3 word4)';
$res = preg_match_all('/\(\w+\)/i', $str, $matches);

if ($res)
  print_r($matches);
I can't use the explode function in this case because actually I need to perform more complicated parsing than just searching for separated words. I just need some starting point.

Thank you again.

Posted: Sat Nov 19, 2005 5:34 am
by Chris Corbyn
Do you need to check that the parens are actually there? Just leave them out of the regex if not.... \w+ alone was enough.

If you do need to check for their existence I'd do it in two stages... not sure you could extract the individual words otherwise.

Code: Select all

$string = '(word1 word2 word3 word4 word5)';
if (preg_match('/^\(.*?\)$/', $string))
{
    preg_match_all('/\w+/', $string, $matches);
    print_r($matches[0]);
}
else echo 'This string doesn\'t look right.';

Posted: Sat Nov 19, 2005 6:32 am
by foobar

Code: Select all

$words = split('/[\s]+/', $string);

Posted: Sat Nov 19, 2005 6:35 am
by Chris Corbyn
foobar wrote:

Code: Select all

$words = split('/[\s]+/', $string);
The only problem with that is that the parens will be stuck to the first and last word after the split. I'm not sure it can easily be done in one shot..

Posted: Sat Nov 19, 2005 7:12 am
by walter78
OK, thank you, it works!

The fact that I can't check if the parens exists in string and catch the words in the same regexp was suprising for me. I'm trying to port some regular expressions from .NET to PHP. In .NET it is possible to catch all the words in a single match call.

I have another issue now. Is d11wtq said , I can use the

Code: Select all

(preg_match('/^\(.*?\)$/', $string))
code to check if string contains starting and ending parens. Actually I'm trying to catch not just the words, but a key-value pairs enclosed in the parens, like (key1="value1", key2="value2", key3="value3"). I wrote the regular expression for extracting these pairs and it works OK (thank to your forum). The problem now is in checking that the whole string meets the following format: (key1="value1", key2="value2", key3="value3"), in case then it can contain the parenthesis enclosed in quotes: (key1=")"). Currently I'm using the following code for the first check:

Code: Select all

if (preg_match('/\([^\)]*\)/', $string, $matches ))
	{ 
		$string = $matches[0];
		/* extracting key-value pairs from the $string */
	}
It works OK, but it stops if the parens included to the value part of any key-value pairs. May you please advice me anything?

Thank you.

Posted: Sat Nov 19, 2005 7:37 am
by Chris Corbyn

Code: Select all

$string = '(key1="value1" key2="values2" key3="value3" key4="value4" key5="value5")';
if (preg_match('/^\(.*?\)$/', $string)) //Check the opening and closing parens are there
{
    preg_match_all('/\b(\w+)="([^"]+)"/', $string, $matches);
    $key_value_pairs = array_combine($matches[1], $matches[2]); //Use backref 1 as keys and backref 2 as values
    print_r($key_value_pairs);
}
else echo 'This string doesn\'t look right.';
I'm not saying that you can't do it with a single regex but it doesn't spring to mind for me. If you showed me the regex you used in .NET I'd be curious how it worked. These are just standard Perl Style regex... I don't know .NET but I'd hazard a guess that it's POSIX regex or some proprietary MS stuff.