Page 1 of 1
How to extract all words in string
Posted: Fri Nov 18, 2005 1:30 pm
by walter78
Hi!
I'm a novice in regular expressions. May you please show me a way how to extract all the words in string? The string looks as following: “(word1 word2 word3 word4)” (with the surrounding parentheses). I want to get all words in a named group. I know, it’s a not complicated task, but I wasted two evenings for it and it scares me

.
Thank you very much

Posted: Fri Nov 18, 2005 1:32 pm
by hawleyjr
Posted: Fri Nov 18, 2005 7:30 pm
by Chris Corbyn
If your string is exactly in the same pattern you showed, it;s just a single metacharacter.
Code: Select all
preg_match_all('/\w+/', $string, $matches);
print_r($matches);
explode() will be quicker but more code since you'll need to use substring() or another explode() to remove the parens.
Posted: Sat Nov 19, 2005 5:22 am
by walter78
Thank you. The code
Code: Select all
preg_match_all('/\w+/', $string, $matches);
works OK if the string contains no parenthesis. But I need to parse the string like (word1 word2 word3 word4). If I put the parenthesis in the pattern, the
preg_match_all returns false:
Code: Select all
$str = '(word1 word2 word3 word4)';
$res = preg_match_all('/\(\w+\)/i', $str, $matches);
if ($res)
print_r($matches);
I can't use the
explode function in this case because actually I need to perform more complicated parsing than just searching for separated words. I just need some starting point.
Thank you again.
Posted: Sat Nov 19, 2005 5:34 am
by Chris Corbyn
Do you need to check that the parens are actually there? Just leave them out of the regex if not.... \w+ alone was enough.
If you do need to check for their existence I'd do it in two stages... not sure you could extract the individual words otherwise.
Code: Select all
$string = '(word1 word2 word3 word4 word5)';
if (preg_match('/^\(.*?\)$/', $string))
{
preg_match_all('/\w+/', $string, $matches);
print_r($matches[0]);
}
else echo 'This string doesn\'t look right.';
Posted: Sat Nov 19, 2005 6:32 am
by foobar
Code: Select all
$words = split('/[\s]+/', $string);
Posted: Sat Nov 19, 2005 6:35 am
by Chris Corbyn
foobar wrote:Code: Select all
$words = split('/[\s]+/', $string);
The only problem with that is that the parens will be stuck to the first and last word after the split. I'm not sure it can
easily be done in one shot..
Posted: Sat Nov 19, 2005 7:12 am
by walter78
OK, thank you, it works!
The fact that I can't check if the parens exists in string and catch the words in the same regexp was suprising for me. I'm trying to port some regular expressions from .NET to PHP. In .NET it is possible to catch all the words in a single
match call.
I have another issue now. Is
d11wtq said , I can use the
Code: Select all
(preg_match('/^\(.*?\)$/', $string))
code to check if string contains starting and ending parens. Actually I'm trying to catch not just the words, but a key-value pairs enclosed in the parens, like (key1="value1", key2="value2", key3="value3"). I wrote the regular expression for extracting these pairs and it works OK (thank to your forum). The problem now is in checking that the whole string meets the following format: (key1="value1", key2="value2", key3="value3"), in case then it can contain the parenthesis enclosed in quotes: (key1=")"). Currently I'm using the following code for the first check:
Code: Select all
if (preg_match('/\([^\)]*\)/', $string, $matches ))
{
$string = $matches[0];
/* extracting key-value pairs from the $string */
}
It works OK, but it stops if the parens included to the value part of any key-value pairs. May you please advice me anything?
Thank you.
Posted: Sat Nov 19, 2005 7:37 am
by Chris Corbyn
Code: Select all
$string = '(key1="value1" key2="values2" key3="value3" key4="value4" key5="value5")';
if (preg_match('/^\(.*?\)$/', $string)) //Check the opening and closing parens are there
{
preg_match_all('/\b(\w+)="([^"]+)"/', $string, $matches);
$key_value_pairs = array_combine($matches[1], $matches[2]); //Use backref 1 as keys and backref 2 as values
print_r($key_value_pairs);
}
else echo 'This string doesn\'t look right.';
I'm not saying that you can't do it with a single regex but it doesn't spring to mind for me. If you showed me the regex you used in .NET I'd be curious how it worked. These are just standard Perl Style regex... I don't know .NET but I'd hazard a guess that it's POSIX regex or some proprietary MS stuff.