How to extract all words in string

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
walter78
Forum Newbie
Posts: 3
Joined: Fri Nov 18, 2005 12:38 pm

How to extract all words in string

Post by walter78 »

Hi!

I'm a novice in regular expressions. May you please show me a way how to extract all the words in string? The string looks as following: “(word1 word2 word3 word4)” (with the surrounding parentheses). I want to get all words in a named group. I know, it’s a not complicated task, but I wasted two evenings for it and it scares me :roll:.

Thank you very much :D
User avatar
hawleyjr
BeerMod
Posts: 2170
Joined: Tue Jan 13, 2004 4:58 pm
Location: Jax FL & Spokane WA USA

Post by hawleyjr »

User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

If your string is exactly in the same pattern you showed, it;s just a single metacharacter.

Code: Select all

preg_match_all('/\w+/', $string, $matches);

print_r($matches);
explode() will be quicker but more code since you'll need to use substring() or another explode() to remove the parens.
walter78
Forum Newbie
Posts: 3
Joined: Fri Nov 18, 2005 12:38 pm

Post by walter78 »

Thank you. The code

Code: Select all

preg_match_all('/\w+/', $string, $matches);
works OK if the string contains no parenthesis. But I need to parse the string like (word1 word2 word3 word4). If I put the parenthesis in the pattern, the preg_match_all returns false:

Code: Select all

$str = '(word1 word2 word3 word4)';
$res = preg_match_all('/\(\w+\)/i', $str, $matches);

if ($res)
  print_r($matches);
I can't use the explode function in this case because actually I need to perform more complicated parsing than just searching for separated words. I just need some starting point.

Thank you again.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Do you need to check that the parens are actually there? Just leave them out of the regex if not.... \w+ alone was enough.

If you do need to check for their existence I'd do it in two stages... not sure you could extract the individual words otherwise.

Code: Select all

$string = '(word1 word2 word3 word4 word5)';
if (preg_match('/^\(.*?\)$/', $string))
{
    preg_match_all('/\w+/', $string, $matches);
    print_r($matches[0]);
}
else echo 'This string doesn\'t look right.';
foobar
Forum Regular
Posts: 613
Joined: Wed Sep 28, 2005 10:08 am

Post by foobar »

Code: Select all

$words = split('/[\s]+/', $string);
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

foobar wrote:

Code: Select all

$words = split('/[\s]+/', $string);
The only problem with that is that the parens will be stuck to the first and last word after the split. I'm not sure it can easily be done in one shot..
walter78
Forum Newbie
Posts: 3
Joined: Fri Nov 18, 2005 12:38 pm

Post by walter78 »

OK, thank you, it works!

The fact that I can't check if the parens exists in string and catch the words in the same regexp was suprising for me. I'm trying to port some regular expressions from .NET to PHP. In .NET it is possible to catch all the words in a single match call.

I have another issue now. Is d11wtq said , I can use the

Code: Select all

(preg_match('/^\(.*?\)$/', $string))
code to check if string contains starting and ending parens. Actually I'm trying to catch not just the words, but a key-value pairs enclosed in the parens, like (key1="value1", key2="value2", key3="value3"). I wrote the regular expression for extracting these pairs and it works OK (thank to your forum). The problem now is in checking that the whole string meets the following format: (key1="value1", key2="value2", key3="value3"), in case then it can contain the parenthesis enclosed in quotes: (key1=")"). Currently I'm using the following code for the first check:

Code: Select all

if (preg_match('/\([^\)]*\)/', $string, $matches ))
	{ 
		$string = $matches[0];
		/* extracting key-value pairs from the $string */
	}
It works OK, but it stops if the parens included to the value part of any key-value pairs. May you please advice me anything?

Thank you.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Code: Select all

$string = '(key1="value1" key2="values2" key3="value3" key4="value4" key5="value5")';
if (preg_match('/^\(.*?\)$/', $string)) //Check the opening and closing parens are there
{
    preg_match_all('/\b(\w+)="([^"]+)"/', $string, $matches);
    $key_value_pairs = array_combine($matches[1], $matches[2]); //Use backref 1 as keys and backref 2 as values
    print_r($key_value_pairs);
}
else echo 'This string doesn\'t look right.';
I'm not saying that you can't do it with a single regex but it doesn't spring to mind for me. If you showed me the regex you used in .NET I'd be curious how it worked. These are just standard Perl Style regex... I don't know .NET but I'd hazard a guess that it's POSIX regex or some proprietary MS stuff.
Post Reply