Page 1 of 1
problem when there are multi-match strings
Posted: Fri Jun 12, 2009 2:53 pm
by topace
Hi I'm new in regex. I hope someone can help me out.
I have a string whch might include multi-substrings I want to search, like this:
$mystring = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here"></PLAYER>
some text here
<PLAYER TYPE="smplayer" TITLE="another title" ARTIST="some one else" URL="soncond link here"></PLAYER>
some text here';
I want to get the values of TYPE, TITLE, ARTIST and URL from each <PLAYER...></PLAYER> element, then do a replacement of the element. I can use looping to get match and raplce one by one so that I have the number of players and values for each. I use preg_match("/<PLAYER (.*)></PLAYER>/", $mystring, $matches), expecting to get the first pair of <PLAYER..></PLAYER>. But I didn't. The returned result was the string between the first "<PLAYER..>" and the last closing "</PLAYER>".
I don't know why. what I'm doing wrong?
Re: problem when there are multi-match strings
Posted: Sun Jun 14, 2009 3:25 pm
by Popcorn
patterns are usually what is known as greedy, which means they gobble up as much as possible when they try to match. you can turn this greediness off. you may also consider that inside your "(.*)", you do not want to capture just anything, but specifically you don't want anything after the next ">" it encounters.
[EDIT] @prometheuzz ... yes, greedy "quantifiers", brain couldn't find the word for some reason

Re: problem when there are multi-match strings
Posted: Mon Jun 15, 2009 7:01 am
by prometheuzz
Something like this?
Code: Select all
$text = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here"></PLAYER>
some text here
<PLAYER TYPE="smplayer" TITLE="another title" ARTIST="some one else" URL="soncond link here"></PLAYER>
some text here';
preg_match_all('/<PLAYER(?=[^>]*(ARTIST="[^"]+"))(?=[^>]*(URL="[^"]+"))(?=[^>]*(TYPE="[^"]+"))/i',
$text, $matches, PREG_SET_ORDER);
print_r($matches);
Re: problem when there are multi-match strings
Posted: Mon Jun 15, 2009 7:05 am
by prometheuzz
Popcorn wrote:patterns are usually what is known as greedy, ...
You most probably meant it correct, but it's not the patterns that are greedy, it's the quantifiers (+, *, ? and {a,b} where 'a' and 'b' are numbers and 'a' >= 'b') that are greedy. Besides that small remark, the rest of your post is sound advice!
Re: problem when there are multi-match strings
Posted: Mon Jun 15, 2009 5:45 pm
by topace
Thank you, Popcorn and prometheuzz, for the replies. They are very helpful.
@Popcorns
I found out adding a "?" after "*" will turn off the greediness.
Code: Select all
preg_match("/<PLAYER (.*?)></PLAYER>/", $mystring, $matches);
@prometheuzz:
Your code is very useful. The printing out result is
Array ( [0] => Array ( [0] => ARTIST="some one" [2] => URL="a link here" [3] => TYPE="smplayer" ) [1] => Array ( [0] => ARTIST="some one else" [2] => URL="soncond link here" [3] => TYPE="smplayer" ) )
Any idea why the Array[0][1] is missing?
one more question, if this is an
optional item:
Code: Select all
$text = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here" OPTION1="o1"></PLAYER>
I tried this:
Code: Select all
preg_match('/<PLAYER(?=[^>]*(ARTIST="[^"]+"))(?=[^>]*(URL="[^"]+"))(?=[^>]*(TYPE="[^"]+"))(?=[^>]*(OPTION1="[^"]+")?)/i', $text, $matches);
It didn't work.
Re: problem when there are multi-match strings
Posted: Tue Jun 16, 2009 4:33 am
by prometheuzz
topace wrote:...
@prometheuzz:
Your code is very useful. The printing out result is
Array ( [0] => Array ( [0] => ARTIST="some one" [2] => URL="a link here" [3] => TYPE="smplayer" ) [1] => Array ( [0] => ARTIST="some one else" [2] => URL="soncond link here" [3] => TYPE="smplayer" ) )
Any idea why the Array[0][1] is missing?
No. When I runt that code, it produces the following output:
Code: Select all
Array
(
[0] => Array
(
[0] => <PLAYER
[1] => ARTIST="some one"
[2] => URL="a link here"
[3] => TYPE="smplayer"
)
[1] => Array
(
[0] => <PLAYER
[1] => ARTIST="some one else"
[2] => URL="soncond link here"
[3] => TYPE="smplayer"
)
)
topace wrote:one more question, if this is an
optional item:
Code: Select all
$text = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here" OPTION1="o1"></PLAYER>
I tried this:
Code: Select all
preg_match('/<PLAYER(?=[^>]*(ARTIST="[^"]+"))(?=[^>]*(URL="[^"]+"))(?=[^>]*(TYPE="[^"]+"))(?=[^>]*(OPTION1="[^"]+")?)/i', $text, $matches);
It didn't work.
No, then my "trick" doesn't work. You will have to do something like t his:
Code: Select all
$text = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here" OPTION1="o1"></PLAYER>
<PLAYER TYPE="player" TITLE="title" ARTIST="one" URL="a link"></PLAYER>';
preg_match_all('/<PLAYER\s+(TYPE="[^"]+")\s+(TITLE="[^"]+")\s+(ARTIST="[^"]+")\s+(URL="[^"]+")\s*(OPTION1="[^"]+")?/i',
$text, $matches, PREG_SET_ORDER);
print_r($matches);
Note that the above will NOT work (opposed to my first suggestion) if the attributes are not in the correct order. For example: it fails when "TITLE" comes before "ARTIST".
But, it looks like you're parsing (s)html, have you considered using an html parser?
Re: problem when there are multi-match strings
Posted: Tue Jun 16, 2009 4:49 am
by Popcorn
what about a conditional?
Code: Select all
$text = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here"></PLAYER>
some text here
<PLAYER TYPE="smplayer" TITLE="another title" URL="soncond link here" ARTIST="some one else" OPTION1="o1"></PLAYER>
some text here';
preg_match_all('/<PLAYER(?=[^>]*(ARTIST="[^"]+"))(?=[^>]*(URL="[^"]+"))(?=[^>]*(TYPE="[^"]+"))(?(?=[^>]*OPTION1="[^"]+")(?=[^>]*(OPTION1="[^"]+"))|)/i', $text, $matches, PREG_SET_ORDER);
Code: Select all
Array(
[0] => Array (
[0] => <PLAYER
[1] => ARTIST="some one"
[2] => URL="a link here"
[3] => TYPE="smplayer"
)
[1] => Array (
[0] => <PLAYER
[1] => ARTIST="some one else"
[2] => URL="soncond link here"
[3] => TYPE="smplayer"
[4] => OPTION1="o1"
)
)
Re: problem when there are multi-match strings
Posted: Tue Jun 16, 2009 4:52 am
by prometheuzz
Popcorn wrote:what about a conditional?
...
Clever!