problem when there are multi-match strings

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
topace
Forum Newbie
Posts: 2
Joined: Fri Jun 12, 2009 12:06 pm

problem when there are multi-match strings

Post by topace »

Hi I'm new in regex. I hope someone can help me out.

I have a string whch might include multi-substrings I want to search, like this:

$mystring = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here"></PLAYER>
some text here
<PLAYER TYPE="smplayer" TITLE="another title" ARTIST="some one else" URL="soncond link here"></PLAYER>
some text here';

I want to get the values of TYPE, TITLE, ARTIST and URL from each <PLAYER...></PLAYER> element, then do a replacement of the element. I can use looping to get match and raplce one by one so that I have the number of players and values for each. I use preg_match("/<PLAYER (.*)></PLAYER>/", $mystring, $matches), expecting to get the first pair of <PLAYER..></PLAYER>. But I didn't. The returned result was the string between the first "<PLAYER..>" and the last closing "</PLAYER>".

I don't know why. what I'm doing wrong?
User avatar
Popcorn
Forum Commoner
Posts: 55
Joined: Fri Feb 21, 2003 5:19 am

Re: problem when there are multi-match strings

Post by Popcorn »

patterns are usually what is known as greedy, which means they gobble up as much as possible when they try to match. you can turn this greediness off. you may also consider that inside your "(.*)", you do not want to capture just anything, but specifically you don't want anything after the next ">" it encounters.

[EDIT] @prometheuzz ... yes, greedy "quantifiers", brain couldn't find the word for some reason :)
Last edited by Popcorn on Tue Jun 16, 2009 3:13 am, edited 1 time in total.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: problem when there are multi-match strings

Post by prometheuzz »

Something like this?

Code: Select all

$text = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here"></PLAYER>
some text here
<PLAYER TYPE="smplayer" TITLE="another title" ARTIST="some one else" URL="soncond link here"></PLAYER>
some text here';
preg_match_all('/<PLAYER(?=[^>]*(ARTIST="[^"]+"))(?=[^>]*(URL="[^"]+"))(?=[^>]*(TYPE="[^"]+"))/i', 
        $text, $matches, PREG_SET_ORDER);
print_r($matches);
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: problem when there are multi-match strings

Post by prometheuzz »

Popcorn wrote:patterns are usually what is known as greedy, ...
You most probably meant it correct, but it's not the patterns that are greedy, it's the quantifiers (+, *, ? and {a,b} where 'a' and 'b' are numbers and 'a' >= 'b') that are greedy. Besides that small remark, the rest of your post is sound advice!
topace
Forum Newbie
Posts: 2
Joined: Fri Jun 12, 2009 12:06 pm

Re: problem when there are multi-match strings

Post by topace »

Thank you, Popcorn and prometheuzz, for the replies. They are very helpful.

@Popcorns
I found out adding a "?" after "*" will turn off the greediness.

Code: Select all

preg_match("/<PLAYER (.*?)></PLAYER>/", $mystring, $matches);
@prometheuzz:

Your code is very useful. The printing out result is

Array ( [0] => Array ( [0] => ARTIST="some one" [2] => URL="a link here" [3] => TYPE="smplayer" ) [1] => Array ( [0] => ARTIST="some one else" [2] => URL="soncond link here" [3] => TYPE="smplayer" ) )

Any idea why the Array[0][1] is missing?

one more question, if this is an optional item:

Code: Select all

$text = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here" OPTION1="o1"></PLAYER>
I tried this:

Code: Select all

preg_match('/<PLAYER(?=[^>]*(ARTIST="[^"]+"))(?=[^>]*(URL="[^"]+"))(?=[^>]*(TYPE="[^"]+"))(?=[^>]*(OPTION1="[^"]+")?)/i',  $text, $matches);
It didn't work.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: problem when there are multi-match strings

Post by prometheuzz »

topace wrote:...
@prometheuzz:

Your code is very useful. The printing out result is

Array ( [0] => Array ( [0] => ARTIST="some one" [2] => URL="a link here" [3] => TYPE="smplayer" ) [1] => Array ( [0] => ARTIST="some one else" [2] => URL="soncond link here" [3] => TYPE="smplayer" ) )

Any idea why the Array[0][1] is missing?
No. When I runt that code, it produces the following output:

Code: Select all

Array
(
    [0] => Array
        (
            [0] => <PLAYER
            [1] => ARTIST="some one"
            [2] => URL="a link here"
            [3] => TYPE="smplayer"
        )
 
    [1] => Array
        (
            [0] => <PLAYER
            [1] => ARTIST="some one else"
            [2] => URL="soncond link here"
            [3] => TYPE="smplayer"
        )
 
)
topace wrote:one more question, if this is an optional item:

Code: Select all

$text = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here" OPTION1="o1"></PLAYER>
I tried this:

Code: Select all

preg_match('/<PLAYER(?=[^>]*(ARTIST="[^"]+"))(?=[^>]*(URL="[^"]+"))(?=[^>]*(TYPE="[^"]+"))(?=[^>]*(OPTION1="[^"]+")?)/i',  $text, $matches);
It didn't work.
No, then my "trick" doesn't work. You will have to do something like t his:

Code: Select all

$text = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here" OPTION1="o1"></PLAYER>
<PLAYER TYPE="player" TITLE="title" ARTIST="one" URL="a link"></PLAYER>';
preg_match_all('/<PLAYER\s+(TYPE="[^"]+")\s+(TITLE="[^"]+")\s+(ARTIST="[^"]+")\s+(URL="[^"]+")\s*(OPTION1="[^"]+")?/i',
        $text, $matches, PREG_SET_ORDER);
print_r($matches);
Note that the above will NOT work (opposed to my first suggestion) if the attributes are not in the correct order. For example: it fails when "TITLE" comes before "ARTIST".

But, it looks like you're parsing (s)html, have you considered using an html parser?
User avatar
Popcorn
Forum Commoner
Posts: 55
Joined: Fri Feb 21, 2003 5:19 am

Re: problem when there are multi-match strings

Post by Popcorn »

what about a conditional?

Code: Select all

$text = '<PLAYER TYPE="smplayer" TITLE="some title" ARTIST="some one" URL="a link here"></PLAYER>
some text here
<PLAYER TYPE="smplayer" TITLE="another title" URL="soncond link here" ARTIST="some one else" OPTION1="o1"></PLAYER>
some text here';
preg_match_all('/<PLAYER(?=[^>]*(ARTIST="[^"]+"))(?=[^>]*(URL="[^"]+"))(?=[^>]*(TYPE="[^"]+"))(?(?=[^>]*OPTION1="[^"]+")(?=[^>]*(OPTION1="[^"]+"))|)/i',  $text, $matches, PREG_SET_ORDER);

Code: Select all

Array(
    [0] => Array        (
            [0] => <PLAYER
            [1] => ARTIST="some one"
            [2] => URL="a link here"
            [3] => TYPE="smplayer"
        )
    [1] => Array        (
            [0] => <PLAYER
            [1] => ARTIST="some one else"
            [2] => URL="soncond link here"
            [3] => TYPE="smplayer"
            [4] => OPTION1="o1"
        )
)
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: problem when there are multi-match strings

Post by prometheuzz »

Popcorn wrote:what about a conditional?
...
Clever!
Post Reply