Page 1 of 1

problem with preg_match_all and <> charecters

Posted: Fri Mar 27, 2009 12:59 pm
by rossati
Hello

I try to extract fragments between <> with :

Code: Select all

preg_match_all('/<.*?>/',$line,$arrm);
this works with Solmetra Regular Expression Test: the text

Code: Select all

<red><green><blue><magenta>
with

Code: Select all

/<.*?>/
tell correctly

Code: Select all

 
Array
(
    [0] => Array
        (
            [0] => <red>
            [1] => <green>
            [2] => <blue>
            [3] => <magenta>
        )
 
)
 
and suggest

Code: Select all

preg_match_all('/<.*?>/', '<red><green><blue><magenta>', $arr, PREG_PATTERN_ORDER);
The same code doesn't works in my PHP (5.2.0) and in 4.3.9.

best regards

Giovanni Rossati

Re: problem with preg_match_all and <> charecters

Posted: Fri Mar 27, 2009 1:44 pm
by Christopher
Perhaps this:

Code: Select all

preg_match_all('/[^\<\>]*/', '<red><green><blue><magenta>', $arr);
$arr = array_filter($arr[0], 'strlen');

Re: problem with preg_match_all and <> charecters

Posted: Fri Mar 27, 2009 2:13 pm
by prometheuzz
arborint wrote:Perhaps this:

Code: Select all

preg_match_all('/[^\<\>]*/', '<red><green><blue><magenta>', $arr);
$arr = array_filter($arr[0], 'strlen');
Note that there is not need to escape the '<' and '>':

Code: Select all

'/[^\]*/'
and

Code: Select all

'/[^<>]*/'
are equivalent.

Re: problem with preg_match_all and <> charecters

Posted: Fri Mar 27, 2009 2:27 pm
by rossati
arborint wrote:Perhaps this:

Code: Select all

preg_match_all('/[^\<\>]*/', '<red><green><blue><magenta>', $arr);
$arr = array_filter($arr[0], 'strlen');
thanks

also this seem works:

Code: Select all

preg_match_all('/[^<>]+/', '<red><green><blue><magenta>', $arr);
 
but what I can't understand is why this

Code: Select all

preg_match_all('/\[.*?\]/', '[red][green][blue][magenta]', $arr, PREG_PATTERN_ORDER);
 
works ?

Re: problem with preg_match_all and <> charecters

Posted: Fri Mar 27, 2009 3:22 pm
by prometheuzz
The pattern in your original post works as well.

Code: Select all

// PHP 5.2.6 (cli) (built: Nov 11 2008 21:47:45) 
// Copyright (c) 1997-2008 The PHP Group
// Zend Engine v2.2.0, Copyright (c) 1998-2008 Zend Technologies
 
if(preg_match_all('/<.*?>/', '<red><green><blue><magenta>', $arr)) {
  print_r($arr);
}
 
/* output:
Array
(
    [0] => Array
        (
            [0] => <red>
            [1] => <green>
            [2] => <blue>
            [3] => <magenta>
        )
 
)
*/
If it doesn't, then there's something seriously messed up with the regex engine of your PHP installation.

Re: problem with preg_match_all and <> charecters

Posted: Fri Mar 27, 2009 3:47 pm
by Christopher
prometheuzz wrote:Note that there is not need to escape the '<' and '>':
Just habit, though I think a good one. I got tired of having one character need to be escaped and having to go back and idit, so I just escape every non-alphanumeric character that I am not using as a meta-character. There are so many common characters used meta-characters that I think being explicit is clearer.

Re: problem with preg_match_all and <> charecters

Posted: Fri Mar 27, 2009 4:56 pm
by prometheuzz
arborint wrote:
prometheuzz wrote:Note that there is not need to escape the '<' and '>':
Just habit, though I think a good one.


It makes your regex overly verbose, IMO.
arborint wrote:I got tired of having one character need to be escaped and having to go back and idit, so I just escape every non-alphanumeric character that I am not using as a meta-character. There are so many common characters used meta-characters that I think being explicit is clearer.
Perhaps for someone not familiar with regex. But someone who is familiar with them, will most likely disagree with you (as I do).

Re: problem with preg_match_all and <> charecters

Posted: Fri Mar 27, 2009 5:35 pm
by Christopher
prometheuzz wrote:Perhaps for someone not familiar with regex. But someone who is familiar with them, will most likely disagree with you (as I do).
Well I am someone familiar with regex, so all we know is that half of people familiar with regex disagree. Unless you are authorized to speak for all regex users (I hadn't been informed). ;)

Re: problem with preg_match_all and <> charecters

Posted: Fri Mar 27, 2009 5:45 pm
by prometheuzz
arborint wrote:
prometheuzz wrote:Perhaps for someone not familiar with regex. But someone who is familiar with them, will most likely disagree with you (as I do).
Well I am someone familiar with regex, so all we know is that half of people familiar with regex disagree. Unless you are authorized to speak for all regex users (I hadn't been informed). ;)
Hence the "most likely" in my response.
Anyway, perhaps I meant to say "more familiar".
; )

Really, I don't mean to say this to put you down or something, but if you truly escape all characters other than alpha-numerics, I really think you're over doing it. Especially inside a character class where most regex-meta characters don't have any special meaning to begin with.

Re: problem with preg_match_all and <> charecters

Posted: Fri Mar 27, 2009 9:57 pm
by Christopher
prometheuzz wrote:Really, I don't mean to say this to put you down or something, but if you truly escape all characters other than alpha-numerics, I really think you're over doing it. Especially inside a character class where most regex-meta characters don't have any special meaning to begin with.
Yeah, for regex pros it is over doing it -- even annoying to some. My comment was meant for the original poster because I have found that escaping symbols that you want to be literals reduces problems. It has been my experience that there are really very few people who truly understand regular expressions. Most of the questions here are from people who found an example somewhere and when they try to change it it explodes. So I thing the consistency helps.