Page 1 of 1

Please help me learn regex

Posted: Sat May 26, 2007 12:17 pm
by blackout
Hello, I found this forum when trying to get a solution for my regex problem.

My real problem is like this:

I have this string: abc <def> /*ghi<xyz>jkl*/ mno <pqr> stu
and I want to get <def> and <pqr> only.

But after going around, I have a starting solution, strangely it doesn't work for my case:

Code: Select all

preg_match_all('/{[^{}]*}/', 'abc {def} mno {pqr} stu', $matches);
using the above syntax, I can get {def} and {pqr}, but when I adapted to my case:

Code: Select all

preg_match_all('/<[^<>]*>/', 'abc <def> mno <pqr> stu', $matches);
it didn't return expected result, great, I guess I have to learn regex more!

So... anyone here want to tell me why it is or maybe directly to the solution of my problem?

Thanks in advance!

Posted: Sat May 26, 2007 12:40 pm
by s.dot
Hi, I have ran both of the regex's on my machine and both returned the expected results.

Code: Select all

C:\Users\HP_Administrator>php -r "preg_match_all('/{[^{}]*}/', 'abc {def} mno {pqr} stu', $matches);  print_r($matches);"
Array
(
    [0] => Array
        (
            [0] => {def}
            [1] => {pqr}
        )

)

C:\Users\HP_Administrator>php -r "preg_match_all('/<[^<>]*>/', 'abc <def> mno <pqr> stu', $matches);  print_r($matches);"
Array
(
    [0] => Array
        (
            [0] => <def>
            [1] => <pqr>
        )

)

C:\Users\HP_Administrator>

Posted: Sat May 26, 2007 12:59 pm
by blackout
oh sorry, I guess I'm so stressed with my problem :(
yes, it returns as expected, but because I run the script and send the output to webpage it's being translated as tags.
Okay, glad it works, so the remaining problem is my real problem. Any advice are appreciated, thanks!

Posted: Sat May 26, 2007 1:49 pm
by stereofrog
hi

expression that ignores comments would look like this

Code: Select all

// match <xxx>'s if not within /* ... */
$re = '~
	<
		([^<>]+)
	>
	(?! 
		\*/
	)
	(?=
		(?:
			.
			(?!
				\*/
			)
		)
		+?
		(?:
			/\*
			|
			$
		)
	)
~x';
preg_match_all($re, $source, $m);
$result = $m[1];
If this seems a bit too complex for you ;) you also can use a much simpler one at the price of extra function call:

Code: Select all

$re = '~ /\* .*? \*/ | <(.*?)> ~x'; // match comments OR <xxxx>'s
preg_match_all($re, $source, $m);
$result = array_filter($m[1]); // strip empty matches, i.e. comments
hope this helps

Posted: Sun May 27, 2007 12:42 am
by blackout
Could you tell me how it's working for this code?

Code: Select all

(?!
      \*/
   )
   (?=
      (?:
         .
         (?!
            \*/
         )
      )
      +?
      (?:
         /\*
         |
         $
      )
   )
I do understand the meaning of (?! or (?= etc (I can read it on reference), but I can't follow the logic. For example why does (?! \*/ ) come first while we're looking for /* ... */ ? Why should we use . (period) here (?: . (?! \*/)) ? etc.

Thanks.

Posted: Mon May 28, 2007 9:49 pm
by blackout
anyone? or maybe someone can guide me what the regex if the comment begin and end with the same character, for example using '&' (without quotes).

Posted: Tue May 29, 2007 7:08 am
by stereofrog
Hi, sorry for not responding, didn't see your reply for the first time.

Here's a more verbose version of the regexp above, I often use variable substitution like this as general technique to understand complex regexps

Code: Select all

$tag = "< ([^<>]+) >";
$open_bracket = '/\*';
$close_bracket = '\*/';
$not_followed_by = '?!';
$followed_by = '?=';
$any_char = '.';
$or = '|';
$and = '';
$many_times = '+?';
$end = '$';

$re = "~
	$tag
   ($not_followed_by $close_bracket)
   $and
   ($followed_by
      ( $any_char ($not_followed_by $close_bracket )) $many_times
	  $and ( $open_bracket $or $end )
   )
~xs";


$source = "
	<tag> /* bbb*/
	<tag2> blah
	/* zzz <tag3> yyy */
	and /* <tag4>*/ and <tag5>
";

preg_match_all($re, $source, $m);
$result = $m[1];

Posted: Tue May 29, 2007 8:23 am
by superdezign
stereofrog wrote:

Code: Select all

$tag = "< ([^<>]+) >";
$open_bracket = '/\*';
$close_bracket = '\*/';
$not_followed_by = '?!';
$followed_by = '?=';
$any_char = '.';
$or = '|';
$and = '';
$many_times = '+?';
$end = '$';
I've never seen such a clear explanation before. Every time I do any complex regex, after I'm done, if I ever look at it again, I'll get angry, confused, and end up redoing it.

Posted: Tue May 29, 2007 9:21 am
by blackout
stereofrog, you're my MAN!!! thanks, it's far better than any regex tutorials around which only tell /.*/ /a*b/ /^abc/ :evil: regex is powerful but yet complicated (no wonder perl was extinct :lol:).

Btw, I haven't figured out how if the open and close bracket is the same character like my previous question, we can't just subtitute those variables, are we?

Posted: Tue May 29, 2007 11:44 am
by stereofrog
for single-char comment delimiters I think it should look like this

Code: Select all

$re = "~
	< ([^<>]+) >        # tag
	(?=                 # followed by
		(
			& [^&]+ &   # comment
			|           # or
			[^&]+       # anything that is not a comment
		)*              # gimme some more
		$               
	)
~xs";

Posted: Wed May 30, 2007 11:56 am
by blackout
Thanks a lot for your help, stereofrog, I appreciate it.

My tries was so far <([^<>]+)>(?=(&.*&)) -> yeah, something like this, and didn't work :oops:

Mmm... I think regex is pretty difficult for me, it's not something we can just say 'take this condition except that condition' :roll: okay, maybe I need to learn more...