Please help me learn regex

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
blackout
Forum Newbie
Posts: 10
Joined: Fri May 25, 2007 11:54 am

Please help me learn regex

Post by blackout »

Hello, I found this forum when trying to get a solution for my regex problem.

My real problem is like this:

I have this string: abc <def> /*ghi<xyz>jkl*/ mno <pqr> stu
and I want to get <def> and <pqr> only.

But after going around, I have a starting solution, strangely it doesn't work for my case:

Code: Select all

preg_match_all('/{[^{}]*}/', 'abc {def} mno {pqr} stu', $matches);
using the above syntax, I can get {def} and {pqr}, but when I adapted to my case:

Code: Select all

preg_match_all('/<[^<>]*>/', 'abc <def> mno <pqr> stu', $matches);
it didn't return expected result, great, I guess I have to learn regex more!

So... anyone here want to tell me why it is or maybe directly to the solution of my problem?

Thanks in advance!
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Hi, I have ran both of the regex's on my machine and both returned the expected results.

Code: Select all

C:\Users\HP_Administrator>php -r "preg_match_all('/{[^{}]*}/', 'abc {def} mno {pqr} stu', $matches);  print_r($matches);"
Array
(
    [0] => Array
        (
            [0] => {def}
            [1] => {pqr}
        )

)

C:\Users\HP_Administrator>php -r "preg_match_all('/<[^<>]*>/', 'abc <def> mno <pqr> stu', $matches);  print_r($matches);"
Array
(
    [0] => Array
        (
            [0] => <def>
            [1] => <pqr>
        )

)

C:\Users\HP_Administrator>
blackout
Forum Newbie
Posts: 10
Joined: Fri May 25, 2007 11:54 am

Post by blackout »

oh sorry, I guess I'm so stressed with my problem :(
yes, it returns as expected, but because I run the script and send the output to webpage it's being translated as tags.
Okay, glad it works, so the remaining problem is my real problem. Any advice are appreciated, thanks!
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

hi

expression that ignores comments would look like this

Code: Select all

// match <xxx>'s if not within /* ... */
$re = '~
	<
		([^<>]+)
	>
	(?! 
		\*/
	)
	(?=
		(?:
			.
			(?!
				\*/
			)
		)
		+?
		(?:
			/\*
			|
			$
		)
	)
~x';
preg_match_all($re, $source, $m);
$result = $m[1];
If this seems a bit too complex for you ;) you also can use a much simpler one at the price of extra function call:

Code: Select all

$re = '~ /\* .*? \*/ | <(.*?)> ~x'; // match comments OR <xxxx>'s
preg_match_all($re, $source, $m);
$result = array_filter($m[1]); // strip empty matches, i.e. comments
hope this helps
blackout
Forum Newbie
Posts: 10
Joined: Fri May 25, 2007 11:54 am

Post by blackout »

Could you tell me how it's working for this code?

Code: Select all

(?!
      \*/
   )
   (?=
      (?:
         .
         (?!
            \*/
         )
      )
      +?
      (?:
         /\*
         |
         $
      )
   )
I do understand the meaning of (?! or (?= etc (I can read it on reference), but I can't follow the logic. For example why does (?! \*/ ) come first while we're looking for /* ... */ ? Why should we use . (period) here (?: . (?! \*/)) ? etc.

Thanks.
blackout
Forum Newbie
Posts: 10
Joined: Fri May 25, 2007 11:54 am

Post by blackout »

anyone? or maybe someone can guide me what the regex if the comment begin and end with the same character, for example using '&' (without quotes).
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

Hi, sorry for not responding, didn't see your reply for the first time.

Here's a more verbose version of the regexp above, I often use variable substitution like this as general technique to understand complex regexps

Code: Select all

$tag = "< ([^<>]+) >";
$open_bracket = '/\*';
$close_bracket = '\*/';
$not_followed_by = '?!';
$followed_by = '?=';
$any_char = '.';
$or = '|';
$and = '';
$many_times = '+?';
$end = '$';

$re = "~
	$tag
   ($not_followed_by $close_bracket)
   $and
   ($followed_by
      ( $any_char ($not_followed_by $close_bracket )) $many_times
	  $and ( $open_bracket $or $end )
   )
~xs";


$source = "
	<tag> /* bbb*/
	<tag2> blah
	/* zzz <tag3> yyy */
	and /* <tag4>*/ and <tag5>
";

preg_match_all($re, $source, $m);
$result = $m[1];
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

stereofrog wrote:

Code: Select all

$tag = "< ([^<>]+) >";
$open_bracket = '/\*';
$close_bracket = '\*/';
$not_followed_by = '?!';
$followed_by = '?=';
$any_char = '.';
$or = '|';
$and = '';
$many_times = '+?';
$end = '$';
I've never seen such a clear explanation before. Every time I do any complex regex, after I'm done, if I ever look at it again, I'll get angry, confused, and end up redoing it.
blackout
Forum Newbie
Posts: 10
Joined: Fri May 25, 2007 11:54 am

Post by blackout »

stereofrog, you're my MAN!!! thanks, it's far better than any regex tutorials around which only tell /.*/ /a*b/ /^abc/ :evil: regex is powerful but yet complicated (no wonder perl was extinct :lol:).

Btw, I haven't figured out how if the open and close bracket is the same character like my previous question, we can't just subtitute those variables, are we?
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

for single-char comment delimiters I think it should look like this

Code: Select all

$re = "~
	< ([^<>]+) >        # tag
	(?=                 # followed by
		(
			& [^&]+ &   # comment
			|           # or
			[^&]+       # anything that is not a comment
		)*              # gimme some more
		$               
	)
~xs";
blackout
Forum Newbie
Posts: 10
Joined: Fri May 25, 2007 11:54 am

Post by blackout »

Thanks a lot for your help, stereofrog, I appreciate it.

My tries was so far <([^<>]+)>(?=(&.*&)) -> yeah, something like this, and didn't work :oops:

Mmm... I think regex is pretty difficult for me, it's not something we can just say 'take this condition except that condition' :roll: okay, maybe I need to learn more...
Post Reply