Extract PHP blocks?

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Extract PHP blocks?

Post by alex.barylski »

I am trying to build a regex to extract all the PHP blocks within a source file, I have something like:

Code: Select all

 
$matches = array();
preg_match_all('#<\?php(.+)#', $source, $matches, PREG_SET_ORDER);
 
It doesn't work quite as expected...it seems to stop when it reaches the first linebreak and obviously is missing additional tests to be complete...

I would like it to be as robust as possible, taking into account the "last" closing PHP block is not required and that the starting PHP block can be either <? or <?php (<% is not required).

How do I make the <? required and the trailing 'php' optional without grouping the 'php' in a [] or ()

Also, do I need to use $ to tell the regex to continue until the end of the source file or is this already greedy?

p.s-Extracting <script type="php"> is not required either -- this isn't for security but source code metrics

Cheers,
Alex
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Extract PHP blocks?

Post by Weirdan »

You need multiline regexp, that's what 'm' and 's' regexp modifiers are for. For 'php' part to be optional, but to not be included in captured matches, you need non-capturing parenthesis: (?:something)? makes 'something' optional, but not captures it.
semlar
Forum Commoner
Posts: 61
Joined: Fri Feb 20, 2009 10:45 pm

Re: Extract PHP blocks?

Post by semlar »

PCSpectra wrote:How do I make the <? required and the trailing 'php' optional without grouping the 'php' in a [] or ()
You can't do that. The only reason I can think of that you wouldn't want to group something is to prevent a backreference, which you can do like this.. <\?(?:php)?

The reason your regex isn't matching the entire file is because a dot character does not match line breaks by default. You can make a dot match a new line with the "s" flag like this (or by setting it after the delimiter).. (?s:.+)

I think the pattern I would personally use for this would be <\?(?i:php\s)?((?:[^?]|\?(?!>))+)\?>

Match <? literally
Optional "php" followed by a space
???
Profit
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Extract PHP blocks?

Post by alex.barylski »

Thanks for the speedy replies...

QUick question before I go and use the above regex...does this regex work in preg_match_all() or is it POSIX regex?
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Extract PHP blocks?

Post by Weirdan »

It's pcre (preg_*). I don't think people use posix regexps nowadays :)
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Extract PHP blocks?

Post by prometheuzz »

PCSpectra wrote:I am trying to build a regex to extract all the PHP blocks within a source file, I have something like:

Code: Select all

 
$matches = array();
preg_match_all('#<\?php(.+)#', $source, $matches, PREG_SET_ORDER);
 
It doesn't work quite as expected...
Be careful with those greedy DOT-PLUS thingies!

Try this:

Code: Select all

preg_match_all('#<\?(?:php)?(?:(?!\?>).)*\?>#si', $source, $matches, PREG_SET_ORDER);
Post Reply