Page 1 of 1

Match words if there is NOT a specific character to the left

Posted: Wed Apr 15, 2009 8:57 am
by xkrja
I have the following regex:

string syntaxPattern = @"(\bif\b)|(\bwhile\b)|(\bclassdef\b)|(\bproperties\b)|(\bend\b)|(\bmethods\b)|(\bfunction\b)|(\belse\b)|(\bfor\b)";

It's for syntax highlighting. Now I want to add a restriction that says:

If there is a '%' character somewhere to the left of any of these words, then DON'T consider it as a match.

How can I add this restriction?

Thanks for help!

Re: Match words if there is NOT a specific character to the left

Posted: Wed Apr 15, 2009 10:11 am
by prometheuzz
xkrja wrote:...

If there is a '%' character somewhere to the left of any of these words, then DON'T consider it as a match.

How can I add this restriction?
You can't. Whenever you match a certain string, you can't go back a variable amount of characters. You can go back a fixed number of characters, but not a variable amount. This feature, also called "variable length look-behind" is not supported in PHP's regex engine.
Note that parsing a programming language with regex-es only is practically impossible (it depends on the language of course...).
Consider the following source:

Code: Select all

class Foo {
 
  % this is a comment
 
  $valueA = "% this is not a comment"
 
  % "this is not a string"
 
  $valueB = "this \" is \\\" a string";
}
Do you see all the things that can go wrong?

Re: Match words if there is NOT a specific character to the left

Posted: Wed Apr 15, 2009 10:54 am
by php_east
did some researching. found a way to cheat.

Code: Select all

$regex1 = "#(\bif\b)|(\bwhile\b)|(\bclassdef\b)|(\bproperties\b)|(\bend\b)|(\bmethods\b)|(\bfunction\b)|(\belse\b)|(\bfor\b)#";
$regex2 = "#^[^%]+#";
 
$array  = array();
$array[] = 'if';
$array[] = 'methods';
$array[] = '%methods';
$array[] = 'else';
$array[] = 'end';
$array[] = '%end';
$array[] = 'not_highlghted';
$array[] = '%if % exists in beginning do not pick it up';
$array[] = 'if % exists elsewhere pick it up';
 
$grep1 = preg_grep($regex1,$array);
$grep2 = preg_grep($regex2,$array);
$result_array = array_intersect_assoc($grep1,$grep2);
 
print_r($grep1);
echo '<hr />';
print_r($grep2);
echo '<hr />';
print_r($result_array);
results

Code: Select all

Array ( [0] => if [1] => methods [2] => %methods [3] => else [4] => end [5] => %end [7] => %if % exists in beginning do not pick it up [8] => if % exists elsewhere pick it up ) 
Array ( [0] => if [1] => methods [3] => else [4] => end [6] => not_highlghted [8] => if % exists elsewhere pick it up ) 
Array ( [0] => if [1] => methods [3] => else [4] => end [8] => if % exists elsewhere pick it up )