Match words if there is NOT a specific character to the left

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
xkrja
Forum Newbie
Posts: 1
Joined: Wed Apr 15, 2009 8:55 am

Match words if there is NOT a specific character to the left

Post by xkrja »

I have the following regex:

string syntaxPattern = @"(\bif\b)|(\bwhile\b)|(\bclassdef\b)|(\bproperties\b)|(\bend\b)|(\bmethods\b)|(\bfunction\b)|(\belse\b)|(\bfor\b)";

It's for syntax highlighting. Now I want to add a restriction that says:

If there is a '%' character somewhere to the left of any of these words, then DON'T consider it as a match.

How can I add this restriction?

Thanks for help!
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Match words if there is NOT a specific character to the left

Post by prometheuzz »

xkrja wrote:...

If there is a '%' character somewhere to the left of any of these words, then DON'T consider it as a match.

How can I add this restriction?
You can't. Whenever you match a certain string, you can't go back a variable amount of characters. You can go back a fixed number of characters, but not a variable amount. This feature, also called "variable length look-behind" is not supported in PHP's regex engine.
Note that parsing a programming language with regex-es only is practically impossible (it depends on the language of course...).
Consider the following source:

Code: Select all

class Foo {
 
  % this is a comment
 
  $valueA = "% this is not a comment"
 
  % "this is not a string"
 
  $valueB = "this \" is \\\" a string";
}
Do you see all the things that can go wrong?
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Re: Match words if there is NOT a specific character to the left

Post by php_east »

did some researching. found a way to cheat.

Code: Select all

$regex1 = "#(\bif\b)|(\bwhile\b)|(\bclassdef\b)|(\bproperties\b)|(\bend\b)|(\bmethods\b)|(\bfunction\b)|(\belse\b)|(\bfor\b)#";
$regex2 = "#^[^%]+#";
 
$array  = array();
$array[] = 'if';
$array[] = 'methods';
$array[] = '%methods';
$array[] = 'else';
$array[] = 'end';
$array[] = '%end';
$array[] = 'not_highlghted';
$array[] = '%if % exists in beginning do not pick it up';
$array[] = 'if % exists elsewhere pick it up';
 
$grep1 = preg_grep($regex1,$array);
$grep2 = preg_grep($regex2,$array);
$result_array = array_intersect_assoc($grep1,$grep2);
 
print_r($grep1);
echo '<hr />';
print_r($grep2);
echo '<hr />';
print_r($result_array);
results

Code: Select all

Array ( [0] => if [1] => methods [2] => %methods [3] => else [4] => end [5] => %end [7] => %if % exists in beginning do not pick it up [8] => if % exists elsewhere pick it up ) 
Array ( [0] => if [1] => methods [3] => else [4] => end [6] => not_highlghted [8] => if % exists elsewhere pick it up ) 
Array ( [0] => if [1] => methods [3] => else [4] => end [8] => if % exists elsewhere pick it up )
Post Reply