Delimiter woes

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Delimiter woes

Post by Maugrim_The_Reaper »

I'm currently working on a parser for the YAML format. Since the lexer needs to scan for tokens, it uses quite a bit of regex and this one has me stumped. Maybe after another few cups of coffee I'll break it ;).

The regex itself looks fine, it matches checks there are no "special" characters at the start of a new line, and for a subset of those (which are allowed) that they are not followed by any tabs, spaces, null bytes or other wonky things. So far so good. When using it the lexer coughs up an Exception and PHP sends out a Warning:
Warning: preg_match() [function.preg-match]: No ending delimiter '£' found in Yaml\Lexer.php on line 222
I used a unique delimiter in case it was a loose non-escaped forward slash or something obvious I missed. Left the double quotes in place around the regex string...

Code: Select all

"£^([^\0 \t\r\n\x85\-?:,\[\]{}#&*!|>'\"%@]|([\-?:][^\0 \t\r\n\x85]))£"
Anyone have a clue where it's gone wrong?
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Post by Maugrim_The_Reaper »

I figured out it's the null byte "\0" truncating the string - need to find PHP's alternative...;).

Edit: \x00 not working either...hmm
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Post by Maugrim_The_Reaper »

It's a stupid question likely but what is the regex for an empty string? Sure, I can fire off an [$str == ''] comparison but wondering if regex (presumably) has this for PHP.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Because you're using a double quote string a lot of escaping is required for multiple characters in your string.

The following compiled fine for myself.

Code: Select all

<?php

$p = "£^([^\\0 \\t\\r\\n\x85\\-?:,\\[\\]{}#&*!|>'\"%@]|([-?:][^\\0 \\t\\r\\n\x85]))£";

preg_match($p, '');

?>
Post Reply