Page 1 of 1

regex with pipes?

Posted: Mon Aug 05, 2013 10:32 pm
by Eric!
I ran across some old code using regex like this for allowing alphanumeric with dot, dash and underscores:

Code: Select all

preg_match('|^[0-9.a-zA-Z_-]*$|', $value)
I was surprised to find it actually seemed to work, but I don't know why. The period isn't escaped and what's up with the pipes? I've only seen patterns done like /pattern/. I've found some cases where a similar regex from the same coder fails which makes me suspect they weren't properly tested. For example:

Code: Select all

preg_match('|[a-zA-Z]|', $value)
Seems to pass anything that has at least letter in it, but it is missing the string start ^ and $ so I would assume that it looks for one passing condition on any character and then returns a boolean result. The programmer was using this to validate alpha character strings, which obviously isn't correct, so I'm suspicious about all of the regex patterns.

Re: regex with pipes?

Posted: Mon Aug 05, 2013 11:15 pm
by requinix
The delimiters, the slashes you're used to, don't actually have to be slashes. They can be pretty much any character you want - just make sure you have one at the beginning and at the end before any flags, and that you escape any uses of it inside the expression. / and # are most common or popular, but I've seen ! ~ | used too.

As for the period, the rules inside character sets change a bit: many metacharacters lose their special meaning. Like . + * ( ) { } $ all become just regular literal characters while ^ and - gain new/different meanings. So while you could escape that period if you wanted to, it's really not necessary because there's nothing to "escape".

Side note: if you want to validate a string containing only letters, ctype_alpha is better.

Re: regex with pipes?

Posted: Mon Aug 05, 2013 11:31 pm
by ragax
Couldn't have said it better than requinix!

Other delimiters I like are tildes and commas --- but I may be alone on that one.
I'll add that the forward slash is one of the worst you could choose as a standard delimiter because sooner or later you'll want to match urls, which will give you this kind of soup:
$pattern = '/http:\/\/www.you.com\/pics\//';

In your character class, note that you have all the elements of \w: 0-9, a-z, A-Z, and underscore.
So you could streamline the pattern to '|^[-.\w]*$|'
Err, I meant, ',^[-.\w]*$,'
:wink:

Re: regex with pipes?

Posted: Mon Aug 05, 2013 11:50 pm
by Eric!
Thanks. I couldn't find any specs on the delimiters (at least on the php side of the documentation) and I didn't know that the character rules changed. Like I said this is some old code and I found a bug with the alpha filter and when I was looking deeper I saw these other odd regex patterns. Now I know. :)

Re: regex with pipes?

Posted: Tue Aug 06, 2013 12:29 am
by requinix
ragax wrote:Other delimiters I like are tildes and commas --- but I may be alone on that one.
Really? Commas? Commas? Yeah... :crazy:

On the subject of picking delimiters, besides avoiding characters you're using in the expression (because let's be honest: backslashes look pretty ugly) I'd suggest avoiding metacharacters too. Like pipes. People accustomed to reading regular expressions will see the pipes and do a double-take because their first thought will be "oh, alternation... but wait that doesn't fit...".

And the documentation? Delimiters. But like with other complicated subjects, what you'll find on php.net is best suited just for those quick questions like "what's this symbol mean" and "what's the syntax for a negative lookbehind". Like Wikipedia: a good starting point and it definitely fills a niche, but if you want to truly learn about a subject then consider looking somewhere else for more in-depth explanations and guidance. Our sticky mentions a few things; regular-expressions.info is another good place.

Re: regex with pipes?

Posted: Tue Aug 06, 2013 12:52 am
by ragax
Yes, commas are kind of crazy, that's why I like to use them sometimes.
For everyday use, though, tildes: less risky, more elegant.
consider looking somewhere else for more in-depth explanations and guidance. Our sticky mentions a few things; regular-expressions.info is another good place.
But for more advanced stuff, RexEgg.com is really the best. Hell, I ought to think so, I made it. :wink:
But really, it does go into a number of features that regular-expressions.info doesn't cover. (At least not yet. The main reason it doesn't cover them is that these features are not yet implemented in RegexBuddy, by the same author. But that's coming in RB4.)

And if you're looking for really in-depth information about PCRE (which php regex functions are built on), then the PCRE manual is pretty great.

After that, if you're still hungry for more... Just ask Requinix. :D