Regex...

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Steveo31
Forum Contributor
Posts: 416
Joined: Sun Nov 23, 2003 9:05 pm
Location: San Jose CA

Regex...

Post by Steveo31 »

Few questions on regex. One, which is technically "better", ereg (POSIX) or the preg_ family, Perl right?

Second, I can't find anything that tells me where the symbols should go. All the .* etc. So for instance, this

Code: Select all

ereg("^їa-zA-Z0-9_\.\-]+@їa-zA-Z0-9\-]+\.їa-zA-Z0-9\-\.]+$", $address
Validates email address. My question is how do you determine where to put the +, ., etc?

I are confused.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Regex...

Post by Weirdan »

Steveo31 wrote:Few questions on regex. One, which is technically "better", ereg (POSIX) or the preg_ family, Perl right?
preg_* family has more features and claimed to be faster in php manual. But choice is up to you. Use the variant which you understand better.
Steveo31 wrote: Second, I can't find anything that tells me where the symbols should go. All the .* etc. So for instance, this

Code: Select all

ereg("^їa-zA-Z0-9_\.\-]+@їa-zA-Z0-9\-]+\.їa-zA-Z0-9\-\.]+$", $address
Validates email address. My question is how do you determine where to put the +, ., etc?

I are confused.
Read the manual. PHP manual has really poor chapter on POSIX regexes, preg_* chapter is much better. But anyway, Google is your friend =)
Steveo31
Forum Contributor
Posts: 416
Joined: Sun Nov 23, 2003 9:05 pm
Location: San Jose CA

Post by Steveo31 »

Ya I've googled this thing a lot. Just keep on keepin on I guess...

Thanks.
User avatar
tim
DevNet Resident
Posts: 1165
Joined: Thu Feb 12, 2004 7:19 pm
Location: ohio

Post by tim »

the plus symbol in PCRE syntax indicates that a pattern can have one or more instances.

the pattern being the [a-zA-Z0-9_\.\-] part.

=]
User avatar
tim
DevNet Resident
Posts: 1165
Joined: Thu Feb 12, 2004 7:19 pm
Location: ohio

Post by tim »

and that pattern just means

[a-zA-Z0-9_\.\-]

any symbol, a-z (case insensistive) 0-9, a peroid, or _ or -

the \ escapes the characters, so \- just means the literal - symbol.
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

I've never benchmarked them myself but it does seem to be generally accepted that preg is faster than ereg. On top of that preg is far more flexible/powerful than ereg. Most people seem to lean towards learning ereg first, I don't think this is because the basics are any easier than preg but as ereg is less powerful the tutorials/examples found are less complex and therefore can be easier to understand.

The 'symbols' e.g. + are quantifers for the preceeding character/character class or sub-pattern the list is as follows....

Code: Select all

*        zero or more ("greedy"), similar to {0,}
+        one or more ("greedy"), similar to {1,}
?        zero or one ("greedy"), similar to {0,1}
{n}      exactly n times ("greedy")
{n,}     at least n times ("greedy")
{n,m}    at least n but not more than m times ("greedy")
*?       zero or more ("non-greedy"), similar to {0,}?
+?       one or more ("non-greedy"), similar to {1,}?
??       zero or one ("non-greedy"), similar to {0,1}?
{n}?     exactly n times ("non-greedy")
{n,}?    at least n times ("non-greedy")
{n,m}?   at least n but not more than m times ("non-greedy")

If a curly bracket occurs in any other context, it is treated as a regular character.
Almost all special characters within regex loose their special meaning when within a defined character class (within square brackets) so there is no need to escape them e.g...

Code: Select all

їa-zA-Z0-9_\.\-]
is the same as...

Code: Select all

їa-zA-Z0-9_.-]
Although it is worth mentioning that if you wish to specify a literal - character within a character class you should place this either at the begining or end, if you need/want to include it anywhere else in the character class then you will need to escape it.
Steveo31
Forum Contributor
Posts: 416
Joined: Sun Nov 23, 2003 9:05 pm
Location: San Jose CA

Post by Steveo31 »

I know the basics, Tim, but I just can't get the gist of multiple huge matching syntax like the email thing.

Oh well. Practice practice practice....
User avatar
tim
DevNet Resident
Posts: 1165
Joined: Thu Feb 12, 2004 7:19 pm
Location: ohio

Post by tim »

i wasnt trying to call you stupid or reflect that theory in any way

sorry, was just trying to help
dave420
Forum Contributor
Posts: 106
Joined: Tue Feb 17, 2004 8:03 am

Post by dave420 »

Regular expressions are one of the most important things you can learn as a PHP developer. Stick with it, steveo, and you'll get the hang of it soon enough. Once you do, you'll realise their great benefits. I had a confusing time when I learned them, but they're second nature now.
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

If you want to spend some money on a book on RegEx, I'd recommend Mastering Regular Expressions. Probably the best explained and most comprehensive book you can find on regular expressions - and as such, doesn't out-date. It's helped me a tremendous lot in understanding the bloody things and is always next to me when I work.
leenoble_uk
Forum Contributor
Posts: 108
Joined: Fri May 03, 2002 10:33 am
Location: Cheshire
Contact:

Post by leenoble_uk »

I started off with ereg and I could never get to grips with some of the longer expressions. It's probably just me but I found ereg to be quite flaky as I could never get it to work with things like \W \b \s etc.
Two years I spent trying to get my regular expressions to work as I expected them to. Then about 4 months ago I tried out preg and it's been an absolute revelation.
Honestly it has aided my learning no end as expressions now behave consistently and handle all the backslash character properly. I've since come on in leaps and bounds in my understanding of them.
I don't mean to get involved in any ereg/preg flame war and I may be the only person in the world who could never get ereg to do what I wanted it to do but moving to preg has been nothing but good.
I think what initially put me off about preg, and may be why most people opt for ereg initially, is that it says Perl Compatible... . It reads as if they have been put there for people who are already familiar with Perl and these people can get stuck right in using what they are used to implying that they are more difficult than the ereg type.

Believe me, if you've no previous knowledge of regex in any form it won't make a difference which one you choose. Regexs are somewhat of a steep learning curve and it took 2 years to get me working comfortably with them. I'm just speaking from personal experience with respect to ereg so now I'm biased in favour of preg.

Good luck.
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

leenoble_uk wrote:It's probably just me but I found ereg to be quite flaky as I could never get it to work with things like \W \b \s etc.
I could be wrong as I don't use ereg but if I remember correctly \w \b \s etc are not recognised in ereg functions which probably explains why your regex didn't work as expected.
Dave420 wrote:Regular expressions are one of the most important things you can learn as a PHP developer.
Depends on what sort of programming you do, many people will only ever require basic email format validation and perhaps some regex to convert plain text URLs to an HTML link. In that sort of case it is not time efficient to learn regex when they could probably find some useful pre-rolled regex already available.
Steveo31 wrote:I know the basics, Tim, but I just can't get the gist of multiple huge matching syntax like the email thing.
Try splitting it up into as many pieces as possible e.g....

Code: Select all

^їa-zA-Z0-9_\.\-]+
@
їa-zA-Z0-9\-]+
\.
їa-zA-Z0-9\-\.]+$
If you look at the above line by line you should see that it breaks down to three (what I would consider) very basic regex patterns. If you know the basics then you should be able to work this one out.
Post Reply