Page 1 of 1
Regex...
Posted: Thu May 13, 2004 2:40 pm
by Steveo31
Few questions on regex. One, which is technically "better", ereg (POSIX) or the preg_ family, Perl right?
Second, I can't find anything that tells me where the symbols should go. All the .* etc. So for instance, this
Code: Select all
ereg("^їa-zA-Z0-9_\.\-]+@їa-zA-Z0-9\-]+\.їa-zA-Z0-9\-\.]+$", $address
Validates email address. My question is how do you determine where to put the +, ., etc?
I are confused.
Re: Regex...
Posted: Thu May 13, 2004 3:00 pm
by Weirdan
Steveo31 wrote:Few questions on regex. One, which is technically "better", ereg (POSIX) or the preg_ family, Perl right?
preg_* family has more features and claimed to be faster in php manual. But choice is up to you. Use the variant which you understand better.
Steveo31 wrote:
Second, I can't find anything that tells me where the symbols should go. All the .* etc. So for instance, this
Code: Select all
ereg("^їa-zA-Z0-9_\.\-]+@їa-zA-Z0-9\-]+\.їa-zA-Z0-9\-\.]+$", $address
Validates email address. My question is how do you determine where to put the +, ., etc?
I are confused.
Read the manual. PHP manual has really poor chapter on POSIX regexes, preg_* chapter is much better. But anyway, Google is your friend =)
Posted: Thu May 13, 2004 3:13 pm
by Steveo31
Ya I've googled this thing a lot. Just keep on keepin on I guess...
Thanks.
Posted: Thu May 13, 2004 4:24 pm
by tim
the plus symbol in PCRE syntax indicates that a pattern can have one or more instances.
the pattern being the [a-zA-Z0-9_\.\-] part.
=]
Posted: Thu May 13, 2004 4:25 pm
by tim
and that pattern just means
[a-zA-Z0-9_\.\-]
any symbol, a-z (case insensistive) 0-9, a peroid, or _ or -
the \ escapes the characters, so \- just means the literal - symbol.
Posted: Thu May 13, 2004 5:36 pm
by redmonkey
I've never benchmarked them myself but it does seem to be generally accepted that preg is faster than ereg. On top of that preg is far more flexible/powerful than ereg. Most people seem to lean towards learning ereg first, I don't think this is because the basics are any easier than preg but as ereg is less powerful the tutorials/examples found are less complex and therefore can be easier to understand.
The 'symbols' e.g. + are quantifers for the preceeding character/character class or sub-pattern the list is as follows....
Code: Select all
* zero or more ("greedy"), similar to {0,}
+ one or more ("greedy"), similar to {1,}
? zero or one ("greedy"), similar to {0,1}
{n} exactly n times ("greedy")
{n,} at least n times ("greedy")
{n,m} at least n but not more than m times ("greedy")
*? zero or more ("non-greedy"), similar to {0,}?
+? one or more ("non-greedy"), similar to {1,}?
?? zero or one ("non-greedy"), similar to {0,1}?
{n}? exactly n times ("non-greedy")
{n,}? at least n times ("non-greedy")
{n,m}? at least n but not more than m times ("non-greedy")
If a curly bracket occurs in any other context, it is treated as a regular character.
Almost all special characters within regex loose their special meaning when within a defined character class (within square brackets) so there is no need to escape them e.g...
is the same as...
Although it is worth mentioning that if you wish to specify a literal - character within a character class you should place this either at the begining or end, if you need/want to include it anywhere else in the character class then you will need to escape it.
Posted: Thu May 13, 2004 6:11 pm
by Steveo31
I know the basics, Tim, but I just can't get the gist of multiple huge matching syntax like the email thing.
Oh well. Practice practice practice....
Posted: Thu May 13, 2004 7:46 pm
by tim
i wasnt trying to call you stupid or reflect that theory in any way
sorry, was just trying to help
Posted: Fri May 14, 2004 3:56 am
by dave420
Regular expressions are one of the most important things you can learn as a PHP developer. Stick with it, steveo, and you'll get the hang of it soon enough. Once you do, you'll realise their great benefits. I had a confusing time when I learned them, but they're second nature now.
Posted: Fri May 14, 2004 4:32 am
by patrikG
If you want to spend some money on a book on RegEx, I'd recommend
Mastering Regular Expressions. Probably the best explained and most comprehensive book you can find on regular expressions - and as such, doesn't out-date. It's helped me a tremendous lot in understanding the bloody things and is always next to me when I work.
Posted: Fri May 14, 2004 5:21 am
by leenoble_uk
I started off with ereg and I could never get to grips with some of the longer expressions. It's probably just me but I found ereg to be quite flaky as I could never get it to work with things like \W \b \s etc.
Two years I spent trying to get my regular expressions to work as I expected them to. Then about 4 months ago I tried out preg and it's been an absolute revelation.
Honestly it has aided my learning no end as expressions now behave consistently and handle all the backslash character properly. I've since come on in leaps and bounds in my understanding of them.
I don't mean to get involved in any ereg/preg flame war and I may be the only person in the world who could never get ereg to do what I wanted it to do but moving to preg has been nothing but good.
I think what initially put me off about preg, and may be why most people opt for ereg initially, is that it says Perl Compatible... . It reads as if they have been put there for people who are already familiar with Perl and these people can get stuck right in using what they are used to implying that they are more difficult than the ereg type.
Believe me, if you've no previous knowledge of regex in any form it won't make a difference which one you choose. Regexs are somewhat of a steep learning curve and it took 2 years to get me working comfortably with them. I'm just speaking from personal experience with respect to ereg so now I'm biased in favour of preg.
Good luck.
Posted: Fri May 14, 2004 9:50 am
by redmonkey
leenoble_uk wrote:It's probably just me but I found ereg to be quite flaky as I could never get it to work with things like \W \b \s etc.
I could be wrong as I don't use ereg but if I remember correctly \w \b \s etc are not recognised in ereg functions which probably explains why your regex didn't work as expected.
Dave420 wrote:Regular expressions are one of the most important things you can learn as a PHP developer.
Depends on what sort of programming you do, many people will only ever require basic email format validation and perhaps some regex to convert plain text URLs to an HTML link. In that sort of case it is not time efficient to learn regex when they could probably find some useful pre-rolled regex already available.
Steveo31 wrote:I know the basics, Tim, but I just can't get the gist of multiple huge matching syntax like the email thing.
Try splitting it up into as many pieces as possible e.g....
Code: Select all
^їa-zA-Z0-9_\.\-]+
@
їa-zA-Z0-9\-]+
\.
їa-zA-Z0-9\-\.]+$
If you look at the above line by line you should see that it breaks down to three (what I would consider) very basic regex patterns. If you know the basics then you should be able to work this one out.