PHP Developers Network

A community of PHP developers offering assistance, advice, discussion, and friendship.
 
Loading
It is currently Thu Oct 19, 2017 11:05 am

All times are UTC - 5 hours




Post new topic Reply to topic  [ 38 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Sat May 07, 2005 8:42 pm 
Offline
Breakbeat Nuttzer
User avatar

Joined: Wed Mar 24, 2004 8:57 am
Posts: 13098
Location: Melbourne, Australia
Firstly can I just say that when researching regex (or Regular Expressions) you will notice a lot of reference to Perl. Perl was "one of" the first languages to heavily use regex, after Grep (unix tool) and so you'll find the most complete documentation for it here. Secondly, regular-expressions.info is a great resource for beginners.

Hold on tight this is going to be a fast paced, but to-the-point tutorial ;-)

Ready? Ok let's do it....

Syntax: [ Download ] [ Hide ]
 
/[\w\s]+\d{1,3}\t\W/
 

This is what makes developers cry. Look at that mess! What's all this \d, \s \w etc etc???

Lets start with metacharacters. Those \d, \s etc are what we refer to as "metacharacters". Metacharacters are characters which represent a particular group of real characters (some exceptions - see below).

The metacharacters and what they stand for:
Syntax: [ Download ] [ Hide ]
 
Character         Matching
 
. (dot)           ANY single character at all
\w                Any single alphanumeric character (a-z, 0-9) and underscores
\d                Any single digit (0-9)
\s                Any single whitespace character
 
<<Uppercase negates the metacharcter>>
 
\W                Any single non-alphanumeric character
\S                Any single non-whitespace character
\D                Any single non-digit
 
<<Something else to note>>
[x-y]             Any single character in the range x to z (e.g. [A-Z])
[abc123]          Any single character from a, b, c, 1, 2 or 3
[a-z125-9]        Any single character from a to z, 1, 2 or 5 to 9
 
<<Negate these with caret "^" at the VERY start of the bracket>>
[^abc0-9]         Any single character EXCEPT a, b, c and 0 to 9
 

Regex are case sensitive unless you specify otherwise. See further down for more info.

You'll see some other metacharacters which don't actually match anything thats really there. They match invisible boundaries so we call them "zero-width assertions".

Syntax: [ Download ] [ Hide ]
 
Assertion        Matching
 
^ (caret)        The start of the string
$                The end of the string
\b               A word boundary (the point between a non-alphanumeric character and an alphanumeric character)
 

There are others but you don't ever use them really.... read the Perl documentation if you want to know more.

Next, we can specify how many times a character should occur. We could do this to match a string of four digits:
Syntax: [ Download ] [ Hide ]
/\d\d\d\d/

Or we could write this:
Syntax: [ Download ] [ Hide ]
/\d{4}/

Lets cover the "quantifiers". The quantifier follows the character it applies to.

Syntax: [ Download ] [ Hide ]
 
Quantifier         Meaning
 
+                  One or more times
*                  None or more times
?                  None or one time
{n,m}              Between n and m times
{y,}               y or more times
{x}                x times only
 

One last thing before we build our first regex. Regex needs to be delimited if using Perl style regular expressions (preg_match()) which I strongly advise you do (Note: ereg_...() is not perl style).

To delimit a regex we start and end with the EXACT same character. The two standards are (but you can use most non-alphanumeric characters):
Syntax: [ Download ] [ Hide ]
 
/pattern/
#pattern#
 

Lets look at a regular expression before we move on further. We'll use preg_match() to execute the regex here (I'll explain after).

Syntax: [ Download ] [ Hide ]
 
$string = "Hello, I'm d11wtq and I'm 22 years old!";
if (preg_match("/\w+\W I'm \w\d{2}wtq and I'm \d+ years old\W/", $string)) {
    echo "d11wtq is 22";
} else {
    echo "d11wtq didn't tell me his age";
}
 


I'll explain what it does.
"\w+" matches an alphanumeric or underscore character one or more times
Hello
"\W" matches any single non-alphanumeric character
Hello,
" I'm " is just plain old string
Hello, I'm
"\w" is any single alphanumeric character
Hello, I'm d
"\d{2}" is two digits
Hello, I'm d11
"wtq and I'm " is just plain old string again
Hello, I'm d11wtq and I'm
"\d+" is one or more digits
Hello, I'm d11wtq and I'm 22
" years old" is plain old string
Hello, I'm d11wtq and I'm 22 years old
"\W" is any single non-alphanumeric charactcer
Hello, I'm d11wtq and I'm 22 years old!

If you understand that then let's move onto some "modifiers". If not, then read it again, and if you still don't get it, read it again.....

Note: When starting out in regex don't try and jump in with both feet. Match a tiny part of the string, then test it. Then add some more to your regex to match more of the string and test again. Repeat until the regex works.

Regex modifiers:
Syntax: [ Download ] [ Hide ]
/^pattern$/mis

"mis" here are all modifiers. They tell the regex how to behave.

Syntax: [ Download ] [ Hide ]
 
Modifier         Effect
 
i                Case insensitive
s                Ignore whitespace
g                Global search (not valid in PHP [use preg_match_all()] but handy if you're using JS or Perl). Tells the regex to keep looking after it's matched once
m                Multi-line mode (^ and $ now match start and end of LINE not start and end of STRING)
 

Again, there are others but you don't really use them.

Modifiers go on the right hand side of the closing delimiter.

Quick example:
Syntax: [ Download ] [ Hide ]
 
$string = "Hello World!";
if (preg_match('/^[a-z]/i', $string)) {
    echo "Starts with a letter";
} else {
    echo "Doesn't start with a letter";
}
 


"^" means match the very start of the string (not a character itself)
"[a-z]" means match a lowercase a to z
Nothing matched - BUT
The "i" modifier makes the regex case insensitive - SO
H is all that is matched but this means it returns true anyway.

There are some things you should remember when working with regular expressions.
1. Escape characters with a backslash
2. Remeber to use quantifiers to match multiple times
3. Remember to match a dot "." you need to escape it "\." because dot "." is a metacharacter itself
4. Regex are case sensitive by default
5. "*" and "+" are what we call "greedy" (Read the follow up to this tutorial to learn more)


Next... Parentheses have more than one use in regex. They:
a) Group characters together
b) Extract the characters they surround into memory (to match a parenthesis itself you must escape it "\(" )

Something useful:
Syntax: [ Download ] [ Hide ]
 
//Check string represents a URL
$string = "http://www.foo.bar/";
if (preg_match("#^\w+://(www\.)?\w+\.\w+#i", $string)) {
    echo "String is a URL";
} else {
    echo "String isn't a URL";
}
 

This matches the "http://www.foo.bar" part of the URL above so it returns true. I'll let you break it down yourself and see how it works (remember the parentheses "(....)" group the characters together ).

A vertical bar character "|" is used to mean OR.

Syntax: [ Download ] [ Hide ]
 
$string = "abcdefg123456";
//abcdefgh23456   OR   abcdefg123456
if (preg_match("/abcdefg(h|1)23456/", $string)) {
    //True
} else {
    //False
}
 

Ok we've nearly covered all the "basics" now. One last thing to cover in the scope of the crash course is extracting parts of the string into memory (then I'll finish up by briefly overviewing the PHP functions).

Sometimes you'll need to match part(s) of a string and extract them to use elsewhere. You do this using parentheses. Indexing starts at 1 and goes up by one for each parens used. The order follows this pattern with regards to nesting parens together:

Syntax: [ Download ] [ Hide ]
 
( 1 ( 2 ) ( 3 ( 4 ) ) ( 5 ( 6 ( 7 ) ( 8 ) ) ) ) ( 9 )
 

Essentially, you go deeper into the nest before moving further to the right.

The best way to refer to an extracted part of a string is by the dollar sign "$" followed by the index of the part you extracted. (e.g. "$4" ).
However, that said, PHP handles things slightly differently with the preg_match() function. Indexing starts at zero (the entire string) and then from 1 as expected for the extracted parts. preg_match() also requires a third parameter to do this so that it can dump "$1", "$2", "$3" etc into an array.

Syntax: [ Download ] [ Hide ]
 
$string = "There's a number in here 123456 somewhere but I don't know what it is!";
preg_match("/[a-z\s]+(\d+)[a-z\s]+/i", $string, $matches); //s a number 123456 in here somewhere but I don
echo "The number in the string is " . $matches[1]; //The number in the string is 123456
 


PHP functions overview:

preg_match() - I guess I have that one covered. Tests if the pattern is matched in the string. Returns TRUE if matched, FALSE if not. If the optional third parameter is given the function extracts parentheses enclosed parts of the pattern into a given array.

preg_match_all() - Same as preg_match() except that the regex doesn't stop when a match is found... it continues to find as many matches as exist in the string. The extracted array is a multi-dimensional array where all occurences of $1 are placed in $array[1] and all occurrences of $2 in $array[2] etc...

preg_replace() - Like str_replace() except it takes regex patterns as arguments:
Syntax: [ Download ] [ Hide ]
 
$string = "This is foo and that is bar";
$new_string = preg_replace('/f(\w+)/', "g$1", $string); //This is goo and that is bar
 


preg_split() - Like explode() except it takes a regex pattern as the point at which to split the string:
Syntax: [ Download ] [ Hide ]
 
$string = "lots of *@><&amp; symbols &amp;^% in this £! string";
$parts = preg_split('/[^\s\w]+/', $string);
print_r($parts);
/*
 
  Array (
      [0] => Lots of
      [1] =>  symbols
      [2] =>  in this
      [3] =>  string
  )
 
*/

 


ereg() - Like preg_match() without the advantages if Perl style patterns and slightly slower (use preg_match() instead).

ereg_replace() - Like preg_replace() without the advantages if Perl style patterns and slightly slower (use preg_replace() instead).

I guess that covers all the basics of using regex but believe me there's a lot more than this to learn if you have got this under your belt first.

[I'll follow this crash course up with an advanced regex tutorial given some time to write it]

Good luck and happy regex'ing! :D


Last edited by Chris Corbyn on Mon Nov 28, 2005 8:02 am, edited 7 times in total.

Top
 Profile  
 
 Post subject:
PostPosted: Sat May 07, 2005 9:18 pm 
Offline
Spockulator
User avatar

Joined: Wed Feb 04, 2004 9:15 pm
Posts: 4713
Location: Eden, Utah
awesome! thanks d11, this will help out a lot of peeps...myself included.

Burr


Top
 Profile  
 
 Post subject:
PostPosted: Sat May 07, 2005 10:43 pm 
Offline
Site Admin
User avatar

Joined: Tue Dec 23, 2003 3:10 am
Posts: 11470
Location: Toronto
dito. thanks


Top
 Profile  
 
 Post subject:
PostPosted: Sun May 08, 2005 12:30 pm 
Offline
Forum Regular
User avatar

Joined: Sat Mar 12, 2005 8:13 pm
Posts: 703
Location: US
excellent work. The one thing I'd add is the bit about using /(?=meh)/. It's not used too much, but still nice to know.


Top
 Profile  
 
 Post subject:
PostPosted: Sun May 08, 2005 3:14 pm 
Offline
Jedi Mod
User avatar

Joined: Tue Dec 21, 2004 6:03 pm
Posts: 5263
Location: usrlab.com
Awesome stuff.


Top
 Profile  
 
 Post subject:
PostPosted: Mon May 09, 2005 1:54 am 
Offline
Breakbeat Nuttzer
User avatar

Joined: Wed Mar 24, 2004 8:57 am
Posts: 13098
Location: Melbourne, Australia
Skara wrote:
excellent work. The one thing I'd add is the bit about using /(?=meh)/. It's not used too much, but still nice to know.


Advanced tutorial will follow (time permitting). This was intended merley to allow people to get a hold of the basics ;-)


Top
 Profile  
 
 Post subject: Great tutorial
PostPosted: Fri Jul 15, 2005 5:52 pm 
Offline
Forum Newbie

Joined: Fri Jul 15, 2005 7:23 am
Posts: 9
Location: Sydney Australia
That was the best regular expression tutorial I have seen. I wish I had seen this along time ago..

Have you posted that advanced tutorial yet, if so where can I find it?

Keep up the good work :)


Top
 Profile  
 
 Post subject: Re: Great tutorial
PostPosted: Sat Jul 16, 2005 8:55 am 
Offline
Breakbeat Nuttzer
User avatar

Joined: Wed Mar 24, 2004 8:57 am
Posts: 13098
Location: Melbourne, Australia
thedamo wrote:
That was the best regular expression tutorial I have seen. I wish I had seen this along time ago..

Have you posted that advanced tutorial yet, if so where can I find it?

Keep up the good work :)


Haven't really had time to knock one up (and yeah ok.... I forgot :P).

Hey it's saturday, I guess that's something I could do. I have a half written regex quiz too (40 questions so far) ;)


Top
 Profile  
 
 Post subject:
PostPosted: Sat Jul 16, 2005 10:36 am 
Offline
DevNet Resident
User avatar

Joined: Tue Jan 20, 2004 5:58 pm
Posts: 1537
Location: Minnesota
Nice work. Regex is a weekness of mine -- tutorial is appreciated.

Thanks


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 09, 2005 7:21 am 
Offline
DevNet Resident
User avatar

Joined: Sat Dec 06, 2003 10:52 pm
Posts: 1679
Location: Mumbai, India
Excellent Tutorial.
It'll be cool if someone posted a tutorial for RegEx in Apache's mod_rewrite or just the differences from the normal RegEx (Like mod_rewrite doesnt support ?).


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 09, 2005 8:58 am 
Offline
Neighborhood Spidermoddy
User avatar

Joined: Mon Mar 29, 2004 4:24 pm
Posts: 31559
Location: Bothell, Washington, USA
mod_rewrite is posix regex, last I checked.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 01, 2005 6:40 pm 
Offline
Breakbeat Nuttzer
User avatar

Joined: Wed Mar 24, 2004 8:57 am
Posts: 13098
Location: Melbourne, Australia
For those who've asked for the followup. I've knocked one up like I said. I forced myself into it tonight since I would never have gotten around to it otherwise :D

http://forums.devnetwork.net/viewtopic.php?t=40169


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 02, 2005 7:29 am 
Offline
Forum Regular

Joined: Wed Sep 28, 2005 10:08 am
Posts: 613
Bah! Real men use the PHP Manual as reference and nothing else. :wink:

Nice tutorial, pretty noob friendly compared to abovementioned manual entry... solid work.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Nov 05, 2005 3:34 am 
Offline
DevNet Resident
User avatar

Joined: Fri Dec 24, 2004 3:59 am
Posts: 1452
Location: Lucknow, UP, India
foobar wrote:
Bah! Real men use the PHP Manual as reference and nothing else. :wink:

I second that :P


Top
 Profile  
 
 Post subject: The best
PostPosted: Thu Nov 17, 2005 3:15 am 
Offline
Forum Commoner

Joined: Mon Sep 05, 2005 10:05 pm
Posts: 71
Hello firends

In fact,That was the best regular expression tutorial I have seen.

Thank you very much for that

GOOD LUCK!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 38 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group