dab wrote:aha, even after testing it myself, it works wonderfully. The words per array index was just something I assumed you had to do. This works great!
The main reason I'd hop you'd explain it, was so I could then fix it, if it didn't quite work the way I did need
Anywho, I'd love to see how this regex is broken down.
Good to hear it. Here are some details:
First, I used a
{ ... }x notation to construct the regex. This will
ignore all whites pace characters and new line characters in your regex, which
will let you create a regex over multiple lines. This is especially handy when
creating larger regexes, otherwise you would get one large and ugly monster!
Also, I used quite a bit of possessive quantifiers in my regex for performance
reasons (and because otherwise Geert would become angry with me!

). For
simplicity I will not go into them, but I encourage you to do some reading on
them yourself [1].
As Geert pointed out: [^\s] (which means any character except a white space
character) can be replaced by the shorter \S
So, the (slightly) simpler regex (without the possessive quantifiers and
\S instead of [^\s]) now looks like this:
Code: Select all
$regex = '{
:([^!]+)!
(\S+)\s+
(\S+)\s+
:?(\S+)\s*
(?:[:+-]+(.*))?
}x';
(test this new regex, you will see it produces the same output)
You see me use quite a bit of parenthesis. These are used to "group" characters
together and store them in the $matches array. After running the following
example:
Code: Select all
if(preg_match('/(.)(.)(.)/', 'abc', $matches)) {
print_r($matches);
}
you will see that the $matches array will hold 4 values: index 0 will hold the
entire match and index 1=a, index 2=b and index 3=c.
Now to make a group (the stuff between the parenthesis) optional, you can add
a question mark after it like this:
Code: Select all
if(preg_match('/(.)(.)(.)?/', 'ab', $matches)) { // the third group is optional
print_r($matches);
}
But in my IRC-regex I sometimes use the question mark inside a group followed
by a semi colon. This will cause the regex engine to NOT add that group to the
$matches array. To understand what I mean by that, run the following snippet:
Code: Select all
if(preg_match('/(.)(?:.)(.)/', 'abc', $matches)) {
print_r($matches);
}
as you noticed, it has caused the 2nd character to be left out of the $matches
array.
Now, to get back to the IRC-regex, here's a brief explanation. Note that I
used << and >> in my explanation to indicate the groups/matches.
Code: Select all
:([^!]+)! // a ':' followed by << one or more non-'!' chars >> followed by a '!'
(\S+)\s+ // << one or more non-white spaces >> followed by one ore more white spaces
(\S+)\s+ // the same as the above
:?(\S+)\s* // an optional ':' followed by << one or more non-white spaces >> followed by
// zero ore more white spaces
(?:[:+-]+(.*))? // see below
// The last group I'll explain over a couple of lines:
(?: // start group, but do not store in $matches!!!
[:+-]+ // one or more of the following chars: ':', '+' or '-'
// because of the preceding '?:' these will not be grouped
(.*) // << zero or more chars of any type >>
)? // end of the group, AND this group is optional!!!
dab wrote:Edit: Also, I read it's possible to make the arrays use names for the index. Indices such as ['nick'] ['host'] etc. Would you mind showing me how to do that? I haven't looked into it yet, as I wanted to get a working regex working before I worried about it

As Geert already pointed out, you can do that by grouping your matches like this:
(?P<name>group), which may look a bit confusing, so I'll give you a little demo:
Code: Select all
$regex = '{
: (?P<NICK> ([^!]+) ) !
(?P<HOST> (\S+) ) \s+
(?P<ACTION> (\S+) ) \s+
:? (?P<CHANNEL> (\S+) ) \s*
(?: [:+-]+ (?P<MESSAGE> (.*) ) )?
}x';
foreach($tests as $t) {
if(preg_match($regex, $t, $matches)) {
print "ACTION=" . $matches['ACTION'] . "\n";
}
}
Remember that the new lines and white spaces are ignored when constructing a regex like
{ ... }x
HTH
[1]
http://www.regular-expressions.info/possessive.html