Page 2 of 2
Posted: Thu Jan 13, 2005 8:31 am
by feyd
the colon is, in this case, purely a character that must appear in the line. It has no special meaning.
As for
(.*?)\s*$...
parens are used to ask the regular expression to remember the contents of it, for later use, whether inside or out of the engine. In this case, out; as we've used it to create the values set denoted by $matches[2][0..n]
. - the period is a metacharacter that will match any character.
* - the asterix is a metacharacter that tells the engine to look for zero or more of the preceeding data. In this case, that's any character.
? - the question mark is a metacharacter that typically tells the engine to find zero or one of the preceeding data. In the case where it follows a greedy pattern metacharacter like *, it attempts to find the shortest, or least greedy match that still works with the rest of the pattern.
\s* - as I said before, look for zero or more (greedy) whitespaces.
$ - this metacharacter, when placed at the end of the pattern denotes the end of line.
so basically, this pattern looks for any length of characters up to whitespaces that are followed by the end of line. So if you have any number of words/letters/characters, the whitespaces and end of line will be excluded from memory.
Posted: Thu Jan 13, 2005 8:41 am
by louie55
Ok, just one more question. How did it know that the labels and values were separated by a colon? Thanks. Sorry for all of the questions, but I really want to understand how this function works.
Louie
Posted: Thu Jan 13, 2005 8:43 am
by feyd
where the colon appears in the pattern, is where I told it to expect a colon in the text, but not to remember it, as I don't want that information.
Posted: Thu Jan 13, 2005 9:36 am
by louie55
How did it know to separate the labels and values into different Arrays.
Louie
Posted: Thu Jan 13, 2005 9:42 am
by feyd
that was done through the use of parens, which told the regex engine to remember each enclosed part as a seperate component.
Posted: Thu Jan 13, 2005 9:56 am
by louie55
Thank you very much for all of your time. You have helped me immensly.
Louie
Posted: Thu Jan 13, 2005 4:59 pm
by louie55
Okay, another question. How do I tell it to only save the second line into a variable? It is the line with the date and time and looks like the following:
Jan 13, 2005 - 04:53 PM EST / 2005.01.13 2153 UTC
Since the the line constantly changes you can't really search for any strings, except maybe "EST" or "UTC". Can you use the preg_match_all function for this? Remember, this is always the 2nd line in the text block, so is there any way to tell it "get the 2nd line"?? Thanks.
Louie
Posted: Thu Jan 13, 2005 5:14 pm
by feyd
you can use preg_split() to break apart the file's string value. Something like
Code: Select all
$array = preg_split('#ї\r\n]+#s', $textstring);
then access the resultant array's second element: $array[1]
there is also a pattern to the line (untested):
Code: Select all
preg_match('#[A-Z][a-z]{2}\s+[0-9]{1,2},\s+[0-9]{4}\s+-\s+[0-9]{2}:[0-9]{2}\s+[AP]M\s+[A-Z]{3}\s*/\s*[0-9]{4}\.[0-9]{2}\.[0-9]{2}\s+[0-9]{4}\s*[A-Z]{3}#', $textstring, $match);
the match is a bit complicated though..

Posted: Thu Jan 13, 2005 5:29 pm
by louie55
Thank you! WOW. You know I hate to bother you and make you type again, but I sure would like you to tell me how you came to that preg_match_all statement so fast! I understand what most of characters mean since you explained it the last time, but I don't know the methodology and the process you have to go through to create one of those statements.
Now, I know the statement you printed above is very long and complicated, so it would be fine and faster if you just came up with a shorter example if you like. Again, sorry for making you the teacher, but the preg_ statements look like they have so much power, I really would like to know how to create one for myself. Thanks.
P.S. If you don't want to, you don't have to.
Louie
Posted: Thu Jan 13, 2005 6:09 pm
by feyd
basically, I just start to look for the varying patterns in the rolling output. Since this line is a continually rolling output, it's easy to get overwhelmed. But if you break down what it'll actually output. Since this is a date display, there's actually a fairly simple pattern.. lots of numbers, seperated by various characters.
the long-short version is:
look for a capital letter, followed by two lower case letters, followed by one or more whitespaces, which is followed by one to two numbers. Then a comma, one or more whitespaces, then four numbers. One or more whitespaces, a dash, and some more whitespace. Two numbers, a colon, and two numbers. Some whitespace, A or P followed by M, extra whitespaces, three letters, any whitespaces, and a slash. Any whitespace, four numbers, a period, two numbers, another period, two more numbers, some whitespace, four numbers, any whitespace and finally, three letters.