Working with Text File.

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

the colon is, in this case, purely a character that must appear in the line. It has no special meaning. :)

As for (.*?)\s*$...
parens are used to ask the regular expression to remember the contents of it, for later use, whether inside or out of the engine. In this case, out; as we've used it to create the values set denoted by $matches[2][0..n]
. - the period is a metacharacter that will match any character.
* - the asterix is a metacharacter that tells the engine to look for zero or more of the preceeding data. In this case, that's any character.
? - the question mark is a metacharacter that typically tells the engine to find zero or one of the preceeding data. In the case where it follows a greedy pattern metacharacter like *, it attempts to find the shortest, or least greedy match that still works with the rest of the pattern.
\s* - as I said before, look for zero or more (greedy) whitespaces.
$ - this metacharacter, when placed at the end of the pattern denotes the end of line.

so basically, this pattern looks for any length of characters up to whitespaces that are followed by the end of line. So if you have any number of words/letters/characters, the whitespaces and end of line will be excluded from memory.
Last edited by feyd on Thu Jan 13, 2005 8:42 am, edited 1 time in total.
louie55
Forum Newbie
Posts: 15
Joined: Tue Jan 11, 2005 10:58 pm

Post by louie55 »

Ok, just one more question. How did it know that the labels and values were separated by a colon? Thanks. Sorry for all of the questions, but I really want to understand how this function works.

Louie
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

where the colon appears in the pattern, is where I told it to expect a colon in the text, but not to remember it, as I don't want that information.
louie55
Forum Newbie
Posts: 15
Joined: Tue Jan 11, 2005 10:58 pm

Post by louie55 »

How did it know to separate the labels and values into different Arrays.

Louie
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

that was done through the use of parens, which told the regex engine to remember each enclosed part as a seperate component.
louie55
Forum Newbie
Posts: 15
Joined: Tue Jan 11, 2005 10:58 pm

Post by louie55 »

Thank you very much for all of your time. You have helped me immensly.

Louie
louie55
Forum Newbie
Posts: 15
Joined: Tue Jan 11, 2005 10:58 pm

Post by louie55 »

Okay, another question. How do I tell it to only save the second line into a variable? It is the line with the date and time and looks like the following:

Jan 13, 2005 - 04:53 PM EST / 2005.01.13 2153 UTC

Since the the line constantly changes you can't really search for any strings, except maybe "EST" or "UTC". Can you use the preg_match_all function for this? Remember, this is always the 2nd line in the text block, so is there any way to tell it "get the 2nd line"?? Thanks.

Louie
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

you can use preg_split() to break apart the file's string value. Something like

Code: Select all

$array = preg_split('#ї\r\n]+#s', $textstring);
then access the resultant array's second element: $array[1]

there is also a pattern to the line (untested):

Code: Select all

preg_match('#[A-Z][a-z]{2}\s+[0-9]{1,2},\s+[0-9]{4}\s+-\s+[0-9]{2}:[0-9]{2}\s+[AP]M\s+[A-Z]{3}\s*/\s*[0-9]{4}\.[0-9]{2}\.[0-9]{2}\s+[0-9]{4}\s*[A-Z]{3}#', $textstring, $match);
the match is a bit complicated though.. ;)
louie55
Forum Newbie
Posts: 15
Joined: Tue Jan 11, 2005 10:58 pm

Post by louie55 »

Thank you! WOW. You know I hate to bother you and make you type again, but I sure would like you to tell me how you came to that preg_match_all statement so fast! I understand what most of characters mean since you explained it the last time, but I don't know the methodology and the process you have to go through to create one of those statements.

Now, I know the statement you printed above is very long and complicated, so it would be fine and faster if you just came up with a shorter example if you like. Again, sorry for making you the teacher, but the preg_ statements look like they have so much power, I really would like to know how to create one for myself. Thanks.

P.S. If you don't want to, you don't have to. :wink:

Louie
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

basically, I just start to look for the varying patterns in the rolling output. Since this line is a continually rolling output, it's easy to get overwhelmed. But if you break down what it'll actually output. Since this is a date display, there's actually a fairly simple pattern.. lots of numbers, seperated by various characters.

the long-short version is:
look for a capital letter, followed by two lower case letters, followed by one or more whitespaces, which is followed by one to two numbers. Then a comma, one or more whitespaces, then four numbers. One or more whitespaces, a dash, and some more whitespace. Two numbers, a colon, and two numbers. Some whitespace, A or P followed by M, extra whitespaces, three letters, any whitespaces, and a slash. Any whitespace, four numbers, a period, two numbers, another period, two more numbers, some whitespace, four numbers, any whitespace and finally, three letters.
Post Reply