Page 1 of 2

Help parsing a large group of text for certain values?

Posted: Tue Nov 30, 2010 4:20 pm
by Insyderznf
Hello,

This should be an easy solution but I can not figure out how to do it. I have a chunk of text that is outputted by a help call program. I want to be able to find the values I am looking for and parse that text into a mysql database. I am having a horrible time figuring out how to parse the text.

An example the text I am talking about is below:

Help Number: 125086 ENTERED ON 11/29/10 AT 14:33 BY JOHN SMITH CURRENT ESTIMATED COMPLETION IS 12/06/10 AT 15:00 TYPE OF PROBLEM: T.P. BIG ISSUE PRIORITY: 5 WORKING DAYS ITEM SUBMITTED BEFORE? NO ------------ PROBLEM DESCRIPTION ------------ SOME LONG DESCRIPTION

So from the text above I need to parse out the items in bold. I can use strpos() to find the to position but i'm not sure how to actually parse the value out. strtok() didn't do what I thought it would. Can someone help me out? Thank you.

-Nick

Re: Help parsing a large group of text for certain values?

Posted: Tue Nov 30, 2010 9:24 pm
by Jonah Bron

Code: Select all

preg_match('/Help Number: ([0-9]+) ENTERED ON ([0-9\/]+) AT .*? BY (.*?) CURRENT ESTIMATED COMPLETION IS ([0-9\/]+) .*? TYPE OF PROBLEM: (.*?) PRIORITY: (.*?) ITEM SUBMITTED BEFORE\? (.*?) ------------ (.*?) ------------ (.*?)/', $text, $matches);

print_r($matches);
As you can see when you run this, all the data is in $matches.

Re: Help parsing a large group of text for certain values?

Posted: Wed Dec 01, 2010 4:02 pm
by Insyderznf
Thanks Jonah,

I will definitely try that out, much easier than the function i eventually wrote to get it to work.

Code: Select all

function parseValues($ParseString, $Start, $Length)
    {
    $AddLength = strlen($Start);
    $substringStart = strpos($ParseString,$Start) + $AddLength;
    $substringLength = strpos($ParseString,$Length) - $substringStart;
    return substr($ParseString, $substringStart,$substringLength);
    }
Which is still not working ;).

I also eventually tried a sscanf function as well that kind of worked but I couldn't get it to receive text with spaces.

Code: Select all

sscanf($Qsi_file_info, "QSI Help Number: %s

ENTERED ON %s AT %s BY %s %s
CURRENT ESTIMATED COMPLETION IS %s AT %s

TYPE OF PROBLEM: T.P. FILTER REMOVE PRIORITY: 5 WORKING DAYS
ITEM SUBMITTED BEFORE? NO
 $qsiNumber,$enteredOn,$enteredTime,$enteredByFirst,$enteredByLast,
        $estimatedCompletionDate,$estimatedCompletionTime, $problemDescription);

Re: Help parsing a large group of text for certain values?

Posted: Wed Dec 01, 2010 5:07 pm
by Jonah Bron
Isn't that funny, I didn't actually know about sscanf() :roll: I would like to know how to get it working with spaces...

Re: Help parsing a large group of text for certain values?

Posted: Wed Dec 01, 2010 6:57 pm
by Insyderznf
Jonah,

Your code unfortunately did not work and i am thoroughly confused as to why. Let me give you another example any maybe you can check this out. With the following I have something that looks like this:

------------ PROBLEM DESCRIPTION ------------
LOOKING AT A PATIENT TOOTH CHART AND APPLYING A FILTER ONCE A USER SWI
TCHES TO THE TREATMENT PLAN THE FILTER IS REMOVED. HOWEVER IF THEY REA
PPLY THE FILTER AND THEN SWITCH BACK TO THE TOOTH CHART THE FILTER REM
AINS. THIS SEEMS INCONSISTANT AND IT SEEMS IT SHOULD STAY THE SAME BOT
H WAYS.

--------------- XTS RESPONSE ----------------
11/29/10 14:50 From: JHS - JOHN H. SMITH
We'll check on this filter issue and update later this week.
New Completion Estimate: 12/06/10 15:00 pacific time

My thoughts were that the way the spacing is working is causing preg_match not to work so I used trim to create a new instance of the string:

$trimedInfo = trim($file_info)

Then I run the preg_match function as follows:

preg_match("/------------ (.*?) ------------/", $trimedInfo, $matches)

And I do receive the correct match of Problem Description. However as soon as I add anything to the beginning or the end of the ----------- I immediately receive no matches to the array. No matter what I add or what spacing I use, as soon as I add something to the match string of that I receive no matches. Any ideas as to why? And by something I mean either (.*?) as suggested or an actual value of what comes next. Does it have anything to do with a carriage return being previously there? I thought Trim stripped that out? Or is there a limit to the length of preg_match?

Re: Help parsing a large group of text for certain values?

Posted: Wed Dec 01, 2010 7:14 pm
by Jonah Bron
Um, I'm confused. If this is a different block of text to parse, it will need a different regular expression. Is this something different? If so, could you more clearly define what exactly you're trying to parse?

Re: Help parsing a large group of text for certain values?

Posted: Thu Dec 02, 2010 10:16 am
by Insyderznf
Jonah,

Sorry my fault, I was trying to give as minimal of an example as possible in the first post. I was hoping that once I was on the right track it would be easy to parse. The top half of the message is in the first post and I can get that to parse correctly, the bottom half is in this post, this is the portion i'm having issues with. Do you think the easiest way to parse the bottom half will be to use a function using strpos() and strlen()?

Here is an example of the whole message,unedited, with what needs to be parsed in bold:

XTI Help Number: 125063

ENTERED ON 11/29/10 AT 10:53 BY JOHN SMITH
CURRENT ESTIMATED COMPLETION IS 12/02/10 AT 16:00

TYPE OF PROBLEM: ADDRESS FIELD PRIORITY: 2 WORKING DAYS
ITEM SUBMITTED BEFORE? NO

------------ PROBLEM DESCRIPTION ------------
IN OUR RSP SYSTEM, THE ADDRESS FIELD IS LIMITED TO 30 CHARACTERS. IN T
HE MULTI SYSTEM, IT SEEMS TO BE LIMITED TO A NUMBER SMALLER THAN 3
0 CHARACTERS (24 I BELIEVE). WHEN THERE IS A LARGE ADDRESS, IT WILL CU
T OFF SOME OF THE ADDRESS IN PROGRAM ,1 FIELD 4. HOW WILL THIS AFFECT
OUR REPORTING, AND IS THERE A WAY WE CAN FIX TH
IS?


--------------- XTI RESPONSE ----------------
12/01/10 10:54**From: SSM - STEVE SMITH
You are correct, your XTI system stores 24
characters of address in ,1. Entering only 24 characters in Multisystem is easiest.
New Completion Estimate: 12/02/10 16:00 pacific time

Re: Help parsing a large group of text for certain values?

Posted: Thu Dec 02, 2010 10:48 am
by klevis miho
Is this text in html?

Re: Help parsing a large group of text for certain values?

Posted: Thu Dec 02, 2010 10:51 am
by klevis miho
If this is html, then you can do a regular expression like this:

preg_match_all('#<b>(.+?)</b>#s', $text, $matches);

var_dump($matches);

This code gets everything that is inside the "b"(bold) tags.
Maybe you are using "strong" tags. If yes, replace the <b> with <strong> and the </b> with </strong>.

Re: Help parsing a large group of text for certain values?

Posted: Thu Dec 02, 2010 11:15 am
by Insyderznf
I'm getting pretty close to finishing this up, this is what I have used:

Code: Select all

function parseValues($ParseString, $Start, $End)
    {
    $trimParseString = trim($ParseString);
    
    $substringStart = strpos($trimParseString,$Start) + strlen($Start);
    $substringLength = strpos($trimParseString,$End) - $substringStart;
    return substr($trimParseString, $substringStart,$substringLength);
    }
I'm just working on getting the values out of the top half that can possibly contain spaces. Trying preg_match() for these.

Woot got it using this:

preg_match("/TYPE OF PROBLEM: (.*?) PRIORITY: (.*?)ITEM SUBMITTED BEFORE?/s",$file_info,$matches)

Now i just need to put it all together and cross my fingers...

Re: Help parsing a large group of text for certain values?

Posted: Thu Dec 02, 2010 11:47 am
by Jonah Bron
This regex matches the whole thing. I tested it.

Code: Select all

preg_match("/XTI Help Number: ([0-9]+)\n\nENTERED ON ([0-9\/]+) AT .*? BY (.*?)\nCURRENT ESTIMATED COMPLETION IS ([0-9\/]+) AT .*?\n\nTYPE OF PROBLEM: (.*?) PRIORITY: (.*?)\nITEM SUBMITTED BEFORE\? (.*?)\n\n-+ PROBLEM DESCRIPTION -+\n(.*?)\n\n-+ XTI RESPONSE -+\n([0-9\/]+) .*?\*\*From: SSM - (.*?)\n(.*?)$/s", $string, $matches);
For a graphical explanation of how this regular expression works, paste it into this website: http://strfriend.com

Re: Help parsing a large group of text for certain values?

Posted: Thu Dec 02, 2010 1:39 pm
by AbraCadaver
The s modifier should be the magic trick to Jonah's "original" pattern (not the new one with \n). It allows the . to match newlines. Without it a newline will end any of the .* matches.

Re: Help parsing a large group of text for certain values?

Posted: Thu Dec 02, 2010 1:46 pm
by Jonah Bron
AbraCadaver wrote:The s modifier should be the magic trick to Jonah's "original" pattern (not the new one with \n). It allows the . to match newlines. Without it a newline will end any of the .* matches.
Yes, and as you can see I put the s on the most recent regex.

Re: Help parsing a large group of text for certain values?

Posted: Thu Dec 02, 2010 2:05 pm
by AbraCadaver
Jonah Bron wrote:
AbraCadaver wrote:The s modifier should be the magic trick to Jonah's "original" pattern (not the new one with \n). It allows the . to match newlines. Without it a newline will end any of the .* matches.
Yes, and as you can see I put the s on the most recent regex.
Yes, but you shouldn't need any of the \n now.

Re: Help parsing a large group of text for certain values?

Posted: Thu Dec 02, 2010 2:20 pm
by Jonah Bron
Don't I at least need some of them? For example:

Code: Select all

PRIORITY: (.*?)\nITEM SUBMITTED BEFORE
If I removed that one, there would be a trailing \n in the captured value.

Code: Select all

([0-9]+)\n\nENTERED ON
And these ones. Wouldn't it break the match to remove it?