Help parsing a large group of text for certain values?
Moderator: General Moderators
-
Insyderznf
- Forum Newbie
- Posts: 11
- Joined: Fri Sep 10, 2010 5:08 pm
Help parsing a large group of text for certain values?
Hello,
This should be an easy solution but I can not figure out how to do it. I have a chunk of text that is outputted by a help call program. I want to be able to find the values I am looking for and parse that text into a mysql database. I am having a horrible time figuring out how to parse the text.
An example the text I am talking about is below:
Help Number: 125086 ENTERED ON 11/29/10 AT 14:33 BY JOHN SMITH CURRENT ESTIMATED COMPLETION IS 12/06/10 AT 15:00 TYPE OF PROBLEM: T.P. BIG ISSUE PRIORITY: 5 WORKING DAYS ITEM SUBMITTED BEFORE? NO ------------ PROBLEM DESCRIPTION ------------ SOME LONG DESCRIPTION
So from the text above I need to parse out the items in bold. I can use strpos() to find the to position but i'm not sure how to actually parse the value out. strtok() didn't do what I thought it would. Can someone help me out? Thank you.
-Nick
This should be an easy solution but I can not figure out how to do it. I have a chunk of text that is outputted by a help call program. I want to be able to find the values I am looking for and parse that text into a mysql database. I am having a horrible time figuring out how to parse the text.
An example the text I am talking about is below:
Help Number: 125086 ENTERED ON 11/29/10 AT 14:33 BY JOHN SMITH CURRENT ESTIMATED COMPLETION IS 12/06/10 AT 15:00 TYPE OF PROBLEM: T.P. BIG ISSUE PRIORITY: 5 WORKING DAYS ITEM SUBMITTED BEFORE? NO ------------ PROBLEM DESCRIPTION ------------ SOME LONG DESCRIPTION
So from the text above I need to parse out the items in bold. I can use strpos() to find the to position but i'm not sure how to actually parse the value out. strtok() didn't do what I thought it would. Can someone help me out? Thank you.
-Nick
- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Help parsing a large group of text for certain values?
Code: Select all
preg_match('/Help Number: ([0-9]+) ENTERED ON ([0-9\/]+) AT .*? BY (.*?) CURRENT ESTIMATED COMPLETION IS ([0-9\/]+) .*? TYPE OF PROBLEM: (.*?) PRIORITY: (.*?) ITEM SUBMITTED BEFORE\? (.*?) ------------ (.*?) ------------ (.*?)/', $text, $matches);
print_r($matches);
-
Insyderznf
- Forum Newbie
- Posts: 11
- Joined: Fri Sep 10, 2010 5:08 pm
Re: Help parsing a large group of text for certain values?
Thanks Jonah,
I will definitely try that out, much easier than the function i eventually wrote to get it to work.
Which is still not working
.
I also eventually tried a sscanf function as well that kind of worked but I couldn't get it to receive text with spaces.
I will definitely try that out, much easier than the function i eventually wrote to get it to work.
Code: Select all
function parseValues($ParseString, $Start, $Length)
{
$AddLength = strlen($Start);
$substringStart = strpos($ParseString,$Start) + $AddLength;
$substringLength = strpos($ParseString,$Length) - $substringStart;
return substr($ParseString, $substringStart,$substringLength);
}I also eventually tried a sscanf function as well that kind of worked but I couldn't get it to receive text with spaces.
Code: Select all
sscanf($Qsi_file_info, "QSI Help Number: %s
ENTERED ON %s AT %s BY %s %s
CURRENT ESTIMATED COMPLETION IS %s AT %s
TYPE OF PROBLEM: T.P. FILTER REMOVE PRIORITY: 5 WORKING DAYS
ITEM SUBMITTED BEFORE? NO
$qsiNumber,$enteredOn,$enteredTime,$enteredByFirst,$enteredByLast,
$estimatedCompletionDate,$estimatedCompletionTime, $problemDescription);- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Help parsing a large group of text for certain values?
Isn't that funny, I didn't actually know about sscanf()
I would like to know how to get it working with spaces...
-
Insyderznf
- Forum Newbie
- Posts: 11
- Joined: Fri Sep 10, 2010 5:08 pm
Re: Help parsing a large group of text for certain values?
Jonah,
Your code unfortunately did not work and i am thoroughly confused as to why. Let me give you another example any maybe you can check this out. With the following I have something that looks like this:
------------ PROBLEM DESCRIPTION ------------
LOOKING AT A PATIENT TOOTH CHART AND APPLYING A FILTER ONCE A USER SWI
TCHES TO THE TREATMENT PLAN THE FILTER IS REMOVED. HOWEVER IF THEY REA
PPLY THE FILTER AND THEN SWITCH BACK TO THE TOOTH CHART THE FILTER REM
AINS. THIS SEEMS INCONSISTANT AND IT SEEMS IT SHOULD STAY THE SAME BOT
H WAYS.
--------------- XTS RESPONSE ----------------
11/29/10 14:50 From: JHS - JOHN H. SMITH
We'll check on this filter issue and update later this week.
New Completion Estimate: 12/06/10 15:00 pacific time
My thoughts were that the way the spacing is working is causing preg_match not to work so I used trim to create a new instance of the string:
$trimedInfo = trim($file_info)
Then I run the preg_match function as follows:
preg_match("/------------ (.*?) ------------/", $trimedInfo, $matches)
And I do receive the correct match of Problem Description. However as soon as I add anything to the beginning or the end of the ----------- I immediately receive no matches to the array. No matter what I add or what spacing I use, as soon as I add something to the match string of that I receive no matches. Any ideas as to why? And by something I mean either (.*?) as suggested or an actual value of what comes next. Does it have anything to do with a carriage return being previously there? I thought Trim stripped that out? Or is there a limit to the length of preg_match?
Your code unfortunately did not work and i am thoroughly confused as to why. Let me give you another example any maybe you can check this out. With the following I have something that looks like this:
------------ PROBLEM DESCRIPTION ------------
LOOKING AT A PATIENT TOOTH CHART AND APPLYING A FILTER ONCE A USER SWI
TCHES TO THE TREATMENT PLAN THE FILTER IS REMOVED. HOWEVER IF THEY REA
PPLY THE FILTER AND THEN SWITCH BACK TO THE TOOTH CHART THE FILTER REM
AINS. THIS SEEMS INCONSISTANT AND IT SEEMS IT SHOULD STAY THE SAME BOT
H WAYS.
--------------- XTS RESPONSE ----------------
11/29/10 14:50 From: JHS - JOHN H. SMITH
We'll check on this filter issue and update later this week.
New Completion Estimate: 12/06/10 15:00 pacific time
My thoughts were that the way the spacing is working is causing preg_match not to work so I used trim to create a new instance of the string:
$trimedInfo = trim($file_info)
Then I run the preg_match function as follows:
preg_match("/------------ (.*?) ------------/", $trimedInfo, $matches)
And I do receive the correct match of Problem Description. However as soon as I add anything to the beginning or the end of the ----------- I immediately receive no matches to the array. No matter what I add or what spacing I use, as soon as I add something to the match string of that I receive no matches. Any ideas as to why? And by something I mean either (.*?) as suggested or an actual value of what comes next. Does it have anything to do with a carriage return being previously there? I thought Trim stripped that out? Or is there a limit to the length of preg_match?
- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Help parsing a large group of text for certain values?
Um, I'm confused. If this is a different block of text to parse, it will need a different regular expression. Is this something different? If so, could you more clearly define what exactly you're trying to parse?
-
Insyderznf
- Forum Newbie
- Posts: 11
- Joined: Fri Sep 10, 2010 5:08 pm
Re: Help parsing a large group of text for certain values?
Jonah,
Sorry my fault, I was trying to give as minimal of an example as possible in the first post. I was hoping that once I was on the right track it would be easy to parse. The top half of the message is in the first post and I can get that to parse correctly, the bottom half is in this post, this is the portion i'm having issues with. Do you think the easiest way to parse the bottom half will be to use a function using strpos() and strlen()?
Here is an example of the whole message,unedited, with what needs to be parsed in bold:
XTI Help Number: 125063
ENTERED ON 11/29/10 AT 10:53 BY JOHN SMITH
CURRENT ESTIMATED COMPLETION IS 12/02/10 AT 16:00
TYPE OF PROBLEM: ADDRESS FIELD PRIORITY: 2 WORKING DAYS
ITEM SUBMITTED BEFORE? NO
------------ PROBLEM DESCRIPTION ------------
IN OUR RSP SYSTEM, THE ADDRESS FIELD IS LIMITED TO 30 CHARACTERS. IN T
HE MULTI SYSTEM, IT SEEMS TO BE LIMITED TO A NUMBER SMALLER THAN 3
0 CHARACTERS (24 I BELIEVE). WHEN THERE IS A LARGE ADDRESS, IT WILL CU
T OFF SOME OF THE ADDRESS IN PROGRAM ,1 FIELD 4. HOW WILL THIS AFFECT
OUR REPORTING, AND IS THERE A WAY WE CAN FIX TH
IS?
--------------- XTI RESPONSE ----------------
12/01/10 10:54**From: SSM - STEVE SMITH
You are correct, your XTI system stores 24
characters of address in ,1. Entering only 24 characters in Multisystem is easiest.
New Completion Estimate: 12/02/10 16:00 pacific time
Sorry my fault, I was trying to give as minimal of an example as possible in the first post. I was hoping that once I was on the right track it would be easy to parse. The top half of the message is in the first post and I can get that to parse correctly, the bottom half is in this post, this is the portion i'm having issues with. Do you think the easiest way to parse the bottom half will be to use a function using strpos() and strlen()?
Here is an example of the whole message,unedited, with what needs to be parsed in bold:
XTI Help Number: 125063
ENTERED ON 11/29/10 AT 10:53 BY JOHN SMITH
CURRENT ESTIMATED COMPLETION IS 12/02/10 AT 16:00
TYPE OF PROBLEM: ADDRESS FIELD PRIORITY: 2 WORKING DAYS
ITEM SUBMITTED BEFORE? NO
------------ PROBLEM DESCRIPTION ------------
IN OUR RSP SYSTEM, THE ADDRESS FIELD IS LIMITED TO 30 CHARACTERS. IN T
HE MULTI SYSTEM, IT SEEMS TO BE LIMITED TO A NUMBER SMALLER THAN 3
0 CHARACTERS (24 I BELIEVE). WHEN THERE IS A LARGE ADDRESS, IT WILL CU
T OFF SOME OF THE ADDRESS IN PROGRAM ,1 FIELD 4. HOW WILL THIS AFFECT
OUR REPORTING, AND IS THERE A WAY WE CAN FIX TH
IS?
--------------- XTI RESPONSE ----------------
12/01/10 10:54**From: SSM - STEVE SMITH
You are correct, your XTI system stores 24
characters of address in ,1. Entering only 24 characters in Multisystem is easiest.
New Completion Estimate: 12/02/10 16:00 pacific time
-
klevis miho
- Forum Contributor
- Posts: 413
- Joined: Wed Oct 29, 2008 2:59 pm
- Location: Albania
- Contact:
Re: Help parsing a large group of text for certain values?
Is this text in html?
-
klevis miho
- Forum Contributor
- Posts: 413
- Joined: Wed Oct 29, 2008 2:59 pm
- Location: Albania
- Contact:
Re: Help parsing a large group of text for certain values?
If this is html, then you can do a regular expression like this:
preg_match_all('#<b>(.+?)</b>#s', $text, $matches);
var_dump($matches);
This code gets everything that is inside the "b"(bold) tags.
Maybe you are using "strong" tags. If yes, replace the <b> with <strong> and the </b> with </strong>.
preg_match_all('#<b>(.+?)</b>#s', $text, $matches);
var_dump($matches);
This code gets everything that is inside the "b"(bold) tags.
Maybe you are using "strong" tags. If yes, replace the <b> with <strong> and the </b> with </strong>.
-
Insyderznf
- Forum Newbie
- Posts: 11
- Joined: Fri Sep 10, 2010 5:08 pm
Re: Help parsing a large group of text for certain values?
I'm getting pretty close to finishing this up, this is what I have used:
I'm just working on getting the values out of the top half that can possibly contain spaces. Trying preg_match() for these.
Woot got it using this:
preg_match("/TYPE OF PROBLEM: (.*?) PRIORITY: (.*?)ITEM SUBMITTED BEFORE?/s",$file_info,$matches)
Now i just need to put it all together and cross my fingers...
Code: Select all
function parseValues($ParseString, $Start, $End)
{
$trimParseString = trim($ParseString);
$substringStart = strpos($trimParseString,$Start) + strlen($Start);
$substringLength = strpos($trimParseString,$End) - $substringStart;
return substr($trimParseString, $substringStart,$substringLength);
}
Woot got it using this:
preg_match("/TYPE OF PROBLEM: (.*?) PRIORITY: (.*?)ITEM SUBMITTED BEFORE?/s",$file_info,$matches)
Now i just need to put it all together and cross my fingers...
- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Help parsing a large group of text for certain values?
This regex matches the whole thing. I tested it.
For a graphical explanation of how this regular expression works, paste it into this website: http://strfriend.com
Code: Select all
preg_match("/XTI Help Number: ([0-9]+)\n\nENTERED ON ([0-9\/]+) AT .*? BY (.*?)\nCURRENT ESTIMATED COMPLETION IS ([0-9\/]+) AT .*?\n\nTYPE OF PROBLEM: (.*?) PRIORITY: (.*?)\nITEM SUBMITTED BEFORE\? (.*?)\n\n-+ PROBLEM DESCRIPTION -+\n(.*?)\n\n-+ XTI RESPONSE -+\n([0-9\/]+) .*?\*\*From: SSM - (.*?)\n(.*?)$/s", $string, $matches);- AbraCadaver
- DevNet Master
- Posts: 2572
- Joined: Mon Feb 24, 2003 10:12 am
- Location: The Republic of Texas
- Contact:
Re: Help parsing a large group of text for certain values?
The s modifier should be the magic trick to Jonah's "original" pattern (not the new one with \n). It allows the . to match newlines. Without it a newline will end any of the .* matches.
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.
- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Help parsing a large group of text for certain values?
Yes, and as you can see I put the s on the most recent regex.AbraCadaver wrote:The s modifier should be the magic trick to Jonah's "original" pattern (not the new one with \n). It allows the . to match newlines. Without it a newline will end any of the .* matches.
- AbraCadaver
- DevNet Master
- Posts: 2572
- Joined: Mon Feb 24, 2003 10:12 am
- Location: The Republic of Texas
- Contact:
Re: Help parsing a large group of text for certain values?
Yes, but you shouldn't need any of the \n now.Jonah Bron wrote:Yes, and as you can see I put the s on the most recent regex.AbraCadaver wrote:The s modifier should be the magic trick to Jonah's "original" pattern (not the new one with \n). It allows the . to match newlines. Without it a newline will end any of the .* matches.
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.
- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Help parsing a large group of text for certain values?
Don't I at least need some of them? For example:
If I removed that one, there would be a trailing \n in the captured value.
And these ones. Wouldn't it break the match to remove it?
Code: Select all
PRIORITY: (.*?)\nITEM SUBMITTED BEFORECode: Select all
([0-9]+)\n\nENTERED ON