How to stop a preg_match_all from freaking out?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
hydroxide
Forum Commoner
Posts: 77
Joined: Mon Jun 05, 2006 9:53 am

How to stop a preg_match_all from freaking out?

Post by hydroxide »

Okay, I have a problem with the function preg_match_all.

Code: Select all

//The regexes that match the data in $string
		preg_match_all("/(<b>)([a-zA-Z0-9\s\-_\.\:\\\&]*)(<\/b>)/is", $string, $match['company_name'], PREG_SET_ORDER);
		preg_match_all("/(Client ID\:\s)([0-9\s\\\-]*)(<br>)/is", $string, $match['id'], PREG_SET_ORDER);
		preg_match_all("/(Processing Location\:\s)([A-Za-z\.\\0-9\s]*)/is", $string, $match['proc_location'], PREG_SET_ORDER);
		preg_match_all("/([\r]*)(;<br>)([\r]*)([\sA-Za-z0-9\,\\\.\s]*)(<br>)/is", $string, $match['client_location'], PREG_SET_ORDER);
		preg_match_all("/(Contact Name\:\s)([A-Za-z\,\/0-9\s\\\.]*)(<br>)/is", $string, $match['contact_name'], PREG_SET_ORDER);
		preg_match_all("/(Contact Phone\:\s)([A-Za-z0-9-\s\\\.]*)(<br>)/is", $string, $match['phone'], PREG_SET_ORDER);
		preg_match_all("/(Client Original Call In Date\:\s)([A-Za-z0-9\/\-\:\\\.\s]*)(<br>)/is", $string, $match['call_date'], PREG_SET_ORDER);
		preg_match_all("/(Client Original Period Begin Date\:\s)([A-Za-z0-9\/\-\:\\\.\s]*)(<br>)/is", $string, $match['per_begin_date'], PREG_SET_ORDER);
		preg_match_all("/(Client Orginal Period End Date\:\s)([A-Za-z0-9\/\-\:\\\.\s]*)(<br>)/is", $string, $match['per_end_date'], PREG_SET_ORDER);
		preg_match_all("/(Client Orginal Check Date\:\s)([A-Za-z0-9\/\-\:\\\.\s]*)(<br>)/is", $string, $match['check_date'], PREG_SET_ORDER);
		preg_match_all("/(Client Orginal Delivery Date\:\s)([A-Za-z0-9\/\-\:\\\.\s]*)(<br>)/is", $string, $match['delivery_date'], PREG_SET_ORDER);
		preg_match_all("/(Client New Call In Date\:\s)([A-Za-z0-9\/\-\:\\\.\s]*)(<br>)/is", $string, $match['new_call_date'], PREG_SET_ORDER);
		preg_match_all("/(Client New Period Begin Date\:\s)([A-Za-z0-9\/\-\:\\\.\s]*)(<br>)/is", $string, $match['new_per_begin_date'], PREG_SET_ORDER);
		preg_match_all("/(Client New Period End Date\:\s)([A-Za-z0-9\/\-\:\\\.\s]*)(<br>)/is", $string, $match['new_per_end_date'], PREG_SET_ORDER);
		preg_match_all("/(Client New Check Date\:\s)([A-Za-z0-9\/\-\:\\\.\s]*)(<br>)/is", $string, $match['new_check_date'], PREG_SET_ORDER);
		preg_match_all("/(Client New Delivery Date\:\s)([A-Za-z0-9\/\\\:\.\-\s]*)(<br>)/is", $string, $match['new_delivery_date'], PREG_SET_ORDER);
		preg_match_all("/([\r]*)(<u>)(Reason for false start\:)(<\/u><br>)([\r]*)([A-Za-z0-9\/,;\:\-\\'$&\\(\)\#.!\s\@]*)(<br>)/si", $string, $match['reason'], PREG_SET_ORDER);
		preg_match_all("/(Change date\:\s)([A-Za-z\,\:0-9\s\\\(\)\s]*)(<hr>)/is", $string, $match['date_added'], PREG_SET_ORDER);
	
	
	//for loop to print the matches found
	
	for($i=0;$i<count($match['company_name']);$i++){

		$n = $i + 1;
		print "<RECORD>\n";
  		print "<id>".$n."</id>\n";
  		print "<company_name>".$match['company_name'][$i][2]."</company_name>\n";
  		print "<client_id>".$match['id'][$i][2]."</client_id>\n";
  		print "<proc_location>".$match['proc_location'][$i][2]."</proc_location>\n";
		print "<client_location>".ltrim($match['client_location'][$i][4])."</client_location>\n";
		print "<contact_name>".$match['contact_name'][$i][2]."</contact_name>\n"; 
  		print "<phone>".$match['phone'][$i][2]."</phone>\n";
  		print "<call_date>".$match['call_date'][$i][2]."</call_date>\n";  
  		print "<per_begin_date>".$match['per_begin_date'][$i][2]."</per_begin_date>\n";  
   		print "<per_end_date>".$match['per_end_date'][$i][2]."</per_end_date>\n";
   		print "<check_date>".$match['check_date'][$i][2]."</check_date>\n";
   		print "<delivery_date>".$match['delivery_date'][$i][2]."</delivery_date>\n";
   		print "<new_call_date>".$match['new_call_date'][$i][2]."</new_call_date>\n";
   		print "<new_per_begin_date>".$match['new_per_begin_date'][$i][2]."</new_per_begin_date>\n";
   		print "<new_per_end_date>".$match['new_per_end_date'][$i][2]."</new_per_end_date>\n";
   		print "<new_check_date>".$match['new_check_date'][$i][2]."</new_check_date>\n";
   		print "<new_delivery_date>".$match['new_delivery_date'][$i][2]."</new_delivery_date>\n";
 		print "<reason>".ltrim($match['reason'][$i][6])."</reason>\n";
   		print "<date_added>".$match['date_added'][$i][2]."</date_added>\n";
  		print "</RECORD>\n\n";
  		print "$i";
	}
It runs through fine until it fails to match (either because the regexp doesn't match or what it's searching for is missing). The problem is that when it fails to match one, it goes on to the next one, which causes incorrect data to be printed for each record, until towards the end of the putput and it doesn't show anything between the tags. I can't figure out how to keep everything the same and get regexp to just print an ' ' instead of going on in the document and mixing up the data.

For example if there's no match in one record for in call_date it will print call_date for the next record instead of ' ' and moving on to the next record.

How can I fix this? I have no idea. The data being searched follows a template like this:
<b>Random Company Name</b><br>
Client ID: 12-23-111<br>
Processing Location: ftlauderdale, <<a href="mailto:mlewis@mycompany.com">mlewis@mycompany.com</a>><br>
<<a href="mailto:lrivera@mycompany.com">lrivera@mycompany.com</a>><br>
Stuart, FL 34992<br>
Contact Name: Dorothy/George Johnson<br>
Contact Phone: 555 555-5555<br>
Client Original Call In Date: 05/31/06<br>
Client Original Period Begin Date: 05/24/06<br>
Client Orginal Period End Date: 05/30/06<br>
Client Orginal Check Date: 06/02/06<br>
Client Orginal Delivery Date: 06/02/06<br>
Client New Call In Date: 06/05/06<br>
Client New Period Begin Date: 05/29/06<br>
Client New Period End Date: 06/04/06<br>
Client New Check Date: 06/09/06<br>
Client New Delivery Date: 06/09/06<br>
<u>Reason for false start:</u><br>
1st False start: Client requested to change pay period from Wed 5/24- Tues 5/30 to new dates of Mon 5/29 to Sun 6/4. Also per Matt he was not aware that client's previous payroll company required a written 30 day notice prior to canceling their account.<br>
Change date: Thursday, June 01, 2006 at 16:23:12 (EDT)<hr>
But, like I said, the problem occurs if, for some reason one (whole line) is missing or there is something in there that makes the regexp not match.
Last edited by hydroxide on Wed Jun 14, 2006 10:54 am, edited 1 time in total.
User avatar
TheMoose
Forum Contributor
Posts: 351
Joined: Tue May 23, 2006 10:42 am

Post by TheMoose »

It does that because you have your regular expressions broken up into separate pieces. Each one is matching only their pattern, they don't care where they are at or what is around them.

Combine it all into one large regular expression pattern and that will solve your problem. If you have line breaks in the HTML code itself, the regexp to match that (the one that I use at least), is [\s\n\r\t]*. That matches any whitespace, new line, line return, or tab character, any number of times (0 to infinity).
User avatar
hydroxide
Forum Commoner
Posts: 77
Joined: Mon Jun 05, 2006 9:53 am

Post by hydroxide »

Wouldn't that prevent me from wrapping the returned data into specific XML tags?
User avatar
TheMoose
Forum Contributor
Posts: 351
Joined: Tue May 23, 2006 10:42 am

Post by TheMoose »

Nope. If you do one giant pattern, (in the example, I'll call it $pattern), the first brackets are the values matched in each parenthesis in the pattern itself, and each subset of those matches are the different results.

IE:

Code: Select all

$pattern = "";
for($i=0; $i<count($matches[0]); $i++) {
	// $matches[2] has all the results for the second parenthesis pattern in the regexp, in this case, all the text that matches the pattern ([a-zA-Z0-9\s\-_\.\:\\\&]*)
	$companyname = $matches[2][$i];
	$clientid = $matches[4][$i];
	//   etc etc
}
User avatar
hydroxide
Forum Commoner
Posts: 77
Joined: Mon Jun 05, 2006 9:53 am

Post by hydroxide »

Ahhh... perfect. Thanks a bunch!
Post Reply