Can somebody help out with RegExps?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
visionmaster
Forum Contributor
Posts: 139
Joined: Wed Jul 14, 2004 4:06 am

Can somebody help out with RegExps?

Post by visionmaster »

Hello together,

I'm having extreme difficulties using RegExps for a specific problem and would really appreciate any help and hope somebody will read through my "long" posting...

1.

Code: Select all

<?php
 // Find all blocks containing the postal code, a minimum of 50 characters and a maximum of 200 characters before and after.
//This should me all blocks containing postal code and city.

$arrParsedBlocks = getDataUsingRegexp("'(.{50,250})".preg_quote($arrDaten['Plz'])."\s+".preg_quote($arrDaten['Ort'])."(.{50,250})
'is",$content);

function getDataUsingRegexp($strRegexp,$string)
{      
	global $arrDaten;
	
	preg_match_all($strRegexp, $string, $matches);
		
	$arrListe = array();	
		
	for ($i=0; $i< count($matches[0]); $i++)
	{   
	   $strData = trim($matches[1][$i].$arrDaten['Plz']." ".$arrDaten['Ort'].$matches[2][$i]);      	     	    
	     
	   $arrListe[] = $strData;        	               
	}
	   
	return $arrListe;
?>

Question:
-----------
* How can I extract 3 lines before and after postal code + city? (instead of a specific number of characters)


2.

Code: Select all

<?php
$string = "Kontakt
      
         
<br>
Bill Jones
           
Dr. Bill 
Jones<br>
Internet & Webdesign<br>

Examplestreet 9<br>
87354 Munich<br>
Germany<br>
Tel. (0 8 9) 1234 <br>
Handy (0173) 111 <br>
Internet: http://www.foo.com<br>

E-Mail: info@foo.com";
	    
echo $string;

$output_array = getDataUsingRegexp('#Tel(.*?)<br>#m',$string);
var_dump($output_array);

$output_array = getDataUsingRegexp('#Handy(.*?)<br>#m',$string);
var_dump($output_array);
?>

Questions:
------------
* I want to extract following data out of a string into an assoziative array (see above example) e.g.

Array( [Name] => "Bill Jones Dr. Bill Jone"s [Company Name] => "Internet & Webdesign" [Street] => "Examplestreet 9" [City] => "87354 Munich" [Country] => "Germany" [Tel] => "(0 8 9) 1234 <br>")

* As a basis I can use a postal code and the city name, with which I extracted the blocks containing these in step one.

Lines with a telephone number can be identified including words such as telefon, tel., fon or telephone.

Lines with a fax number can be identified including words such as fax or telefax.

Lines with a cellural number can be identified including words such as handy or mobile.

The patterns in my above example are actually very specific and designed for special cases and are not global at all.

The line above the line holding postal code and city is assumed holding the street data.

The 2 lines above the line holding the street data are assumed holding the company name.

Lines between postal code+city and tel. are assumed holding the country name, where as this is optional. Sometimes there may not even be any country information available.

I define the separation of lines not only by the separator new line (/n or <br>) but also strings/characters such as <br> or , or - or : or ; or |
Since an address can be written in one line, like

Bill Jones | Internet & Webdesign | Examplestreet 9 | 87354 Munich |

1. Company Name
2. Company Name
3. Street Name
3. Postal Code + City name
4. Country Name (optional)
5. Tel.
6. Fax.
7. Handy

5. to 7. can of course differ in order

=> Somehow all sounds simple, but performing a regular expression pattern is another side of the story... :(

Is there any RegExp professionell out there who could help out? I would also appreciate detailed explanations, since I'm here to learn!

Thanks a lot!
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

Question:
-----------
* How can I extract 3 lines before and after postal code + city? (instead of a specific number of characters)
Let's assume you need to find a 'needle' and get it as well as three lines of context around it (but no 'crap' surrounding these lines):

Code: Select all

$txt = <<<EOF
crap
surrounding text
surrounding text2
surrounding text3
needle
surrounding text4
surrounding text5
surrounding text6
crap
EOF;

preg_match('/(.*?\n){3}needle(.*?\n){3}/m', $txt, $matches);
echo $matches[0];
visionmaster
Forum Contributor
Posts: 139
Joined: Wed Jul 14, 2004 4:06 am

Post by visionmaster »

Weirdan wrote:
Question:
-----------
* How can I extract 3 lines before and after postal code + city? (instead of a specific number of characters)
Let's assume you need to find a 'needle' and get it as well as three lines of context around it (but no 'crap' surrounding these lines):

Code: Select all

$txt = <<<EOF
crap
surrounding text
surrounding text2
surrounding text3
needle
surrounding text4
surrounding text5
surrounding text6
crap
EOF;

preg_match('/(.*?\n){3}needle(.*?\n){3}/m', $txt, $matches);
echo $matches[0];
Thanks for your quick response!

How about any helpful tips regarding the second block of questions. :wink:

* Lines with a telephone number can be identified including words such as telefon, tel., fon or telephone. (I want the result array with just the telephone number, the identifying string e.g. 'Tel.' itself should be excluded.)

*The line above the line holding postal code and city is assumed holding the street data.

*Lines between postal code+city and tel. are assumed holding the country name, where as this is optional. Sometimes there may not even be any country information available.

=> How can I translate that to RegExp?


As I described a line may not only be \n but also may be something like <br> or , or | etc...

I define the separation of lines not only by the separator new line (/n or <br>) but also strings/characters such as <br> or , or - or : or ; or |

Thanks a lot for your help!
Post Reply