split() or reg_ex()

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
traffic
Forum Newbie
Posts: 17
Joined: Fri May 23, 2003 1:18 pm

split() or reg_ex()

Post by traffic »

Hello...

I am downloading some public information website regarding my community neighbors from the county assessors office...

If I have:

Code: Select all

<span id="_ctl5_lblOwner">MAGUIRE TIM/CHERYL</span>
What is the best way to 'capture' (MAGUIRE TIM/CHERYL)...?

Also - the website HTML is loaded into a variable --> $htmlData

You can see 'sample output' at: http://myPVHOA.com/curl.php

I was thinking about running through line by line looking for a match on '_ctl5_lblOwner' - then using split() on the first span tag - then splitting the second portion on the span tag and leaving myself with the middle portion that I was looking for...

I have about 25 of these little 'sections' that I want to download into a database and thought there might be a better way to capture the data without using several loops...


Thank you for any help you can give me...
User avatar
m3mn0n
PHP Evangelist
Posts: 3548
Joined: Tue Aug 13, 2002 3:35 pm
Location: Calgary, Canada

Post by m3mn0n »

It's a tie, they both lose.

See: http://php.net/manual/en/function.preg-match.php
traffic
Forum Newbie
Posts: 17
Joined: Fri May 23, 2003 1:18 pm

split() or reg_ex()

Post by traffic »

What would you suggest Sami...?

Any help is very useful to me...


...
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

split() also uses regex as it's delimiter pattern so there is no real benefit there.

TBH, data extraction from strings is best done via the method you best understand. This particular example seems quite simple but in many cases your pattern matching requires tweaking, if you don't fully understand your extraction code this could take some time.

If it were me, without seeing the exact data you are dealing with I'd probably opt for a simple preg_match() solution. It doesn't sound like you have too much data to churn through so I very much doubt it's worth looking at performance issue of regex vs std string manipulators.
traffic
Forum Newbie
Posts: 17
Joined: Fri May 23, 2003 1:18 pm

split() or reg_ex()

Post by traffic »

RedMonkey...

Thank you...

I am reading about preg_match() right now...

A sample of the data I am 'scraping' is found here: http://myPVHOA.com/curl.php

I am trying to get a firm grip on the preg_match --> http://www.php.net/manual/en/function.p ... ch-all.php

Would it be possible to see an easy example...?

Maybe something like this:

Code: Select all

<?php
preg_match("|<span id=\"_ctl5_lblOwner\">(.*)<\/span>|U",
   "<span id="_ctl5_lblOwner">MAGUIRE TIM/CHERYL</span>",
   $out, PREG_PATTERN_ORDER);
echo $out[0][0] . ", " . $out[0][1] . "\n";
echo $out[1][0] . ", " . $out[1][1] . "\n";
?>
And what would be the best way to 'loop' through the site data in the variable --> $htmlData to search in each line...?

Thank you again for your help...


...
yum-jelly
Forum Commoner
Posts: 98
Joined: Sat Oct 29, 2005 9:16 pm

Post by yum-jelly »

Maybe preg_ would be faster on your server, but in my testing it was slower than using multi str_ functions using your example data!

Code: Select all

<?

$data = file_get_contents ( 'junk.txt' ); // string to look in

$new = 'lblOwner">';
$end = '</span>';

if ( ( $pos = strpos ( $data, $new ) ) !== false )
{
	$data = substr ( $data, ( $len = ( $pos + strlen ( $new ) ) ), ( strpos ( $data, $end, $len ) - $len ) );

	echo $data;
}
else
{
	echo 'substr was not found';
}

?>

yj
traffic
Forum Newbie
Posts: 17
Joined: Fri May 23, 2003 1:18 pm

split() or reg_ex()

Post by traffic »

Thank you yum-jelly...

I wound up using:

Code: Select all

$search_string = array();
$search_string[0] = '_ctl5_lblOwner">';
$search_string[1] = '_ctl4_lblAddress">';

$numElements = count($search_string);

$end = '</span>';
$data = $htmlData;

for($x=0; $x < $numElements; $x++){
  if ( ( $pos = strpos ( $data, $search_string[$x] ) ) !== false ){
      $new_data = substr ( $data, ( $len = ( $pos + strlen ( $search_string[$x] ) ) ), ( strpos ( $data, $end, $len ) - $len ) );
      echo "search_string[$x] => $new_data<br>";
    }else{
     echo 'substr was not found';
    }
}

Of course, I will finish populating the search_string() with the rest of my 'strings' to search by...

Thank you again...


...
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

Off the top of my head I would go with something along the lines of....

Code: Select all

header('Content-Type: palin/text');
if (preg_match('/5_lblOwner">(.*?)<\/span.*?5_lblAddress">(.*?)<\/span/s', $string, $match))
{
  echo "Owner   : {$match[1]}\x0a";
  echo "Address : {$match[2]}\x0a";
}
User avatar
n00b Saibot
DevNet Resident
Posts: 1452
Joined: Fri Dec 24, 2004 2:59 am
Location: Lucknow, UP, India
Contact:

Post by n00b Saibot »

redmonkey wrote:header('Content-Type: palin/text');
:lol:
Post Reply