Page 1 of 1
split() or reg_ex()
Posted: Tue Nov 08, 2005 11:19 pm
by traffic
Hello...
I am downloading some public information website regarding my community neighbors from the county assessors office...
If I have:
Code: Select all
<span id="_ctl5_lblOwner">MAGUIRE TIM/CHERYL</span>
What is the best way to 'capture' (MAGUIRE TIM/CHERYL)...?
Also - the website HTML is loaded into a variable --> $htmlData
You can see 'sample output' at:
http://myPVHOA.com/curl.php
I was thinking about running through line by line looking for a match on '_ctl5_lblOwner' - then using split() on the first span tag - then splitting the second portion on the span tag and leaving myself with the middle portion that I was looking for...
I have about 25 of these little 'sections' that I want to download into a database and thought there might be a better way to capture the data without using several loops...
Thank you for any help you can give me...
Posted: Tue Nov 08, 2005 11:32 pm
by m3mn0n
split() or reg_ex()
Posted: Tue Nov 08, 2005 11:37 pm
by traffic
What would you suggest Sami...?
Any help is very useful to me...
...
Posted: Tue Nov 08, 2005 11:52 pm
by redmonkey
split() also uses regex as it's delimiter pattern so there is no real benefit there.
TBH, data extraction from strings is best done via the method you best understand. This particular example seems quite simple but in many cases your pattern matching requires tweaking, if you don't fully understand your extraction code this could take some time.
If it were me, without seeing the exact data you are dealing with I'd probably opt for a simple preg_match() solution. It doesn't sound like you have too much data to churn through so I very much doubt it's worth looking at performance issue of regex vs std string manipulators.
split() or reg_ex()
Posted: Wed Nov 09, 2005 12:01 am
by traffic
RedMonkey...
Thank you...
I am reading about preg_match() right now...
A sample of the data I am 'scraping' is found here:
http://myPVHOA.com/curl.php
I am trying to get a firm grip on the preg_match -->
http://www.php.net/manual/en/function.p ... ch-all.php
Would it be possible to see an easy example...?
Maybe something like this:
Code: Select all
<?php
preg_match("|<span id=\"_ctl5_lblOwner\">(.*)<\/span>|U",
"<span id="_ctl5_lblOwner">MAGUIRE TIM/CHERYL</span>",
$out, PREG_PATTERN_ORDER);
echo $out[0][0] . ", " . $out[0][1] . "\n";
echo $out[1][0] . ", " . $out[1][1] . "\n";
?>
And what would be the best way to 'loop' through the site data in the variable --> $htmlData to search in each line...?
Thank you again for your help...
...
Posted: Wed Nov 09, 2005 12:04 am
by yum-jelly
Maybe preg_ would be faster on your server, but in my testing it was slower than using multi str_ functions using your example data!
Code: Select all
<?
$data = file_get_contents ( 'junk.txt' ); // string to look in
$new = 'lblOwner">';
$end = '</span>';
if ( ( $pos = strpos ( $data, $new ) ) !== false )
{
$data = substr ( $data, ( $len = ( $pos + strlen ( $new ) ) ), ( strpos ( $data, $end, $len ) - $len ) );
echo $data;
}
else
{
echo 'substr was not found';
}
?>
yj
split() or reg_ex()
Posted: Wed Nov 09, 2005 12:45 am
by traffic
Thank you yum-jelly...
I wound up using:
Code: Select all
$search_string = array();
$search_string[0] = '_ctl5_lblOwner">';
$search_string[1] = '_ctl4_lblAddress">';
$numElements = count($search_string);
$end = '</span>';
$data = $htmlData;
for($x=0; $x < $numElements; $x++){
if ( ( $pos = strpos ( $data, $search_string[$x] ) ) !== false ){
$new_data = substr ( $data, ( $len = ( $pos + strlen ( $search_string[$x] ) ) ), ( strpos ( $data, $end, $len ) - $len ) );
echo "search_string[$x] => $new_data<br>";
}else{
echo 'substr was not found';
}
}
Of course, I will finish populating the search_string() with the rest of my 'strings' to search by...
Thank you again...
...
Posted: Wed Nov 09, 2005 12:52 am
by redmonkey
Off the top of my head I would go with something along the lines of....
Code: Select all
header('Content-Type: palin/text');
if (preg_match('/5_lblOwner">(.*?)<\/span.*?5_lblAddress">(.*?)<\/span/s', $string, $match))
{
echo "Owner : {$match[1]}\x0a";
echo "Address : {$match[2]}\x0a";
}
Posted: Wed Nov 09, 2005 6:04 am
by n00b Saibot
redmonkey wrote:header('Content-Type: palin/text');
