Extracting data from a html string...

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
welsh_sponger
Forum Newbie
Posts: 14
Joined: Fri Feb 03, 2006 7:46 am

Extracting data from a html string...

Post by welsh_sponger »

Hello. I have just read through the helpful tutorial, but am still a little unsure...

If i extract some HTML to a string, i can match what im looking for using something like this:

Code: Select all

page = file_get_contents("http://www.somesite...");

if (preg_match("/13.1/", $page))
			{

			print ("yes");
			} else {

			print ("no");
			}
Which is fine.

The problem I have, is that if that value of 13.1 is constantly changing say every 30 minutes, how do find it in the haze of html code and extract that new value each time? I'm assuming that its position on the page doesnt change, simply the value.

Thanks in advance
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

post more of the HTML around it so we can help you create a pattern to find it.
welsh_sponger
Forum Newbie
Posts: 14
Joined: Fri Feb 03, 2006 7:46 am

Post by welsh_sponger »

Ok, so it's something like this...

Code: Select all

<TR><TD><A HREF="/show_plot.php?station=62303&meas=wspd&uom=E"><img alt="24-hour plot - Wind Speed" border=0 src="/images/graph04.gif"></A></TD><TD>Wind Speed (WSPD):</TD><TD>   20.0 kts</TD></TR>
So id like to get the 20.0 value...

Thanks
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

If you're getting this from a weather site, most of them have XML/RSS feeds of this data these days, making it easier to extract the information.

Code: Select all

$pattern = '#<\s*td[^>]*>\s*wind\s speed[^<]*<\s*/\s*td[^>]*>\s*<\s*td[^>]*>\s*(\d (?:\.\d )?)[^\d<]*<\s*/\s*td[^>]*>#si';
preg_match($pattern, $html, $matches);
var_export($matches);
should work, but that is untested.
Post Reply