my first try at regex, and I'm stuck :-P

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

my first try at regex, and I'm stuck :-P

Post by afbase »

ok I'm CURL'ing MSN money pages and using regex to collect data off the pages. An example of the HTML source, I'm selecting out:

Code: Select all

P/E</td><td class="cl1">6.20
I want my regex to select out the numbers "6.20" from the string... the proper syntax for this would be (i think):

Code: Select all

\d+\.\d*
my regex code to select this from the MSN page is:

Code: Select all

$result = curl_exec($ch);
curl_close($ch);
$pattern='#\w\W\w\W+\w+\W+\w+\s\w+\W+\w+\W+(\d+\W\d*)#i';
preg_match($pattern,$result,$match);
print_r($match);
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

oops my returned code

Post by afbase »

This is what the code spits out

Code: Select all

Array ( [0] => 0 s 0 v 0 l 0) [1] => 0) )
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

\w\W\w\W+\w+\W+\why are there so many \w\W in your pattern?

Code: Select all

$subject = 'P/E</td><td class="cl1">6.20';
$pattern = '/<td class="cl1">([\d+.]+)/';

preg_match($pattern, $subject, $matches);
echo $matches[1];
User avatar
kaisellgren
DevNet Resident
Posts: 1675
Joined: Sat Jan 07, 2006 5:52 am
Location: Lahti, Finland.

Post by kaisellgren »

Not tested...

Code: Select all

preg_match("/>(\d+(\.\d+)?)/i",$str,$matches);
echo $matches[1];
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

P/E

Post by afbase »

the "P/E" is a critical identifier not so much the "<td class='c11'>", If i try either code, it will just return incorrect data. Instead of posting the P/E ratio, it returns the previous day's close.

I just realized that some stocks do not have number, how can I have the pregmatch () to select the p/e ratio as a number or as "NA"?
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

Almost Solved!!!!

Post by afbase »

feyd | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]


here is my code now:

Code: Select all

$result = curl_exec($ch);
curl_close($ch);
$pattern='#[P\E]</td><td class="cl1">([\d+.]+|\w+)#';
preg_match($pattern,$result,$match);
print_r($match);


It properly selects the P/E ratio but it isn't a very clean selection, This is what it returns for "NA" price:

Code: Select all

Array ( [0] => ENA [1] => NA )
and for a Number:

Code: Select all

Array ( [0] => E28.60 [1] => 28.60 )


These returned codes are just examples. I could use regex but I really really don't want $match[0]!!! I'm going to put this into a function that will loop over 1000 times. Extra data on that many loops won't be healthy coding.


feyd | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Re: P/E

Post by volka »

afbase wrote:the "P/E" is a critical identifier not so much the "<td class='c11'>", If i try either code, it will just return incorrect data. Instead of posting the P/E ratio, it returns the previous day's close.
Then give us more data. I don't know "MSN money pages", do they have e.g. an url? Something to test on?
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

MSN Money

Post by afbase »

the following two links are examples of the pages that Curling
http://moneycentral.msn.com/detail/stoc ... Symbol=wgo
http://moneycentral.msn.com/detail/stoc ... ymbol=zoom

I am trying to capture the P/E ratio displayed on the far right of the main table of information. It will either display NA or a number and I've given you links to these two types of examples.

The specific piece that I'm looking for is on line 86, column 3918 of the source code.

this is my code so far

Code: Select all

$url = "http://moneycentral.msn.com/detail/stock_quote?ipage=qd&Symbol=US%3A".$_GET['ticker'];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT, 3);
$result = curl_exec($ch);
curl_close($ch);
$pattern='#[P\][E]</td><td class="cl1">([\d+.]+|\w+)#i';
preg_match($pattern,$result,$match);
print_r($match);
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

Code: Select all

$testdata = array(
		'http://moneycentral.msn.com/detail/stock_quote?Symbol=wgo',
		'http://moneycentral.msn.com/detail/stock_quote?Symbol=zoom'
	);
$pattern = '!<tr><td>P/E</td><td class="cl1">([^<]*)</td></tr>!';


foreach($testdata as $url) {
	$subject = file_get_contents($url);
	preg_match($pattern, $subject, $matches);
	$pe = $matches[1]; 
	echo $url, ' -> ', $pe, "<br />\n";
}
works fine for me.
Wether you use url wrappers or curl doesn't matter, they both return a string.
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

volka thanks!!

Post by afbase »

thanks for your help!!!!! That pattern you gave me actually shed some light on how to write regex patterns better. I had to modify your pattern a little bit though, I forgot that special stocks that have gone through bankruptcies/have pro forma earnings like lockheed martin (LMT) and have FYI/psuedo P/E ratios displayed on MSN. so here is the final script if you are curious:

Code: Select all

<?php
$url = "http://moneycentral.msn.com/detail/stock_quote?ipage=qd&Symbol=US%3A".$_GET['ticker'];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT, 3);
$result = curl_exec($ch);
curl_close($ch);
$pattern='!P/E</td><td class="cl1">([^<]*)</td></tr>!';
preg_match($pattern,$result,$match);
print_r($match);
?>

I'm going to stick with the curling script and borrow your pattern. It loops/retrieves pages faster for some reason (according to some posts on php.net), not exactly sure why though. Eventually I'm going to loop this script
Post Reply