HTML regex problem

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

HTML regex problem

Post by afbase »

I am trying to parse the value "GDP (current US$)" for years 2000 and 2005 in the examples here and here


the script I'm formulating a regex for is:

Code: Select all

GDP (current US$)                                                                                                                                                                                       </a></font></td>
				
				<td bgcolor="#ffffff" width="88" align="right"><font size="-2" face="Verdana,Tahoma,Arial,Helvetica">.. </font></td>
				
				<td bgcolor="#ffffff" width="88" align="right"><font size="-2" face="Verdana,Tahoma,Arial,Helvetica">7.3 billion </font></td>
and this is the pattern i made:

Code: Select all

$pattern='&\(current US$\)\s*</a></font></td>\s*<td bgcolor="#ffffff" width="88" align="right">';
$pattern.='<font size="-2" face="Verdana,Tahoma,Arial,Helvetica">([^<]*)</font></td>\s*';
$pattern.='<td bgcolor="#ffffff" width="88" align="right"><font size="-2" face="Verdana,Tahoma,Arial,Helvetica">([^<]*)</font></td>&';

i get the array

Code: Select all

Array
(
    [0] => Array
        (
        )

    [1] => Array
        (
        )

    [2] => Array
        (
        )

)


any ideas of where i went wrong?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Your pattern needs a bit more escaping.

I do hope you have their permission to extract this information from their pages.
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

Post by afbase »

I found it! The dollar sign needed escaping, i missed that one! According to there terms/conditions, what i'm doing is fine.
Post Reply