Page 1 of 1

HTML regex problem

Posted: Sat Jun 16, 2007 7:35 pm
by afbase
I am trying to parse the value "GDP (current US$)" for years 2000 and 2005 in the examples here and here


the script I'm formulating a regex for is:

Code: Select all

GDP (current US$)                                                                                                                                                                                       </a></font></td>
				
				<td bgcolor="#ffffff" width="88" align="right"><font size="-2" face="Verdana,Tahoma,Arial,Helvetica">.. </font></td>
				
				<td bgcolor="#ffffff" width="88" align="right"><font size="-2" face="Verdana,Tahoma,Arial,Helvetica">7.3 billion </font></td>
and this is the pattern i made:

Code: Select all

$pattern='&\(current US$\)\s*</a></font></td>\s*<td bgcolor="#ffffff" width="88" align="right">';
$pattern.='<font size="-2" face="Verdana,Tahoma,Arial,Helvetica">([^<]*)</font></td>\s*';
$pattern.='<td bgcolor="#ffffff" width="88" align="right"><font size="-2" face="Verdana,Tahoma,Arial,Helvetica">([^<]*)</font></td>&';

i get the array

Code: Select all

Array
(
    [0] => Array
        (
        )

    [1] => Array
        (
        )

    [2] => Array
        (
        )

)


any ideas of where i went wrong?

Posted: Sat Jun 16, 2007 7:55 pm
by feyd
Your pattern needs a bit more escaping.

I do hope you have their permission to extract this information from their pages.

Posted: Sat Jun 16, 2007 10:21 pm
by afbase
I found it! The dollar sign needed escaping, i missed that one! According to there terms/conditions, what i'm doing is fine.