Page 1 of 1

parsing content from html

Posted: Fri May 02, 2008 12:40 am
by SidewinderX
I am trying to parse the contents of a url. I first use cURL to connect to the url, and dump returned content of curl_exec into a variable [$content]. Then, using sscanf, I try to parse a number from a section of the html. My current code is below:

Code: Select all

<?php
$ch = curl_init();
 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_URL, $url); //$url is defined elsewhere
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 
$content = curl_exec($ch);
curl_close($ch);
 
sscanf($content, '<span id="Stats_lbl16">%s</span>', $xp);
 
echo $xp;
 
?>
The current code displays nothing. I know the sscanf statement is correct as the following code will work:

Code: Select all

$content = '<span id="Stats_lbl16">3244893</span>'; //This line is in the source code of the website I am trying to parse.
sscanf($content, '<span id="Stats_lbl16">%s</span>', $xp);
echo $xp;
Moreover, I am pretty confident the curl bit works fine also. As when I simply echo $content, it will display the page.

I believe the problem is with $content. Something along the lines that > is being converted to > or something. I'm not exactly sure, anyone have any insight on this matter?

Thank you

Re: parsing content from html

Posted: Fri May 02, 2008 4:25 am
by youscript
You can try add

Code: Select all

$content=htmlspecialchars_decode($content)
after

Code: Select all

$content = curl_exec($ch);
curl_close($ch);

Re: parsing content from html

Posted: Fri May 02, 2008 12:24 pm
by SidewinderX
Nope, that doesn't appear to work.

Re: parsing content from html

Posted: Fri May 02, 2008 6:03 pm
by SidewinderX
Solution:

Code: Select all

preg_match('/<span id="Stats_lbl16">\d+<\/span>/', $content, $xp);