PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!
<?php
$q="Who is the richest man in the world";
$url="http://www.google.com/search?hl=en&q=".str_replace(" ","+",$q);
echo $url.'<br>';
$str=file_get_contents($url);
preg_match("/(Results.*?<b>1<\/b>.*?- <b>.*?<\/b> of about <b>)(.*?)(<\/b> for <b>)/is",$str,$matches);
while (list($k,$v)=each($matches))
{
echo htmlentities($v).'<br>';
}
?>
http://www.google.com/search?hl=en&q=Who+is+the+richest+man+in+the+world
Results <b>1</b> - <b>10</b> of about <b>1,370,000</b> for <b>
Results <b>1</b> - <b>10</b> of about <b>
1,370,000
</b> for <b>
Type http://www.google.com/search?hl=en&q=Wh ... +the+world in your browser and you'll find Results 1 - 10 of about 800,000.
I searched 1,370,000 in the google html page result but came up with nothing. I searched for 800,000 in $str but nothing.
Anyone know how this is happening ?
Thanks
http://www.google.com/search?hl=en&q=Who+is+the+richest+man+in+the+world
Results <b>1</b> - <b>10</b> of about <b>779,000</b> for <b>
Results <b>1</b> - <b>10</b> of about <b>
779,000
</b> for <b>
Am I misunderstanding what you're after? Those figures are correct.
Why are you outputting the htmlentities of it by the way?
All I want is the total number of results - nothing more. But when I cross check its totally different.
Now Im getting 769,000.
Im doing the same thing with MSN and its always giving the correct no: when checked against manually. Its only google that keeps giving me different results.
And BTW, the php code on my web host showed 1,370,000 and when I manually type here I get 800,000. I couldnt try on my localhost because it exceeds 30 sec.
Im outputting htmlentities just to show the real data from RegExp.
<?php
$q="Who is the richest man in the world";
$url="http://www.google.com/search?hl=en&q=".str_replace(" ","+",$q);
echo '<b>'.$url.'</b> returns<br>';
$str=file_get_contents($url);
//preg_match("/(Results.*?<b>1<\/b>.*?- <b>.*?<\/b> of about <b>)(.*?)(<\/b> for <b>)/is",$str,$matches); //Original regexp
preg_match('/Results <b>\d+<\/b> - <b>\d+<\/b> of about <b>((\d|\,)+?)<\/b> for <b>/is', $str, $matches);
echo $matches[1].' results.';
?>
This IS the correct regexp, although, as you mention... it does sometimes return a wonky number. The only thing I can think is that google returns a wonky number sometimes
Last edited by Chris Corbyn on Sun Mar 13, 2005 8:44 am, edited 1 time in total.
My own PC with returns 779,000 70% of the time and 1,440,000 the rest of the time. I'm thinking about it but I really dont see why PHP code or a regexp for that matter could be so inconsistent. I'm gonna keep testing it on google itself, without this script and if google throws a wobbler on me I'll put it down to that
I'm refreshing the google page over and over and the only result count I can ever get is 1,440,000. I'm completely mystified by this. I don't know how google works so the only thing I can think is that somehow the PHP script is reading the data midway through a results count, but I can't see how it's possible since google should parse this info on the server.
Maybe google is doing it's monthly crawl? That does affect results quite a bit.
EDIT: Spent past 5 mins repeatedly running the google query in google itself and via the script. Both are now returning 1,440,000 100% of the time. I guess this was a minor glitch in the google system (probably due the bot doing it's crawl).
d11wtq wrote:but I can't see how it's possible since google should parse this info on the server.
Exactly - Im not able to see any problem with the code so far. After all the no: of results are the same for MSN, Yahoo and Altavista - I checked. Its just Google that googling around.
But Google may show different results based on location - they have a seprate search for each country - like google.co.in - so sometimes it may check for results locally too when given .com ?