Page 1 of 1

Getting Total No Of Results from Google Query

Posted: Sun Mar 13, 2005 6:48 am
by anjanesh
This code get the total results of a query from Google.

Code: Select all

<?php
$q="Who is the richest man in the world";
$url="http://www.google.com/search?hl=en&q=".str_replace(" ","+",$q);
echo $url.'<br>';
$str=file_get_contents($url);
preg_match("/(Results.*?<b>1<\/b>.*?- <b>.*?<\/b> of about <b>)(.*?)(<\/b> for <b>)/is",$str,$matches);
while (list($k,$v)=each($matches))
 {
 	echo htmlentities($v).'<br>';
 }
?>
Outputs :

Code: Select all

http://www.google.com/search?hl=en&amp;q=Who+is+the+richest+man+in+the+world
Results &lt;b&gt;1&lt;/b&gt; - &lt;b&gt;10&lt;/b&gt; of about &lt;b&gt;1,370,000&lt;/b&gt; for &lt;b&gt;
Results &lt;b&gt;1&lt;/b&gt; - &lt;b&gt;10&lt;/b&gt; of about &lt;b&gt;
1,370,000
&lt;/b&gt; for &lt;b&gt;
Type http://www.google.com/search?hl=en&q=Wh ... +the+world in your browser and you'll find Results 1 - 10 of about 800,000.
I searched 1,370,000 in the google html page result but came up with nothing. I searched for 800,000 in $str but nothing.
Anyone know how this is happening ?
Thanks

Posted: Sun Mar 13, 2005 7:28 am
by Chris Corbyn
No I get:

Code: Select all

http://www.google.com/search?hl=en&q=Who+is+the+richest+man+in+the+world
Results <b>1</b> - <b>10</b> of about <b>779,000</b> for <b>
Results <b>1</b> - <b>10</b> of about <b>
779,000
</b> for <b>
Am I misunderstanding what you're after? Those figures are correct.

Why are you outputting the htmlentities of it by the way? :?:

Posted: Sun Mar 13, 2005 7:43 am
by anjanesh
All I want is the total number of results - nothing more. But when I cross check its totally different.
Now Im getting 769,000.
Im doing the same thing with MSN and its always giving the correct no: when checked against manually. Its only google that keeps giving me different results.
And BTW, the php code on my web host showed 1,370,000 and when I manually type here I get 800,000. I couldnt try on my localhost because it exceeds 30 sec.
Im outputting htmlentities just to show the real data from RegExp.

Posted: Sun Mar 13, 2005 8:07 am
by Chris Corbyn
So you want your regexp to just extract the number? Nothing else?

Posted: Sun Mar 13, 2005 8:30 am
by Chris Corbyn
Try this...

Code: Select all

<?php
$q="Who is the richest man in the world";
$url="http://www.google.com/search?hl=en&q=".str_replace(" ","+",$q);
echo '<b>'.$url.'</b> returns<br>';
$str=file_get_contents($url);
//preg_match("/(Results.*?<b>1<\/b>.*?- <b>.*?<\/b> of about <b>)(.*?)(<\/b> for <b>)/is",$str,$matches); //Original regexp
preg_match('/Results <b>\d+<\/b> - <b>\d+<\/b> of about <b>((\d|\,)+?)<\/b> for <b>/is', $str, $matches);
echo $matches[1].' results.';
?>
This IS the correct regexp, although, as you mention... it does sometimes return a wonky number. The only thing I can think is that google returns a wonky number sometimes :?

Posted: Sun Mar 13, 2005 8:43 am
by anjanesh
d11 - theres something wrong with the host I think.
Your code output on my host :

Code: Select all

http://www.google.com/search?hl=en&q=Who+is+the+richest+man+in+the+world returns
1,370,000 results.
localhost:

Code: Select all

http://www.google.com/search?hl=en&q=Who+is+the+richest+man+in+the+world returns
800,000 results.

Posted: Sun Mar 13, 2005 8:47 am
by Chris Corbyn
It's not your host, I think it's Google.

My own PC with returns 779,000 70% of the time and 1,440,000 the rest of the time. I'm thinking about it but I really dont see why PHP code or a regexp for that matter could be so inconsistent. I'm gonna keep testing it on google itself, without this script and if google throws a wobbler on me I'll put it down to that :lol:

I'm refreshing the google page over and over and the only result count I can ever get is 1,440,000. I'm completely mystified by this. I don't know how google works so the only thing I can think is that somehow the PHP script is reading the data midway through a results count, but I can't see how it's possible since google should parse this info on the server.

Posted: Sun Mar 13, 2005 8:54 am
by Chris Corbyn
Maybe google is doing it's monthly crawl? That does affect results quite a bit.

EDIT: Spent past 5 mins repeatedly running the google query in google itself and via the script. Both are now returning 1,440,000 100% of the time. I guess this was a minor glitch in the google system (probably due the bot doing it's crawl).

Posted: Sun Mar 13, 2005 9:08 am
by anjanesh
d11wtq wrote:but I can't see how it's possible since google should parse this info on the server.
Exactly - Im not able to see any problem with the code so far. After all the no: of results are the same for MSN, Yahoo and Altavista - I checked. Its just Google that googling around.
But Google may show different results based on location - they have a seprate search for each country - like google.co.in - so sometimes it may check for results locally too when given .com ?

Posted: Sun Mar 13, 2005 9:11 am
by Chris Corbyn
Hmm.. well whatever was causing it it's not the code.

Pretty weird however. :roll: