Getting Total No Of Results from Google Query

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Getting Total No Of Results from Google Query

Post by anjanesh »

This code get the total results of a query from Google.

Code: Select all

<?php
$q="Who is the richest man in the world";
$url="http://www.google.com/search?hl=en&q=".str_replace(" ","+",$q);
echo $url.'<br>';
$str=file_get_contents($url);
preg_match("/(Results.*?<b>1<\/b>.*?- <b>.*?<\/b> of about <b>)(.*?)(<\/b> for <b>)/is",$str,$matches);
while (list($k,$v)=each($matches))
 {
 	echo htmlentities($v).'<br>';
 }
?>
Outputs :

Code: Select all

http://www.google.com/search?hl=en&amp;q=Who+is+the+richest+man+in+the+world
Results &lt;b&gt;1&lt;/b&gt; - &lt;b&gt;10&lt;/b&gt; of about &lt;b&gt;1,370,000&lt;/b&gt; for &lt;b&gt;
Results &lt;b&gt;1&lt;/b&gt; - &lt;b&gt;10&lt;/b&gt; of about &lt;b&gt;
1,370,000
&lt;/b&gt; for &lt;b&gt;
Type http://www.google.com/search?hl=en&q=Wh ... +the+world in your browser and you'll find Results 1 - 10 of about 800,000.
I searched 1,370,000 in the google html page result but came up with nothing. I searched for 800,000 in $str but nothing.
Anyone know how this is happening ?
Thanks
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

No I get:

Code: Select all

http://www.google.com/search?hl=en&q=Who+is+the+richest+man+in+the+world
Results <b>1</b> - <b>10</b> of about <b>779,000</b> for <b>
Results <b>1</b> - <b>10</b> of about <b>
779,000
</b> for <b>
Am I misunderstanding what you're after? Those figures are correct.

Why are you outputting the htmlentities of it by the way? :?:
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Post by anjanesh »

All I want is the total number of results - nothing more. But when I cross check its totally different.
Now Im getting 769,000.
Im doing the same thing with MSN and its always giving the correct no: when checked against manually. Its only google that keeps giving me different results.
And BTW, the php code on my web host showed 1,370,000 and when I manually type here I get 800,000. I couldnt try on my localhost because it exceeds 30 sec.
Im outputting htmlentities just to show the real data from RegExp.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

So you want your regexp to just extract the number? Nothing else?
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Try this...

Code: Select all

<?php
$q="Who is the richest man in the world";
$url="http://www.google.com/search?hl=en&q=".str_replace(" ","+",$q);
echo '<b>'.$url.'</b> returns<br>';
$str=file_get_contents($url);
//preg_match("/(Results.*?<b>1<\/b>.*?- <b>.*?<\/b> of about <b>)(.*?)(<\/b> for <b>)/is",$str,$matches); //Original regexp
preg_match('/Results <b>\d+<\/b> - <b>\d+<\/b> of about <b>((\d|\,)+?)<\/b> for <b>/is', $str, $matches);
echo $matches[1].' results.';
?>
This IS the correct regexp, although, as you mention... it does sometimes return a wonky number. The only thing I can think is that google returns a wonky number sometimes :?
Last edited by Chris Corbyn on Sun Mar 13, 2005 8:44 am, edited 1 time in total.
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Post by anjanesh »

d11 - theres something wrong with the host I think.
Your code output on my host :

Code: Select all

http://www.google.com/search?hl=en&q=Who+is+the+richest+man+in+the+world returns
1,370,000 results.
localhost:

Code: Select all

http://www.google.com/search?hl=en&q=Who+is+the+richest+man+in+the+world returns
800,000 results.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

It's not your host, I think it's Google.

My own PC with returns 779,000 70% of the time and 1,440,000 the rest of the time. I'm thinking about it but I really dont see why PHP code or a regexp for that matter could be so inconsistent. I'm gonna keep testing it on google itself, without this script and if google throws a wobbler on me I'll put it down to that :lol:

I'm refreshing the google page over and over and the only result count I can ever get is 1,440,000. I'm completely mystified by this. I don't know how google works so the only thing I can think is that somehow the PHP script is reading the data midway through a results count, but I can't see how it's possible since google should parse this info on the server.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Maybe google is doing it's monthly crawl? That does affect results quite a bit.

EDIT: Spent past 5 mins repeatedly running the google query in google itself and via the script. Both are now returning 1,440,000 100% of the time. I guess this was a minor glitch in the google system (probably due the bot doing it's crawl).
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Post by anjanesh »

d11wtq wrote:but I can't see how it's possible since google should parse this info on the server.
Exactly - Im not able to see any problem with the code so far. After all the no: of results are the same for MSN, Yahoo and Altavista - I checked. Its just Google that googling around.
But Google may show different results based on location - they have a seprate search for each country - like google.co.in - so sometimes it may check for results locally too when given .com ?
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Hmm.. well whatever was causing it it's not the code.

Pretty weird however. :roll:
Post Reply