I have written a script that I pipe my Apache access logs through in order to import them to a Postgres Database. This is done on on the fly, and occasionally it won't log some entries. So far, I am only getting about ~97% of hits to my site. It only appears that some googlebot entries are not being logged through stdin.
Here is an excerpt from my httpd.conf:
[text]ErrorLog "/var/log/apache/httpd-error.log"
CustomLog "/var/log/apache/httpd-access.log" combined
CustomLog |/www/plog/log combined[/text]
Below is the short of it, which I am having problems with:
Code: Select all
$stdin = fopen('php://stdin', 'r');
$dbg_log = fopen('/www/plog/debug_log', 'a');
ob_implicit_flush(true);
while ($line = fgets($stdin)) {
fwrite ($dbg_log, $line);
/*
regex and database importing
*/
}
[text]66.249.65.108 - - [15/May/2010:07:43:54 -0400] "GET /robots.txt HTTP/1.1" 200 70 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.108 - - [15/May/2010:07:43:54 -0400] "GET /user/login?destination=comment%2Freply%2F7%23comment-form HTTP/1.1" 404 208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"[/text]
But the next two hits are logged:
[text]66.249.65.119 - - [15/May/2010:08:46:08 -0400] "GET /robots.txt HTTP/1.1" 200 70 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.119 - - [15/May/2010:08:46:08 -0400] "GET /whois.php HTTP/1.1" 200 1202 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"[/text]
At this point, I narrowed down the problem to Apache or the PHP code. I'm leaning towards it being a problem with my code. Any ideas or insights into this?
Thanks,
James