Page 1 of 1

Log file details required !! Help

Posted: Tue Mar 25, 2003 8:09 pm
by nainil
Hi,

I need to get some details from a LOG file as to which
keywords are used...

I have attached the LOG file.

I heed the details of the following:-

1. Ip Address
2. Date
3. The page that was requested
4. KeyWords used by the Search Engine [ Google / Yahoo seperated ] (ordered BY DATE) in a tabular FORMAT

I am using the following CODE.. But , it doesnt work .
Am I MISSING SOMETHING ?????



<?php

$flag = "false";

if ($fp = fopen("web.log","r")) {

while (!feof($fp)) {

$line = fgets($fp,1024);

if (ereg("\$",$line)) {
$flag="false";
}
if ($flag=="true") {
$linearray = explode(" ",$line);
$ip = $linearray[0];
$date = $linearray[1];
$keywords = $linearray[2];
echo "$ip $date $keyowrds";
}
if (ereg("\^",$line)) {
$flag = "true";
}

}
} else {
echo "Could not open file";
}

?>



Could any one please help me

Thanks...

LOG FILE
===========

202.63.171.3 - - [14/Mar/2003:16:58:59 +0000] "GET /web-hosting-mumbai/index.php HTTP/1.1" 200 12300 "http://www.google.com/search?sourceid=n ... +in+mumbai" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
202.63.171.3 - - [14/Mar/2003:17:10:57 +0000] "GET / HTTP/1.1" 200 26176 "http://dmoz.org/Regional/Asia/India/Mah ... /Internet/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
216.239.46.105 - - [14/Mar/2003:19:46:04 +0000] "GET /payment-gateway-mumbai/ HTTP/1.0" 302 221 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
195.121.244.222 - - [14/Mar/2003:21:35:14 +0000] "GET /paymentmode.php HTTP/1.1" 200 9547 "http://www.google.nl/search?q=mode+indi ... rt=10&sa=N" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
202.63.171.3 - - [15/Mar/2003:03:01:56 +0000] "GET /webstats/awstats.pl?framename=mainright HTTP/1.1" 200 24612 "http://services.eliteral.com/webstats/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
66.196.65.11 - - [15/Mar/2003:04:53:00 +0000] "GET /robots.txt HTTP/1.0" 200 0 "-" "Mozilla/5.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)"
205.245.191.186 - - [14/Mar/2003:17:10:17 +0000] "GET /web-hosting-mumbai/colocation.php HTTP/1.1" 200 9654 "http://search.yahoo.com/bin/search?p=mu ... ta+centers" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; YComp 5.0.0.0)"
216.239.46.105 - - [14/Mar/2003:19:46:04 +0000] "GET /payment-gateway-mumbai/ HTTP/1.0" 302 221 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
195.121.244.222 - - [14/Mar/2003:21:35:14 +0000] "GET /paymentmode.php HTTP/1.1" 200 9547 "http://www.google.nl/search?q=mode+indi ... rt=10&sa=N" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"

Best Regards,
Nainil Chheda.
http://services.eliteral.com/

Posted: Wed Mar 26, 2003 7:19 am
by volka
might be easier to use regular expressions (might be not ;) )

Code: Select all

<html><body><pre>
<?php
$source = ...;
$fp = fopen($source, 'rb') or die('file not found');
$pattern = '!';
$pattern .= '(\d+.\d+.\d+.\d+)'; // match and fetch ip-addr
$pattern .= '[^[]*'; // skip ' - - '
$pattern .= '\[([^]]+)\]'; // fetch everything inside [ ] , the date
$pattern .= '\s+'; 
$pattern .= '"\S+\s(\S*)[^"]+"'; // match "<method> <doc> <proto>", fetch <doc>;
$pattern .= '[^"]+'; // skip everthing until the next "
$pattern .= '"([^"]+)"'; // fetch the next double-quoted string
$pattern .= '!';

while($line = fgets($fp, 512))
{
	preg_match($pattern, $line, $res);
	print_r($res);	
}
?></pre></body></html>