parsing log for a specific condition and removing duplicates

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
maksymyuk
Forum Newbie
Posts: 2
Joined: Sat Feb 18, 2006 1:40 am

parsing log for a specific condition and removing duplicates

Post by maksymyuk »

feyd | Please use

Code: Select all

and

Code: Select all

tags where appropriate when posting code. Read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]


Hi,

I'm new here and also just started learning PHP and already have a big problem (at least it's big for me  )
Please help.

What I need is to parse the log file for a specific condition: I need to get all URLs from the log that are in "" (for example: [b]http://forum.novd.ru/index.php?showtopic=56816&st=15[/b]) 
and remove duplicates.

Here are 3 sample records from the access log (2 an 3 lines contain same URL so I need only one then):

Code: Select all

213.148.171.244 - - [15/Feb/2006:22:49:52 -0800] "GET /index.jsp HTTP/1.1" 200 98238 "http://forum.novd.ru/index.php?showtopic=56816&st=15" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; ru) Opera 8.50"
68.105.89.121 - - [15/Feb/2006:22:50:05 -0800] "GET /index.jsp HTTP/1.1" 200 98161 "http://forums.us.comp.com/supportforums/board/message?board.id=dim_upghw&message.id=78270" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1"
68.105.89.121 - - [15/Feb/2006:22:50:07 -0800] "GET /index.jsp HTTP/1.1" 200 98161 "http://forums.us.comp.com/supportforums/board/message?board.id=dim_upghw&message.id=78270" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1"

So far I've got to the point where I can grab all lines from the log file which contain http word...

Code: Select all

<?

$in="/usr/local/apache/domlogs/my_log.log";
$out = "final_log.txt";

$fp_in = fopen($in, 'r') or die("No such file!\n");
$fp_out = fopen($out, 'w') or die("Cannot open a new file!\n");

while (!feof($fp_in)) {
	$data = fgets($fp_in, 1024);
	if ($data == false) {
   	    break;
	}
	
	if (preg_match ("/http/", $data)) {
	    fwrite($fp_out, $data) or die("Cannot write!\n");
	} 
}

fclose($fp_out);
fclose($fp_in);

?>
I'd appreciate any suggestion!


feyd | Please use

Code: Select all

and

Code: Select all

tags where appropriate when posting code. Read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

viewtopic.php?t=36790 may be of interest.
maksymyuk
Forum Newbie
Posts: 2
Joined: Sat Feb 18, 2006 1:40 am

Post by maksymyuk »

feyd wrote:viewtopic.php?t=36790 may be of interest.
From the first look it's a bit hard for beginner but I like to dig. Probably I'll spend a few days for this :) but anyway thanks feyd, I appreciate it!
MinDFreeZ
Forum Commoner
Posts: 58
Joined: Tue Feb 14, 2006 12:28 pm
Location: Lake Mary, FL

Post by MinDFreeZ »

I'm slowly figuring things out... but I kinda need something like this.. but kind of prevent it, not just kill it... like.. as things get logged to my file, I want it to first check to see if the $email is already in there.. if so, don't log.. probably easy but I don't know how =X
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

you sure you're in the right topic MinDFreeZ?
MinDFreeZ
Forum Commoner
Posts: 58
Joined: Tue Feb 14, 2006 12:28 pm
Location: Lake Mary, FL

Post by MinDFreeZ »

heh, not really... lol

but seems like he's trying to find certain words in a file and kill duplicates of them....
i'm trying to find certain words in a file before logging to it, and if its already there, not log at all....
so im not killing dupes, im preventing them....

similar? or no...? lol
mnemonik23
Forum Newbie
Posts: 3
Joined: Sat Feb 18, 2006 1:39 am

Post by mnemonik23 »

ok, let's forget about duplicates. Just to get the URLs from that are in ""
Post Reply