Statistical summary script

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
midnite
Forum Newbie
Posts: 14
Joined: Sun Nov 06, 2011 2:58 pm

Statistical summary script

Post by midnite »

Hi there;
I am trying to to write a PHP script that examines and extracts the data from a log file and produces a statistical summary of the contents: total number of file requests in the month, number of file requests from the articles directory, TOTAL bandwidth consumed by the file requests over the month and the number of requests that resulted in 404 status errors as well as a list of the filenames that produced these 404 errors.
I've managed to extract the file and find out the total bandwidth consumed over the month using the following code:

Code: Select all

	<?php	

        // Opens the file "april" in read mode only
	$fileLog = fopen("april.txt", "r");
        
        // Variable to count the total bytes used during the month
	$totalBytes = 0;

        // While not the end of file get and echo the data line by line
	while (!feof($fileLog)) {
	$line = fgets($fileLog, 1024);
	

        // Explodes the data with a space	
        $details = explode(' ', $line);

        // Adds all the bytes and stores them in $totalBytes
	$totalBytes = $totalBytes +(int)$details[8];
	}
        // Adds commas every 3 digits
        $totalBytes = number_format($totalBytes);
	echo "<h3>April Statistics</h3>";

        // echoes the total bytes
	echo "<p>The TOTAL bandwidth consumed used: 8.43MB ($totalBytes Bytes)</p>";
	fclose($fileLog);
	
	?>
But now im a bit stuck on how to extract the rest of the contents??!!
Any suggestion or example on how to achieve this..please.
Many thanks
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Statistical summary script

Post by Celauran »

Without seeing at least a sample of the file, it's hard for us to know what 'the rest of the contents' is. What else is in your $details array?
midnite
Forum Newbie
Posts: 14
Joined: Sun Nov 06, 2011 2:58 pm

Re: Statistical summary script

Post by midnite »

You right I completely forgot sorry about that...here is a bit of the file:

103.239.234.105 -- [2007-04-01 00:42:21] "GET articles/learn_PHP_basics HTTP/1.0" 200 12729 "Mozilla/4.0"
207.3.35.52 -- [2007-04-01 01:24:42] "GET index.php HTTP/1.0" 200 11411 "Mozilla/4.0"
51.4.190.113 -- [2007-04-01 02:07:04] "GET articles/php_classes_and_oop HTTP/1.0" 200 7674 "MSIE 7.0"
216.134.52.171 -- [2007-04-01 02:49:25] "GET articles/learn_PHP_basics HTTP/1.0" 200 12729 "MSIE 7.0"
97.212.128.181 -- [2007-04-01 03:31:46] "GET articles/using_regex_with_php HTTP/1.0" 200 12127 "Mozilla/4.0"
49.174.77.138 -- [2007-04-01 04:14:07] "GET about/contact.php HTTP/1.0" 200 7554 "Mozilla/4.0"
174.118.145.203 -- [2007-04-01 10:35:18] "GET not/available HTTP/1.0" 404 0 "Mozilla/4.0"
210.172.255.245 -- [2007-04-02 04:56:28] "GET articles/not/a/page HTTP/1.0" 404 0 "MSIE 7.0"
189.110.162.205 -- [2007-04-02 06:21:11] "GET typo/in/path HTTP/1.0" 404 0 "MSIE 7.0"
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Statistical summary script

Post by Celauran »

What did you mean about extracting the rest of the contents? Everything in each line should be in your $details array thanks to your explode() call. IP addresses would be $details[0], user agent would be $details[9], etc.
midnite
Forum Newbie
Posts: 14
Joined: Sun Nov 06, 2011 2:58 pm

Re: Statistical summary script

Post by midnite »

Right so i've managed to calculate the total bandwidth and the total files requested with the following code:

Code: Select all

	<?php	
	// Opens the file "april" in read mode only
	$fileLog = fopen("april.txt", "r");
	
	// Variable to count the total bytes used during the month
	$totalBytes = 0;
	$linecount = 0;
	
	// While not the end of file get and echo the data line by line
	while (!feof($fileLog)) {
	$line = fgets($fileLog, 1024);
	
	$linecount = $linecount + substr_count($line, "\n");	
	
	// Explodes the data with a space 
        $details = explode(' ', $line);
	
	// Adds all the bytes and stores them in $totalBytes
	$totalBytes = $totalBytes +(isset($details[8])?intval($details[8]):0);
	 
	}
	fclose($fileLog);
	
	// Adds commas every 3 digits
	$totalBytes = number_format($totalBytes);
	echo "<h3>April Statistics</h3>";
	echo "<p>The Total files requested: $linecount ";
	
	// echoes the total bytes
	echo "<p>The Total bandwidth used: 8.43MB ($totalBytes Bytes)</p>";
	
	?>
But now I still have to extract the number of file requests from the articles directory and the number of requests that resulted in 404 status errors as well as a list of the filenames that produced these 404 errors, and this is where I'm getting really stuck lol, any ideas, help or example would be deeply appreciated.

And i have to extract all the info from a file containing 1000 lines of the following data:

103.239.234.105 -- [2007-04-01 00:42:21] "GET articles/learn_PHP_basics HTTP/1.0" 200 12729 "Mozilla/4.0"
207.3.35.52 -- [2007-04-01 01:24:42] "GET index.php HTTP/1.0" 200 11411 "Mozilla/4.0"
51.4.190.113 -- [2007-04-01 02:07:04] "GET articles/php_classes_and_oop HTTP/1.0" 200 7674 "MSIE 7.0"
216.134.52.171 -- [2007-04-01 02:49:25] "GET articles/learn_PHP_basics HTTP/1.0" 200 12729 "MSIE 7.0"
97.212.128.181 -- [2007-04-01 03:31:46] "GET articles/using_regex_with_php HTTP/1.0" 200 12127 "Mozilla/4.0"
49.174.77.138 -- [2007-04-01 04:14:07] "GET about/contact.php HTTP/1.0" 200 7554 "Mozilla/4.0"
174.118.145.203 -- [2007-04-01 10:35:18] "GET not/available HTTP/1.0" 404 0 "Mozilla/4.0"
210.172.255.245 -- [2007-04-02 04:56:28] "GET articles/not/a/page HTTP/1.0" 404 0 "MSIE 7.0"
189.110.162.205 -- [2007-04-02 06:21:11] "GET typo/in/path HTTP/1.0" 404 0 "MSIE 7.0"
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Statistical summary script

Post by Celauran »

Code: Select all

<?php

// Opens the file "april" in read mode only
$fileLog = fopen("foo.txt", "r");

// Variable to count the total bytes used during the month
$totalBytes = 0;
$linecount = 0;
$file_not_found = array();

// While not the end of file get and echo the data line by line
while (!feof($fileLog))
{
    $line = fgets($fileLog, 1024);

    // fgets returns on newline, so you could just use $linecount++
    $linecount = $linecount + substr_count($line, "\n");

    // Explodes the data with a space
    $details = explode(' ', $line);

    // Adds all the bytes and stores them in $totalBytes
    $totalBytes = $totalBytes + (isset($details[8]))
                        ? intval($details[8])
                        : 0;
    if ($details[7] == "404")
    {
        $file_not_found[] = $details[6];
    }
}
fclose($fileLog);

// Adds commas every 3 digits
$totalBytes = number_format($totalBytes);
echo "<h3>April Statistics</h3>";
echo "<p>The Total files requested: $linecount</p>";

// echoes the total bytes
echo "<p>The Total bandwidth used: 8.43MB ($totalBytes Bytes)</p>";

echo "<p>Total 404s: " . count($file_not_found) . "</p>";
echo "<ul>";
foreach ($file_not_found as $error)
{
    echo "<li>{$error}</li>";
}
echo "</ul>";

?>
midnite
Forum Newbie
Posts: 14
Joined: Sun Nov 06, 2011 2:58 pm

Re: Statistical summary script

Post by midnite »

Thank you so much Celauran that really helped I'm still going to try to improve it since its echoing a list with 102 errors I just need it to echo 3 nevertheless thank you so much I'm really appreciated
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Statistical summary script

Post by Celauran »

Do you mean you just want it to echo the first three 404s?
midnite
Forum Newbie
Posts: 14
Joined: Sun Nov 06, 2011 2:58 pm

Re: Statistical summary script

Post by midnite »

right now i'm trying the for loop to echo the only 3 errors on the file which are:
not/available
articles/not/a/page
typo/in/path
instead of echoing 102 times
not/available
articles/not/a/page
typo/in/path
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Statistical summary script

Post by Celauran »

I've modified it slightly. It now counts the total number of 404 entries but stores the individual error messages in a separate array without duplicates.

Code: Select all

<?php

// Opens the file "april" in read mode only
$fileLog = fopen("foo.txt", "r");

// Variable to count the total bytes used during the month
$totalBytes = 0;
$linecount = 0;
$file_not_found = 0;
$errors = array();

// While not the end of file get and echo the data line by line
while (!feof($fileLog))
{
    $line = fgets($fileLog, 1024);

    // fgets returns on newline, so you could just use $linecount++
    $linecount = $linecount + substr_count($line, "\n");

    // Explodes the data with a space
    $details = explode(' ', $line);

    // Adds all the bytes and stores them in $totalBytes
    $totalBytes = $totalBytes + (isset($details[8]))
                        ? intval($details[8])
                        : 0;
    if ($details[7] == "404")
    {
        $file_not_found++;
        if (!in_array($details[6], $errors))
        {
            $errors[] = $details[6];
        }
    }
}
fclose($fileLog);

// Adds commas every 3 digits
$totalBytes = number_format($totalBytes);
echo "<h3>April Statistics</h3>";
echo "<p>The Total files requested: $linecount</p>";

// echoes the total bytes
echo "<p>The Total bandwidth used: 8.43MB ($totalBytes Bytes)</p>";

echo "<p>Total 404s: {$file_not_found}</p>";
echo "<ul>";
foreach ($errors as $error)
{
    echo "<li>{$error}</li>";
}
echo "</ul>";

?>
midnite
Forum Newbie
Posts: 14
Joined: Sun Nov 06, 2011 2:58 pm

Re: Statistical summary script

Post by midnite »

Right got it i've also changed the array number to 5 so now its outputting this:

April Statistics

The Total files requested: 1000

The Total bandwidth used: 8.43MB (8,430,877 Bytes)

Total 404 Errors: 102

The following files produced errors:

not/available
articles/not/a/page
typo/in/path

Just one last thing could you explain to me sort of what you did here:

Code: Select all

if ($details[7] == "404") {
			$file_not_found++;
			if(!in_array($details[5], $errors)) {
				$errors[] = $details[5];
			}		
		}
otherwise there is no point asking for help if I don't learn.

To finish I do appreciate all the help i probably would've got there but not today probably next week lol thank you so much.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Statistical summary script

Post by Celauran »

midnite wrote:Just one last thing could you explain to me sort of what you did here:

Code: Select all

if ($details[7] == "404") {
			$file_not_found++;
			if(!in_array($details[5], $errors)) {
				$errors[] = $details[5];
			}		
		}
otherwise there is no point asking for help if I don't learn.
Based on the structure of the Apache logs, we know that the eighth "item" in each line ($details[7]) will be the status code, so we check if it equals 404. If it does, we increment the counter to reflect the additional 404 error. Next, we check the page that generated the 404 error, which is the sixth "item" in our list ($details[5]) and check the error array ($errors) to see if that page has already been added. If it has, there's nothing else to be done. If not, we'll add it to the list of pages resulting in 404s.
midnite
Forum Newbie
Posts: 14
Joined: Sun Nov 06, 2011 2:58 pm

Re: Statistical summary script

Post by midnite »

got it, now makes more sense thank you Celauran.
Post Reply