Page 1 of 3

phpSoCo, for counting php lines of code

Posted: Tue Nov 27, 2007 3:14 am
by s.dot
Requires PHP >= 5

phpSoCo is a library dedicated to the counting of source lines of code for php files. It's extensive set of features allows you to count the lines in the scripts you want, the way you want, and capture or display the results the way you want.

Features:

* Free. Released under the GNU GPL.
* Get source lines of code stats for a single file, a directory of files, or an entire directory tree.
* For those of you who embed HTML into your PHP scripts, you have a choice. You can count lines with the HTML included, or count only PHP lines of code.
* Return your stats as a multi-dimensional array or as HTML (defaults to return an array)
* If you choose to return HTML, you can get a full standards compliant web page, or just a chunk of HTML to embed in other pages.

All posts in this topic have been taken into account and this is the freshest code. However your input is still welcome and I actually invite it.

It'd be cool if you'd download the library and give it a test run and report your experiences with it, too!

Webpage: http://www.scottayy.com/phpsoco (download from there to avoid copy/paste errors)

The code:

Code: Select all

<?php

/**
 * phpSoCo - for counting php source lines of code
 *
 * Copyright (C) 2007 Scott Martin <smp_info[at]yahoo[dot]com>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.

 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details.

 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 *
 *
 *
 * phpsoco will allow you to evaluate any php script to determine the number of
 * source lines of code and characters in that script.  You can decide whether
 * or not to count HTML not encapsulated in php blocks as code (defaults to not
 * count it.. counts php only).
 *
 * Use one of the counting modes to determine how you want to count lines of
 * code in your scripts.
 *
 * The available counting modes are as follows:
 *
 * COUNT_SOFT
 *  - Lines are counted the way they appear in the code.  All comments, blank
 *    lines, and curly braces on a single line are counted.
 *
 * COUNT_NO_COMMENT
 *  - Lines are counted after all comments are removed.  Blank lines and curly
 *    braces on a single line are included in the line count.
 *
 * COUNT_NO_BLANK
 *  - Lines are counted after all blank lines are removed.  Comments and curly
 *    braces on a single line are included in the line count.
 *
 * COUNT_NO_BRACE
 *  - Lines are counted after all lines containing a single curly brace are
 *    removed.  Comments and blank lines are included in the line count.
 *
 * COUNT_HARD
 *  - Lines are counted after all comments, blank lines, and curly braces on a
 *    single line are removed.
 *
 * @Version 1.0.0 Alpha
 *     - Initial Release
 *
 * @Version 1.0.0 Beta
 *     - Added # style comments to be removed
 *     - If _PHPOnly is set to false, HTML style comments will be removed
 *     - replaced php opening/closing tags with str_ireplace() to avoid
 *       potential casing issues.
 *     - Added a _returnHTML class member to return HTML instead of passing it
 *       as a parameter to the getStats() method.
 *     - Added a setter method setReturnHTML() to set the desired output style
 *
 * @Version 1.0.0 Beta 2
 *     - Broke the main class up into 4 classes, phpsoco() base class,
 *       phpsoco_file(), phpsoco_directory() and phpsoco_directoryTree()
 *     - Allows for single directory, and a directory tree parsing for files
 *     - Base class phpsoco() method getStats() has different parameters and
 *       instantiates a new object based on the $input parameter.
 *
 * @Version 1.0.0 Beta 3
 *     - Moved methods _getPHPCode(), _stripWhiteSpace(), _stripComments(), and
 *       _getCharacterCount() from class phpsoco() to class phpsoco_file().
 *     - Got rid of the method _formatLineLength()
 *     - Got rid of phpsoco() properties _lineLength, _allowSingleSpaces, and
 *       _allowPHPTags and their correspending setter methods.
 *     - Used the php tokenizer functions for parsing PHP code instead of
 *       regexes.
 *     - Got rid of the counting lines by linelengths of hard code.  That was
 *       ugly.
 *     - Introduced counting modes COUNT_SOFT, COUNT_NO_COMMENT, COUNT_NO_BLANK
 *       , COUNT_NO_BRACE, and COUNT_HARD, and setCountMode() method to set the
 *       counting mode.
 *
 *
 * @author Scott Martin <smp_info[at]yahoo[dot]com>
 * @date started November 27th, 2007
 * @last updated November 30th, 2007
 */
class phpsoco
{
	/**
	 * Container for each files code source
	 */
	protected $_code;

	/**
	 * Evaluate PHP only code
	 */
	protected $_PHPOnly = true;

	/**
	 * Whether to return HTML or not.  The default return is an array
	 * Set to true to return HTML.
	 */
	protected $_returnHTML = false;

	/**
	 * If _returnHTML is set to true, this boolean determines whether to send a
	 * full HTML page with headers or not.
	 */
	protected $_returnHTMLFull = true;
	
	/**
	 * When the last line doesn't reach $_lineLength characters, how many decimal
	 * places should we round to?
	 */
	protected $_lineCountPrecision = 2;

	/**
	 * An array of the counting modes available, and their description
	 */
	protected $_countModes = array(
		'COUNT_SOFT' => 'Lines are counted the way they appear in the code.  All
			comments, blank lines, and curly braces on a single line are counted.',

		'COUNT_NO_COMMENT' => 'Lines are counted after all comments are removed.
			Blank lines and curly braces on a single line are included in the line
			count.',

		'COUNT_NO_BLANK' => 'Lines are counted after all blank lines are removed.
			Comments and curly braces on a single line are included in the line count.',

		'COUNT_NO_BRACE' => 'Lines are counted after all lines containing a single curly
			brace are removed.  Comments and blank lines are included in the line
			count.',

		'COUNT_HARD' => 'Lines are counted after all comments, blank lines, and curly
			braces on a single line are removed.'
	);

	/**
	 * Holds the count mode used
	 */
	protected $_countMode;

	/**
	 * Set to true to only include php code, false to include all code in file
	 * contents (html, css, js, etc)
	 * @param boolean $bool - true to only parse php code, false to parse all code
	 * @access public
	 */
	public function setPHPOnly($bool)
	{
		$this->_PHPOnly = (bool) $bool;
	}

	/**
	 * Set the number of decimal points to round the lines of code to
	 * @param integer $int
	 * @access public
	 */
	public function setLineCountPrecision($int)
	{
		$this->_lineCountPrecision = (int) $int;
	}

	/**
	 * Set to return HTML instead of the default array returned
	 * @param boolean $bool - true to return html, false to return array
	 * @access public
	 */
	public function setReturnHTML($bool)
	{
		$this->_returnHTML = (bool) $bool;
	}

	/**
	 * Set to false to return only the body of the html generated, or set to true
	 * to return a full HTML page with headers.
	 * @param boolean $bool
	 * @access public
	 */
	public function setReturnHTMLFull($bool)
	{
		$this->_returnHTMLFull = (bool) $bool;
	}

	/**
	 * Sets the count mode to use for counting
	 * @param string $countMode
	 * @access public
	 */
	public function setCountMode($countMode)
	{
		if (in_array(strtoupper($countMode), array_keys($this->_countModes)))
		{
			$this->_countMode = strtoupper($countMode);
		} else
		{
			trigger_error(
				'phpsoco: Invalid count mode.  Valid count modes are COUNT_SOFT,
				COUNT_NO_COMMENT, COUNT_NO_BLANK, COUNT_NO_BRACE, and COUNT_HARD.',
				E_USER_ERROR
			);
		}
	}

	/**
	 * Instantiates a new object, generating data, then returns it
	 * @param string $input - the file or directory to be evaluated
	 * @param boolean $recurse - if $input is a directory, whether or not to
	 * recurse the directory tree.
	 * @return mixed - array when _returnHTML is false, string when true
	 * @access public
	 */
	public function getStats($input, $recurse=false)
	{
		if ($this->_countMode == NULL)
		{
			trigger_error(
				'phpsoco: No counting mode found.  Use setCountMode() with a parameter of
				COUNT_SOFT, COUNT_NO_COMMENT, COUNT_NO_BLANK, COUNT_NO_BRACE, or
				COUNT_HARD.',
				E_USER_ERROR
			);
		}

		if (is_file($input))
		{
			$ret = new phpsoco_file($input);
		} elseif (is_dir($input) && !$recurse)
		{
			$ret = new phpsoco_directory($input);
		} elseif (is_dir($input) && $recurse)
		{
			$ret = new phpsoco_directoryTree($input);
		} else
		{
			trigger_error(
				'phpsoco: Could not evaluate input file or directory',
				E_USER_ERROR
			);
		}

		//set object properties
		$ret->_returnHTML = $this->_returnHTML;
		$ret->_returnHTMLFull = $this->_returnHTMLFull;
		$ret->_PHPOnly = $this->_PHPOnly;
		$ret->_countMode = $this->_countMode;
		$ret->_lineCountPrecision = $this->_lineCountPrecision;
		return $ret->_getStat();
	}

	/**
	 * Generates HTML for output
	 * @param array $arr - array of generated stats
	 * @return string
	 * @access protected
	 */
	protected function _generateHTML($arr)
	{
		$htmlOutput = '';
		foreach ($arr AS $k => $v)
		{
			//$htmlOutput .= '';
			if (is_array($v))
			{
				$htmlOutput .= '<p><strong>' . str_replace('_', ' ', $k) . '</strong></p>';
				$htmlOutput .=  $this->_generateHTML($v, false);
			} else
			{
				$htmlOutput .=  !is_numeric($k) ?
					str_replace('_', ' ', $k) . ': <strong>' . $v . '</strong><br>'
					:
					'<strong>' . $v . '</strong><br>';
			}
		}
		return $htmlOutput;
	}

	/**
	 * Generates a standards compliant HTML header for HTML output.
	 * @param string $type - the type of evaluation being done
	 * @access protected
	 */
	protected function _HTMLHeader($type)
	{
		return '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
		   "http://www.w3.org/TR/html4/loose.dtd">
		<html>
		<head>
		<title>phpSoCo ' . $type . ' Evaluation</title>
		<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
		<style type="text/css">
		body
		{
			background-color: #fff;
			color: #000;
			font-family: "courier new", courier, verdana, arial;
			font-size: 13px;
		}
		</style>
		</head>
		<body>';
	}

	/**
	 * Closes standards compliant full HTML output
	 * @ access protected
	 */
	protected function _HTMLFooter()
	{
		return '</body>
		</html>';
	}

}





/**
 * This is the class that deals with a single file.  It will evaluate only a
 * single file.  It can be used on single file input, directory input, or
 * directory tree input.
 */
class phpsoco_file extends phpsoco
{
	/**
	 * Holds the line count of the file after this class has prepared it
	 */
	protected $_lineCount;

	/**
	 * Holds the character count of the file's code
	 */
	protected $_characterCount;
	
	/**
	 * The name of the file to be evaluated
	 */
	private $_file;

	/**
	 * Constructor to set this class's $_file property, and parent class $_code
	 * property
	 * @param $file - string of file name
	 * @access protected
	 */
	protected function __construct($file)
	{
		$this->_file = $file;
		$this->_code = file_get_contents($file);
	}

	/**
	 * Runs through class methods generating stats and returns them.  Either as
	 * HTML or an array.
	 * @access protected
	 */
	protected function _getStat()
	{
		$this->_getPHPCode();
		
		//perform methods based on count mode
		switch ($this->_countMode)
		{
			case 'COUNT_SOFT':
			//do not modify the source code
			break;
			
			case 'COUNT_NO_COMMENT':
			$this->_stripComments();
			break;
			
			case 'COUNT_NO_BLANK':
			$this->_stripWhiteSpace();
			break;
			
			case 'COUNT_NO_BRACE':
			$this->_stripBraceOnly();
			break;
			
			case 'COUNT_HARD':
			$this->_stripComments();
			$this->_stripWhiteSpace();
			$this->_stripBraceOnly();
			break;
		}
		
		$this->_getCharacterCount();
		$this->_getLineCount();
		
		//write the return array
		$ret = array(
			'phpSoCo_Configuration' => array(
				'PHP_Code_Only' => $this->_PHPOnly ? 'Yes' : 'No',
				'Line_Count_Float_Precision' => $this->_lineCountPrecision,
				'Mode_Used' => 'Single File',
				'Count_Mode_Used' => $this->_countMode,
				'Count_Mode_Description' => $this->_countModes[$this->_countMode]
			),

			'Code_Stats' => array(
				'File_Evaluated' => $this->_file,
				'Lines_Of_Code' => $this->_lineCount,
				'Characters_In_Code' => $this->_characterCount
			)
		);

		//if HTML is the preferred method of return, return it
		if ($this->_returnHTML)
		{
			if ($this->_returnHTMLFull)
			{
				return print
					$this->_HTMLHeader('File') .
					$this->_generateHTML($ret) .
					$this->_HTMLFooter();
			} else
			{
				return print $this->_generateHTML($ret);
			}
		}

		//just return the array
		return $ret;
	}

	/**
	 * Gathers php blocks from code
	 * @access private
	 */
	private function _getPHPCode()
	{
		//target file may have different line
		//endings - replace them all with a unified \n
		$this->_code = str_replace(array("\r\n", "\r"), "\n", $this->_code);
		
		//if user wants php only, rebuild the source code from tokens
		//we'll never miss any php code this way
		if ($this->_PHPOnly)
		{
			//suppress errors in case the code has a parse error
			$tokens = @token_get_all($this->_code);

			//initialize blocks array
			$blocks = array();

			//loop through each token
			foreach ($tokens AS $token)
			{
				if (!is_string($token))
				{
					//token id and text
					list($id, $text) = $token;

					//if it's not HTML, capture it
					if ($id != T_INLINE_HTML)
					{
						$blocks[] = $text;
					}
				} else
				{
					//capture string
					$blocks[] = $token;
				}
			}

			//get each block of php into a string
			$final = array();
			$i = 0;
			foreach ($blocks AS $blockLine)
			{
				//would love to use PHP_EOL here, but files come from different systems
				//however, when rebuilding the code, this library will just use \n
				//to get some unification going on
				if (($blockLine != "\r\n") && ($blockLine != "\r"))
				{
					if (isset($final[$i]))
					{
						$final[$i] .= $blockLine;
					} else
					{
						$final[$i] = $blockLine;
					}
				} else
				{
					$i++;
					$final[$i] = "\n";
				}
			}

			//get all blocks into a single string
			$this->_code = '';
			foreach ($final AS $f)
			{
				$this->_code .= $f;
			}
		}
	}

	/**
	 * Strips the files contents of lines containing only white space
	 * @access private
	 */
	private function _stripWhiteSpace()
	{
		//get each line
		$lines = explode("\n", $this->_code);

		//output container
		$output = array();

		//loop
		foreach ($lines AS $line)
		{
			//add trimmed line to output
			$output[] = trim($line);
		}

		//set code
		$this->_code = implode("\n", array_filter($output));
	}

	/**
	 * Strips the files contents of comments
	 * @todo make sure the comment isn't inside of a string
	 * @access private
	 */
	private function _stripComments()
	{
		$tokens = token_get_all($this->_code);
		$this->_code = '';

		foreach ($tokens AS $token)
		{
			if (is_string($token))
			{
				$this->_code .= $token;
			} else
			{
				list($id, $text) = $token;

				switch ($id)
				{
					case T_COMMENT:
					case T_DOC_COMMENT:
					break;

					default:
					$this->_code .= $text;
					break;
				}
			}
		}
	}

	private function _stripBraceOnly()
	{
		$lines = explode("\n", $this->_code);
		$output = array();

		foreach ($lines AS $line)
		{
			if (trim($line) !== '{' && trim($line) != '}')
			{
				$output[] = $line;
			}
		}

		$this->_code = implode("\n", $output);
	}

	/**
	 * Counts the characters in the phpsoco formatted code
	 * @access private
	 */
	private function _getCharacterCount()
	{
		$this->_characterCount = strlen($this->_code);
	}

	/**
	 * Counts the number of lines in the phpsoco formatted code, taking into
	 * consideration the last line.  If it is not a "full" line, it will be
	 * represented as a float value
	 * @access private
	 */
	private function _getLineCount()
	{
		$this->_lineCount = count(explode("\n", $this->_code));
	}
}





/**
 * This class evaluated a single directory, and is also used in recursive
 * directory trees.  Each file found in the directory is passed to
 * phpsoco_file() for evaluating each individual file.
 */
class phpsoco_directory extends phpsoco_file
{
	/**
	 * Holds the directory to be evaluated
	 */
	private $_directory;

	/**
	 * Holds the array of php files found in the directory
	 */
	private $_foundFiles;

	/**
	 * Constructor method sets the directory to be used
	 * @param string $directory
	 * @access protected
	 */
	protected function __construct($directory)
	{
		if (substr($directory, -1) == DIRECTORY_SEPARATOR)
		{
			$this->_directory = substr($directory, 0, strlen($directory-1));
		} else
		{
			$this->_directory = $directory;
		}
	}

	/**
	 * Runs through methods in this class, ultimately returning stats.
	 * @access protected
	 * @return mixed - array or string (depending on settings)
	 */
	protected function _getStat()
	{
		//find the files
		$this->_findFiles();

		//if HTML is the preferred return method, return it
		if ($this->_returnHTML)
		{
			if ($this->_returnHTMLFull)
			{
				//return html with headers
				return print $this->_HTMLHeader('Directory') . $this->_generateHTML(
						$this->_compoundSingleStats(
							$this->_evaluateSingleFiles()
						)
					) . $this->_HTMLFooter();
			} else
			{
				//return html without headers
				return print $this->_generateHTML(
						$this->_compoundSingleStats(
							$this->_evaluateSingleFiles()
						)
					);
			}
		}

		//return the array
		return $this->_compoundSingleStats($this->_evaluateSingleFiles());
	}

	/**
	 * Grabs all of the php files found in this directory and stores the array in
	 * class member.
	 * @access private
	 */
	private function _findFiles()
	{
		$found = array();
		if ($handle = opendir($this->_directory))
		{
			while (($file = readdir($handle)) !== false)
			{
				if (($file != '.') && ($file != '..'))
				{
					if (is_file($this->_directory . DIRECTORY_SEPARATOR . $file) && 
						(strtolower(substr($file, -4)) == '.php')
					)
					{
						$found[] = $this->_directory . DIRECTORY_SEPARATOR . $file;
					}
				}
			}
		} else
		{
			trigger_error(
				'phpSoCo: Could not open directory (' . $this->_directory . ')',
				E_USER_WARNING
			);
		}
		
		$this->_foundFiles = $found;
	}

	/**
	 * Loops through each found file and creates a new phpsoco_file() object.
	 * Stores the returned array of stats in return value, then returns it.
	 * @return array
	 * @access private
	 */
	private function _evaluateSingleFiles()
	{
		//if we have files
		if (!empty($this->_foundFiles))
		{
			//loop through, gather single stats
			$ret = array();
			foreach ($this->_foundFiles AS $file)
			{
				$single = new phpsoco_file($file);
				$single->_returnHTML = false;
				$single->_PHPOnly = $this->_PHPOnly;
				$single->_lineCountPrecision = $this->_lineCountPrecision;
				$single->_countMode = $this->_countMode;
				$ret[] = $single->_getStat();
			}
		} else
		{
			//no files, return empty array
			$ret = array();
		}

		//return array of found file stats
		return $ret;
	}

	/**
	 * Compounds each files single stats into an array of stats for the
	 * directory.  This is really ugly at the moment.
	 * @param array $stats
	 * @return array
	 */
	private function _compoundSingleStats($stats)
	{
		//if we have stats
		if (!empty($stats))
		{
			//set up return array
			$ret['phpSoCo_Configuration'] = $stats[0]['phpSoCo_Configuration'];
			$ret['phpSoCo_Configuration']['Mode_Used'] = 'Directory';
			$ret['Code_Stats']['Directory_Evaluated'] = '';
			$ret['Code_Stats']['Files_Evaluated'] = array();
			$ret['Code_Stats']['Summary'] = array();
			$ret['Code_Stats']['Summary']['Lines_Of_Code'] = 0;
			$ret['Code_Stats']['Summary']['Characters_In_Code'] = 0;
			$ret['Code_Stats']['Average'] = array();
			$ret['Code_Stats']['Average']['Lines_Of_Code'] = 0;
			$ret['Code_Stats']['Average']['Characters_In_Code'] = 0;
			$ret['Code_Stats']['Single_File_Stats'] = array();

			//loop through each, setting and adding stats
			$i = 0;
			foreach ($stats AS $stat)
			{
				if ($i == 0)
				{
					$ret['Code_Stats']['Directory_Evaluated'] = implode(
						DIRECTORY_SEPARATOR,
						array_diff(
							explode(
								DIRECTORY_SEPARATOR, $stat['Code_Stats']['File_Evaluated']
							),
							array(
								array_pop(
									explode(
										DIRECTORY_SEPARATOR, $stat['Code_Stats']['File_Evaluated']
									)
								)
							)
						)
					);
				}

				$ret['Code_Stats']['Files_Evaluated'][] =
					$stat['Code_Stats']['File_Evaluated'];

				$ret['Code_Stats']['Summary']['Lines_Of_Code'] +=
					$stat['Code_Stats']['Lines_Of_Code'];

				$ret['Code_Stats']['Summary']['Characters_In_Code'] +=
					$stat['Code_Stats']['Characters_In_Code'];

				$ret['Code_Stats']['Single_File_Stats'][] = array(
					'File' => $stat['Code_Stats']['File_Evaluated'],
					'Lines_Of_Code' => $stat['Code_Stats']['Lines_Of_Code'],
					'Characters_In_Code' => $stat['Code_Stats']['Characters_In_Code']
				);

				$i++;
			}

			//here we will get directory averages
			$ret['Code_Stats']['Average']['Lines_Of_Code'] =
				round(
					$ret['Code_Stats']['Summary']['Lines_Of_Code']
					/
					count($stats), $this->_lineCountPrecision
				);

			$ret['Code_Stats']['Average']['Characters_In_Code']
				= round(
					$ret['Code_Stats']['Summary']['Characters_In_Code']
					/
					count($stats), $this->_lineCountPrecision
				);
		} else
		{
			//we have nothing
			return array();
		}

		//return
		return $ret;
	}
}





class phpsoco_directoryTree extends phpsoco_directory
{
	/**
	 * Holds the root directory of the directory tree
	 */
	private $_directory;

	/**
	 * Sets the root directory of the directory tree
	 * @param string $directory
	 * @access protected
	 */
	protected function __construct($directory)
	{
		if (substr($directory, -1) == DIRECTORY_SEPARATOR)
		{
			$this->_directory = substr($directory, 0, strlen($directory-1));
		} else
		{
			$this->_directory = $directory;
		}
	}

	/**
	 * Generates stats and returns them
	 * @access protected
	 */
	protected function _getStat()
	{
		//find the directories
		$directories = $this->_findDirectories($this->_directory);

		//unshift root directory onto the beginning
		array_unshift($directories, $this->_directory);

		//get the stats
		$ret = $this->_compoundDirectories($this->_evaluateDirectories($directories));

		//return
		if ($this->_returnHTML)
		{
			if ($this->_returnHTMLFull)
			{
				return print
					$this->_HTMLHeader('Directory Tree') .
					$this->_generateHTML($ret) .
					$this->_HTMLFooter();
			}

			return print $this->_generateHTML($ret);
		}

		return $ret;
	}

	/**
	 * Compounds directory stats into a single array
	 * @param array $stats
	 * @access private
	 */
	private function _compoundDirectories($stats)
	{
		if (!empty($stats))
		{
			//set up return array
			$ret['phpSoCo_Configuration'] = $stats[0]['phpSoCo_Configuration'];
			$ret['phpSoco_Configuration']['Mode_Used'] = 'Directory Tree';
			$ret['Code_Stats']['Directory_Tree_Evaluated'] = $this->_directory;
			$ret['Code_Stats']['Directories_Evaluated'] = array();
			$ret['Code_Stats']['Files_Evaluated'] = array();
			$ret['Code_Stats']['Summary'] = array();
			$ret['Code_Stats']['Summary']['Lines_Of_Code'] = 0;
			$ret['Code_Stats']['Summary']['Characters_In_Code'] = 0;
			$ret['Code_Stats']['Average_Per_Directory'] = array();
			$ret['Code_Stats']['Average_Per_Directory']['Lines_Of_Code'] = 0;
			$ret['Code_Stats']['Average_Per_Directory']['Characters_In_Code'] = 0;
			$ret['Code_Stats']['Average_Per_File'] = array();
			$ret['Code_Stats']['Average_Per_File']['Lines_Of_Code'] = 0;
			$ret['Code_Stats']['Average_Per_File']['Characters_In_Code'] = 0;
			$ret['Code_Stats']['Single_File_Stats'] = array();

			foreach ($stats AS $stat)
			{
				$ret['Code_Stats']['Directories_Evaluated'][] =
					$stat['Code_Stats']['Directory_Evaluated'];

				$ret['Code_Stats']['Files_Evaluated'] =
					array_merge(
						$ret['Code_Stats']['Files_Evaluated'],
						$stat['Code_Stats']['Files_Evaluated']
					);

				$ret['Code_Stats']['Summary']['Lines_Of_Code'] +=
					$stat['Code_Stats']['Summary']['Lines_Of_Code'];

				$ret['Code_Stats']['Summary']['Characters_In_Code'] +=
					$stat['Code_Stats']['Summary']['Characters_In_Code'];

				$ret['Code_Stats']['Single_File_Stats'] =
					array_merge(
						$ret['Code_Stats']['Single_File_Stats'],
						$stat['Code_Stats']['Single_File_Stats']
					);
			}

			//here we will get directory averages
			$ret['Code_Stats']['Average_Per_Directory']['Lines_Of_Code'] =
				round(
					$ret['Code_Stats']['Summary']['Lines_Of_Code']
					/
					count($stats), $this->_lineCountPrecision
				);

			$ret['Code_Stats']['Average_Per_Directory']['Characters_In_Code']
				= round(
					$ret['Code_Stats']['Summary']['Characters_In_Code']
					/
					count($stats),
					$this->_lineCountPrecision
				);

			//here we will get single file averages
			$ret['Code_Stats']['Average_Per_File']['Lines_Of_Code'] =
				round(
					$ret['Code_Stats']['Summary']['Lines_Of_Code']
					/
					count($ret['Code_Stats']['Files_Evaluated']),
					$this->_lineCountPrecision
				);

			$ret['Code_Stats']['Average_Per_File']['Characters_In_Code']
				= round(
					$ret['Code_Stats']['Summary']['Characters_In_Code']
					/
					count($ret['Code_Stats']['Files_Evaluated']),
					$this->_lineCountPrecision
				);
		} else
		{
			//we have nothing
			return array();
		}

		return $ret;
	}

	/**
	 * Evaluate each single directory
	 * @param array $directories
	 */
	private function _evaluateDirectories($directories)
	{
		//echo '<pre>';print_r($directories);echo'</pre>';
		$ret = array();
		foreach ($directories AS $directory)
		{
			//instantiate new directory object
			$dirObj = new phpsoco_directory($directory);

			//set object properties
			$dirObj->_returnHTML = false;
			$dirObj->_PHPOnly = $this->_PHPOnly;
			$dirObj->_lineCountPrecision = $this->_lineCountPrecision;
			$dirObj->_countMode = $this->_countMode;

			//add to ret array if not empty
			$stat = $dirObj->_getStat();
			if (!empty($stat))
			{
				$ret[] = $stat;
			}
		}

		return $ret;
	}

	/**
	 * Recursively find directories
	 * @param $start
	 * @access private
	 */
	private function _findDirectories($start)
	{
		$ret = array();
		$handle = opendir($start);
		while (($file = readdir($handle)) !== false)
		{
			$file = $start . DIRECTORY_SEPARATOR . $file;
			if (
				($file != $start . DIRECTORY_SEPARATOR . '.') &&
				($file != $start . DIRECTORY_SEPARATOR . '..')
			)
			{
				if (is_dir($file))
				{
					array_push($ret, $file);
					$ret = array_merge($ret, $this->_findDirectories($file));
				}
			}
		}

		return $ret;
	}
}
Evaluating a file

Code: Select all

<?php
require_once 'phpsoco/phpsoco.php';
$phpsoco = new phpsoco();

$phpsoco->setCountMode('COUNT_SOFT');
$phpsoco->getStats('c:\apache2\apache2\htdocs\phpsoco.php');
Results:

Code: Select all

Array
(
    [phpSoCo_Configuration] => Array
        (
            [PHP_Code_Only] => Yes
            [Line_Count_Float_Precision] => 2
            [Mode_Used] => Single File
            [Count_Mode_Used] => COUNT_SOFT
            [Count_Mode_Description] => Lines are counted the way they appear in the code.  All
			comments, blank lines, and curly braces on a single line are counted.
        )

    [Code_Stats] => Array
        (
            [File_Evaluated] => phpsoco-version-1.0.0.php
            [Lines_Of_Code] => 1041
            [Characters_In_Code] => 26123
        )

)

Posted: Tue Nov 27, 2007 3:48 am
by Christopher
Add a directory iterator that will go through a directory tree of files and give individual counts, maybe directory totals, and grand totals...

Posted: Tue Nov 27, 2007 3:59 am
by s.dot
hecks yeahhh, that would be tight.

I like how i handle the lines of code. Letting the evaluator decide what is and isn't code. Takes away from { on a single line, spaces consuming lots of characters, and lots of other stuff.

Posted: Tue Nov 27, 2007 7:38 am
by aaronhall
I love the directory-wide and file-by-file stats idea... would be great to extend this and have a whitelist of file extentions (.js, .css, .html, etc.)

Great script! I'll be using this at some point

Posted: Tue Nov 27, 2007 10:44 am
by Jenk
I'd prefer if the count was based on the use of ';' and function/class/if/loops open and closing tags as one line.

Well, if I'm completely honest, I don't care about LOC (Lines Of Code) :) but thought it would be more sensible to count based on actual commands/lines than to count arbitrarily on length/number of characters. :)

Posted: Tue Nov 27, 2007 11:04 am
by John Cartwright
A couple of comments I have,

Code: Select all

private function _formatLineLength()
   {
      $this->_code = chunk_split($this->_code, $this->_lineLength);
   }
Can potentially create parse errors since it might chop php right in the middle of a php command, or similar. This probably require a smart enough regex to know not to chop in certain places.

Code: Select all

private function _handleTags()
   {
      if (!$this->_allowPHPTags)
      {
         $this->_code = str_replace(array('<?php', '<?', '?>'), '', $this->_code);
      }
   }
Should probably be using str_ireplace for case insensitivity also.

Otherwise, cool library.

Posted: Tue Nov 27, 2007 5:29 pm
by s.dot
Jcart wrote:A couple of comments I have,

Code: Select all

private function _formatLineLength()
   {
      $this->_code = chunk_split($this->_code, $this->_lineLength);
   }
Can potentially create parse errors since it might chop php right in the middle of a php command, or similar. This probably require a smart enough regex to know not to chop in certain places.
How so? The code is just being evaluated in a string (never executed). I don't see how hard chops would matter, and chopping right at the xth character will give a "full" line of code.
Jcart wrote:

Code: Select all

private function _handleTags()
   {
      if (!$this->_allowPHPTags)
      {
         $this->_code = str_replace(array('<?php', '<?', '?>'), '', $this->_code);
      }
   }
Should probably be using str_ireplace for case insensitivity also.
Absolutely!

Right now I'm working on a base class, and extending into single file class (for evaluating a single php file, like above), single directory class (all php files in one directory), and directory iterator class (all php file in a directory tree starting at root directory gven). Boredom is bliss!

I'm also adding support for asp style tags (ew.. but I suppose some people use it).

<script language="php">...</script> will be tough, because </script> can denote the ending of javascript if _PHPOnly is set to false.. so I'll save that one for later.

Posted: Tue Nov 27, 2007 7:42 pm
by John Cartwright
$code = 'echo somefunction($foobar);';

Applying chunk split on this will produce invalid code if it splits it in the middle of a function name..
Maybe I'm not understanding how why you even need to chunk_split in the first place, since you are never outputting the code anyways.

Posted: Tue Nov 27, 2007 8:33 pm
by s.dot
It's my way of evaluating a full line of source code. In my library, one full line = 80 characters of hard code.

This eliminates guessing of white space, one character lines ({ and }), and some other stuff. I like it because it breaks it down into the absolutely most compact full line of code you can possibly have :) (even if it would be broken when evaluated).

Posted: Tue Nov 27, 2007 10:27 pm
by John Cartwright
It's my way of evaluating a full line of source code. In my library, one full line = 80 characters of hard code.

This eliminates guessing of white space, one character lines ({ and }), and some other stuff. I like it because it breaks it down into the absolutely most compact full line of code you can possibly have Smile (even if it would be broken when evaluated).
Ah, gotcha! I see where your coming from, I just assume that the snippet below would report it as only 1 line of code. Obviously this is a bit of an exaggeration, since I'm sure nobody would set it 10 characters per chunk,

Code: Select all

private function _formatLineLength()
   {      
      //$this->_code = chunk_split($this->_code, $this->_lineLength);
      $this->_code = chunk_split('echo foobar($foo);', 10);
   }
When determining line count I wouldn't necessarily strip all whitespace, only blank lines. It just seems natural to do it this way.. as in if I'm in my editor and I'm looking at 100 lines of code, I'd expect your library to display ~100 lines (depending on blank lines) and not how many lines fill of code fill 80 characters. For instance,

Code: Select all

$foo = 'echo 1;
echo 2;

echo 3;

echo blehblah(4)';

//remove blank lines from code
$lines = array_filter(explode(PHP_EOL, $foo));
echo 'Line count: '. count($lines); //returns 4

//convert to string to perform regex
$source = implode(PHP_EOL, $lines);
Apologies if I'm not understanding, perhaps just a difference in opinion :)

Posted: Tue Nov 27, 2007 10:30 pm
by John Cartwright
Jenk wrote:I'd prefer if the count was based on the use of ';' and function/class/if/loops open and closing tags as one line.
Excellent idea! Just noticed this post after re-reading it :) Might take a bit of regex redbull to be as accurate with it as possible though, for instance for loops shouldn't be considered multiple lines of code

Posted: Wed Nov 28, 2007 9:43 am
by feyd
Jcart wrote:
Jenk wrote:I'd prefer if the count was based on the use of ';' and function/class/if/loops open and closing tags as one line.
Excellent idea! Just noticed this post after re-reading it :) Might take a bit of regex redbull to be as accurate with it as possible though, for instance for loops shouldn't be considered multiple lines of code
One should also take into account that the last line of a code block (ending in ?>) does not require a semicolon terminator.

Posted: Wed Nov 28, 2007 2:39 pm
by s.dot
Yeah, I'm working on an updated version (1.1.1) that splits into different classes. A base class, one for single file, one for directory, and one for directory tree recursion. Right now they'll only use the line count I have implemented into the example above.

However, if there's enough interest after I show that updated version.. I've been thinking about allowing the user to decide how to count the lines for the next version update (1.2.1).

* COUNT_HARD
- The way I've already implemented it. Breaks the code down into a single string of code and hard chops at a full 80 characters per line. I call this the hard count because it doesn't really get any more compact than this.

* COUNT_REAL
- The way the programmer programmed it. All lines (minus comments) including blank lines.

* COUNT_REAL_NO_BLANK
- The way the programmer programmed it. All lines (minus comments) not including blank lines.

* COUNT_TERMINATION
- Semicolon delimited commands count as 1 line. Minus for loops and the optional missing ; before the close block ?>

* COUNT_ALL
- An average of the above four counts.

I believe with all of those options, you couldn't ask for a more suitable source lines of code library. :) And perhaps the COUNT_ALL average could become some sort of source line counting standard. :P hehehe

Posted: Wed Nov 28, 2007 3:37 pm
by Mordred
Parsing PHP code with regexps is [s]hopelessly[/s] seriously flawed.
Use proper parrsing via the tokenizer functions.

count_hard (and therefore count_all) are useless stats, might as well give character count or i dunno, measure whitespace to text ratio :)

and btw your db code sucks :)
just kidding (literally, got my kid in the other hand, so excuse tha lame typing :P )

(but seriously, why the thin wrapper, I doubt you ever used even half the methods)

Posted: Wed Nov 28, 2007 4:03 pm
by s.dot
Mordred wrote:Parsing PHP code with regexps is [s]hopelessly[/s] seriously flawed.
Use proper parrsing via the tokenizer functions.
That's a good idea. So far I'm having success with regex's, though.
Mordred wrote:count_hard (and therefore count_all) are useless stats, might as well give character count or i dunno, measure whitespace to text ratio :)
You've got to be kidding? I enjoy count_hard stats because it's the most compact. Gives me a MINIMUM boundary that I can definately say my code is at least X lines by X characters long.

And I don't think the count_all would be useless, in fact, it'd be the most useful.
and btw your db code sucks :)
just kidding (literally, got my kid in the other hand, so excuse tha lame typing :P )

(but seriously, why the thin wrapper, I doubt you ever used even half the methods)
I've never even used it. It was a boredom thing, much like this. :)