phpSoCo, for counting php lines of code

Coding Critique is the place to post source code for peer review by other members of DevNetwork. Any kind of code can be posted. Code posted does not have to be limited to PHP. All members are invited to contribute constructive criticism with the goal of improving the code. Posted code should include some background information about it and what areas you specifically would like help with.

Popular code excerpts may be moved to "Code Snippets" by the moderators.

Moderator: General Moderators

User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

phpSoCo, for counting php lines of code

Post by s.dot »

Requires PHP >= 5

phpSoCo is a library dedicated to the counting of source lines of code for php files. It's extensive set of features allows you to count the lines in the scripts you want, the way you want, and capture or display the results the way you want.

Features:

* Free. Released under the GNU GPL.
* Get source lines of code stats for a single file, a directory of files, or an entire directory tree.
* For those of you who embed HTML into your PHP scripts, you have a choice. You can count lines with the HTML included, or count only PHP lines of code.
* Return your stats as a multi-dimensional array or as HTML (defaults to return an array)
* If you choose to return HTML, you can get a full standards compliant web page, or just a chunk of HTML to embed in other pages.

All posts in this topic have been taken into account and this is the freshest code. However your input is still welcome and I actually invite it.

It'd be cool if you'd download the library and give it a test run and report your experiences with it, too!

Webpage: http://www.scottayy.com/phpsoco (download from there to avoid copy/paste errors)

The code:

Code: Select all

<?php

/**
 * phpSoCo - for counting php source lines of code
 *
 * Copyright (C) 2007 Scott Martin <smp_info[at]yahoo[dot]com>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.

 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details.

 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 *
 *
 *
 * phpsoco will allow you to evaluate any php script to determine the number of
 * source lines of code and characters in that script.  You can decide whether
 * or not to count HTML not encapsulated in php blocks as code (defaults to not
 * count it.. counts php only).
 *
 * Use one of the counting modes to determine how you want to count lines of
 * code in your scripts.
 *
 * The available counting modes are as follows:
 *
 * COUNT_SOFT
 *  - Lines are counted the way they appear in the code.  All comments, blank
 *    lines, and curly braces on a single line are counted.
 *
 * COUNT_NO_COMMENT
 *  - Lines are counted after all comments are removed.  Blank lines and curly
 *    braces on a single line are included in the line count.
 *
 * COUNT_NO_BLANK
 *  - Lines are counted after all blank lines are removed.  Comments and curly
 *    braces on a single line are included in the line count.
 *
 * COUNT_NO_BRACE
 *  - Lines are counted after all lines containing a single curly brace are
 *    removed.  Comments and blank lines are included in the line count.
 *
 * COUNT_HARD
 *  - Lines are counted after all comments, blank lines, and curly braces on a
 *    single line are removed.
 *
 * @Version 1.0.0 Alpha
 *     - Initial Release
 *
 * @Version 1.0.0 Beta
 *     - Added # style comments to be removed
 *     - If _PHPOnly is set to false, HTML style comments will be removed
 *     - replaced php opening/closing tags with str_ireplace() to avoid
 *       potential casing issues.
 *     - Added a _returnHTML class member to return HTML instead of passing it
 *       as a parameter to the getStats() method.
 *     - Added a setter method setReturnHTML() to set the desired output style
 *
 * @Version 1.0.0 Beta 2
 *     - Broke the main class up into 4 classes, phpsoco() base class,
 *       phpsoco_file(), phpsoco_directory() and phpsoco_directoryTree()
 *     - Allows for single directory, and a directory tree parsing for files
 *     - Base class phpsoco() method getStats() has different parameters and
 *       instantiates a new object based on the $input parameter.
 *
 * @Version 1.0.0 Beta 3
 *     - Moved methods _getPHPCode(), _stripWhiteSpace(), _stripComments(), and
 *       _getCharacterCount() from class phpsoco() to class phpsoco_file().
 *     - Got rid of the method _formatLineLength()
 *     - Got rid of phpsoco() properties _lineLength, _allowSingleSpaces, and
 *       _allowPHPTags and their correspending setter methods.
 *     - Used the php tokenizer functions for parsing PHP code instead of
 *       regexes.
 *     - Got rid of the counting lines by linelengths of hard code.  That was
 *       ugly.
 *     - Introduced counting modes COUNT_SOFT, COUNT_NO_COMMENT, COUNT_NO_BLANK
 *       , COUNT_NO_BRACE, and COUNT_HARD, and setCountMode() method to set the
 *       counting mode.
 *
 *
 * @author Scott Martin <smp_info[at]yahoo[dot]com>
 * @date started November 27th, 2007
 * @last updated November 30th, 2007
 */
class phpsoco
{
	/**
	 * Container for each files code source
	 */
	protected $_code;

	/**
	 * Evaluate PHP only code
	 */
	protected $_PHPOnly = true;

	/**
	 * Whether to return HTML or not.  The default return is an array
	 * Set to true to return HTML.
	 */
	protected $_returnHTML = false;

	/**
	 * If _returnHTML is set to true, this boolean determines whether to send a
	 * full HTML page with headers or not.
	 */
	protected $_returnHTMLFull = true;
	
	/**
	 * When the last line doesn't reach $_lineLength characters, how many decimal
	 * places should we round to?
	 */
	protected $_lineCountPrecision = 2;

	/**
	 * An array of the counting modes available, and their description
	 */
	protected $_countModes = array(
		'COUNT_SOFT' => 'Lines are counted the way they appear in the code.  All
			comments, blank lines, and curly braces on a single line are counted.',

		'COUNT_NO_COMMENT' => 'Lines are counted after all comments are removed.
			Blank lines and curly braces on a single line are included in the line
			count.',

		'COUNT_NO_BLANK' => 'Lines are counted after all blank lines are removed.
			Comments and curly braces on a single line are included in the line count.',

		'COUNT_NO_BRACE' => 'Lines are counted after all lines containing a single curly
			brace are removed.  Comments and blank lines are included in the line
			count.',

		'COUNT_HARD' => 'Lines are counted after all comments, blank lines, and curly
			braces on a single line are removed.'
	);

	/**
	 * Holds the count mode used
	 */
	protected $_countMode;

	/**
	 * Set to true to only include php code, false to include all code in file
	 * contents (html, css, js, etc)
	 * @param boolean $bool - true to only parse php code, false to parse all code
	 * @access public
	 */
	public function setPHPOnly($bool)
	{
		$this->_PHPOnly = (bool) $bool;
	}

	/**
	 * Set the number of decimal points to round the lines of code to
	 * @param integer $int
	 * @access public
	 */
	public function setLineCountPrecision($int)
	{
		$this->_lineCountPrecision = (int) $int;
	}

	/**
	 * Set to return HTML instead of the default array returned
	 * @param boolean $bool - true to return html, false to return array
	 * @access public
	 */
	public function setReturnHTML($bool)
	{
		$this->_returnHTML = (bool) $bool;
	}

	/**
	 * Set to false to return only the body of the html generated, or set to true
	 * to return a full HTML page with headers.
	 * @param boolean $bool
	 * @access public
	 */
	public function setReturnHTMLFull($bool)
	{
		$this->_returnHTMLFull = (bool) $bool;
	}

	/**
	 * Sets the count mode to use for counting
	 * @param string $countMode
	 * @access public
	 */
	public function setCountMode($countMode)
	{
		if (in_array(strtoupper($countMode), array_keys($this->_countModes)))
		{
			$this->_countMode = strtoupper($countMode);
		} else
		{
			trigger_error(
				'phpsoco: Invalid count mode.  Valid count modes are COUNT_SOFT,
				COUNT_NO_COMMENT, COUNT_NO_BLANK, COUNT_NO_BRACE, and COUNT_HARD.',
				E_USER_ERROR
			);
		}
	}

	/**
	 * Instantiates a new object, generating data, then returns it
	 * @param string $input - the file or directory to be evaluated
	 * @param boolean $recurse - if $input is a directory, whether or not to
	 * recurse the directory tree.
	 * @return mixed - array when _returnHTML is false, string when true
	 * @access public
	 */
	public function getStats($input, $recurse=false)
	{
		if ($this->_countMode == NULL)
		{
			trigger_error(
				'phpsoco: No counting mode found.  Use setCountMode() with a parameter of
				COUNT_SOFT, COUNT_NO_COMMENT, COUNT_NO_BLANK, COUNT_NO_BRACE, or
				COUNT_HARD.',
				E_USER_ERROR
			);
		}

		if (is_file($input))
		{
			$ret = new phpsoco_file($input);
		} elseif (is_dir($input) && !$recurse)
		{
			$ret = new phpsoco_directory($input);
		} elseif (is_dir($input) && $recurse)
		{
			$ret = new phpsoco_directoryTree($input);
		} else
		{
			trigger_error(
				'phpsoco: Could not evaluate input file or directory',
				E_USER_ERROR
			);
		}

		//set object properties
		$ret->_returnHTML = $this->_returnHTML;
		$ret->_returnHTMLFull = $this->_returnHTMLFull;
		$ret->_PHPOnly = $this->_PHPOnly;
		$ret->_countMode = $this->_countMode;
		$ret->_lineCountPrecision = $this->_lineCountPrecision;
		return $ret->_getStat();
	}

	/**
	 * Generates HTML for output
	 * @param array $arr - array of generated stats
	 * @return string
	 * @access protected
	 */
	protected function _generateHTML($arr)
	{
		$htmlOutput = '';
		foreach ($arr AS $k => $v)
		{
			//$htmlOutput .= '';
			if (is_array($v))
			{
				$htmlOutput .= '<p><strong>' . str_replace('_', ' ', $k) . '</strong></p>';
				$htmlOutput .=  $this->_generateHTML($v, false);
			} else
			{
				$htmlOutput .=  !is_numeric($k) ?
					str_replace('_', ' ', $k) . ': <strong>' . $v . '</strong><br>'
					:
					'<strong>' . $v . '</strong><br>';
			}
		}
		return $htmlOutput;
	}

	/**
	 * Generates a standards compliant HTML header for HTML output.
	 * @param string $type - the type of evaluation being done
	 * @access protected
	 */
	protected function _HTMLHeader($type)
	{
		return '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
		   "http://www.w3.org/TR/html4/loose.dtd">
		<html>
		<head>
		<title>phpSoCo ' . $type . ' Evaluation</title>
		<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
		<style type="text/css">
		body
		{
			background-color: #fff;
			color: #000;
			font-family: "courier new", courier, verdana, arial;
			font-size: 13px;
		}
		</style>
		</head>
		<body>';
	}

	/**
	 * Closes standards compliant full HTML output
	 * @ access protected
	 */
	protected function _HTMLFooter()
	{
		return '</body>
		</html>';
	}

}





/**
 * This is the class that deals with a single file.  It will evaluate only a
 * single file.  It can be used on single file input, directory input, or
 * directory tree input.
 */
class phpsoco_file extends phpsoco
{
	/**
	 * Holds the line count of the file after this class has prepared it
	 */
	protected $_lineCount;

	/**
	 * Holds the character count of the file's code
	 */
	protected $_characterCount;
	
	/**
	 * The name of the file to be evaluated
	 */
	private $_file;

	/**
	 * Constructor to set this class's $_file property, and parent class $_code
	 * property
	 * @param $file - string of file name
	 * @access protected
	 */
	protected function __construct($file)
	{
		$this->_file = $file;
		$this->_code = file_get_contents($file);
	}

	/**
	 * Runs through class methods generating stats and returns them.  Either as
	 * HTML or an array.
	 * @access protected
	 */
	protected function _getStat()
	{
		$this->_getPHPCode();
		
		//perform methods based on count mode
		switch ($this->_countMode)
		{
			case 'COUNT_SOFT':
			//do not modify the source code
			break;
			
			case 'COUNT_NO_COMMENT':
			$this->_stripComments();
			break;
			
			case 'COUNT_NO_BLANK':
			$this->_stripWhiteSpace();
			break;
			
			case 'COUNT_NO_BRACE':
			$this->_stripBraceOnly();
			break;
			
			case 'COUNT_HARD':
			$this->_stripComments();
			$this->_stripWhiteSpace();
			$this->_stripBraceOnly();
			break;
		}
		
		$this->_getCharacterCount();
		$this->_getLineCount();
		
		//write the return array
		$ret = array(
			'phpSoCo_Configuration' => array(
				'PHP_Code_Only' => $this->_PHPOnly ? 'Yes' : 'No',
				'Line_Count_Float_Precision' => $this->_lineCountPrecision,
				'Mode_Used' => 'Single File',
				'Count_Mode_Used' => $this->_countMode,
				'Count_Mode_Description' => $this->_countModes[$this->_countMode]
			),

			'Code_Stats' => array(
				'File_Evaluated' => $this->_file,
				'Lines_Of_Code' => $this->_lineCount,
				'Characters_In_Code' => $this->_characterCount
			)
		);

		//if HTML is the preferred method of return, return it
		if ($this->_returnHTML)
		{
			if ($this->_returnHTMLFull)
			{
				return print
					$this->_HTMLHeader('File') .
					$this->_generateHTML($ret) .
					$this->_HTMLFooter();
			} else
			{
				return print $this->_generateHTML($ret);
			}
		}

		//just return the array
		return $ret;
	}

	/**
	 * Gathers php blocks from code
	 * @access private
	 */
	private function _getPHPCode()
	{
		//target file may have different line
		//endings - replace them all with a unified \n
		$this->_code = str_replace(array("\r\n", "\r"), "\n", $this->_code);
		
		//if user wants php only, rebuild the source code from tokens
		//we'll never miss any php code this way
		if ($this->_PHPOnly)
		{
			//suppress errors in case the code has a parse error
			$tokens = @token_get_all($this->_code);

			//initialize blocks array
			$blocks = array();

			//loop through each token
			foreach ($tokens AS $token)
			{
				if (!is_string($token))
				{
					//token id and text
					list($id, $text) = $token;

					//if it's not HTML, capture it
					if ($id != T_INLINE_HTML)
					{
						$blocks[] = $text;
					}
				} else
				{
					//capture string
					$blocks[] = $token;
				}
			}

			//get each block of php into a string
			$final = array();
			$i = 0;
			foreach ($blocks AS $blockLine)
			{
				//would love to use PHP_EOL here, but files come from different systems
				//however, when rebuilding the code, this library will just use \n
				//to get some unification going on
				if (($blockLine != "\r\n") && ($blockLine != "\r"))
				{
					if (isset($final[$i]))
					{
						$final[$i] .= $blockLine;
					} else
					{
						$final[$i] = $blockLine;
					}
				} else
				{
					$i++;
					$final[$i] = "\n";
				}
			}

			//get all blocks into a single string
			$this->_code = '';
			foreach ($final AS $f)
			{
				$this->_code .= $f;
			}
		}
	}

	/**
	 * Strips the files contents of lines containing only white space
	 * @access private
	 */
	private function _stripWhiteSpace()
	{
		//get each line
		$lines = explode("\n", $this->_code);

		//output container
		$output = array();

		//loop
		foreach ($lines AS $line)
		{
			//add trimmed line to output
			$output[] = trim($line);
		}

		//set code
		$this->_code = implode("\n", array_filter($output));
	}

	/**
	 * Strips the files contents of comments
	 * @todo make sure the comment isn't inside of a string
	 * @access private
	 */
	private function _stripComments()
	{
		$tokens = token_get_all($this->_code);
		$this->_code = '';

		foreach ($tokens AS $token)
		{
			if (is_string($token))
			{
				$this->_code .= $token;
			} else
			{
				list($id, $text) = $token;

				switch ($id)
				{
					case T_COMMENT:
					case T_DOC_COMMENT:
					break;

					default:
					$this->_code .= $text;
					break;
				}
			}
		}
	}

	private function _stripBraceOnly()
	{
		$lines = explode("\n", $this->_code);
		$output = array();

		foreach ($lines AS $line)
		{
			if (trim($line) !== '{' && trim($line) != '}')
			{
				$output[] = $line;
			}
		}

		$this->_code = implode("\n", $output);
	}

	/**
	 * Counts the characters in the phpsoco formatted code
	 * @access private
	 */
	private function _getCharacterCount()
	{
		$this->_characterCount = strlen($this->_code);
	}

	/**
	 * Counts the number of lines in the phpsoco formatted code, taking into
	 * consideration the last line.  If it is not a "full" line, it will be
	 * represented as a float value
	 * @access private
	 */
	private function _getLineCount()
	{
		$this->_lineCount = count(explode("\n", $this->_code));
	}
}





/**
 * This class evaluated a single directory, and is also used in recursive
 * directory trees.  Each file found in the directory is passed to
 * phpsoco_file() for evaluating each individual file.
 */
class phpsoco_directory extends phpsoco_file
{
	/**
	 * Holds the directory to be evaluated
	 */
	private $_directory;

	/**
	 * Holds the array of php files found in the directory
	 */
	private $_foundFiles;

	/**
	 * Constructor method sets the directory to be used
	 * @param string $directory
	 * @access protected
	 */
	protected function __construct($directory)
	{
		if (substr($directory, -1) == DIRECTORY_SEPARATOR)
		{
			$this->_directory = substr($directory, 0, strlen($directory-1));
		} else
		{
			$this->_directory = $directory;
		}
	}

	/**
	 * Runs through methods in this class, ultimately returning stats.
	 * @access protected
	 * @return mixed - array or string (depending on settings)
	 */
	protected function _getStat()
	{
		//find the files
		$this->_findFiles();

		//if HTML is the preferred return method, return it
		if ($this->_returnHTML)
		{
			if ($this->_returnHTMLFull)
			{
				//return html with headers
				return print $this->_HTMLHeader('Directory') . $this->_generateHTML(
						$this->_compoundSingleStats(
							$this->_evaluateSingleFiles()
						)
					) . $this->_HTMLFooter();
			} else
			{
				//return html without headers
				return print $this->_generateHTML(
						$this->_compoundSingleStats(
							$this->_evaluateSingleFiles()
						)
					);
			}
		}

		//return the array
		return $this->_compoundSingleStats($this->_evaluateSingleFiles());
	}

	/**
	 * Grabs all of the php files found in this directory and stores the array in
	 * class member.
	 * @access private
	 */
	private function _findFiles()
	{
		$found = array();
		if ($handle = opendir($this->_directory))
		{
			while (($file = readdir($handle)) !== false)
			{
				if (($file != '.') && ($file != '..'))
				{
					if (is_file($this->_directory . DIRECTORY_SEPARATOR . $file) && 
						(strtolower(substr($file, -4)) == '.php')
					)
					{
						$found[] = $this->_directory . DIRECTORY_SEPARATOR . $file;
					}
				}
			}
		} else
		{
			trigger_error(
				'phpSoCo: Could not open directory (' . $this->_directory . ')',
				E_USER_WARNING
			);
		}
		
		$this->_foundFiles = $found;
	}

	/**
	 * Loops through each found file and creates a new phpsoco_file() object.
	 * Stores the returned array of stats in return value, then returns it.
	 * @return array
	 * @access private
	 */
	private function _evaluateSingleFiles()
	{
		//if we have files
		if (!empty($this->_foundFiles))
		{
			//loop through, gather single stats
			$ret = array();
			foreach ($this->_foundFiles AS $file)
			{
				$single = new phpsoco_file($file);
				$single->_returnHTML = false;
				$single->_PHPOnly = $this->_PHPOnly;
				$single->_lineCountPrecision = $this->_lineCountPrecision;
				$single->_countMode = $this->_countMode;
				$ret[] = $single->_getStat();
			}
		} else
		{
			//no files, return empty array
			$ret = array();
		}

		//return array of found file stats
		return $ret;
	}

	/**
	 * Compounds each files single stats into an array of stats for the
	 * directory.  This is really ugly at the moment.
	 * @param array $stats
	 * @return array
	 */
	private function _compoundSingleStats($stats)
	{
		//if we have stats
		if (!empty($stats))
		{
			//set up return array
			$ret['phpSoCo_Configuration'] = $stats[0]['phpSoCo_Configuration'];
			$ret['phpSoCo_Configuration']['Mode_Used'] = 'Directory';
			$ret['Code_Stats']['Directory_Evaluated'] = '';
			$ret['Code_Stats']['Files_Evaluated'] = array();
			$ret['Code_Stats']['Summary'] = array();
			$ret['Code_Stats']['Summary']['Lines_Of_Code'] = 0;
			$ret['Code_Stats']['Summary']['Characters_In_Code'] = 0;
			$ret['Code_Stats']['Average'] = array();
			$ret['Code_Stats']['Average']['Lines_Of_Code'] = 0;
			$ret['Code_Stats']['Average']['Characters_In_Code'] = 0;
			$ret['Code_Stats']['Single_File_Stats'] = array();

			//loop through each, setting and adding stats
			$i = 0;
			foreach ($stats AS $stat)
			{
				if ($i == 0)
				{
					$ret['Code_Stats']['Directory_Evaluated'] = implode(
						DIRECTORY_SEPARATOR,
						array_diff(
							explode(
								DIRECTORY_SEPARATOR, $stat['Code_Stats']['File_Evaluated']
							),
							array(
								array_pop(
									explode(
										DIRECTORY_SEPARATOR, $stat['Code_Stats']['File_Evaluated']
									)
								)
							)
						)
					);
				}

				$ret['Code_Stats']['Files_Evaluated'][] =
					$stat['Code_Stats']['File_Evaluated'];

				$ret['Code_Stats']['Summary']['Lines_Of_Code'] +=
					$stat['Code_Stats']['Lines_Of_Code'];

				$ret['Code_Stats']['Summary']['Characters_In_Code'] +=
					$stat['Code_Stats']['Characters_In_Code'];

				$ret['Code_Stats']['Single_File_Stats'][] = array(
					'File' => $stat['Code_Stats']['File_Evaluated'],
					'Lines_Of_Code' => $stat['Code_Stats']['Lines_Of_Code'],
					'Characters_In_Code' => $stat['Code_Stats']['Characters_In_Code']
				);

				$i++;
			}

			//here we will get directory averages
			$ret['Code_Stats']['Average']['Lines_Of_Code'] =
				round(
					$ret['Code_Stats']['Summary']['Lines_Of_Code']
					/
					count($stats), $this->_lineCountPrecision
				);

			$ret['Code_Stats']['Average']['Characters_In_Code']
				= round(
					$ret['Code_Stats']['Summary']['Characters_In_Code']
					/
					count($stats), $this->_lineCountPrecision
				);
		} else
		{
			//we have nothing
			return array();
		}

		//return
		return $ret;
	}
}





class phpsoco_directoryTree extends phpsoco_directory
{
	/**
	 * Holds the root directory of the directory tree
	 */
	private $_directory;

	/**
	 * Sets the root directory of the directory tree
	 * @param string $directory
	 * @access protected
	 */
	protected function __construct($directory)
	{
		if (substr($directory, -1) == DIRECTORY_SEPARATOR)
		{
			$this->_directory = substr($directory, 0, strlen($directory-1));
		} else
		{
			$this->_directory = $directory;
		}
	}

	/**
	 * Generates stats and returns them
	 * @access protected
	 */
	protected function _getStat()
	{
		//find the directories
		$directories = $this->_findDirectories($this->_directory);

		//unshift root directory onto the beginning
		array_unshift($directories, $this->_directory);

		//get the stats
		$ret = $this->_compoundDirectories($this->_evaluateDirectories($directories));

		//return
		if ($this->_returnHTML)
		{
			if ($this->_returnHTMLFull)
			{
				return print
					$this->_HTMLHeader('Directory Tree') .
					$this->_generateHTML($ret) .
					$this->_HTMLFooter();
			}

			return print $this->_generateHTML($ret);
		}

		return $ret;
	}

	/**
	 * Compounds directory stats into a single array
	 * @param array $stats
	 * @access private
	 */
	private function _compoundDirectories($stats)
	{
		if (!empty($stats))
		{
			//set up return array
			$ret['phpSoCo_Configuration'] = $stats[0]['phpSoCo_Configuration'];
			$ret['phpSoco_Configuration']['Mode_Used'] = 'Directory Tree';
			$ret['Code_Stats']['Directory_Tree_Evaluated'] = $this->_directory;
			$ret['Code_Stats']['Directories_Evaluated'] = array();
			$ret['Code_Stats']['Files_Evaluated'] = array();
			$ret['Code_Stats']['Summary'] = array();
			$ret['Code_Stats']['Summary']['Lines_Of_Code'] = 0;
			$ret['Code_Stats']['Summary']['Characters_In_Code'] = 0;
			$ret['Code_Stats']['Average_Per_Directory'] = array();
			$ret['Code_Stats']['Average_Per_Directory']['Lines_Of_Code'] = 0;
			$ret['Code_Stats']['Average_Per_Directory']['Characters_In_Code'] = 0;
			$ret['Code_Stats']['Average_Per_File'] = array();
			$ret['Code_Stats']['Average_Per_File']['Lines_Of_Code'] = 0;
			$ret['Code_Stats']['Average_Per_File']['Characters_In_Code'] = 0;
			$ret['Code_Stats']['Single_File_Stats'] = array();

			foreach ($stats AS $stat)
			{
				$ret['Code_Stats']['Directories_Evaluated'][] =
					$stat['Code_Stats']['Directory_Evaluated'];

				$ret['Code_Stats']['Files_Evaluated'] =
					array_merge(
						$ret['Code_Stats']['Files_Evaluated'],
						$stat['Code_Stats']['Files_Evaluated']
					);

				$ret['Code_Stats']['Summary']['Lines_Of_Code'] +=
					$stat['Code_Stats']['Summary']['Lines_Of_Code'];

				$ret['Code_Stats']['Summary']['Characters_In_Code'] +=
					$stat['Code_Stats']['Summary']['Characters_In_Code'];

				$ret['Code_Stats']['Single_File_Stats'] =
					array_merge(
						$ret['Code_Stats']['Single_File_Stats'],
						$stat['Code_Stats']['Single_File_Stats']
					);
			}

			//here we will get directory averages
			$ret['Code_Stats']['Average_Per_Directory']['Lines_Of_Code'] =
				round(
					$ret['Code_Stats']['Summary']['Lines_Of_Code']
					/
					count($stats), $this->_lineCountPrecision
				);

			$ret['Code_Stats']['Average_Per_Directory']['Characters_In_Code']
				= round(
					$ret['Code_Stats']['Summary']['Characters_In_Code']
					/
					count($stats),
					$this->_lineCountPrecision
				);

			//here we will get single file averages
			$ret['Code_Stats']['Average_Per_File']['Lines_Of_Code'] =
				round(
					$ret['Code_Stats']['Summary']['Lines_Of_Code']
					/
					count($ret['Code_Stats']['Files_Evaluated']),
					$this->_lineCountPrecision
				);

			$ret['Code_Stats']['Average_Per_File']['Characters_In_Code']
				= round(
					$ret['Code_Stats']['Summary']['Characters_In_Code']
					/
					count($ret['Code_Stats']['Files_Evaluated']),
					$this->_lineCountPrecision
				);
		} else
		{
			//we have nothing
			return array();
		}

		return $ret;
	}

	/**
	 * Evaluate each single directory
	 * @param array $directories
	 */
	private function _evaluateDirectories($directories)
	{
		//echo '<pre>';print_r($directories);echo'</pre>';
		$ret = array();
		foreach ($directories AS $directory)
		{
			//instantiate new directory object
			$dirObj = new phpsoco_directory($directory);

			//set object properties
			$dirObj->_returnHTML = false;
			$dirObj->_PHPOnly = $this->_PHPOnly;
			$dirObj->_lineCountPrecision = $this->_lineCountPrecision;
			$dirObj->_countMode = $this->_countMode;

			//add to ret array if not empty
			$stat = $dirObj->_getStat();
			if (!empty($stat))
			{
				$ret[] = $stat;
			}
		}

		return $ret;
	}

	/**
	 * Recursively find directories
	 * @param $start
	 * @access private
	 */
	private function _findDirectories($start)
	{
		$ret = array();
		$handle = opendir($start);
		while (($file = readdir($handle)) !== false)
		{
			$file = $start . DIRECTORY_SEPARATOR . $file;
			if (
				($file != $start . DIRECTORY_SEPARATOR . '.') &&
				($file != $start . DIRECTORY_SEPARATOR . '..')
			)
			{
				if (is_dir($file))
				{
					array_push($ret, $file);
					$ret = array_merge($ret, $this->_findDirectories($file));
				}
			}
		}

		return $ret;
	}
}
Evaluating a file

Code: Select all

<?php
require_once 'phpsoco/phpsoco.php';
$phpsoco = new phpsoco();

$phpsoco->setCountMode('COUNT_SOFT');
$phpsoco->getStats('c:\apache2\apache2\htdocs\phpsoco.php');
Results:

Code: Select all

Array
(
    [phpSoCo_Configuration] => Array
        (
            [PHP_Code_Only] => Yes
            [Line_Count_Float_Precision] => 2
            [Mode_Used] => Single File
            [Count_Mode_Used] => COUNT_SOFT
            [Count_Mode_Description] => Lines are counted the way they appear in the code.  All
			comments, blank lines, and curly braces on a single line are counted.
        )

    [Code_Stats] => Array
        (
            [File_Evaluated] => phpsoco-version-1.0.0.php
            [Lines_Of_Code] => 1041
            [Characters_In_Code] => 26123
        )

)
Last edited by s.dot on Sat Dec 01, 2007 3:15 am, edited 1 time in total.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

Add a directory iterator that will go through a directory tree of files and give individual counts, maybe directory totals, and grand totals...
(#10850)
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

hecks yeahhh, that would be tight.

I like how i handle the lines of code. Letting the evaluator decide what is and isn't code. Takes away from { on a single line, spaces consuming lots of characters, and lots of other stuff.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
aaronhall
DevNet Resident
Posts: 1040
Joined: Tue Aug 13, 2002 5:10 pm
Location: Back in Phoenix, missing the microbrews
Contact:

Post by aaronhall »

I love the directory-wide and file-by-file stats idea... would be great to extend this and have a whitelist of file extentions (.js, .css, .html, etc.)

Great script! I'll be using this at some point
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Post by Jenk »

I'd prefer if the count was based on the use of ';' and function/class/if/loops open and closing tags as one line.

Well, if I'm completely honest, I don't care about LOC (Lines Of Code) :) but thought it would be more sensible to count based on actual commands/lines than to count arbitrarily on length/number of characters. :)
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

A couple of comments I have,

Code: Select all

private function _formatLineLength()
   {
      $this->_code = chunk_split($this->_code, $this->_lineLength);
   }
Can potentially create parse errors since it might chop php right in the middle of a php command, or similar. This probably require a smart enough regex to know not to chop in certain places.

Code: Select all

private function _handleTags()
   {
      if (!$this->_allowPHPTags)
      {
         $this->_code = str_replace(array('<?php', '<?', '?>'), '', $this->_code);
      }
   }
Should probably be using str_ireplace for case insensitivity also.

Otherwise, cool library.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Jcart wrote:A couple of comments I have,

Code: Select all

private function _formatLineLength()
   {
      $this->_code = chunk_split($this->_code, $this->_lineLength);
   }
Can potentially create parse errors since it might chop php right in the middle of a php command, or similar. This probably require a smart enough regex to know not to chop in certain places.
How so? The code is just being evaluated in a string (never executed). I don't see how hard chops would matter, and chopping right at the xth character will give a "full" line of code.
Jcart wrote:

Code: Select all

private function _handleTags()
   {
      if (!$this->_allowPHPTags)
      {
         $this->_code = str_replace(array('<?php', '<?', '?>'), '', $this->_code);
      }
   }
Should probably be using str_ireplace for case insensitivity also.
Absolutely!

Right now I'm working on a base class, and extending into single file class (for evaluating a single php file, like above), single directory class (all php files in one directory), and directory iterator class (all php file in a directory tree starting at root directory gven). Boredom is bliss!

I'm also adding support for asp style tags (ew.. but I suppose some people use it).

<script language="php">...</script> will be tough, because </script> can denote the ending of javascript if _PHPOnly is set to false.. so I'll save that one for later.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

$code = 'echo somefunction($foobar);';

Applying chunk split on this will produce invalid code if it splits it in the middle of a function name..
Maybe I'm not understanding how why you even need to chunk_split in the first place, since you are never outputting the code anyways.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

It's my way of evaluating a full line of source code. In my library, one full line = 80 characters of hard code.

This eliminates guessing of white space, one character lines ({ and }), and some other stuff. I like it because it breaks it down into the absolutely most compact full line of code you can possibly have :) (even if it would be broken when evaluated).
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

It's my way of evaluating a full line of source code. In my library, one full line = 80 characters of hard code.

This eliminates guessing of white space, one character lines ({ and }), and some other stuff. I like it because it breaks it down into the absolutely most compact full line of code you can possibly have Smile (even if it would be broken when evaluated).
Ah, gotcha! I see where your coming from, I just assume that the snippet below would report it as only 1 line of code. Obviously this is a bit of an exaggeration, since I'm sure nobody would set it 10 characters per chunk,

Code: Select all

private function _formatLineLength()
   {      
      //$this->_code = chunk_split($this->_code, $this->_lineLength);
      $this->_code = chunk_split('echo foobar($foo);', 10);
   }
When determining line count I wouldn't necessarily strip all whitespace, only blank lines. It just seems natural to do it this way.. as in if I'm in my editor and I'm looking at 100 lines of code, I'd expect your library to display ~100 lines (depending on blank lines) and not how many lines fill of code fill 80 characters. For instance,

Code: Select all

$foo = 'echo 1;
echo 2;

echo 3;

echo blehblah(4)';

//remove blank lines from code
$lines = array_filter(explode(PHP_EOL, $foo));
echo 'Line count: '. count($lines); //returns 4

//convert to string to perform regex
$source = implode(PHP_EOL, $lines);
Apologies if I'm not understanding, perhaps just a difference in opinion :)
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Jenk wrote:I'd prefer if the count was based on the use of ';' and function/class/if/loops open and closing tags as one line.
Excellent idea! Just noticed this post after re-reading it :) Might take a bit of regex redbull to be as accurate with it as possible though, for instance for loops shouldn't be considered multiple lines of code
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Jcart wrote:
Jenk wrote:I'd prefer if the count was based on the use of ';' and function/class/if/loops open and closing tags as one line.
Excellent idea! Just noticed this post after re-reading it :) Might take a bit of regex redbull to be as accurate with it as possible though, for instance for loops shouldn't be considered multiple lines of code
One should also take into account that the last line of a code block (ending in ?>) does not require a semicolon terminator.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Yeah, I'm working on an updated version (1.1.1) that splits into different classes. A base class, one for single file, one for directory, and one for directory tree recursion. Right now they'll only use the line count I have implemented into the example above.

However, if there's enough interest after I show that updated version.. I've been thinking about allowing the user to decide how to count the lines for the next version update (1.2.1).

* COUNT_HARD
- The way I've already implemented it. Breaks the code down into a single string of code and hard chops at a full 80 characters per line. I call this the hard count because it doesn't really get any more compact than this.

* COUNT_REAL
- The way the programmer programmed it. All lines (minus comments) including blank lines.

* COUNT_REAL_NO_BLANK
- The way the programmer programmed it. All lines (minus comments) not including blank lines.

* COUNT_TERMINATION
- Semicolon delimited commands count as 1 line. Minus for loops and the optional missing ; before the close block ?>

* COUNT_ALL
- An average of the above four counts.

I believe with all of those options, you couldn't ask for a more suitable source lines of code library. :) And perhaps the COUNT_ALL average could become some sort of source line counting standard. :P hehehe
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

Parsing PHP code with regexps is [s]hopelessly[/s] seriously flawed.
Use proper parrsing via the tokenizer functions.

count_hard (and therefore count_all) are useless stats, might as well give character count or i dunno, measure whitespace to text ratio :)

and btw your db code sucks :)
just kidding (literally, got my kid in the other hand, so excuse tha lame typing :P )

(but seriously, why the thin wrapper, I doubt you ever used even half the methods)
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Mordred wrote:Parsing PHP code with regexps is [s]hopelessly[/s] seriously flawed.
Use proper parrsing via the tokenizer functions.
That's a good idea. So far I'm having success with regex's, though.
Mordred wrote:count_hard (and therefore count_all) are useless stats, might as well give character count or i dunno, measure whitespace to text ratio :)
You've got to be kidding? I enjoy count_hard stats because it's the most compact. Gives me a MINIMUM boundary that I can definately say my code is at least X lines by X characters long.

And I don't think the count_all would be useless, in fact, it'd be the most useful.
and btw your db code sucks :)
just kidding (literally, got my kid in the other hand, so excuse tha lame typing :P )

(but seriously, why the thin wrapper, I doubt you ever used even half the methods)
I've never even used it. It was a boredom thing, much like this. :)
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
Post Reply