Page 1 of 1

checking for complexcity with reg ex

Posted: Thu Aug 31, 2006 5:49 am
by rami
i am trying to develop a software metrices with php reg ex
but i fell in to some problem
here is my code
scenario:it reads a program written in c and analyze it for complexcity
counting number of if,for ,while ,number of lines etc......

Code: Select all

$line=explode("\r",$filestring);
	$no_lines=count($line);
	$count_line=0;
	$cond_count=0;
	$loop_count=0;
	$case_count=0;
	for($i=0;$i<=$no_lines;$i++)
	{
		if (eregi (";", $line[$i]))
		{
				$count_line++;
		}
		if (eregi ("if", $line[$i]))
		{
				$cond_count+=2;
		}
		if (eregi ("for", $line[$i])||eregi ("while", $line[$i])||eregi ("do", $line[$i]))
		{
				$loop_count++;
		}
	}
		echo"<br>";
		echo"No of lines=".$count_line;
		echo"<br>No of conditions=".$cond_count/2;
		echo"<br>No of loops=".($loop_count-1);
		$cc=($cond_count+($loop_count-1)+$case_count)+1;
		echo "<br>Cyclomatic complexity=".$cc;
		echo"<br>-------------------------";
ok here are my problem
This program is counting keyword i want to count for eg
"for"
but the program is also counting for with in double quotes
ie "this is for merry";
and for of some word for ef form
it is also counting for of comments as well
/*...for */
These all should not be included as it is not keyword but rather is expression.
how can i ingore these and take only real keyword
for eg
for(i=0;i<n;i++)

SAME WITH IF,WHILE ETC...

and how can i count the line of codes by ignoring or not including comments line and blank lines.

any help
new program can be written or i would be grateful if any body ammend this program for those purpose.

Posted: Thu Aug 31, 2006 6:47 am
by sweatje
I think you are approaching it using the wrong design from the start.

See http://us3.php.net/manual/en/function.token-get-all.php

If you want to stay with regex, changed to preg_* (ereg is or should be deprecated)
look at \b (zero width word boundary assertion)
look at the | operator, and put comments before your other checks.

Posted: Thu Aug 31, 2006 7:49 am
by volka
see viewtopic.php?p=304839
I think you overestimate the power of regular expressions.
http://en.wikipedia.org/wiki/Chomsky%E2%80%93Sch%C3%BCtzenberger_hierarchy wrote:Type-2 grammars (context-free grammars) [...] Context free languages are the theoretical basis for the syntax of most programming languages.
regular expressions are of type 3 which is the weakest level.
Search the web for the combination of flex/bison for more information about parsing scripts (and similar)