Page 1 of 1

how to extract data from my files

Posted: Wed Jan 05, 2005 6:50 am
by jasongr
Hello people

I would like to run the following test on my PHP files.
I have 2 PHP functions called: iterate and display
here are the function signatures:

Code: Select all

public function iterate($param1, $param2, $param3 = '');
function display($param1, $param2, $param3 = '');
$param1, $param2 and $param3 are all strings
I would like to know exactly where I am using these functions in the code and what parameters are being used.

I would like to write a script that will iterate over all the PHP files (.php or .inc.php extension) in my projet and to use a regular expression to look for usage of these functions.
Whenever a usage is encountered, it should be written to a log file like so:
<$param1>\t<$param2>\t<path to file>\t<line #>\n

For example, if file a.php contains the code:

Code: Select all

echo display('test', 'Hello');
and file b.php contains the line:

Code: Select all

if ($obj->iterate('run', $val) == true)) {
The log file read:

Code: Select all

test   Hello   a.php   14
run    $val    b.php   20
where 14 is the line in a.php where the function was found and 20 is the line in b.php where the second function was found

I was wondering if someone could help me with this as my main problem is how to formulate the regular expression

regards

Posted: Wed Jan 05, 2005 10:48 am
by feyd

Code: Select all

<?php

	$text = 'if ($obj->iterate(''run'', $val) == true)) {
	if ($obj->iterate(''run'' . ''test'', $val) == true)) {
	if ($obj->iterate(''run'' . $not_work, $val) == true)) {';
	
	preg_match_all('#(iterate|display)\\s*\(((\\s*(([''"]).*?\\\\5)|\\$.*?)\\s*,){1,2}\\s*((([''"]).*?\\\\8)|\\$.*?)\\s*\\)#s', $text, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE);
	
	var_export($matches);

?>

Code: Select all

array (
  0 =&gt;
  array (
    0 =&gt;
    array (
      0 =&gt; 'iterate(''run'', $val)',
      1 =&gt; 10,
    ),
    1 =&gt;
    array (
      0 =&gt; 'iterate',
      1 =&gt; 10,
    ),
    2 =&gt;
    array (
      0 =&gt; '''run'',',
      1 =&gt; 18,
    ),
    3 =&gt;
    array (
      0 =&gt; '''run''',
      1 =&gt; 18,
    ),
    4 =&gt;
    array (
      0 =&gt; '''run''',
      1 =&gt; 18,
    ),
    5 =&gt;
    array (
      0 =&gt; '''',
      1 =&gt; 18,
    ),
    6 =&gt;
    array (
      0 =&gt; '$val',
      1 =&gt; 25,
    ),
  ),
  1 =&gt;
  array (
    0 =&gt;
    array (
      0 =&gt; 'iterate(''run'' . ''test'', $val)',
      1 =&gt; 55,
    ),
    1 =&gt;
    array (
      0 =&gt; 'iterate',
      1 =&gt; 55,
    ),
    2 =&gt;
    array (
      0 =&gt; '''run'' . ''test'',',
      1 =&gt; 63,
    ),
    3 =&gt;
    array (
      0 =&gt; '''run'' . ''test''',
      1 =&gt; 63,
    ),
    4 =&gt;
    array (
      0 =&gt; '''run'' . ''test''',
      1 =&gt; 63,
    ),
    5 =&gt;
    array (
      0 =&gt; '''',
      1 =&gt; 63,
    ),
    6 =&gt;
    array (
      0 =&gt; '$val',
      1 =&gt; 79,
    ),
  ),
)
note how the third line isn't found, due to the more complex nature of the expression involved and my lack of time to fiddle with it more.

Posted: Wed Jan 05, 2005 11:07 am
by timvw
your question has already been answered, but you might want to check out this project: http://ctags.sourceforge.net too....

if i understand a little, it does (much) more than what you asked for :)

Posted: Wed Jan 05, 2005 5:32 pm
by jasongr
thanks feyd

could you give me a short explanation of this complex regular expression?
I am very new to regular expressions and I tried to understand how you solved it, but I am lost

Posted: Wed Jan 05, 2005 5:43 pm
by feyd
basics:
(iterate|display) = look for iterate or display
\s*\( any amount of whitespace followed by an opening paren.
([''"]).*?\\5)|\$.*? look for a quoted string symboling or a vague description of a variable.
{1,2} expect 1 to 2 of them
([''"]).*?\\8)|\$.*? look for a quoted string symboling or a vague description of a variable.
\s*\) any amount of whitespace followed by the closing paren of the function call.

Posted: Wed Jan 05, 2005 5:47 pm
by jasongr
I have 2 questions:
what do the numbers 5 and 8 mean in ([''"]).*?\\5)|\$.*? and ([''"]).*?\\8)|\$.*?

and why do you precede them with \\

I notice that you surround your expression with #. Is it custom?
I also saw expressions that are surrounded by '
is there a difference?
do I must surround my regular expressions with # ?

thanks

Posted: Wed Jan 05, 2005 8:37 pm
by feyd
you have to surround your pattern with a symbol character... any will work. However, some characters are metacharacters for patterns. Although you can use them for patterns, it's often suggested to find a symbol you like that isn't one of them. ^+-*()[]{}=?|!$ are all metacharacters in some form or another. /@# are the most often used that I've seen as pattern start and end markers (surrounding a pattern)

the 5 and 8 are back references to marked segments of the pattern. In both cases they are [''"], but from differing positions. \\5 references the 5th subpattern that's marked for remembering.