Page 1 of 1

looking for a string in a file?...

Posted: Fri Dec 17, 2004 11:17 pm
by sebnewyork
Hi all

I have been using the fgetcsv() function before to detect a character while opening and reading a file. Now, I need to look for not just one character, but for a string (like for example "<table>" or "<table width=\"100\"> or any string within the file.

How can I do that?

The code I was using before is

Code: Select all

<?php
foreach($pages as $file){
	$fp = fopen ("$file", 'rb');
	while ($line = fgetcsv ($fp, 200, "n")) {
		foreach('n') {
			echo "one n found";
			break;
		}
	}
	fclose ($fp);
}

?>
what should I use if I wanted to acheive the same thing but with a string rather than a single character? I know the following code is not valid, but this is what I'd like to do:

Code: Select all

<?php
foreach($pages as $file){
	$fp = fopen ("$file", 'rb');
	while ($line = fgetcsv ($fp, 200, "<table>")) {
		foreach('<table>') {
			echo "one table tag found";
			break;
		}
	}
	fclose ($fp);
}
?>

Thanks for your help!

Posted: Sat Dec 18, 2004 1:05 am
by rehfeld

Code: Select all

<?php

$haystack = file_get_contents($filename);

// faster if you just need to see if needle exists
if (false !== strpos($haystack, $needle)) {
    echo 'needle found';
}

// or

$occurances = substr_count($haystack, $needle);

echo "$needle was found $occurances times";






?>

if you need case insensitive, use strtolower() on both $haystack and $needle before checking via either the above methods, or if using php5, there is stripos()

Posted: Sat Dec 18, 2004 9:16 am
by sebnewyork
thanks rehfeld

I assume the "$needle" can be any string, for example I could have:

$occurances = substr_count($myPage, "<? include('path_to_file.html') ?>");

right?

Basically my goal is to return a list of all the included files in a page, like in my example: "path_to_file.html"
So I'd need somehow to get PHP to look for the opening include tag up to the actual file path

"<? include('"

and the next closing tag

"') ?>

so that it can return just the actual file path, between those two strings.

Can I use the haystack and needle system for that, or is there a special function just for that (returning any content between any occurence of 2 specific strings).

Thanks a lot for your help

Posted: Sat Dec 18, 2004 11:04 am
by rehfeld
you could use strpos and substr in a while loop, but it would become extremely difficult.

you should use regular expressions for this, its what they are made for.
take a look at preg_match_all()

this is a tough one though. its because theres so many variations in the syntax for include and require


im not that great at regex, but this should get them all i beleive.
but it doesnt isolate the filename, it just grabs the whole include statement. isolating the filename is where it gets difficult.

Code: Select all

<pre>
<?php

$pattern = '/(include|require(_once)?)([^;]+)/i';


preg_match_all($pattern, $subject, $matches);

print_r($matches);

?>
you can learn more about regex on google or heres a nice one.
http://www.regular-expressions.info/tutorial.html
it takes a long time to learn the seeminlgy cryptic syntax, but its well worth it

Posted: Sat Dec 18, 2004 1:36 pm
by sebnewyork
thank you.
Do I need to define the variables $subject and $matches?
if yes, how do I do that? Is $subject the page I want to open and read through?

and does that preg_match_all() function opens the pages, or do I need to use a
file_get_content or fopen() function?

Sorry I'm lost.

Posted: Sat Dec 18, 2004 1:47 pm
by rehfeld
yes,
$subject = file_get_contents('some_file.php');

if your using an old version of php,
you may need to use fopen() and fread() instead of file_get_contetns()

$matches will be filled by preg_match,
so no you do not need to give it a value beforehand

Posted: Sat Dec 18, 2004 1:48 pm
by John Cartwright

Code: Select all

<?php
$subject = file_get_contents($file);
$pattern = '/(include|require(_once)?)([^;]+)/i';
preg_match_all($pattern, $subject, $matches);
print_r($matches);
?>
Subject is your content, Matches is an array of all the matches your regex has found....

Posted: Sat Dec 18, 2004 3:35 pm
by sebnewyork
Ah, thanks, it seems to work... sort of!
But not really.
Here's the code I'm using now:

Code: Select all

<?php
$subject = file_get_contents ($file); 
$pattern ='/(include|require(_once)?)([^;]+)/i';
if (preg_match_all ($pattern, $subject, $matches)) {
	foreach ($matches as $match) {
		echo "<p>" . $match ."</p>"; 
	}
}
?>
and depending on what $file is, it echos a different number of matches, in the form of

Array

Array

Array

etc.
And it returns 4 of them for a page that doesn't has any includes!... So there must be something wrong with the regular expression, I'll have to look into it.

But instead of "Array" I'd like to have the matching string returned, i.e.
"include('blabla.htm')"
How would I do that?

Posted: Sat Dec 18, 2004 3:44 pm
by markl999

Code: Select all

foreach ($matches[0] as $match) {
 echo "<p>" . $match ."</p>";
}

Posted: Sat Dec 18, 2004 5:10 pm
by John Cartwright
take a look at http://ca.php.net/preg_match_all to understand how matches are returned[/php_man]