Page 1 of 1
searching text in html page
Posted: Sun May 08, 2005 7:42 pm
by cisrudlow
I want search phrase in html file and mark (like Google when I click on "cached") it (phrase). Does somebody know any class, that can help me? Maybe somebody know how could I make this?
Thx
Posted: Sun May 08, 2005 8:28 pm
by php_wiz_kid
You could use str_replace().
Just open a file, put its contents into a variable and use str_replace() to replace a string with the desired string.
Code: Select all
$file_open = fopen($file, 'r');
$file_read = fread($file_open, filesize($file)); //Contents of $file
str_replace($to_replace, $replace_with, $file_read);
Give that a whirl. The contents of $file_read doesn't have to be a file. It can be any type of string or array.
Posted: Sun May 08, 2005 10:18 pm
by John Cartwright
or you can even minimize that with
Code: Select all
str_replace($to_replace, $replace_with, file_get_contents($filename));
I would recommend using file_get_contents() instead of fopen then fread
Posted: Mon May 09, 2005 12:04 am
by cisrudlow
ok, but if I want find "body" or "table" etc. it'll find me html tags, too.
Posted: Mon May 09, 2005 12:27 am
by php_wiz_kid
Yes, it should. You could do this:
Code: Select all
$to_replace = "e;<table>"e;;
$replace_with = "e;<blahblah>"e;;
str_replace($to_replace, $replace_with, file_get_contents($filename));
It would turn this:
Code: Select all
...
<body>
<table>
<tr>
<td>BLAH</td>
</tr>
</table>
</body>
...
to:
Code: Select all
...
<body>
<blahblah>
<tr>
<td>BLAH</td>
</tr>
</table>
</body>
...
In fact. I made a template object that uses this (str_replace) to find strings like {U_THING} inside a html/text/template file and change into XHTML.
Posted: Mon May 09, 2005 4:35 am
by Chris Corbyn

Moving to regex....
Posted: Mon May 09, 2005 4:44 am
by Chris Corbyn
Here's a nice little trick
Code: Select all
function getBlock($source, $tag) {
$re = '#<\s*'.$tag.'[^>]*>(.*?)<\s*/\s*'.$tag.'\s*>#is';
if (preg_match($re, $source, $matches)) {
$block = $matches[1]; //The bit you need
return $block;
} else {
return false;
}
}
/*** EXAMPLE ****/
$google_source = file_get_contents('http://www.google.com/');
$googles_body = getBlock($google_source, 'body');
echo '<pre>';
echo htmlspecialchars($googles_body);
echo '</pre>';
Change preg_match() to preg_match_all() if you're looking for numerous items (e.g. <b>text</b> tags)...
Hope that helps

Posted: Mon May 09, 2005 5:36 am
by Chris Corbyn
Misunderstood question... apologies.
To avoid replacing HTML tags like you suggested try this...
It will have to be a regex for this anyway.
Untested (can you let me know how this goes please - curious on this one but too busy to test).
Code: Select all
function highlightWords ($source, $word, $color) {
$re = '#(?<!<)(\s*'.$word.')(?![^>]*>)#is';
$replace = '<span style="background-color:'.$color.'">$1</span>';
$highlighted = preg_replace($re, $replace, $source);
return $highlighted;
}
/*** EXAMPLE ***/
$regex_info_source = file_get_contents('http://www.regular-expressions.info/');
$re_highlighted = highlightWords($regex_info_source, 'regex', '#FFEE00');
echo $re_highlighted;
At first sight the regular expressions looks quite scary (and it unavoidably matches the whitespace preceding the word but nobody sees that).
The (?<!...) is a negative lookbehind (in other words the word must NOT follow "<". Equally the (?!...) is a negative lookahead (in other words the word must not come before ">". The \s* and [^>]* just allow other permittable characters to be in the source code and not cause a problem.
Good luck
