searching text in html page
Moderator: General Moderators
searching text in html page
I want search phrase in html file and mark (like Google when I click on "cached") it (phrase). Does somebody know any class, that can help me? Maybe somebody know how could I make this?
Thx
Thx
-
php_wiz_kid
- Forum Contributor
- Posts: 181
- Joined: Tue Jun 24, 2003 7:33 pm
You could use str_replace().
Just open a file, put its contents into a variable and use str_replace() to replace a string with the desired string.
Give that a whirl. The contents of $file_read doesn't have to be a file. It can be any type of string or array.
Just open a file, put its contents into a variable and use str_replace() to replace a string with the desired string.
Code: Select all
$file_open = fopen($file, 'r');
$file_read = fread($file_open, filesize($file)); //Contents of $file
str_replace($to_replace, $replace_with, $file_read);- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
or you can even minimize that with
I would recommend using file_get_contents() instead of fopen then fread
Code: Select all
str_replace($to_replace, $replace_with, file_get_contents($filename));-
php_wiz_kid
- Forum Contributor
- Posts: 181
- Joined: Tue Jun 24, 2003 7:33 pm
Yes, it should. You could do this:
It would turn this:
to:
In fact. I made a template object that uses this (str_replace) to find strings like {U_THING} inside a html/text/template file and change into XHTML.
Code: Select all
$to_replace = "e;<table>"e;;
$replace_with = "e;<blahblah>"e;;
str_replace($to_replace, $replace_with, file_get_contents($filename));Code: Select all
...
<body>
<table>
<tr>
<td>BLAH</td>
</tr>
</table>
</body>
...Code: Select all
...
<body>
<blahblah>
<tr>
<td>BLAH</td>
</tr>
</table>
</body>
...- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Here's a nice little trick 
Change preg_match() to preg_match_all() if you're looking for numerous items (e.g. <b>text</b> tags)...
Hope that helps
Code: Select all
function getBlock($source, $tag) {
$re = '#<\s*'.$tag.'[^>]*>(.*?)<\s*/\s*'.$tag.'\s*>#is';
if (preg_match($re, $source, $matches)) {
$block = $matches[1]; //The bit you need
return $block;
} else {
return false;
}
}
/*** EXAMPLE ****/
$google_source = file_get_contents('http://www.google.com/');
$googles_body = getBlock($google_source, 'body');
echo '<pre>';
echo htmlspecialchars($googles_body);
echo '</pre>';Hope that helps
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Misunderstood question... apologies.
To avoid replacing HTML tags like you suggested try this...
It will have to be a regex for this anyway.
Untested (can you let me know how this goes please - curious on this one but too busy to test).
At first sight the regular expressions looks quite scary (and it unavoidably matches the whitespace preceding the word but nobody sees that).
The (?<!...) is a negative lookbehind (in other words the word must NOT follow "<". Equally the (?!...) is a negative lookahead (in other words the word must not come before ">". The \s* and [^>]* just allow other permittable characters to be in the source code and not cause a problem.
Good luck
To avoid replacing HTML tags like you suggested try this...
It will have to be a regex for this anyway.
Untested (can you let me know how this goes please - curious on this one but too busy to test).
Code: Select all
function highlightWords ($source, $word, $color) {
$re = '#(?<!<)(\s*'.$word.')(?![^>]*>)#is';
$replace = '<span style="background-color:'.$color.'">$1</span>';
$highlighted = preg_replace($re, $replace, $source);
return $highlighted;
}
/*** EXAMPLE ***/
$regex_info_source = file_get_contents('http://www.regular-expressions.info/');
$re_highlighted = highlightWords($regex_info_source, 'regex', '#FFEE00');
echo $re_highlighted;The (?<!...) is a negative lookbehind (in other words the word must NOT follow "<". Equally the (?!...) is a negative lookahead (in other words the word must not come before ">". The \s* and [^>]* just allow other permittable characters to be in the source code and not cause a problem.
Good luck