Page 1 of 1

How? Replace text -- but not tags -- within an HTML page

Posted: Wed Mar 10, 2004 1:52 pm
by TheBentinel.com
I have a need to read an entire HTML page, replace bits of text with other bits of text, then spit it back out. preg_replace handles the replace handily, but it replaces all occurrences. I only want to change the text of the page, not the html tags and attributes.

As an example, if a page returned this html: (using brackets for angle brackets)

Code: Select all

[font face=arial] arial is a good face [/font]
and the user indicated that they want to change "face" to "FONT", I only want the text to change, like this:

Code: Select all

[font face=arial] arial is a good FONT [/font]
I can't seem to come up with a way to find and change just the text. strip_tags doesn't help, since I *want* the tags.

Is there something obvious I'm missing, or do I need to write some sort of character-by-character walk through the HTML?

Thanks for any ideas!

Posted: Wed Mar 10, 2004 3:33 pm
by The Monkey
substr() might work. Is it just ONE piece of text you want replaced, or can the user change any piece of text or characters or whatever?


For instance, on your script could I change good to bad? Or just face to whatever I want?

Because another solution might be letting them just update the whole database entry and have it add the tags automatically.

Elaborate, please! :)

Posted: Thu Mar 11, 2004 7:52 am
by TheBentinel.com
The Monkey wrote:Elaborate, please! :)
The user will name a URL and a series of text pairs. The PHP will retrieve the html from the URL, substitute the "find" text with the "replace" text, then display the result to the user. You can imagine pulling up a CNN article about "John Kerry" but substituting "Bozo The Clown" for "Kerry". Hilarity ensues!

But, if there's a script in the HTML that uses the word "kerry", or a .js file named "kerry", or any other structural item called "kerry" I want to leave it alone.

I thought this morning that I might do a strpos to find my target string, then do some sort of reverse strpos from that point to find the previous ">" and "<". I could use that to determine if I'm within a tag or not. I'd have to check for <script and <style, too. Any others?

I guess I was hoping somebody would pipe up and say, "Why don't you just use the strReplaceOnlyTextButIgnoreTags function?"

Oh well, it was worth a whack. Thanks for your help.

Posted: Thu Mar 11, 2004 11:28 am
by Illusionist
how about adding all the find to an array and all the replace to an array and using str_replace()??

Code: Select all

$search = array('one','two','three','Kerry');
$replace = array('1','2','3','Bozo the Clown');
$text = str_replace($search,$replace,$html);
//$html is the incomming html from the URL
//then you can etiher echo out the HTML or save it to a file.
//this way will replace all occurences of the string, so it might not work as you want... 
//you may just want to get the HTML load it into a textarea, let the user edit it like they want. Then save it to a file or echo it all out!

Posted: Thu Mar 11, 2004 11:35 am
by TheBentinel.com
Illusionist wrote:how about adding all the find to an array and all the replace to an array and using str_replace()??
That would also destroy scripts, styles, even tags within the document. Imagine somebody thinking how funny it would be to replace "table" with "chair". All the table tags in the HTML would be destroyed.

I need to do the replace only within the text of the document.

Thanks for the help, though. I didn't realize you could use arrays in the replace function like that. Pretty cool.

Posted: Thu Mar 11, 2004 11:38 am
by Illusionist
ya thats why i said you wouldn't want to do that... So whats wrong wiht loading it into a textarea and letting the user edit it themselves?

Posted: Thu Mar 11, 2004 11:43 am
by TheBentinel.com
Illusionist wrote:ya thats why i said you wouldn't want to do that... So whats wrong wiht loading it into a textarea and letting the user edit it themselves?
It's just supposed to be for fun. Some idiot user that doesn't know HTML from HGB just wants to take a CNN article and make it sound funny. The CDC recommends getting a vodka shot, instead of a flu shot, for instance. So it has to be automatic or they won't fool with it.

And you're absolutely right, I don't want to do it! :-) But I think I'm going to have to.