How? Replace text -- but not tags -- within an HTML page

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
TheBentinel.com
Forum Contributor
Posts: 282
Joined: Wed Mar 10, 2004 1:52 pm
Location: Columbus, Ohio

How? Replace text -- but not tags -- within an HTML page

Post by TheBentinel.com »

I have a need to read an entire HTML page, replace bits of text with other bits of text, then spit it back out. preg_replace handles the replace handily, but it replaces all occurrences. I only want to change the text of the page, not the html tags and attributes.

As an example, if a page returned this html: (using brackets for angle brackets)

Code: Select all

[font face=arial] arial is a good face [/font]
and the user indicated that they want to change "face" to "FONT", I only want the text to change, like this:

Code: Select all

[font face=arial] arial is a good FONT [/font]
I can't seem to come up with a way to find and change just the text. strip_tags doesn't help, since I *want* the tags.

Is there something obvious I'm missing, or do I need to write some sort of character-by-character walk through the HTML?

Thanks for any ideas!
The Monkey
Forum Contributor
Posts: 168
Joined: Tue Mar 09, 2004 9:05 am
Location: Arkansas, USA

Post by The Monkey »

substr() might work. Is it just ONE piece of text you want replaced, or can the user change any piece of text or characters or whatever?


For instance, on your script could I change good to bad? Or just face to whatever I want?

Because another solution might be letting them just update the whole database entry and have it add the tags automatically.

Elaborate, please! :)
TheBentinel.com
Forum Contributor
Posts: 282
Joined: Wed Mar 10, 2004 1:52 pm
Location: Columbus, Ohio

Post by TheBentinel.com »

The Monkey wrote:Elaborate, please! :)
The user will name a URL and a series of text pairs. The PHP will retrieve the html from the URL, substitute the "find" text with the "replace" text, then display the result to the user. You can imagine pulling up a CNN article about "John Kerry" but substituting "Bozo The Clown" for "Kerry". Hilarity ensues!

But, if there's a script in the HTML that uses the word "kerry", or a .js file named "kerry", or any other structural item called "kerry" I want to leave it alone.

I thought this morning that I might do a strpos to find my target string, then do some sort of reverse strpos from that point to find the previous ">" and "<". I could use that to determine if I'm within a tag or not. I'd have to check for <script and <style, too. Any others?

I guess I was hoping somebody would pipe up and say, "Why don't you just use the strReplaceOnlyTextButIgnoreTags function?"

Oh well, it was worth a whack. Thanks for your help.
Illusionist
Forum Regular
Posts: 903
Joined: Mon Jan 12, 2004 9:32 pm

Post by Illusionist »

how about adding all the find to an array and all the replace to an array and using str_replace()??

Code: Select all

$search = array('one','two','three','Kerry');
$replace = array('1','2','3','Bozo the Clown');
$text = str_replace($search,$replace,$html);
//$html is the incomming html from the URL
//then you can etiher echo out the HTML or save it to a file.
//this way will replace all occurences of the string, so it might not work as you want... 
//you may just want to get the HTML load it into a textarea, let the user edit it like they want. Then save it to a file or echo it all out!
TheBentinel.com
Forum Contributor
Posts: 282
Joined: Wed Mar 10, 2004 1:52 pm
Location: Columbus, Ohio

Post by TheBentinel.com »

Illusionist wrote:how about adding all the find to an array and all the replace to an array and using str_replace()??
That would also destroy scripts, styles, even tags within the document. Imagine somebody thinking how funny it would be to replace "table" with "chair". All the table tags in the HTML would be destroyed.

I need to do the replace only within the text of the document.

Thanks for the help, though. I didn't realize you could use arrays in the replace function like that. Pretty cool.
Illusionist
Forum Regular
Posts: 903
Joined: Mon Jan 12, 2004 9:32 pm

Post by Illusionist »

ya thats why i said you wouldn't want to do that... So whats wrong wiht loading it into a textarea and letting the user edit it themselves?
TheBentinel.com
Forum Contributor
Posts: 282
Joined: Wed Mar 10, 2004 1:52 pm
Location: Columbus, Ohio

Post by TheBentinel.com »

Illusionist wrote:ya thats why i said you wouldn't want to do that... So whats wrong wiht loading it into a textarea and letting the user edit it themselves?
It's just supposed to be for fun. Some idiot user that doesn't know HTML from HGB just wants to take a CNN article and make it sound funny. The CDC recommends getting a vodka shot, instead of a flu shot, for instance. So it has to be automatic or they won't fool with it.

And you're absolutely right, I don't want to do it! :-) But I think I'm going to have to.
Post Reply