A regex which the whole PHP world would appreciate.

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Walid
Forum Commoner
Posts: 33
Joined: Mon Mar 17, 2008 8:43 am

A regex which the whole PHP world would appreciate.

Post by Walid »

Does anyone know of a regexp that can be used with preg_replace that can replace all ampersands which are not connected to an HTML entity into an HTML entity.

So, on the following, the & in between PHP & ASP would get replaced by the & html entiry and the © would be left untouched.

Code: Select all

$string = "PHP & ASP went up a tree. My poem is © 2008.";
$string = preg_replace($regexp', '&', $string);
This regex would probably be valued by each and every single PHP developer. (Especially me, of course)
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: A regex which the whole PHP world would appreciate.

Post by onion2k »

What cases would it need to match? Wouldn't it just be any ampersand followed by whitespace, or by an alphanumeric string that doesn't have a semi-colon at the end? I can't think of anything else it'd need to match...

But..

Why can't you use html_entity_decode() followed by htmlentities()? Wouldn't that achieve the same result?
Walid
Forum Commoner
Posts: 33
Joined: Mon Mar 17, 2008 8:43 am

Re: A regex which the whole PHP world would appreciate.

Post by Walid »

What cases would it need to match? Wouldn't it just be any ampersand followed by whitespace, or by an alphanumeric string that doesn't have a semi-colon at the end?
I can't think of any either. But, being regex-illiterate, I don't even know how to put that together. If you could do the honours and also add an explanation for each part of the pattern, that would be fantastic.
Why can't you use html_entity_decode() followed by htmlentities()? Wouldn't that achieve the same result?
Actually, running html_entity_decode() alone seems to have done the trick. I am in a bit confusion right now as to what's going on.
Walid
Forum Commoner
Posts: 33
Joined: Mon Mar 17, 2008 8:43 am

Re: A regex which the whole PHP world would appreciate.

Post by Walid »

Walid wrote:
Why can't you use html_entity_decode() followed by htmlentities()? Wouldn't that achieve the same result?
Actually, running html_entity_decode() alone seems to have done the trick. I am in a bit confusion right now as to what's going on.
My mistake... it is not.

Running the 2 functions one after the other simply do and undo. So how would that work?
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: A regex which the whole PHP world would appreciate.

Post by onion2k »

html_entity_decode() turns html entities in a string into their character equivalents ... so it should turn "PHP & ASP went up a tree. My poem is © 2008." into "PHP & ASP went up a tree. My poem is © 2008.". Running that resulting string through htmlentities() will turn it into "PHP & ASP went up a tree. My poem is © 2008.". Which is what you want, right?

If it doesn't then there's something wrong with your code, or possibly with the character encoding of the initial string.
Post Reply