Page 1 of 1

A regex which the whole PHP world would appreciate.

Posted: Wed Mar 26, 2008 2:16 am
by Walid
Does anyone know of a regexp that can be used with preg_replace that can replace all ampersands which are not connected to an HTML entity into an HTML entity.

So, on the following, the & in between PHP & ASP would get replaced by the & html entiry and the © would be left untouched.

Code: Select all

$string = "PHP & ASP went up a tree. My poem is © 2008.";
$string = preg_replace($regexp', '&', $string);
This regex would probably be valued by each and every single PHP developer. (Especially me, of course)

Re: A regex which the whole PHP world would appreciate.

Posted: Wed Mar 26, 2008 2:51 am
by onion2k
What cases would it need to match? Wouldn't it just be any ampersand followed by whitespace, or by an alphanumeric string that doesn't have a semi-colon at the end? I can't think of anything else it'd need to match...

But..

Why can't you use html_entity_decode() followed by htmlentities()? Wouldn't that achieve the same result?

Re: A regex which the whole PHP world would appreciate.

Posted: Thu Mar 27, 2008 12:07 am
by Walid
What cases would it need to match? Wouldn't it just be any ampersand followed by whitespace, or by an alphanumeric string that doesn't have a semi-colon at the end?
I can't think of any either. But, being regex-illiterate, I don't even know how to put that together. If you could do the honours and also add an explanation for each part of the pattern, that would be fantastic.
Why can't you use html_entity_decode() followed by htmlentities()? Wouldn't that achieve the same result?
Actually, running html_entity_decode() alone seems to have done the trick. I am in a bit confusion right now as to what's going on.

Re: A regex which the whole PHP world would appreciate.

Posted: Thu Mar 27, 2008 12:10 am
by Walid
Walid wrote:
Why can't you use html_entity_decode() followed by htmlentities()? Wouldn't that achieve the same result?
Actually, running html_entity_decode() alone seems to have done the trick. I am in a bit confusion right now as to what's going on.
My mistake... it is not.

Running the 2 functions one after the other simply do and undo. So how would that work?

Re: A regex which the whole PHP world would appreciate.

Posted: Thu Mar 27, 2008 2:57 am
by onion2k
html_entity_decode() turns html entities in a string into their character equivalents ... so it should turn "PHP & ASP went up a tree. My poem is © 2008." into "PHP & ASP went up a tree. My poem is © 2008.". Running that resulting string through htmlentities() will turn it into "PHP & ASP went up a tree. My poem is © 2008.". Which is what you want, right?

If it doesn't then there's something wrong with your code, or possibly with the character encoding of the initial string.