check if string contains html entities

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
cjkeane
Forum Contributor
Posts: 217
Joined: Fri Jun 11, 2010 1:17 pm

check if string contains html entities

Post by cjkeane »

Hi Everyone.
I am starting to have an issue which I'm not sure how to resolve.
I have a database which is all set using utf8.
all data up until now, I have run htmlspecialchars on content when it inserted into the db, so i always need to use html_entity_decode to decode properly.
i changed my script recently and i no longer convert any text to html entities when its submitted to the db. I leave it up to utf8 to display correctly which it does for all new records.
My issue is that all previously entered data may have some elements of html entities which i need to decode to display properly, but if i run html_entity_decode, it mangles the display of some newly entered data.

My thought is to do something like this: if html entities are detected in the string, then run html_entities_decode on it, otherwise do nothing.
I'm just not sure how to code for that. Any help would be appreciated. Thanks.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: check if string contains html entities

Post by requinix »

How do you tell the difference between HTML entities that are supposed to be there and those that are not?
cjkeane
Forum Contributor
Posts: 217
Joined: Fri Jun 11, 2010 1:17 pm

Re: check if string contains html entities

Post by cjkeane »

i can only identify the entities by looking either at the html source code or in the actual db to see why they don't display correctly when viewed on the website.
for example:
1. a previous entry inserted into the db looks like this: <strong>testing</strong>
When viewed on the website, it displays as <strong>testing</strong>
2. a new entry is saved to the db like so: <strong>testing</strong> and displays on the website as tested actually bolded
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: check if string contains html entities

Post by Celauran »

I guess the obvious takeaways here are a.
cjkeane wrote:all data up until now, I have run htmlspecialchars on content when it inserted into the db
don't do that, and 2.
cjkeane wrote:i changed my script recently and i no longer convert any text to html entities when its submitted to the db.
don't do that. Worry about escaping your HTML when it comes out, not when it goes in but be consistent.

That said, how bad is the damage? Can you fix it to be consistent either one way or the other, or is there just too much?
cjkeane
Forum Contributor
Posts: 217
Joined: Fri Jun 11, 2010 1:17 pm

Re: check if string contains html entities

Post by cjkeane »

i just changed my script yesturday to just use mysql_real_escape string to get the data in.
up until then, i was using a function which applied htmlspecialchars. i then found i had to decode it when viewing it on the site which was working for the most part.
occassionally there would be an issue decoding it, but yesturday i had an inquiry if chinese characters could be saved into the db. when i tested it (and because i had enabled html_entities_decode) chinese characters were mangled.
which is why i asked if it was possible to detect if htmlspecialchars was within a string, if it was, then decode it, otherwise do nothing.
there are close to 300,000 records, but as i can tell only about 2500 records have htmlspecialchars applied.

whats the best way to accommodate both issues?
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: check if string contains html entities

Post by Celauran »

So 298,000 new records since yesterday? Wow. The good news is there are only 2,500 or so that need fixing. Are you using auto-incrementing primary keys? Can you easily identify the cutoff point and update only those records?
cjkeane
Forum Contributor
Posts: 217
Joined: Fri Jun 11, 2010 1:17 pm

Re: check if string contains html entities

Post by cjkeane »

no no. 300,000 records in the last two years. thats approx. the number of records in the entire db.
i just did a quick search and of the 2500 records which had some variant of htmlspecial chars, some were regarding accented characters, others were from html formatting. that being said, i did a quick search in the db, and i'm down to 14 records which i need to actually fix so its not as bad as i thought.
Post Reply