Escape Welsh Characters

Discussions of secure PHP coding. Security in software is important, so don't be afraid to ask. And when answering: be anal. Nitpick. No security vulnerability is too small.

Moderator: General Moderators

Post Reply
peredur
Forum Newbie
Posts: 2
Joined: Mon Nov 01, 2010 6:13 am

Escape Welsh Characters

Post by peredur »

Hi,

I have a similar question to Tolga's (viewtopic.php?f=34&t=122869&sid=f4c1b72 ... e989cfa28e), but it's probably sufficiently different for a topic on its own. If that's not the case I'd be more than happy to merge the threads.

The application that contains the problem is one that I've been contracted to completely redevelop. It stores data in text form in both English and Welsh. Here is an example of some of the Welsh data: http://www.wales-legislation.org.uk/en/acts/1084. You will see that it contains, for example, a 'w' character with a circumflex. You will also see from the state of the page why it needs redeveloping! But that's a different matter. At the moment, the site does not escape the data it gets from the database.

The data is entered by a Site Editor, using TinyMCE in an administration application. It is then stored in a MySQL database. To present the data on the public pages (and, indeed, to represent it in the administration pages when it is retrieved for editing), I wanted to use

Code: Select all

htmlentities()
, but doing so means that each 'w' with a circumflex, for example, appears like this after escaping: ŵ It should appear like this: ŵ The same problem arises with the other uniquely Welsh diacritics. As far as I can tell the problem does not arise with the more common accented characters like â.

On examining the database, I find that the ŵ character is represented there as ŵ but that it renders correctly in display pages when it is not escaped in my PHP code. So temporarily I've had to remove the calls I added to

Code: Select all

htmlentities()
. I'm not happy about that since it clearly leaves the site open to XSS attacks.

I note that PHP has some validate filters (http://php.net/manual/en/filter.filters.validate.php), but I can't work out from that page which filter is the one I want or how to use it. I just want to make that text safe for display as HTML in the way that

Code: Select all

htmlentities()
does, but to allow Welsh diacritics like ŵ and ŷ along with the other more familiar ones you'll have met in French, Spanish and other European langugaes.

Thanks in advance


Peredur ab Efrog
jrgp
Forum Newbie
Posts: 4
Joined: Mon Nov 01, 2010 8:57 am

Re: Escape Welsh Characters

Post by jrgp »

Use htmlspecialchars() not htmlentities(). The former deals with characters like >< appropriately but leaves foreign language characters intact. Win/win.
peredur
Forum Newbie
Posts: 2
Joined: Mon Nov 01, 2010 6:13 am

Re: Escape Welsh Characters

Post by peredur »

Sheesh! Sometimes I can be a real thickhead.

Thank you for your patient response. I'm off work now until Monday, but I'll be sure to sort it first thing on Monday morning (bosses permitting).

Cheers


Peter
Post Reply