Escape Welsh Characters
Posted: Mon Nov 01, 2010 6:42 am
Hi,
I have a similar question to Tolga's (viewtopic.php?f=34&t=122869&sid=f4c1b72 ... e989cfa28e), but it's probably sufficiently different for a topic on its own. If that's not the case I'd be more than happy to merge the threads.
The application that contains the problem is one that I've been contracted to completely redevelop. It stores data in text form in both English and Welsh. Here is an example of some of the Welsh data: http://www.wales-legislation.org.uk/en/acts/1084. You will see that it contains, for example, a 'w' character with a circumflex. You will also see from the state of the page why it needs redeveloping! But that's a different matter. At the moment, the site does not escape the data it gets from the database.
The data is entered by a Site Editor, using TinyMCE in an administration application. It is then stored in a MySQL database. To present the data on the public pages (and, indeed, to represent it in the administration pages when it is retrieved for editing), I wanted to use, but doing so means that each 'w' with a circumflex, for example, appears like this after escaping: ŵ It should appear like this: ŵ The same problem arises with the other uniquely Welsh diacritics. As far as I can tell the problem does not arise with the more common accented characters like â.
On examining the database, I find that the ŵ character is represented there as ŵ but that it renders correctly in display pages when it is not escaped in my PHP code. So temporarily I've had to remove the calls I added to. I'm not happy about that since it clearly leaves the site open to XSS attacks.
I note that PHP has some validate filters (http://php.net/manual/en/filter.filters.validate.php), but I can't work out from that page which filter is the one I want or how to use it. I just want to make that text safe for display as HTML in the way that does, but to allow Welsh diacritics like ŵ and ŷ along with the other more familiar ones you'll have met in French, Spanish and other European langugaes.
Thanks in advance
Peredur ab Efrog
I have a similar question to Tolga's (viewtopic.php?f=34&t=122869&sid=f4c1b72 ... e989cfa28e), but it's probably sufficiently different for a topic on its own. If that's not the case I'd be more than happy to merge the threads.
The application that contains the problem is one that I've been contracted to completely redevelop. It stores data in text form in both English and Welsh. Here is an example of some of the Welsh data: http://www.wales-legislation.org.uk/en/acts/1084. You will see that it contains, for example, a 'w' character with a circumflex. You will also see from the state of the page why it needs redeveloping! But that's a different matter. At the moment, the site does not escape the data it gets from the database.
The data is entered by a Site Editor, using TinyMCE in an administration application. It is then stored in a MySQL database. To present the data on the public pages (and, indeed, to represent it in the administration pages when it is retrieved for editing), I wanted to use
Code: Select all
htmlentities()On examining the database, I find that the ŵ character is represented there as ŵ but that it renders correctly in display pages when it is not escaped in my PHP code. So temporarily I've had to remove the calls I added to
Code: Select all
htmlentities()I note that PHP has some validate filters (http://php.net/manual/en/filter.filters.validate.php), but I can't work out from that page which filter is the one I want or how to use it. I just want to make that text safe for display as HTML in the way that
Code: Select all
htmlentities()Thanks in advance
Peredur ab Efrog