Page 1 of 1

Converting extended characters to html numeric entities

Posted: Tue Aug 21, 2007 3:48 am
by h4ppy
Summary of the problem:
I'm trying to find a way to convert extended characters (such as russian or japanese characters submitted via a GET submittede search form) into their corresponding HTML numeric entities.

Example:
If you enter into the search box
Россия

The get string sent is
?search=%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D1%8F

Within PHP it reports the string (after doing stripslashes, but that shouldn't make a difference) as:
Россия

I would like to convert that string to the respective HTML numeric entities, namely:
& #1056;& #1086;& #1089;& #1089;& #1080;& #1103;
(but without the spaces between the & and # - if I don't put the spaces in, this BB converts them to the actual letters)

...but I cannot figure out how.

Any help would be gratefully received!

Thanks,

C

PS. This BB seems to do exactly what I need when I post the message (hence the spaces in the entities above)!

Posted: Tue Aug 21, 2007 5:35 am
by volka
You might be interested in http://de2.php.net/mbstring
h4ppy wrote:The get string sent is
?search=%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D1%8F
That's the url encoded version of the utf-8 repesentation of your string.

after you have enabled the mbstring extension try

Code: Select all

<?php ini_set('default_charset',	"UTF-8"); ?>
<html>
	<body>
<?php
$s = join('', array_map('chr', array(0xD0, 0xA0, 0xD0, 0xBE, 0xD1, 0x81, 0xD1, 0x81, 0xD0, 0xB8, 0xD1, 0x8F)));
echo 'utf-8:' , $s, "<br />\n";
echo 'htmlentities: ', mb_convert_encoding($s, 'HTML-ENTITIES', 'UTF-8');
?>
	</body>
</html>
and take a look at your browser's source view.

Posted: Tue Aug 21, 2007 7:55 am
by h4ppy
That works perfectly - thank you.

I tried using mb_encode_numericentity() and got horribly lost - seems that I should have been barking up a different tree: mb_convert_encoding() !

Thanks again for taking the time to help me out on this.

C