ANSI to UTF-8 - how to preserve special characters?

XML, Perl, Python, and other languages can be discussed here, even if it isn't PHP (We might forgive you).

Moderator: General Moderators

Post Reply
anivad
Forum Commoner
Posts: 80
Joined: Thu Apr 09, 2009 11:16 pm

ANSI to UTF-8 - how to preserve special characters?

Post by anivad »

I've got about 3000 text files encoded in ANSI (windows-1252) that I'd like to convert to UTF-8. I used UTFCast to do that, but now all my special characters are turning up wonky - smart quotes, emdashes, accented letters, and so on are either appearing as question marks or weird characters.

Is there any way for me to preserve those special characters while doing the conversion? There are way too many files for me to do this with a manual search and replace, especially since I won't know if I've fixed every instance of this unless I go through each file to check. Even if I stick to just the foreign-language files, there are probably at least a hundred.

Any help would be greatly appreciated, thanks!
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: ANSI to UTF-8 - how to preserve special characters?

Post by Christopher »

From the Unix command line or any language with access to the standard C library you cat try iconv.

Command line: http://www.gnu.org/savannah-checkouts/g ... onv.1.html
PHP docs: http://www.php.net/manual/en/book.iconv.php
(#10850)
Post Reply