I've got about 3000 text files encoded in ANSI (windows-1252) that I'd like to convert to UTF-8. I used UTFCast to do that, but now all my special characters are turning up wonky - smart quotes, emdashes, accented letters, and so on are either appearing as question marks or weird characters.
Is there any way for me to preserve those special characters while doing the conversion? There are way too many files for me to do this with a manual search and replace, especially since I won't know if I've fixed every instance of this unless I go through each file to check. Even if I stick to just the foreign-language files, there are probably at least a hundred.
Any help would be greatly appreciated, thanks!
ANSI to UTF-8 - how to preserve special characters?
Moderator: General Moderators
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: ANSI to UTF-8 - how to preserve special characters?
From the Unix command line or any language with access to the standard C library you cat try iconv.
Command line: http://www.gnu.org/savannah-checkouts/g ... onv.1.html
PHP docs: http://www.php.net/manual/en/book.iconv.php
Command line: http://www.gnu.org/savannah-checkouts/g ... onv.1.html
PHP docs: http://www.php.net/manual/en/book.iconv.php
(#10850)