Page 1 of 1
converting Excel sheets to UTF CSV
Posted: Tue Aug 18, 2009 4:06 pm
by Eran
This is not completely related to PHP, though I'm hoping for a PHP based solution. Anyone know how to manipulate an Excel file that has international characters into a UTF CSV file? unfurtunately the default encoding in excel is not UTF (I believe it's ISO-8859) and upon exporting to CSV it loses information on the special characters.
Perhaps Mark Baker can shed some light on this?
Re: converting Excel sheets to UTF CSV
Posted: Tue Aug 18, 2009 4:37 pm
by dejvos
Hello,
I'm Czech so I have been facing this problem many times. Excel uses Microsoft's encoding so for Czech is a Windows-1250. I' ve solved the problem with iconv().
Re: converting Excel sheets to UTF CSV
Posted: Tue Aug 18, 2009 4:42 pm
by Eran
but how do you export the data without losing the special characters? i don't want to simply copy paste, since it will lose all the table formatting
Re: converting Excel sheets to UTF CSV
Posted: Tue Aug 18, 2009 4:51 pm
by dejvos
I don't understand, you can't format CSV.
First time I get data from Excel using Spreadsheet_Excel_Reader class ( I think that it is PEAR library).
Re: converting Excel sheets to UTF CSV
Posted: Tue Aug 18, 2009 4:54 pm
by Eran
thanks, I'll look into that class.
CSV retains the table structure (columns), which is important since those are huge files and I need that separation
Re: converting Excel sheets to UTF CSV
Posted: Tue Aug 18, 2009 5:02 pm
by dejvos
Well,
yes, I tought You want to keep wide borders

.
Re: converting Excel sheets to UTF CSV
Posted: Tue Aug 18, 2009 5:13 pm
by Eran
Re: converting Excel sheets to UTF CSV
Posted: Wed Aug 19, 2009 3:39 am
by Mark Baker
pytrin wrote:This is not completely related to PHP, though I'm hoping for a PHP based solution. Anyone know how to manipulate an Excel file that has international characters into a UTF CSV file? unfurtunately the default encoding in excel is not UTF (I believe it's ISO-8859) and upon exporting to CSV it loses information on the special characters.
Perhaps Mark Baker can shed some light on this?
Sorry, I was away from a network connection for most of yesterday
I'm assuming you're talking xls rather than xlsx.
When reading an xls, we read the codepage value from the workbook, and convert all content from that codepage to UTF-8.
Possible codepage values are:
Code: Select all
367 ASCII (ASCII)
437 OEM US (CP437)
720 OEM Arabic // currently not supported by libiconv
737 OEM Greek (CP737)
775 OEM Baltic (CP775)
850 OEM Latin I (CP850)
852 OEM Latin II Central European (CP852)
855 OEM Cyrillic (CP855)
857 OEM Turkish (CP857)
858 OEM Multilingual Latin I with Euro (CP858)
860 OEM Portugese (CP860)
861 OEM Icelandic (CP861)
862 OEM Hebrew (CP862)
863 OEM Canadian French (CP863)
864 OEM Arabic (CP864)
865 OEM Nordic (CP865)
866 OEM Cyrillic Russian (CP866)
869 OEM Greek Modern (CP869)
874 ANSI Thai (CP874)
932 ANSI Japanese Shift-JIS (CP932)
936 ANSI Chinese Simplified GBK (CP936)
949 ANSI Korean Wansung (CP949)
950 ANSI Chinese Traditional BIG5 (CP950)
1200 UTF-16 (UTF-16LE)
1250 ANSI Latin II Central European (CP1250)
1251 ANSI Cyrillic (CP1251)
1252 ANSI Latin I (CP1252)
1253 ANSI Greek (CP1253)
1254 ANSI Turkish (CP1254)
1255 ANSI Hebrew (CP1255)
1256 ANSI Arabic (CP1256)
1257 ANSI Baltic (CP1257)
1258 ANSI Vietnamese (CP1258)
1361 ANSI Korean Johab (CP1361)
10000 Apple Roman (MAC)
32768 Apple Roman (MAC)
32769 ANSI Latin I // currently not supported by libiconv
Conversion is using iconv() if available, otherwise mb_convert_encoding() if available, so internally we try and hold everything as UTF-8.
In the PHPExcel CSV writer, we have a setUseBOM() method (defaults to false). If true, then when writing the CSV data, we write a UTF-8 BOM marker before writing the file data... but we do assume that the actual data is UTF-8 at the point when the writer save() method is called. Data is written "as is", assuming that any reader conversions to UTF-8 have already been performed, or that data has been set as UTF-8 by the developer.
Re: converting Excel sheets to UTF CSV
Posted: Wed Aug 19, 2009 5:01 am
by Eran
Thanks for your comments, Mark. I tried out phpexcel, and after some minor issues I got it working and it was exactly what I needed, good job!
Some minor comments:
- Your documentation in the word files is pretty good, but any chance for an (online) HTML version? it's not so comfortable navigating without hyperlinks.
- The subject of different excel readers is not completely clear. I have excel files from office 2003, so I tried out first the excel2007 reader. I had no idea what excel5 is or what it reads. Of course I ran into some issues with it - maybe you could perform some basic file type detection? or explain in the documentation what office version correspondes with what reader/writer
again, very nice work!
Re: converting Excel sheets to UTF CSV
Posted: Wed Aug 19, 2009 5:14 am
by Mark Baker
pytrin wrote:Thanks for your comments, Mark. I tried out phpexcel, and after some minor issues I got it working and it was exactly what I needed, good job!
Glad to know it was useful
pytrin wrote:- Your documentation in the word files is pretty good, but any chance for an (online) HTML version? it's not so comfortable navigating without hyperlinks.
I'll look at what we can do. As always, documentation is the bugbear of all coders, and while we try to keep it useful (I'm currently rewriting the Function reference), it's a lot of manual work when we'd far rather be coding

... but I'll see what we can do about "other formats" such as an HTML version.
There is the API docs, but that's not really the same type of documentation.
pytrin wrote:- The subject of different excel readers is not completely clear. I have excel files from office 2003, so I tried out first the excel2007 reader. I had no idea what excel5 is or what it reads. Of course I ran into some issues with it - maybe you could perform some basic file type detection? or explain in the documentation what office version correspondes with what reader/writer
Quick tip: use
Code: Select all
$objPHPExcel = PHPExcel_IOFactory::load("05featuredemo.xlsx");
which does perform basic file type detection (albeit only based on the file extension)
Re: converting Excel sheets to UTF CSV
Posted: Wed Aug 19, 2009 6:27 am
by Eran
that last tip is a good one. put it in your documentation!

or if it's already there, I missed it..