converting Excel sheets to UTF CSV

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

converting Excel sheets to UTF CSV

Post by Eran »

This is not completely related to PHP, though I'm hoping for a PHP based solution. Anyone know how to manipulate an Excel file that has international characters into a UTF CSV file? unfurtunately the default encoding in excel is not UTF (I believe it's ISO-8859) and upon exporting to CSV it loses information on the special characters.

Perhaps Mark Baker can shed some light on this?
dejvos
Forum Contributor
Posts: 122
Joined: Tue Mar 10, 2009 8:40 am

Re: converting Excel sheets to UTF CSV

Post by dejvos »

Hello,

I'm Czech so I have been facing this problem many times. Excel uses Microsoft's encoding so for Czech is a Windows-1250. I' ve solved the problem with iconv().
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: converting Excel sheets to UTF CSV

Post by Eran »

but how do you export the data without losing the special characters? i don't want to simply copy paste, since it will lose all the table formatting
dejvos
Forum Contributor
Posts: 122
Joined: Tue Mar 10, 2009 8:40 am

Re: converting Excel sheets to UTF CSV

Post by dejvos »

I don't understand, you can't format CSV.

First time I get data from Excel using Spreadsheet_Excel_Reader class ( I think that it is PEAR library).
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: converting Excel sheets to UTF CSV

Post by Eran »

thanks, I'll look into that class.

CSV retains the table structure (columns), which is important since those are huge files and I need that separation
dejvos
Forum Contributor
Posts: 122
Joined: Tue Mar 10, 2009 8:40 am

Re: converting Excel sheets to UTF CSV

Post by dejvos »

Well,

yes, I tought You want to keep wide borders ;).
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: converting Excel sheets to UTF CSV

Post by Eran »

:P
Mark Baker
Forum Regular
Posts: 710
Joined: Thu Oct 30, 2008 6:24 pm

Re: converting Excel sheets to UTF CSV

Post by Mark Baker »

pytrin wrote:This is not completely related to PHP, though I'm hoping for a PHP based solution. Anyone know how to manipulate an Excel file that has international characters into a UTF CSV file? unfurtunately the default encoding in excel is not UTF (I believe it's ISO-8859) and upon exporting to CSV it loses information on the special characters.

Perhaps Mark Baker can shed some light on this?
Sorry, I was away from a network connection for most of yesterday

I'm assuming you're talking xls rather than xlsx.

When reading an xls, we read the codepage value from the workbook, and convert all content from that codepage to UTF-8.
Possible codepage values are:

Code: Select all

 
367    ASCII (ASCII)
437    OEM US (CP437)
720    OEM Arabic    // currently not supported by libiconv
737    OEM Greek (CP737)
775    OEM Baltic (CP775)
850    OEM Latin I (CP850)
852    OEM Latin II Central European (CP852)
855    OEM Cyrillic (CP855)
857    OEM Turkish (CP857)
858    OEM Multilingual Latin I with Euro (CP858)
860    OEM Portugese (CP860)
861    OEM Icelandic (CP861)
862    OEM Hebrew (CP862)
863    OEM Canadian French (CP863)
864    OEM Arabic (CP864)
865    OEM Nordic (CP865)
866    OEM Cyrillic Russian (CP866)
869    OEM Greek Modern (CP869)
874    ANSI Thai (CP874)
932    ANSI Japanese Shift-JIS (CP932)
936    ANSI Chinese Simplified GBK (CP936)
949    ANSI Korean Wansung (CP949)
950    ANSI Chinese Traditional BIG5 (CP950)
1200   UTF-16 (UTF-16LE)
1250   ANSI Latin II Central European (CP1250)
1251   ANSI Cyrillic (CP1251)
1252   ANSI Latin I (CP1252)
1253   ANSI Greek (CP1253)
1254   ANSI Turkish (CP1254)
1255   ANSI Hebrew (CP1255)
1256   ANSI Arabic (CP1256)
1257   ANSI Baltic (CP1257)
1258   ANSI Vietnamese (CP1258)
1361   ANSI Korean Johab (CP1361)
10000 Apple Roman (MAC)
32768 Apple Roman (MAC)
32769 ANSI Latin I   // currently not supported by libiconv
 
Conversion is using iconv() if available, otherwise mb_convert_encoding() if available, so internally we try and hold everything as UTF-8.

In the PHPExcel CSV writer, we have a setUseBOM() method (defaults to false). If true, then when writing the CSV data, we write a UTF-8 BOM marker before writing the file data... but we do assume that the actual data is UTF-8 at the point when the writer save() method is called. Data is written "as is", assuming that any reader conversions to UTF-8 have already been performed, or that data has been set as UTF-8 by the developer.
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: converting Excel sheets to UTF CSV

Post by Eran »

Thanks for your comments, Mark. I tried out phpexcel, and after some minor issues I got it working and it was exactly what I needed, good job!

Some minor comments:
- Your documentation in the word files is pretty good, but any chance for an (online) HTML version? it's not so comfortable navigating without hyperlinks.
- The subject of different excel readers is not completely clear. I have excel files from office 2003, so I tried out first the excel2007 reader. I had no idea what excel5 is or what it reads. Of course I ran into some issues with it - maybe you could perform some basic file type detection? or explain in the documentation what office version correspondes with what reader/writer


again, very nice work!
Mark Baker
Forum Regular
Posts: 710
Joined: Thu Oct 30, 2008 6:24 pm

Re: converting Excel sheets to UTF CSV

Post by Mark Baker »

pytrin wrote:Thanks for your comments, Mark. I tried out phpexcel, and after some minor issues I got it working and it was exactly what I needed, good job!
Glad to know it was useful
pytrin wrote:- Your documentation in the word files is pretty good, but any chance for an (online) HTML version? it's not so comfortable navigating without hyperlinks.
I'll look at what we can do. As always, documentation is the bugbear of all coders, and while we try to keep it useful (I'm currently rewriting the Function reference), it's a lot of manual work when we'd far rather be coding :D ... but I'll see what we can do about "other formats" such as an HTML version.
There is the API docs, but that's not really the same type of documentation.
pytrin wrote:- The subject of different excel readers is not completely clear. I have excel files from office 2003, so I tried out first the excel2007 reader. I had no idea what excel5 is or what it reads. Of course I ran into some issues with it - maybe you could perform some basic file type detection? or explain in the documentation what office version correspondes with what reader/writer
Quick tip: use

Code: Select all

$objPHPExcel = PHPExcel_IOFactory::load("05featuredemo.xlsx");
which does perform basic file type detection (albeit only based on the file extension)
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: converting Excel sheets to UTF CSV

Post by Eran »

that last tip is a good one. put it in your documentation! :) or if it's already there, I missed it..
Post Reply