Converting UTF-8 to ANSI for CSV Export

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Converting UTF-8 to ANSI for CSV Export

Post by batfastad »

Hi everyone

I'm converting a Filemaker database into an intranet PHP/MySQL system.
With the new php/MySQL it's all set to use utf-8 everywhere, database tables (utf8_general_ci), html output, mysql collations, and mysql connection collations.
So that's all great... Apart from when I want to export a CSV to read in Excel :roll:

When I'm exporting the CSV using PHP, this is what I'm doing:

Code: Select all

header('Content-type: text/csv; charset=ISO-8859-1');
header('Content-Disposition: attachment; filename="MailingData.csv"');
echo iconv('UTF-8', 'ISO-8859-1', $var);
And this almost works. However I'm getting stuck on certain eastern european characters... at the moment the CSV output stalls on a Z with an inverted circumflex hat thing on top.
When I view the CSV with a text editor (Notepad++), the CSV output just stops just before that character.

However when I export the same data from Filemaker, the CSV output works fine and opening up the file in Excel that character displays perfectly as well.
So excel seems to be capable of displaying that character, but I just need to get PHP to output past that point.
When I open both the output CSV files in Notepad++, they both show Dos\Windows and ANSI as the format.

Am I using the correct destination charset in my iconv function?
Or should that be a windows one or CP12... or whatever it is?

I understand that converting from UTF-8 to anything else is usually a lossy process when it comes to these characters.
Is there anything I can do to get PHP outputting past that point?
And any way to minimise the potential loss in char information?

Thanks, B
dml
Forum Contributor
Posts: 133
Joined: Sat Jan 26, 2008 2:20 pm

Re: Converting UTF-8 to ANSI for CSV Export

Post by dml »

Here's a list of the character sets that support Ž.

If you want non-lossy output, and anything other than non-lossy output shouldn't really be an option (an accent on a character isn't a scribal flourish, it encodes a distinction in sound and meaning that might be very important), then you're obviously constrained to picking a character set that supports all the characters in your data, and if you have characters from a varied-enough set of languages, then the only character sets that will work are things like UTF8.

I'm guessing, correct me if I'm wrong, that the reason you're doing the conversion is that Excel is a bit thick about opening utf8 files. But if I understand that problem correctly, that's because Excel assumes all CSV files are in the default encoding of the system, so it's likely to garble ISO-8859-XXX files as well.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Re: Converting UTF-8 to ANSI for CSV Export

Post by Kieran Huggins »

Maybe take a look at the phpMyAdmin "export to Excel" code - I have a feeling they've probably had enough eyes on the problem to have a solution ready-made for you.
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Re: Converting UTF-8 to ANSI for CSV Export

Post by batfastad »

I agree, non-lossy is absolutely the way forward. I also agree that Excel is being stupid and annoying over this.
I guess the reason PHP stops outputting at that particular character, is because that character doesn't exist in that charset!
I'll try some of the other charsets mentioned on that page in my iconv script step, and see if that helps.

I guess I could also benefit from finding out exactly what charset the filemaker output is in. Are there any utilities that can give me that info?
Because Excel definitely works ok on those CSV files.

Good call on the phpMyAdmin. I'll have a dig through that. If anyone's been through this pain it will be those guys.

Cheers, B
dml
Forum Contributor
Posts: 133
Joined: Sat Jan 26, 2008 2:20 pm

Re: Converting UTF-8 to ANSI for CSV Export

Post by dml »

Look at a hex dump of the line with the Ž in it and compare the way the Ž is encoded with the list at http://www.fileformat.info/info/unicode ... upport.htm . That'll narrow it down to one or two possible character sets.

Another thing to check for is a byte order mark at the very start of the file.
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Re: Converting UTF-8 to ANSI for CSV Export

Post by batfastad »

Ok this is sorted!
PHP was stopping at that character because I was trying to force the outputting of a charset which didn't support that character.
By looking at the charset of the Filemaker CSV output in JEdit, the encoding was cp1252 and not ISO. So I thought I'd just make sure I was outputting Windows-1252
I tried using iconv but I encountered some strange problems.
For the time being this seems to work ok:

Code: Select all

function csv_encode_conv($var, $enc='Windows-1252') {
    $var = htmlentities($var, ENT_QUOTES, 'utf-8');
    $var = html_entity_decode($var, ENT_QUOTES , $enc);
    return $var;
}
I'll be keeping an eye on it though

Hope this helps someone out, thanks for the help!
I'll be having a dig through phpmyadmin to try and track down how they've done it

Cheers, B
wireless2007
Forum Newbie
Posts: 1
Joined: Mon Mar 09, 2009 4:50 am

Re: Converting UTF-8 to ANSI for CSV Export

Post by wireless2007 »

Using this function:
$vv = csv_encode_conv($vv,'Windows-1252');

function csv_encode_conv($var, $enc='Windows-1252') {
$var = htmlentities($var, ENT_QUOTES, 'utf-8');
$var = html_entity_decode($var, ENT_QUOTES , $enc);
return $var;
}


This is a great work!
Thanks batfastad , you are a genius! :lol:
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Re: Converting UTF-8 to ANSI for CSV Export

Post by batfastad »

Wow, glad I was able to help out! :lol:

I modified my function slightly to deal with different input encodings as well

Code: Select all

function encoding_conv($var, $enc_out, $enc_in='utf-8') {
    $var = htmlentities($var, ENT_QUOTES, $enc_in);
    return html_entity_decode($var, ENT_QUOTES, $enc_out);
}
So the input encoding (3rd parameter) defaults to utf-8 unless you tell it otherwise.

And a final bit of working out I had to do, involved making Excel compatible CSV files... so Excel doesn't trim the leading zero from numeric fields. OpenOffice did this as well in my tests.

Code: Select all

function output_csv($var, $raw=FALSE) {
 
    if($raw === FALSE and is_numeric($var) and mb_strpos($var, 0) === 0) {
        // CSV LEADING ZERO - ESCAPE LIKE THIS FOR EXCEL "=""08075"""
        $var = '"=""'.$var.'"""';
 
    } elseif ( strlen($var) > 0 and !is_numeric($var)) {
        // CSV STRING COLUMN
        $var = '"'.str_replace('"', '""', $var).'"';
    }
    return $var;
}
So this correctly escapes strings and quote marks within the output data, and if you leave out the 2nd parameter of the function ($raw) then it will give you Excel compatible number columns, preserving the leading zero!

Hope this helps someone out.
You have no idea how long it took me to figure this out!!
Obviously let me know if there's any problems with my code, I've only been doing PHP for a couple of years so it could probably be improved.

Cheers, B
Post Reply