[solved] Character encoding set for mac text files

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

[solved] Character encoding set for mac text files

Post by batfastad »

Hi guys

I'm working on a project at the moment using PHP to export data from a FileMaker database (although this applies to MySQL as well).

I'm exporting data in XPressTags - which is a pseudo-XML format devised by Quark allowing you to import text that's already formatted with colours, fonts and everything.
Useful for publishing directories and stuff!

But the problem is the data we're dealing with has many characters and accents and they all get converted to HTML entities on the export from the database.

The code I'm using to generate the header and replace the characters is...

Code: Select all

$xtg_output = html_entity_decode($xtg_output, ENT_COMPAT, "UTF-8");
$xtg_output = preg_replace('/&#(\d+);/me', "chr(\\1)", $xtg_output);
$xtg_output = preg_replace('/&#x([a-f0-9]+);/mei', "chr(0x\\1)", $xtg_output);

header('Content-type: text/xml; Content-encoding: utf-8');
header('Content-Disposition: attachment; filename=directoryexport.xtg');
So I'm outputting an XML file - the contents being my $xtg_output variable.

And when I import this file into Quark 6.0 Passport on a PC all the characters come through fine and correct.

I got the preg_replace code above from the PHP manual page on html_entity_decode

But when I try and import this file on a mac (downloaded from the server again) the characters with access all come through completely different.

The version on the mac is Quark 6.5 Passport, but the version doesn't matter as.

As even when opening the downloaded export file in TextEdit on a mac, the characters are all messed up in the file before it even gets to Quark.

So I'm guessing it's a charset issue.
As you can see above I've used UTF-8
I've also tried using ISO-8859-1 and ISO-8859-15 as suggested on this...
http://uk.php.net/manual/en/function.ht ... decode.php

They all give slightly different results with the characters being scrambled in different ways.

Has anyone found a way round this problem on OS X?

Many of the characters we're dealing with are all Scandinavian, French and Eastern European characters and accents.
And in the downloaded file the rest of the text appears fine on the mac, it's just those particular characters that are mashed up a bit.

Any ideas on what I need to get this working on a mac?

Thanks

Ben
Last edited by batfastad on Wed Sep 13, 2006 11:36 am, edited 1 time in total.
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Post by batfastad »

Ok solved it

The problem was I needed to open the text file in TextEdit on the Mac as 'Unicode UTF-8' then the characters appear fine in TextEdit.

Then just save the text file out as 'Western (Mac OS Roman)'

I was relying on TextEdit automatically detecting the charset.
So it wasn't a PHP / headers issue, but a stupid Mac error (as usual :roll:).

Hope this helps someone


Thanks

Ben
Post Reply