[Solved] Parsing incoming mail -- charsets
Posted: Tue Apr 04, 2006 6:46 pm
I am trying to use two public license classes to recieve and parse incoming mail through a POP3 server. It works great for us-ascii messages, however as soon as any special characters are present -- that is as soon as the charset changes -- I get broken text.
The class that I am using for retrieving and reading the messages is pop3class from http://www.phpclasses.org/browse/package/2.html. It does a handy job of getting the content and headers etc.
I am reading the content type out of the header and then as pop3class reads the body of the message I am using the charset identified in the header to attempt to convert from ASCII using ConvertCharset from http://www.hotscripts.com/Detailed/37274.html.
In short it isn't working.
I am a Java programmer that is in the process of cross-training in PHP for its flexibility. In Java when declaring a string you can specify the character set. This doesn't appear to be possible in PHP4. Are strings stored as raw bytes and then by default assumed to be ASCII? If you do a character conversion, for example:
does the variable now contain a UTF8 string? Does the server know to treat it that way?
Does anyone know if fgets() returns us-ascii?
Thanks in advance.
The class that I am using for retrieving and reading the messages is pop3class from http://www.phpclasses.org/browse/package/2.html. It does a handy job of getting the content and headers etc.
I am reading the content type out of the header and then as pop3class reads the body of the message I am using the charset identified in the header to attempt to convert from ASCII using ConvertCharset from http://www.hotscripts.com/Detailed/37274.html.
In short it isn't working.
I am a Java programmer that is in the process of cross-training in PHP for its flexibility. In Java when declaring a string you can specify the character set. This doesn't appear to be possible in PHP4. Are strings stored as raw bytes and then by default assumed to be ASCII? If you do a character conversion, for example:
Code: Select all
$my_utf_string = ConvertCharset::Convert($my_raw_variable, "us-ascii", "utf8");Does anyone know if fgets() returns us-ascii?
Thanks in advance.