Page 1 of 1

Parsing Raw Email Data

Posted: Sat Aug 07, 2010 12:12 pm
by chrisatnetronix
Currently I need a email piping to a php script then I need to separate the raw email into variables

$from
$subject
$message


I then need to take the above variables and insert them into a mysql database
I have already setup the pipe to script feature on the host, now
I have written a parser php script that does ok, but it seems to work in

thunderbird email client
comcast webmail

if I use outlook it includes a bunch of encryption code with the $message

and yahoo and gmail make the message include a bunch of things like this:

--000e0cd24e8a6d9440048d2e4f27
Content-Type: text/plain; charset=ISO-8859-1

messtest

--000e0cd24e8a6d9440048d2e4f27
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable


if the $message is "messtest" that's all i need not the other stuff for message

the same is with the $from i need:

just test@gmail.com (example output)

I instead get:

tester <test@gmail.com>


and it seems the output from each variable varies via different mail servers.

I need a universal parser so I can get the variables I need no matter what mail server they use.

here is the code I have made using it to parse then testing it by sending it to a text file to view.

Code: Select all

#!/usr/bin/php -q
<?php
// read from stdin
$fp = fopen("php://stdin", "r");
$email = "";
while (!feof($fp)) {
$email .= fgets($fp, 1024);
}
fclose($fp);
// handle email
$lines = explode("\n", $email);
// empty vars
$from = "";
$subject = "";
$headers = "";
$message = "";
$splittingheaders = true;

for ($i=0; $i < count($lines); $i++) {
if ($splittingheaders) {
// this is a header
$headers .= $lines[$i]."\n";
// look out for special headers
if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) {
$subject = $matches[1];
}
if (preg_match("/^From: (.*)/", $lines[$i], $matches)) {
$from = $matches[1];
}
} else {
// not a header, but message
$message .= $lines[$i]."\n";
}
if (trim($lines[$i])=="") {
// empty line, header section has ended
$splittingheaders = false;


}
}
//write mail to file
//emails.txt is chmod 777
$out = fopen("emails.txt","a+");
fwrite($out, $message);
fclose($out);


?>

Re: Parsing Raw Email Data

Posted: Sat Aug 07, 2010 5:07 pm
by chrisatnetronix
Here is a resolution that fixes 99 percent of the parsing issues:

this works and has been tested in AOL, Yahoo, MSN, Gmail OutLook, and Thunderbird.

The only one that still has some raw code is gmail and it only shows this in the message:

--0016367658309708d3048d428b34
Content-Type: text/plain; charset=ISO-8859-1

no ohters show any thing:

simply add this code under but before the text file write part of the code I supplied above.

Code: Select all

preg_match("/boundary=\".*?\"/i", $headers, $boundary);
$boundaryfulltext = $boundary[0];

if ($boundaryfulltext!="")
{
$find = array("/boundary=\"/i", "/\"/i");
$boundarytext = preg_replace($find, "", $boundaryfulltext);
$splitmessage = explode("--" . $boundarytext, $message);
$fullmessage = ltrim($splitmessage[1]);
preg_match('/\n\n(.*)/is', $fullmessage, $splitmore);

if (substr(ltrim($splitmore[0]), 0, 2)=="--")
{
$actualmessage = $splitmore[0];
}
else
{
$actualmessage = ltrim($splitmore[0]);
}

}
else
{
$actualmessage = ltrim($message);
}

$clean = array("/\n--.*/is", "/=3D\n.*/s");
$cleanmessage = trim(preg_replace($clean, "", $actualmessage)); 
then after that you can install your insert into mysql code or whatever you like.


I must admit parsing raw email universally ain't easy...