Page 1 of 1
seeking optimal way to parse an email sent to my PHP script
Posted: Thu Feb 02, 2006 11:18 pm
by bitt3n
I am currently piping mail to a script that does a bit of parsing as per this article:
http://www.devarticles.com/c/a/PHP/Inco ... and-PHP/1/
Email piped to the script gets recorded as $email. The script then parses it like this:
http://www.filefarmer.com/2/bitt3n/example.html
I want to parse $email more completely. I want to isolate fully the message body and email addresses, and ideally also isolate the message from any text that comprises a message to which the message body is a reply. (Ultimately I hope to handle attachments.)
In a perfect world, some function would take $email as its argument and returns variables $to, $from, $subject, $message, $num_attachments, and $attachments_array with the array indicating the name, type, size and filepath of each attachment saved by the function (which saves each attachment to a specified directory if this attachment meets the size, type and number criteria).
Is there some library I can install that has such a function? Searching around I found some software called ripMIME that looks interesting (
http://www.pldaniels.com/ripmime/), but I am not sure that is what I want and it doesn’t appear to come with any documentation. Also I’ve never needed to install a library before, and I believe I would have to get a ripMIME binary to use it, since it is written in C and will need to be compiled. Another option I am considering is
http://pear.php.net/manual/en/package.m ... l-mime.php. I've never used any PEAR modules before and I am not sure that is what I want either.
Thanks for your advice.
Posted: Fri Feb 03, 2006 4:57 am
by raghavan20
I have not read the whole post and as the theme says, you want to parse an entire email and split it into parts so please post a sample entire email which has everything in it and specify the sections of the email you want to be isolated.
Posted: Fri Feb 03, 2006 9:46 am
by bitt3n
sure, this is an example of what the parsed sections look like after I send an email to my present script (variable names in bold):
http://www.filefarmer.com/2/bitt3n/example.html
I want to isolate the e-mail address in the
$from and
$to variables (the recipient address is isolated in this example), and the text "this is the message body" in
$message. Ideally I want to be able to handle attachments properly, but at least I must be able to isolate them from $message. Presently when I send attachments they get attached to the bottom of
$message in this form:
------=_Part_5869_28984144.1138981175181
Content-Type: image/jpeg; name="render-icon.cgi.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="render-icon.cgi.jpg"
X-Attachment-Id: f_ej8nz36e
/9j/4AAQSkZJRgABAQEASABIAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB
AQEBAQICAQECAQEBAgICAgICAgICAQICAgICAgICAgL/2wBDAQEBAQEBAQEBAQECAQEBAgICAgIC
AgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgL/wAARCABgAGADAREA
AhEBAxEB/8QAHAAAAgIDAQEAAAAAAAAAAAAABwgGCQQFCgED/8QAMRAAAgICAgEDBAEDBAIDAQAA
AQIDBAURBhIHABMhCBQiMSMJMkEVFlFhM3EkNEJS/8QAHQEAAgIDAQEBAAAAAAAAAAAABQYEBwID
etc. etc. etc.
Posted: Fri Feb 03, 2006 9:58 am
by josh
Split on a double line break limiting to one split to isolate the header data, then explode the header data on a new line and further explode each element of the header on the colon to get all the header values into an array, subsequently use regex to find the mime boundary to explode the emai's body on to isolate each attachment
Posted: Fri Feb 03, 2006 12:14 pm
by raghavan20
this is the regex to get the from email address..
Code: Select all
<pre>
<?php
$from = " Test User <bitt3n@myurl.com>";
echo "Retrieving from email address:<br />"; preg_match("#.*?<(.*?)>#si", $from, $matches);
print_r($matches);
?>
</pre>
output:
Code: Select all
Retrieving from email address:Array
(
[0] => Test User
[1] => bitt3n@myurl.com
)
What I want from you is one email which is not parsed...that email should have all from, to, date, subject, body and a few attachments...post such an email here...i want to see the structure so that I can write regex to split them into parts ... hope you get me..
Posted: Fri Feb 03, 2006 1:03 pm
by bitt3n
sounds good. here is an example of an unparsed email:
http://findmoby.com/unparsed_mail.htm
Everything after "MESSAGE UNPARSED: " is the $email variable that the script emailed back to my address.
I attached 3 jpegs with this message, but I see only 2 in the unparsed message. I tried this more than once and got the same result. I don't know why the third gets cut off.
I also set up my script so that you can receive unparsed messages if you want to experiment. Send a message to
bitt3n@findmoby.com. If the subject of your message is "unparsed" (no quotation marks), the script will reply with the unparsed $email variable. Any other subject will return the message (incompletely) parsed as per the example in my first post.
Thanks again for your help. Let me know what additional info I can provide.
Posted: Fri Feb 03, 2006 2:36 pm
by raghavan20
Why there are two bodies?? two contents with content-disposition as inline...
Code: Select all
------=_Part_12417_32294437.1138992797880
Content-Type: multipart/alternative;
boundary="----=_Part_12418_29668200.1138992797880"
------=_Part_12418_29668200.1138992797880
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
this is the message body
------=_Part_12418_29668200.1138992797880
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
this is the message body<br>
------=_Part_12418_29668200.1138992797880--
anyway this is the code to retrieve the body of the message..
Code: Select all
<pre>
<?php
$message = file_get_contents("http://findmoby.com/unparsed_mail.htm");
preg_match_all("#Content-Disposition:\s+inline(.*?)[-]+=_Part#is", $message, $matches);
print_r($matches);
?>
</pre>
output:
Code: Select all
Array
(
[0] => Array
(
[0] => Content-Disposition: inlinethis is the message body------=_Part
[1] => Content-Disposition: inlinethis is the message body<br>------=_Part
)
[1] => Array
(
[0] => this is the message body
[1] => this is the message body<br>
)
)
I am now facing a small problem retrieving to, from , subject and date information in a single go...hopefully will sort it out soon
Posted: Fri Feb 03, 2006 2:47 pm
by bitt3n
raghavan20 wrote:Why there are two bodies??
I wondered that myself. However I noticed that they are not exactly the same. The first has:
Content-Type: text/plain;
and the second has:
Content-Type: text/html;
I don't care about the html version, for what it's worth.