seeking optimal way to parse an email sent to my PHP script
Moderator: General Moderators
seeking optimal way to parse an email sent to my PHP script
I am currently piping mail to a script that does a bit of parsing as per this article:
http://www.devarticles.com/c/a/PHP/Inco ... and-PHP/1/
Email piped to the script gets recorded as $email. The script then parses it like this:
http://www.filefarmer.com/2/bitt3n/example.html
I want to parse $email more completely. I want to isolate fully the message body and email addresses, and ideally also isolate the message from any text that comprises a message to which the message body is a reply. (Ultimately I hope to handle attachments.)
In a perfect world, some function would take $email as its argument and returns variables $to, $from, $subject, $message, $num_attachments, and $attachments_array with the array indicating the name, type, size and filepath of each attachment saved by the function (which saves each attachment to a specified directory if this attachment meets the size, type and number criteria).
Is there some library I can install that has such a function? Searching around I found some software called ripMIME that looks interesting (http://www.pldaniels.com/ripmime/), but I am not sure that is what I want and it doesn’t appear to come with any documentation. Also I’ve never needed to install a library before, and I believe I would have to get a ripMIME binary to use it, since it is written in C and will need to be compiled. Another option I am considering is http://pear.php.net/manual/en/package.m ... l-mime.php. I've never used any PEAR modules before and I am not sure that is what I want either.
Thanks for your advice.
http://www.devarticles.com/c/a/PHP/Inco ... and-PHP/1/
Email piped to the script gets recorded as $email. The script then parses it like this:
http://www.filefarmer.com/2/bitt3n/example.html
I want to parse $email more completely. I want to isolate fully the message body and email addresses, and ideally also isolate the message from any text that comprises a message to which the message body is a reply. (Ultimately I hope to handle attachments.)
In a perfect world, some function would take $email as its argument and returns variables $to, $from, $subject, $message, $num_attachments, and $attachments_array with the array indicating the name, type, size and filepath of each attachment saved by the function (which saves each attachment to a specified directory if this attachment meets the size, type and number criteria).
Is there some library I can install that has such a function? Searching around I found some software called ripMIME that looks interesting (http://www.pldaniels.com/ripmime/), but I am not sure that is what I want and it doesn’t appear to come with any documentation. Also I’ve never needed to install a library before, and I believe I would have to get a ripMIME binary to use it, since it is written in C and will need to be compiled. Another option I am considering is http://pear.php.net/manual/en/package.m ... l-mime.php. I've never used any PEAR modules before and I am not sure that is what I want either.
Thanks for your advice.
- raghavan20
- DevNet Resident
- Posts: 1451
- Joined: Sat Jun 11, 2005 6:57 am
- Location: London, UK
- Contact:
sure, this is an example of what the parsed sections look like after I send an email to my present script (variable names in bold):
http://www.filefarmer.com/2/bitt3n/example.html
I want to isolate the e-mail address in the $from and $to variables (the recipient address is isolated in this example), and the text "this is the message body" in $message. Ideally I want to be able to handle attachments properly, but at least I must be able to isolate them from $message. Presently when I send attachments they get attached to the bottom of $message in this form:
------=_Part_5869_28984144.1138981175181
Content-Type: image/jpeg; name="render-icon.cgi.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="render-icon.cgi.jpg"
X-Attachment-Id: f_ej8nz36e
/9j/4AAQSkZJRgABAQEASABIAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB
AQEBAQICAQECAQEBAgICAgICAgICAQICAgICAgICAgL/2wBDAQEBAQEBAQEBAQECAQEBAgICAgIC
AgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgL/wAARCABgAGADAREA
AhEBAxEB/8QAHAAAAgIDAQEAAAAAAAAAAAAABwgGCQQFCgED/8QAMRAAAgICAgEDBAEDBAIDAQAA
AQIDBAURBhIHABMhCBQiMSMJMkEVFlFhM3EkNEJS/8QAHQEAAgIDAQEBAAAAAAAAAAAABQYEBwID
etc. etc. etc.
http://www.filefarmer.com/2/bitt3n/example.html
I want to isolate the e-mail address in the $from and $to variables (the recipient address is isolated in this example), and the text "this is the message body" in $message. Ideally I want to be able to handle attachments properly, but at least I must be able to isolate them from $message. Presently when I send attachments they get attached to the bottom of $message in this form:
------=_Part_5869_28984144.1138981175181
Content-Type: image/jpeg; name="render-icon.cgi.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="render-icon.cgi.jpg"
X-Attachment-Id: f_ej8nz36e
/9j/4AAQSkZJRgABAQEASABIAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB
AQEBAQICAQECAQEBAgICAgICAgICAQICAgICAgICAgL/2wBDAQEBAQEBAQEBAQECAQEBAgICAgIC
AgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgL/wAARCABgAGADAREA
AhEBAxEB/8QAHAAAAgIDAQEAAAAAAAAAAAAABwgGCQQFCgED/8QAMRAAAgICAgEDBAEDBAIDAQAA
AQIDBAURBhIHABMhCBQiMSMJMkEVFlFhM3EkNEJS/8QAHQEAAgIDAQEBAAAAAAAAAAAABQYEBwID
etc. etc. etc.
Split on a double line break limiting to one split to isolate the header data, then explode the header data on a new line and further explode each element of the header on the colon to get all the header values into an array, subsequently use regex to find the mime boundary to explode the emai's body on to isolate each attachment
- raghavan20
- DevNet Resident
- Posts: 1451
- Joined: Sat Jun 11, 2005 6:57 am
- Location: London, UK
- Contact:
this is the regex to get the from email address..
output:
What I want from you is one email which is not parsed...that email should have all from, to, date, subject, body and a few attachments...post such an email here...i want to see the structure so that I can write regex to split them into parts ... hope you get me..
Code: Select all
<pre>
<?php
$from = " Test User <bitt3n@myurl.com>";
echo "Retrieving from email address:<br />"; preg_match("#.*?<(.*?)>#si", $from, $matches);
print_r($matches);
?>
</pre>Code: Select all
Retrieving from email address:Array
(
[0] => Test User
[1] => bitt3n@myurl.com
)sounds good. here is an example of an unparsed email:
http://findmoby.com/unparsed_mail.htm
Everything after "MESSAGE UNPARSED: " is the $email variable that the script emailed back to my address.
I attached 3 jpegs with this message, but I see only 2 in the unparsed message. I tried this more than once and got the same result. I don't know why the third gets cut off.
I also set up my script so that you can receive unparsed messages if you want to experiment. Send a message to bitt3n@findmoby.com. If the subject of your message is "unparsed" (no quotation marks), the script will reply with the unparsed $email variable. Any other subject will return the message (incompletely) parsed as per the example in my first post.
Thanks again for your help. Let me know what additional info I can provide.
http://findmoby.com/unparsed_mail.htm
Everything after "MESSAGE UNPARSED: " is the $email variable that the script emailed back to my address.
I attached 3 jpegs with this message, but I see only 2 in the unparsed message. I tried this more than once and got the same result. I don't know why the third gets cut off.
I also set up my script so that you can receive unparsed messages if you want to experiment. Send a message to bitt3n@findmoby.com. If the subject of your message is "unparsed" (no quotation marks), the script will reply with the unparsed $email variable. Any other subject will return the message (incompletely) parsed as per the example in my first post.
Thanks again for your help. Let me know what additional info I can provide.
- raghavan20
- DevNet Resident
- Posts: 1451
- Joined: Sat Jun 11, 2005 6:57 am
- Location: London, UK
- Contact:
Why there are two bodies?? two contents with content-disposition as inline...
anyway this is the code to retrieve the body of the message..
output:
I am now facing a small problem retrieving to, from , subject and date information in a single go...hopefully will sort it out soon
Code: Select all
------=_Part_12417_32294437.1138992797880
Content-Type: multipart/alternative;
boundary="----=_Part_12418_29668200.1138992797880"
------=_Part_12418_29668200.1138992797880
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
this is the message body
------=_Part_12418_29668200.1138992797880
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
this is the message body<br>
------=_Part_12418_29668200.1138992797880--Code: Select all
<pre>
<?php
$message = file_get_contents("http://findmoby.com/unparsed_mail.htm");
preg_match_all("#Content-Disposition:\s+inline(.*?)[-]+=_Part#is", $message, $matches);
print_r($matches);
?>
</pre>Code: Select all
Array
(
[0] => Array
(
[0] => Content-Disposition: inlinethis is the message body------=_Part
[1] => Content-Disposition: inlinethis is the message body<br>------=_Part
)
[1] => Array
(
[0] => this is the message body
[1] => this is the message body<br>
)
)