Email Interpretation

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Parody
Forum Contributor
Posts: 252
Joined: Fri May 06, 2005 7:06 pm
Location: Great Britain

Email Interpretation

Post by Parody »

I know I could use a pre-built mail handler, but I need to be able to modify everything and understand how it works so I can connect it to other things.

I have a script which attempts to interpret the email source and for the most part it works well, but it fails to work out which part of the source is the body. If the source of an email has this:

Code: Select all

Date: Mon, 24 Mar 2008 07:40:13 -0000
MIME-Version: 1.0
Content-Type: multipart/alternative;
    boundary="----=_NextPart_000_019D_01C88D82.4BE36DC0"
X-Priority: 3
X-MSMail-Priority: Normal
Importance: Normal
X-Mailer: Microsoft Windows Live Mail 12.0.1606
X-MimeOLE: Produced By Microsoft MimeOLE V12.0.1606
X-OriginalArrivalTime: 24 Mar 2008 07:40:16.0900 (UTC) FILETIME=[4DB48440:01C88D82]
 
This is a multi-part message in MIME format.
 
------=_NextPart_000_019D_01C88D82.4BE36DC0
Content-Type: text/plain;
    charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
 
 
 
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Fusce =
ultrices, felis ac rhoncus pharetra, augue turpis condimentum augue, sed =
cursus est sem scelerisque nulla. Class aptent taciti sociosqu ad litora =
torquent per conubia nostra, per inceptos hymenaeos. Donec erat. Donec =
laoreet. In ut nibh et velit fringilla pharetra. Donec nisl nulla, =
adipiscing sed, lacinia ac, lobortis quis, pede. Suspendisse nec neque =
at nibh lobortis laoreet. Maecenas fringilla, metus aliquet lobortis =
condimentum, sapien massa fermentum eros, in fermentum leo est vel urna. =
Etiam dui. In dui urna, semper sed, aliquet non, tristique in, lorem. =
Nam elementum.=20
 
Nulla gravida. Duis condimentum, nisi non mollis ornare, dolor nibh =
bibendum pede, ut elementum est metus vitae risus. Mauris nec erat. =
Proin et erat. Etiam rhoncus pede quis tortor. Maecenas dolor diam, =
tempus eget, sodales sit amet, tincidunt vitae, est. Etiam a est non est =
venenatis sollicitudin. Aliquam erat volutpat. Lorem ipsum dolor sit =
amet, consectetuer adipiscing elit. Fusce at felis.=20
=20
=20
 
------=_NextPart_000_019D_01C88D82.4BE36DC0
Content-Type: text/html;
    charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type =
content=3Dtext/html;charset=3Diso-8859-1>
<META content=3D"MSHTML 6.00.6000.16609" name=3DGENERATOR></HEAD>
<BODY id=3DMailContainerBody=20
style=3D"PADDING-RIGHT: 10px; PADDING-LEFT: 10px; PADDING-TOP: 15px"=20
bgColor=3D#ffffff leftMargin=3D0 topMargin=3D0 CanvasTabStop=3D"true"=20
name=3D"Compose message area">
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Lorem ipsum dolor sit amet, =
consectetuer adipiscing=20
elit. Fusce ultrices, felis ac rhoncus pharetra, augue turpis =
condimentum augue,=20
sed cursus est sem scelerisque nulla. Class aptent taciti sociosqu ad =
litora=20
torquent per conubia nostra, per inceptos hymenaeos. Donec erat. Donec =
laoreet.=20
In ut nibh et velit fringilla pharetra. Donec nisl nulla, adipiscing =
sed,=20
lacinia ac, lobortis quis, pede. Suspendisse nec neque at nibh lobortis =
laoreet.=20
Maecenas fringilla, metus aliquet lobortis condimentum, sapien massa =
fermentum=20
eros, in fermentum leo est vel urna. Etiam dui. In dui urna, semper sed, =
aliquet=20
non, tristique in, lorem. Nam elementum. </FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Nulla gravida. Duis condimentum, nisi =
non mollis=20
ornare, dolor nibh bibendum pede, ut elementum est metus vitae risus. =
Mauris nec=20
erat. Proin et erat. Etiam rhoncus pede quis tortor. Maecenas dolor =
diam, tempus=20
eget, sodales sit amet, tincidunt vitae, est. Etiam a est non est =
venenatis=20
sollicitudin. Aliquam erat volutpat. Lorem ipsum dolor sit amet, =
consectetuer=20
adipiscing elit. Fusce at felis.=20
<BR>&nbsp;<BR>&nbsp;<BR></FONT></DIV></BODY></HTML>
 
------=_NextPart_000_019D_01C88D82.4BE36DC0--
 
 
The subject is 'Lorem Ipsum' and the message is two paragraphs of lorem ipsum which should be easy to spot.

How do I get php to 'grab' just the body of the message? It would have been easy if I were sure all emails would be in exactly the same format, but I know they're not. :(

I'm sure this would be useful to a lot of people interested in using php to handle and send email.

Thanks to all who reply :D
Last edited by Parody on Mon Mar 24, 2008 2:44 am, edited 1 time in total.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Email Interpretation

Post by Chris Corbyn »

There appears to be some additional backslashes in that email before all the " marks.

Code: Select all

boundary=\"----=_NextPart_000_0181_01C88D2C.38D06CC0\"
There's also a strange comma in single quotes in the gap before the mime boundary.

Code: Select all

', 'This is a multi-part message in MIME format.
Looks like you need to grab RFC 2822, RFC 2045 and RFC 2046 then start to try understanding the email structure. You can work out where the boundaries are then read through them until you reach the "best" format you know how to handle (i.e. the last "text/xxx" part which you know how to use).
Parody
Forum Contributor
Posts: 252
Joined: Fri May 06, 2005 7:06 pm
Location: Great Britain

Re: Email Interpretation

Post by Parody »

Ah, yes. Those commas are due to me putting it in a query and then printing the query. I'll edit the post and use the actual source instead of something php has had it's greasy paws on. I'll have to use a new message though.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Email Interpretation

Post by Chris Corbyn »

Parody wrote:Ah, yes. Those commas are due to me putting it in a query and then printing the query. I'll edit the post and use the actual source instead of something php has had it's greasy paws on. I'll have to use a new message though.
It won't help anything ;) I've already offered my answer ;)

http://www.ietf.org/rfc/rfc2822.txt
http://www.ietf.org/rfc/rfc2045.txt
http://www.ietf.org/rfc/rfc2046.txt
Post Reply