Page 1 of 1

Extract Substing from complex text

Posted: Thu Apr 06, 2006 3:32 am
by ed209
I am attempting to receive emails and sort through the raw source to extract the infomation I require. The information I need is attachments, subject, message and To. I have looked into how emails are made up and it seems that they are divided into various sections.

Those sections are defined by:

Code: Select all

Content-Type: multipart/mixed; boundary="----=_NextPart_000_58b1_26fb_1faf"
where 'boundary' defines the start of a new section. A section may look like (this would be the message section):

Code: Select all

------=_NextPart_000_58b1_26fb_1faf

Content-Type: text/plain; format=flowed



this is the message

------=_NextPart_000_58b1_26fb_1faf
So I can find the relevant section by extracting infomation between 'boundary'.

My question is, how should I search through the source and extract (then work with) these sections? Emails are likely to be a few Mb's due to attachments.

Should I:

Code: Select all

explode($boudary, $email_source);
or

find the occurences of 'boundary' and substr them

or

use some sort of preg_match()



??????

Posted: Thu Apr 06, 2006 4:21 am
by ed209
incase anyone stumbles accross this post in the future, it may be easier to use PHP imap function family.

Code: Select all

// you can also connect to pop3
$mbox = imap_open ("{localhost:110/pop3}INBOX", "user", "password");
$structure = imap_fetchstructure($mbox,$msgNum);

// this gives you a breakdown of erevything in the email.
print_r($structure);