Extract Substing from complex text
Posted: Thu Apr 06, 2006 3:32 am
I am attempting to receive emails and sort through the raw source to extract the infomation I require. The information I need is attachments, subject, message and To. I have looked into how emails are made up and it seems that they are divided into various sections.
Those sections are defined by:
where 'boundary' defines the start of a new section. A section may look like (this would be the message section):
So I can find the relevant section by extracting infomation between 'boundary'.
My question is, how should I search through the source and extract (then work with) these sections? Emails are likely to be a few Mb's due to attachments.
Should I:
or
find the occurences of 'boundary' and substr them
or
use some sort of preg_match()
??????
Those sections are defined by:
Code: Select all
Content-Type: multipart/mixed; boundary="----=_NextPart_000_58b1_26fb_1faf"Code: Select all
------=_NextPart_000_58b1_26fb_1faf
Content-Type: text/plain; format=flowed
this is the message
------=_NextPart_000_58b1_26fb_1fafMy question is, how should I search through the source and extract (then work with) these sections? Emails are likely to be a few Mb's due to attachments.
Should I:
Code: Select all
explode($boudary, $email_source);find the occurences of 'boundary' and substr them
or
use some sort of preg_match()
??????