Page 1 of 1

Trying to read all mail headers in a single preg_match_all

Posted: Fri Feb 03, 2006 2:27 pm
by raghavan20
I have an unparsed mail from which I am trying to read parts of it...I am trying to read from/to/date/subject all in a single statement...
Here is the mail from which I am trying to match..

Code: Select all

<?php
$message  = file_get_contents("http://findmoby.com/unparsed_mail.htm");
//print_r($message);
preg_match_all("#(Date|From|To|Subject):(.*?)$#im", $message, $matches);
print_r($matches);

?>
What happens is ...it is greedy and does not actually stop at new lines even after I have specified m modifier
output:

Code: Select all

Array
(
    [0] => Array
        (
            [0] => To:
            [1] => to:godshalk@gmail.com">godshalk@gmail.com Fri Feb 03 12:53:16 2006Received: from [66.249.92.193] (helo=uproxy.gmail.com)        by athena.pronameservice.net with esmtp (Exim 4.52)        id 1F563W-00021Z-Qv        for bitt3n@findmoby.com; Fri, 03 Feb 2006 12:53:16 -0600Received: by uproxy.gmail.com with SMTP id u2so29148uge        for <bitt3n@findmoby.com>; Fri, 03 Feb 2006 10:53:19 -0800 (PST)DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;        s=beta; d=gmail.com;        h=received:message-id:date:from:to:subject:mime-version:content-type;        b=dyk5dmqU7VHivuZ69z96Nvr7Ln15ZNcqjWopqsMvmfm+f9szF9eCFoLpsCbt09ba2qngHfnP5ouug6/59hCBKhhBccAQbUYo1vZ8saqtjJn6uwhYIxNlKZC0yTbHrqKvzswmR/+vm30ArWVkC/vmkVESFFmHHfbJESwWFyvolnQ=Received: by 10.49.8.5 with SMTP id l5mr606158nfi;        Fri, 03 Feb 2006 10:53:18 -0800 (PST)Received: by 10.48.254.3 with HTTP; Fri, 3 Feb 2006 10:53:17 -0800 (PST)Message-ID: <e0632b600602031053hfa1455agdf3520a1605a12d7@mail.gmail.com>Date: Fri, 3 Feb 2006 13:53:17 -0500From: Dirk Godshalk <godshalk@gmail.com>To: bitt3n@findmoby.comSubject: unparsedMIME-Version: 1.0Content-Type: multipart/mixed;        boundary="----=_Part_12417_32294437.1138992797880"------=_Part_12417_32294437.1138992797880Content-Type: multipart/alternative;        boundary="----=_Part_12418_29668200.1138992797880"
...
...
...
...
...
...

Posted: Fri Feb 03, 2006 4:19 pm
by timvw
It doesn't work because there is a newline between To: and user@example.com

Code: Select all

<?php
$message  = file_get_contents("unparsed_mail.htm");
preg_match_all("#^(.*)$#im", $message, $matches);
print_r($matches);
?>

Code: Select all

[42] => <div>
            [43] => To:
            [44] => Dirk Godshalk <godshalk@gmail.com>
            [45] => </div>

Posted: Fri Feb 03, 2006 5:36 pm
by feyd
the email being rendered instead of original source makes it difficult to actually parse.. :?

Posted: Sat Feb 04, 2006 7:00 am
by raghavan20
I have come up with this...but the problem is when I try to retrieve the from address, it returns Javascript which I could not understand..run the script to see what is happening...

Code: Select all

<pre>
<?php
$message  = file_get_contents("http://findmoby.com/unparsed_mail.htm");
preg_match_all("#Message-ID.*(Date.*?)MIME-Version#im", $message, $matches);
print_r($matches);
$inputString = $matches[1][0];
preg_match("#Date:(.*)From#s", $inputString, $matches);
//print_r($matches);
$date = $matches[1];
preg_match("#From:.*?<(.*?)>#", $inputString, $matches);
print_r($matches);
$from = $matches[1];
preg_match("#To:\s(.*?)Subject#", $inputString, $matches);
//print_r($matches);
$to = $matches[1];
preg_match("#Subject:\s(.*?)$#", $inputString, $matches);
//print_r($matches);
$subject = $matches[1];



?> 
</pre>
Output..

Code: Select all

Array
(
    [0] => Array
        (
            [0] => message-id:date:from:to:subject:mime-version:content-type;        b=dyk5dmqU7VHivuZ69z96Nvr7Ln15ZNcqjWopqsMvmfm+f9szF9eCFoLpsCbt09ba2qngHfnP5ouug6/59hCBKhhBccAQbUYo1vZ8saqtjJn6uwhYIxNlKZC0yTbHrqKvzswmR/+vm30ArWVkC/vmkVESFFmHHfbJESwWFyvolnQ=Received: by 10.49.8.5 with SMTP id l5mr606158nfi;        Fri, 03 Feb 2006 10:53:18 -0800 (PST)Received: by 10.48.254.3 with HTTP; Fri, 3 Feb 2006 10:53:17 -0800 (PST)Message-ID: <e0632b600602031053hfa1455agdf3520a1605a12d7@mail.gmail.com>Date: Fri, 3 Feb 2006 13:53:17 -0500From: Dirk Godshalk <godshalk@gmail.com>To: bitt3n@findmoby.comSubject: unparsedMIME-Version
        )

    [1] => Array
        (
            [0] => Date: Fri, 3 Feb 2006 13:53:17 -0500From: Dirk Godshalk <godshalk@gmail.com>To: bitt3n@findmoby.comSubject: unparsed
        )

)
Array
(
    [0] => From: Dirk Godshalk <
    [1] => a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:godshalk@gmail.com"
)