Preg_Match to find valid email addresses in Mail Log

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
james007
Forum Newbie
Posts: 2
Joined: Sat Sep 04, 2010 11:15 am

Preg_Match to find valid email addresses in Mail Log

Post by james007 »

Morning folks - been out of the devnetwork community for some time now, but glad to be back.

I've fallen a BIT out of touch with regular expressions and PHP in general, but am now back on the bandwagon. Need what I *hope* will be simple help to a problem I can't get my head around. I'm trying to find particular lines in a mail log text file. I've already got the file opened and read in to an array, but now need to find lines formatted in the following manner by valid email address:

Code: Select all

Sep  1 09:57:55 gpmail postfix/smtp[12622]: 093A116B10B: to=<randomtext82@hotmail.com>, relay=mx2.hotmail.com[65.54.188.110]:25, delay=0.85, delays=0.08/0.03/0.28/0.46, dsn=2.0.0, status=sent (250  <dc57235831ca0d82c17d2fe292982ad2@randomname> Queued mail for delivery)
I'd like to validate it by the email address in the to= field (if it's empty, I'd like to unset it's value in the array) and would like to find a way to validate the "dc57235831ca0d82c17d2fe292982ad2@randomname" which you'll note has no TLD.

Can someone help me out a bit?

Thanks for your time!
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Preg_Match to find valid email addresses in Mail Log

Post by Jonah Bron »

I'm blurry on what you're trying to do... could you clarify?
User avatar
McInfo
DevNet Resident
Posts: 1532
Joined: Wed Apr 01, 2009 1:31 pm

Re: Preg_Match to find valid email addresses in Mail Log

Post by McInfo »

According to filter_var(), the email address ending with "@randomname" is valid.

Indeed, "randomname" is a valid host name, at least syntactically.
RFC 2821: Simple Mail Transfer Protocol wrote:A domain (or domain name) consists of one or more dot-separated components.
The regex pattern in this script is less restrictive than the pattern used by filter_var() to validate email addresses, so it should pull something out of the input string from the places you expect to find email addresses.

Code: Select all

<?php
header('Content-Type: text/plain');
$pattern = '/<((?:"[^"]+"@|[^<]+@)[^>]+)>/';
$subject = 'Sep  1 09:57:55 gpmail postfix/smtp[12622]: 093A116B10B: to=<randomtext82@hotmail.com>, relay=mx2.hotmail.com[65.54.188.110]:25, delay=0.85, delays=0.08/0.03/0.28/0.46, dsn=2.0.0, status=sent (250  <dc57235831ca0d82c17d2fe292982ad2@randomname> Queued mail for delivery)';
$matches = null;
var_dump(
    preg_match_all($pattern, $subject, $matches),
    $matches,
    filter_var($matches[1][0], FILTER_VALIDATE_EMAIL),
    filter_var($matches[1][1], FILTER_VALIDATE_EMAIL)
);
Pattern explanation:

Code: Select all

'/<((?:"[^"]+"@|[^<]+@)[^>]+)>/' $pattern
'                              ' string bounds
 /                            /  regex bounds
  <                          >   literal characters (email bounds)
   (                        )    capturing subpattern (the email address)
    (?:               )          non-capturing subpattern (the local part)
               |                 "or" branch within nearest subpattern
       "     "@                  literal chars in case of quoted local part
                     @           literal char in case of usual, unquoted local part
        [  ]                     character range bounds
        [^"]                     any character not a quote mark
        [^"]+                    one or more any char not a quote mark
                [^<]+            one or more any char not a less-than sign
                       [^>]+     one or more any char not a greater-than sign
After validating the matched strings with filter_var(), you can apply additional restrictions, such as requiring a dot nested in the host part or a verified top-level domain. (Wikipedia: List of TLDs) Ask if you need tips.

Beyond that, it is possible to verify the domain with DNS validation and maybe verify the account with SMTP validation, but I have not yet explored that in-depth. There are some sites that do this kind of validation for you, but I wouldn't trust them to keep the email addresses private, so you should probably come up with a solution yourself or find an open-source solution that you can examine before implementing.

Update: I found something that might be useful. (I haven't tried it.)
james007
Forum Newbie
Posts: 2
Joined: Sat Sep 04, 2010 11:15 am

Re: Preg_Match to find valid email addresses in Mail Log

Post by james007 »

Excellent. Thank you both for your help. As I research further, I'll update this postage.

Cheers!
Post Reply