[SOLVED] RFC 2822 with stupid circular references :(
Posted: Wed Jan 09, 2008 12:37 am
I was moving along so well with this as well. I'm going for 100% compliancy with a whole bunch of RFC's (RFC 2822 providing the bulk of it).
In my quest for validating and parsing some parts of email headers I'm turning the ABNF syntax (i.e. this stuff
)
into PCRE groups which I can "glue together" to make the tokens described in the RFC.
However, I've hit a big hurdle (a brick wall??):
I was defining tokens in *exactly* the same way the RFC refers to them until this point:
ccontent refers to comment, and comment refers to ccontent so I can't see a way to write a regex which matches this
Any ideas? Maybe I'll just have to be as close a reasonably possible here...
//Hmm... do I remember someone mentioning a 'recurse' flag in PCRE?
EDIT | I'll mull this over and continue... for now, a handy TODO
In my quest for validating and parsing some parts of email headers I'm turning the ABNF syntax (i.e. this stuff
Code: Select all
comment = "(" *([FWS] ccontent) [FWS] ")"into PCRE groups which I can "glue together" to make the tokens described in the RFC.
However, I've hit a big hurdle (a brick wall??):
Code: Select all
ccontent = ctext / quoted-pair / comment
comment = "(" *([FWS] ccontent) [FWS] ")"Code: Select all
//Refer to RFC 2822 for ABNF
$noWsCtl = '[\x01-\x08\x0B\x0C\x0E-\x19\x7F]';
$text = '[\x00-\x08\x0B\x0C\x0E-\x7F]';
$quotedPair = '\\\\' . $text;
$atext = '[a-zA-Z0-9!#\$%&\'\*\+\-\/=\?\^_`\{\}\|~]';
$dotAtomText = $atext . '+' . '(\.' . $atext . '+)*?';
$qtext = '(?:' . $noWsCtl . '|[\x21\x23-\x5B\x5D-\x7E])';
$noFoldQuote = '"(?:' . $qtext . '|' . $quotedPair . ')*?"';
$dtext = '(?:' . $noWsCtl . '|[\x21-\x5A\x5E-\x7E])';
$noFoldLiteral = '\[(?:' . $dtext . '|' . $quotedPair . ')*?\]';
$idLeft = '(?:' . $dotAtomText . '|' . $noFoldQuote . ')';
$idRight = '(?:' . $dotAtomText . '|' . $noFoldLiteral . ')';
$WSP = '[ \t]';
$CRLF = '\r\n';
$FWS = '(?:' . $WSP . '*' . $CRLF . ')?' . $WSP;
$ctext = '(?:' . $noWsCtl . '|[\x21-\x27\x2A-\x5B\x5D-\x7E])';
//AGRRRAAAGGGGHHHH!!!!!
$comment = '\((?:' . $FWS . '|' . $ccontent. ')*?' . $FWS . '?\)';
$ccontent = '(?:' . $ctext . '|' . $quotedPair . '|' . $comment . ')';//Hmm... do I remember someone mentioning a 'recurse' flag in PCRE?
EDIT | I'll mull this over and continue... for now, a handy TODO
Code: Select all
//TODO: Make this RFC2822 compliant (support comment nesting -- e.g. add |comment)
$ccontent = '(?:' . $ctext . '|' . $quotedPair . ')';