In my quest for validating and parsing some parts of email headers I'm turning the ABNF syntax (i.e. this stuff
Code: Select all
comment = "(" *([FWS] ccontent) [FWS] ")"into PCRE groups which I can "glue together" to make the tokens described in the RFC.
However, I've hit a big hurdle (a brick wall??):
Code: Select all
ccontent = ctext / quoted-pair / comment
comment = "(" *([FWS] ccontent) [FWS] ")"Code: Select all
//Refer to RFC 2822 for ABNF
$noWsCtl = '[\x01-\x08\x0B\x0C\x0E-\x19\x7F]';
$text = '[\x00-\x08\x0B\x0C\x0E-\x7F]';
$quotedPair = '\\\\' . $text;
$atext = '[a-zA-Z0-9!#\$%&\'\*\+\-\/=\?\^_`\{\}\|~]';
$dotAtomText = $atext . '+' . '(\.' . $atext . '+)*?';
$qtext = '(?:' . $noWsCtl . '|[\x21\x23-\x5B\x5D-\x7E])';
$noFoldQuote = '"(?:' . $qtext . '|' . $quotedPair . ')*?"';
$dtext = '(?:' . $noWsCtl . '|[\x21-\x5A\x5E-\x7E])';
$noFoldLiteral = '\[(?:' . $dtext . '|' . $quotedPair . ')*?\]';
$idLeft = '(?:' . $dotAtomText . '|' . $noFoldQuote . ')';
$idRight = '(?:' . $dotAtomText . '|' . $noFoldLiteral . ')';
$WSP = '[ \t]';
$CRLF = '\r\n';
$FWS = '(?:' . $WSP . '*' . $CRLF . ')?' . $WSP;
$ctext = '(?:' . $noWsCtl . '|[\x21-\x27\x2A-\x5B\x5D-\x7E])';
//AGRRRAAAGGGGHHHH!!!!!
$comment = '\((?:' . $FWS . '|' . $ccontent. ')*?' . $FWS . '?\)';
$ccontent = '(?:' . $ctext . '|' . $quotedPair . '|' . $comment . ')';//Hmm... do I remember someone mentioning a 'recurse' flag in PCRE?
EDIT | I'll mull this over and continue... for now, a handy TODO
Code: Select all
//TODO: Make this RFC2822 compliant (support comment nesting -- e.g. add |comment)
$ccontent = '(?:' . $ctext . '|' . $quotedPair . ')';