Page 1 of 1

SpamAssassin - Summary breakdown expression.

Posted: Wed Apr 25, 2007 5:16 am
by php_dev
I've got the following SpamAssassin summary that I'm trying to breakdown in to three parts: score, rule and description:

1.8 SUBJECT_DRUG_GAP_VIA Subject contains a gappy version of '<span style='color:red;text-decoration:blink' title='Alert a moderator!'>grilled spam</span>'
-0.0 NO_RELAYS Informational: message was not relayed via SMTP
0.4 SUBJ_ALL_CAPS Subject is all capitals
1.5 DRUG_ED_CAPS BODY: Mentions an E.D. drug
0.8 BODY_ENHANCEMENT2 BODY: Information on getting larger body parts
1.5 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist
[URIs: hitem.hk]
0.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist
[URIs: hitem.hk]
2.0 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist
[URIs: hitem.hk]
0.0 DRUGS_ERECTILE Refers to an erectile drug

So far I've come up with:

[\n]*[ |-]?([0-9]*\.[0-9]*)[ ]*([a-z0-9_]*)[ ]*([a-z0-9 \':\(\)\.\%\-]*)

This matches the score, rule and description for all lines, except for the blocklist URI results, which it matches in a seperate result. I need the URI result to added to the end of the relevant description. I can't for the life of me get this work without breaking the whole expression. Can anyone help?

I need to store the score, rule and description in a database. I've been using preg_match.

Remeber I'm a newbie, please be patient with me! :wink:

Thanks in advance.

Posted: Sat Apr 28, 2007 10:04 am
by feyd
This seems to work

Code: Select all

<?php

ob_start();

?>
 1.8 SUBJECT_DRUG_GAP_VIA   Subject contains a gappy version of '<span style='color:red;text-decoration:blink' title='Alert a moderator!'>grilled spam</span>'
-0.0 NO_RELAYS              Informational: message was not relayed via SMTP
 0.4 SUBJ_ALL_CAPS          Subject is all capitals
 1.5 DRUG_ED_CAPS           BODY: Mentions an E.D. drug
 0.8 BODY_ENHANCEMENT2      BODY: Information on getting larger body parts
 1.5 URIBL_JP_SURBL         Contains an URL listed in the JP SURBL blocklist
                            [URIs: hitem.hk]
 0.5 URIBL_WS_SURBL         Contains an URL listed in the WS SURBL blocklist
                            [URIs: hitem.hk]
 2.0 URIBL_OB_SURBL         Contains an URL listed in the OB SURBL blocklist
                            [URIs: hitem.hk]
 0.0 DRUGS_ERECTILE         Refers to an erectile drug
<?php

$text = ob_get_clean();

preg_match_all('#^([\s-]?\d+\.\d+)\s+([A-Z_]+)\s+(.*?$(?:(?:\n(?![\s-]\d+\.\d+\s+[A-Z_]+).*?$)*)+)#m', $text, $matches);

print_r($matches);

?>
outputs

Code: Select all

Array
(
    [0] => Array
        (
            [0] =>  1.8 SUBJECT_DRUG_GAP_VIA   Subject contains a gappy version of '<span style='color:red;text-decoration:blink' title='Alert a moderator!'>grilled spam</span>'
            [1] => -0.0 NO_RELAYS              Informational: message was not relayed via SMTP
            [2] =>  0.4 SUBJ_ALL_CAPS          Subject is all capitals
            [3] =>  1.5 DRUG_ED_CAPS           BODY: Mentions an E.D. drug
            [4] =>  1.5 URIBL_JP_SURBL         Contains an URL listed in the JP SURBL blocklist
                            [URIs: hitem.hk]
            [5] =>  0.5 URIBL_WS_SURBL         Contains an URL listed in the WS SURBL blocklist
                            [URIs: hitem.hk]
            [6] =>  2.0 URIBL_OB_SURBL         Contains an URL listed in the OB SURBL blocklist
                            [URIs: hitem.hk]
            [7] =>  0.0 DRUGS_ERECTILE         Refers to an erectile drug

        )

    [1] => Array
        (
            [0] =>  1.8
            [1] => -0.0
            [2] =>  0.4
            [3] =>  1.5
            [4] =>  1.5
            [5] =>  0.5
            [6] =>  2.0
            [7] =>  0.0
        )

    [2] => Array
        (
            [0] => SUBJECT_DRUG_GAP_VIA
            [1] => NO_RELAYS
            [2] => SUBJ_ALL_CAPS
            [3] => DRUG_ED_CAPS
            [4] => URIBL_JP_SURBL
            [5] => URIBL_WS_SURBL
            [6] => URIBL_OB_SURBL
            [7] => DRUGS_ERECTILE
        )

    [3] => Array
        (
            [0] => Subject contains a gappy version of '<span style='color:red;text-decoration:blink' title='Alert a moderator!'>grilled spam</span>'
            [1] => Informational: message was not relayed via SMTP
            [2] => Subject is all capitals
            [3] => BODY: Mentions an E.D. drug
            [4] => Contains an URL listed in the JP SURBL blocklist
                            [URIs: hitem.hk]
            [5] => Contains an URL listed in the WS SURBL blocklist
                            [URIs: hitem.hk]
            [6] => Contains an URL listed in the OB SURBL blocklist
                            [URIs: hitem.hk]
            [7] => Refers to an erectile drug

        )

)
Jcart | Edited post to remove smilies in regex