« Downtime Wednesday Evening | Home | Spam Filtering: Training with Ham »
August 21, 2006
Spam Filtering: Making it Better
Short version:
Spammers adjust their tactics to try to get around spam filters. Save any spam which evads the filters in a folder called Spambox-learn so that SpamAssassin on mathserv can use the messages to update its bayesian spam database each night.
Long version:
A program called SpamAssassin on the departmental mail server attempts to determine whether or not each incoming message is spam using a combination of rules and bayesian methods (more). As spammers adjust their methods and content, the database used by the bayesian algorithms become out of date. One can feed SpamAssissin missed spam, allowing it to "learn" from these mistakes and so improve its abililty to recognize new spam.
Mathserv will now look for a folder called mail/Spambox-missed in each home directory overnight and use the contents as input for the SpamAssassin sa-learn program.
Since sa-learn will only process a given mail message once so you don't need to delete messages from Spambox-learn.
Note that you don't need to put any already-filtered spam into Spambox-learn since SpamAssassin is already being trained on captured spam.
And, finally, a reminder: mathserv uses SpamAssassin to flag all messages as spam or not-spam, but it does not filter spam from your inbox unless you turn filtering on. To turn mail filtering on, see Spam Filtering on the Departmental Mail Server.