[clue-talk] Triaging spam

Wed Dec 22 15:28:45 MST 2004

On 12-15 12:57, Matt Gushee wrote:
> I've been thinking about how to more effectively deal with the flood of 
> spam I get, and it seems to me that SpamAssassin's yes-or-no judgment is 
> a rather crude mechanism, and a triage approach would be better. I mean:
> 
>   Some messages are definitely spam. Send them straight to /dev/null.
> 
>   Some messages are definitely not spam. Send them to the Inbox.
> 
>   Some messages might be spam. Send them to the maybe-spam folder.

I do exactly this with procmail and spamassassin. I think my user_prefs has
spam set as 6.0. Then I put this in my .procmailrc:

:0:
* ^X-Spam-Flag: YES
$HOME/Mail/junk

:0:
* ^X-Spam-Level: \*\*\*\*\*
$HOME/Mail/maybejunk

You can vary your settings as needed depending on how wide you want the grey
area to be.

As a side note: how often do you train your SpamAssassin? I keep a large
amount of ham and spam around in order to train every so many weeks or so. I
don't have many issues with spam ever since I've started using SpamAssassin,
and especially after they added the Bayesian stuff. 

On another side note, has everyone read the writeup by Paul Graham? This
made it into his _Hackers & Painters_. 

http://www.paulgraham.com/spam.html

I like his observation about how much more the statistical analysis would
show about "words" like "c0ck" vs. what he might be able to do with
keywords.  Even with his holier-than-thou attitude towards Java, I have a
lot of respect for the guy...

I think the very fact that spammers purposely mangle words will make a
Bayesian filter work better. I don't know how well they will handle the
newest trend from the scum-sucking bottom feeders, though - ASCII art. 

-- 
Sean LeBlanc:seanleblanc at comcast.net  
To be poor without murmuring is difficult. To be rich without being proud is 
easy. 
-Confucius