[clue-tech] Me vs. Spam and Spamassassin

Sun Sep 21 22:26:39 MDT 2008

Jed S. Baer wrote:
> Hi Folks.
>
> Well, I'm once again looking for advice on combatting spam. Been working
> on it most of the day, and made some real headway. But ...
>
> Some really ugly sample messages are coming out with really low scores.
>
> So here's some info.
>
> Spamassassin 3.1.7
> CentOS 5
> Postfix 2.3.3-2
> procmail 3.22
>
> I doubt the version of Postfix, procmail, or CentOS are relevant.
>
> I'm invoking Spamassin via procmail, directed from a .forward file.
> Seemed the easiest thing to do, rather than config postfix to run it. And
> that all works fine. When I look at mail, I see the relevant Spamassassin
> headers, so mail is getting piped through SA just fine, and afterwards
> procmail is delivering it as specified. Hey, I even got IMAP running.
> Whheeeeeeeeee! Also, I'm not using Bayesian filtering, as the various
> docs indicate that our spam/ham ration is too large to have that be
> useful.
>
> I've read Schawtz's Spamassasin book, and poked around the official SA
> wiki online. Based on what I've read, the claim is that even without
> Bayesian filtering, SA should be detecting spam pretty well, using just
> its various processing rules. Without going into gory detail, I've sent a
> variety of crap through it, and the highest score I've seen is 1.4. Given
> the default for "this looks like spam" is 5, I'm surprised.
>
> Anyways, I've done almost zero mucking about with the SA local.cf file --
> just enough to keep it simple. IOW, I haven't modified any scoring
> factors.
>
> So, what are other folks' experience with SA? Does it mostly just work
> out of the box, or do you have to muck with it significantly?
>   

I haven't set it up since 2003/2004.  When I did, I remember thinking 
that without bayesian filtering I had to set the threshold pretty low to 
ensure that there were no false positives.  Personally, I think you 
should use it so that SA knows what kind of spam you get.  If I remember 
correctly, you can do it on a global basis, or per-user basis.  Maybe 
you should do both but at different thresholds.  What I WOULDN'T do is 
automatically send things it already detected as spam through the 
learning filter.  (But I'm no expert.)  Things I would try might include 
always putting spam with URLs through the learner.

On a side note, since I wasn't using the bayesian filter at the time, I 
also used the procmail sanitizer.  The combination of dumb SA and the 
sanitizer made for an acceptable system at the time.

Angelo