Skip to navigation
   
Dan Jones's Blog
Fighting Spam with Spamassassin

By Dan Jones in Reader

Posted in Spam, Networking, Email on August 14, 2008 at 8:08 am

Permalink | Author Profile

Well, after many years with zero anti-spam technologies (and manual deletion of ~ 200 items a day) I decided it was time to move my mail host and implement anti-spam technologies.

Now I already have a home SAMBA server, running Debian, which also acts as a mini desktop. I decided to use this as my mail volume isn’t huge… I get ~20 valid emails a day, ~200-500 spams depending on the day of week really.

SpamAssassin looked to be the premier anti-spam solution out there for Linux, and I selected a Debian EXIM integration. Took a while to learn exim, but I’m now mostly impressed with the configuration. I’ve used dovecot as a IMAP server. All these are the standard Debian stable packages……

Basic procedure for me was I installed the packages - then I followed this guide and got a basic system up and running… and moved a “test” domain name to point inbound SMTP at the box so I could then fully test all the options and tune the anti-spam.

Tricks the above guide missed:

Using CPAN (perl -m CPAN -e shell) to install Net::DNS. Without this vital step Spamassassin missed out on ALL DNS tests, which are quite good for scoring.

Bayesian filtering.

  • Set this up to use a system wide database, in a folder you control with world read/write access. The default isn’t right.
  • You may wish to increase the default size of the bayes database. I increased mine 10 times.
  • It seems to require 200 spams and 200 non-spams to be learnt before its operational - at first I did not realise this. I fed Bayes a folder of 2000 spams, and let it read my (already filtered of spam) archive of personal mails as non-spam (3400 items). This trained the spam filter quite well.. I used a variation of this script
  • If you run sa-learn with -D for debug it does tend to show faults in your SA config.
  • Increasing score of BAYES_99 for me at least results in better results.
  • I’ve set up learn as spam folders in my mailfile, which is learnt and deleted every 6 hours (ie mails making it through SA I drag to this folder).

Setting SpamAssassin up is NOT easy, and requires a lot of tinkering to get runnign as you want (hence my playing with a test domain). Once complete however, its an brilliant system in my opinion at least.

Now its up and running, only 4 spams have hit my mailbox (though I’m still storing all spam - aim is to not store very high scoring spams in future, and only store “uncertain” results. Though right now, with ~5000 spams not hitting my mailbox I’m a happy bunny.

SpamAssassin is also available as a windows version I believe. For Exchange users with nothing else it may be worth a look.

12345
Rated: 100% (1 votes)
Loading ... Loading ...

Previous Post | Next Post

 
 
Comments

Comment by Philip - August 14, 2008 on 3:43 pm

I look forward to trying SA. Nice to see it has Bayesian filtering like SpamBully which I have been using on the desktop for a few years.

Comment by Matt - August 15, 2008 on 2:47 pm

Spamassassin is also available as a Windows pop3 proxy - http://sawin32.sourceforge.net/

Spampal is another possibility - http://www.spampal.org/ - though development did rather stall.

There seems to be a split in approach between several spamfilters, some, such as Spampal, place high importance on DNSBL, while others may leave DNSBL out altogether unless you choose to add.

One warning about DNSBLs, they also divide into about 3 categories:
1. Automatic, spamtrap bbased, easy-on / easy-off. The CBL (or the Spamhaus XBL) http://cbl.abuseat.org/faq.html - is a good example
2. Actively managed lists, Spamcop is a prime example.
3. Lists intended for personal use, but made available.

The problem with using a type 3 list, is that the single administrator of the list will oftem be making value judgements about quite large netblocks, eg. “x” is a spammy ISP, or freemails (including Gmail) are undesirable.

Country blocking is another moderately effective slicing method, if you expect never to have genuine correspondence from the common offenders.

I’m not a great fan of Bayes - anti-bayes junk is too common.

For body filtering, another quite useful filter is the Chinese IP filter, if the website in the body has a Chinese IP (unless you have correspondents with websites there), then it is pretty much 100% spam - Chinese hosts don’t seem to be very fussy.

Pingback by dig, dig and dig » Fighting spam with spamassassin - August 21, 2009 on 3:51 pm

[…] http://www.itpro.co.uk/blogs/danj/2008/08/14/fighting-spam-with-spamassassin/ […]

Trackback by Jeri Macnab - February 9, 2012 on 5:31 am

greenpeace international energy…

[…]various factors exactly where Decide Rodgers may have failed to workout […]…

Trackback by Shavonda Frappier - February 9, 2012 on 7:10 am

will smith greatest hits free download…

[…]a central aspect of sports activities stroke, physiotherapy, osteotherapy, tension leadership and rest therapy.[…]…

Make a comment

* required

* required

We stop spam using reCaptcha.
Type the words below and click Submit Comment.

Advertisement
Advertisement