Fighting Spam with Spamassassin
Posted in Spam, Networking, Email on August 14, 2008 at 8:08 am
Well, after many years with zero anti-spam technologies (and manual deletion of ~ 200 items a day) I decided it was time to move my mail host and implement anti-spam technologies.
Now I already have a home SAMBA server, running Debian, which also acts as a mini desktop. I decided to use this as my mail volume isn’t huge… I get ~20 valid emails a day, ~200-500 spams depending on the day of week really.
SpamAssassin looked to be the premier anti-spam solution out there for Linux, and I selected a Debian EXIM integration. Took a while to learn exim, but I’m now mostly impressed with the configuration. I’ve used dovecot as a IMAP server. All these are the standard Debian stable packages……
Basic procedure for me was I installed the packages - then I followed this guide and got a basic system up and running… and moved a “test” domain name to point inbound SMTP at the box so I could then fully test all the options and tune the anti-spam.
Tricks the above guide missed:
Using CPAN (perl -m CPAN -e shell) to install Net::DNS. Without this vital step Spamassassin missed out on ALL DNS tests, which are quite good for scoring.
Bayesian filtering.
- Set this up to use a system wide database, in a folder you control with world read/write access. The default isn’t right.
- You may wish to increase the default size of the bayes database. I increased mine 10 times.
- It seems to require 200 spams and 200 non-spams to be learnt before its operational - at first I did not realise this. I fed Bayes a folder of 2000 spams, and let it read my (already filtered of spam) archive of personal mails as non-spam (3400 items). This trained the spam filter quite well.. I used a variation of this script
- If you run sa-learn with -D for debug it does tend to show faults in your SA config.
- Increasing score of BAYES_99 for me at least results in better results.
- I’ve set up learn as spam folders in my mailfile, which is learnt and deleted every 6 hours (ie mails making it through SA I drag to this folder).
Setting SpamAssassin up is NOT easy, and requires a lot of tinkering to get runnign as you want (hence my playing with a test domain). Once complete however, its an brilliant system in my opinion at least.
Now its up and running, only 4 spams have hit my mailbox (though I’m still storing all spam - aim is to not store very high scoring spams in future, and only store “uncertain” results. Though right now, with ~5000 spams not hitting my mailbox I’m a happy bunny.
SpamAssassin is also available as a windows version I believe. For Exchange users with nothing else it may be worth a look.
Comment by Philip - August 14, 2008 on 3:43 pm
I look forward to trying SA. Nice to see it has Bayesian filtering like SpamBully which I have been using on the desktop for a few years.
Comment by - August 15, 2008 on 2:47 pm
Spamassassin is also available as a Windows pop3 proxy - http://sawin32.sourceforge.net/
Spampal is another possibility - http://www.spampal.org/ - though development did rather stall.
There seems to be a split in approach between several spamfilters, some, such as Spampal, place high importance on DNSBL, while others may leave DNSBL out altogether unless you choose to add.
One warning about DNSBLs, they also divide into about 3 categories:
1. Automatic, spamtrap bbased, easy-on / easy-off. The CBL (or the Spamhaus XBL) http://cbl.abuseat.org/faq.html - is a good example
2. Actively managed lists, Spamcop is a prime example.
3. Lists intended for personal use, but made available.
The problem with using a type 3 list, is that the single administrator of the list will oftem be making value judgements about quite large netblocks, eg. “x” is a spammy ISP, or freemails (including Gmail) are undesirable.
Country blocking is another moderately effective slicing method, if you expect never to have genuine correspondence from the common offenders.
I’m not a great fan of Bayes - anti-bayes junk is too common.
For body filtering, another quite useful filter is the Chinese IP filter, if the website in the body has a Chinese IP (unless you have correspondents with websites there), then it is pretty much 100% spam - Chinese hosts don’t seem to be very fussy.
Pingback by - August 21, 2009 on 3:51 pm
[…] http://www.itpro.co.uk/blogs/danj/2008/08/14/fighting-spam-with-spamassassin/ […]
Trackback by - February 9, 2012 on 5:31 am
greenpeace international energy…
[…]various factors exactly where Decide Rodgers may have failed to workout […]…
Trackback by - February 9, 2012 on 7:10 am
will smith greatest hits free download…
[…]a central aspect of sports activities stroke, physiotherapy, osteotherapy, tension leadership and rest therapy.[…]…
Make a comment
Archives
- July 2009
- June 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- April 2007
- March 2007
- February 2007
- January 2007
- October 2006
- September 2006
- August 2006
Most commented posts
Highest Rated Blog Posts
- Debian & APT - Why I love it (100%)
- PicardTagger - most useful mp3 tool ever? (100%)
- Nokia Comes with Music - doomed to fail? (100%)
- The death of the British High Street (100%)
- Fighting Spam with Spamassassin (100%)
- iPhone 2.1 Upgrade - Genius! (100%)
- ADSL and why I am happy a neighbor is moving. (80%)
- Homebuilt NAS - one week on (80%)
- Second Life - a big waste of time? (75%)
- Day 4 of me.com/iPhone, my mini-review (73.4%)

