2006-11-09

a real plan for spam

Spam sucks. It especially sucks for me because:


  • I can't punt by using gmail or hosted gmail--I need to host phauna.org email myself for reasons not related to spam.


  • As a bit of an idealist, I like to think I shouldn't have to hide my email address from the world, so it's posted in a number of places around the internet.



Fortunately, there are decent solutions to those willing to spend the effort. I've recently spent the effort, and decided to let the rest of the world know how I did it.

I've divided the report into big-picture theory and the specifics of my implementation on a 64-bit linux gentoo server running exim.



Theory
The basic theory is that it's better to reject spam at SMTP time. There are some excellent arguments (and exellent infos) here, and my take is this: if you accept the message that's spam, you have the following two choices after the fact.


  • Store it in a spam mailbox and periodically scan it manually to make sure nothing was misclassified. This is error prone, tedious, and can introduce long delays in legitimate mail if you're lazy like me. In fact, this can become more or less impossible once you start getting north of 1000 spams a day. I get approximately 300 spams a day, and until very recently I took the manual scan approach. A few weeks ago it started to fail because I started to lose legitimate emails that were misclassified first by my spam checker, then by me.


  • Delete it. This obviously can't work unless you trust your spam filter to have a 0% false positive rate. The only filter I know of that has that rate is cat.



If, however, you are wise in the ways of spam and reject the message at SMTP time, you can send a cunning reject message that allows a human to get through to you if their mail was misclassified. The reject message simply contains a url to a web page where they can prove their human-ness and then add their email address to a whitelist.

The biggest drawback with this approach is legitimate automated emails. For example, your Delta plane ticket reservation confirmation or your Amazon receipt. If one of these is misclassified as spam, you are probably going to lose it, because Amazon and Delta aren't going to read any reject messages. Whitelisting could help a little here, but that's tedious and error prone. I currently don't have a solution for this part of the problem, so if you do, please let me know.

Practice: installation and configuration on 64-bit gentoo

(I assume you already have exim installed, configured, and running..)

step 1: install sa-exim
For me this was a bit of a nuisance to get on gentoo, because at the time of this writing, there's no ebuild in portage. Fortunately, there's one on the bug tracker. Download this puppy to your portage overlay and try to emerge it. I'm running an em64t (Intel's amd64 clone) CPU, and I had to add -fpic to the CFLAGS in the ebuild.

step 2: install spamassassin
easy: emerge spamassassin

step 3: configgit spamassassin
Changing spamassassin's configuration at /etc/spamassassin/local.cf is strictly optional.

For sa-exim to work properly, you need to run spamd as mail instead of the default which is root. To do this:


  • add -u mail to SPAMD_OPTS in /etc/conf.d/spamd


  • change PIDFILE to /var/run/spamd/spamd.pid /etc/conf.d/spamd


  • chown mail /var/run/spamd


  • mkdir /var/spool/mail/.spamassassin


  • chown mail:mail /var/spool/mail/.spamassassin


You should also start spamd on bootup:
rc-update add spamd default

step 4: configgit sa-exim

(Note: from here on out, nothing is gentoo-specific.)

The minimum configuration changes to make in
/etc/exim/sa-exim.conf are:

  • comment out SAEximRunCond: 0

  • add your custom reject message (including the URL of your whitelist page) to SAmsgpermrej



Step 5: configgit exim
There are a number of cases where you will want to be sure you accept mail regardless of its spamminess. One such case is your whitelist. Another is the postmaster and abuse addresses, which you're not allowed to reject according to the RFCs. To tell sa-exim not to reject a message, you add X-SA-Do-Not-Rej: Yes to the headers. Here are the relevant parts of /etc/exim/exim.conf:


# to enable sa-exim:
local_scan_path =/usr/lib/exim/local_scan/sa-exim.so

# define the whitelist
addresslist whitelist_senders = lsearch:/etc/exim/whitelist

...

acl_check_rcpt:
# accept if the source is local SMTP (didn't come from the internet)
accept hosts = :
# never reject postmaster or abuse:
warn message = X-SA-Do-Not-Rej: Yes
local_parts = postmaster:abuse
# accept if the email address is in the whitelist
warn message = X-SA-Do-Not-Rej: Yes
senders = +whitelist_senders

...


step 6: the whitelist
All you need to do now is choose a method of getting email addresses into your whitelist. I made a little web page that uses Gigoit's human authentication. After ensuring the user is a human, it adds the specified email address to the whitelist via some simple PHP scripting.

That's basically it. You should also consider using sa-learn to teach spamassassin which messages are good and which ones are spam, otherwise you'll probably get a lot of false negatives. Also, for testing, I found the following information extremely helpful: modern spamassassin installations come with the GTUBE, which makes it possible to force your spam filter to think a message is spam. Finally, you probably want to periodically clean out your whitelist file.

No comments:

Post a Comment