SpamAssassin explained


SpamAssassin is a computer program released under the Apache License 2.0 used for e-mail spam filtering based on content-matching rules, which also supports DNS-based, checksum-based and statistical filtering, supported by external programs and online databases.

SpamAssassin is generally regarded as one of the most effective spam filters, especially when used in combination with spam databases. While simple text-matching alone may, for most users, be sufficient to correctly classify a majority of incoming mail the complexity involved in the combination of the comparison of words and symbols used in conjunction with the sources of spam may far exceed the average user's capability. For instance, graphic-only spam messages have no text to compare to therefore checking the sender's originating hosting mail server and included links against various databases of known e-mail abusers enables the prevention of unnecessary or non-personal mail getting through to the end user.

History


SpamAssassin was created by Justin Mason who had maintained a number of patches against an earlier program named filter.plx by Mark Jeftovic, which in turn was begun in August 1997. Mason rewrote all of Jeftovic's code from scratch and uploaded the resulting codebase to SourceForge.net on April 20, 2001.

Methods of usage


SpamAssassin is a Perl-based application (Mail::SpamAssassin in CPAN) which is usually used to filter all incoming mail for one or several users. It can be run as a standalone application or as a client (spamc) that communicates with a daemon (spamd). The latter mode of operation has performance benefits, but under certain circumstances may introduce additional security risks.

Typically either variant of the application is set up in a generic mail filter program, or it is called directly from a mail user agent that supports this, whenever new mail arrives. Mail filter programs such as procmail can be made to pipe all incoming mail through SpamAssassin with an adjustment to user's .procmailrc file.

Anti-spam techniques


SpamAssassin (usually hosted on Linux servers) comes with a large set of rules which are applied to determine whether an email is spam or not. To decide, specific fields within the email header and the email body are typically searched for certain regular expressions, and if these expressions match, the email is assigned a certain score, depending on the test, and several (customizable) headers are added to the mail. The total score resulting from all tests or other criteria can then be used by the end user or by the ISP to set the conditions under which email is moved to a separate spam folder, deleted, flagged etc.

Each test has a label and a description. The label is usually an all upper case identifier separated with underscores, such as "LIMITED_TIME_ONLY", with the description for that label being "Offers a limited time offer". A mail that fails that test (in this case, contains certain variants of the "limited time only" phrase) might be assigned a score of +0.3. With a spam threshold of 5 (default as of SpamAssassin version 2.55), several other tests would usually have to fail for the mail to be classified as spam. On the other hand, some tests, such as those for invalid message IDs or years, result in a very high score being assigned, where even a single test can almost put a mail "over the edge".

When a mail's total score is higher than the "required_score" setting in SpamAssassin's configuration, the mail is treated as spam and rewritten according to several options. In the default configuration, the content of the mail is appended as a MIME attachment, with a brief excerpt in the message body, and a description of the tests which resulted in the mail being classified as spam. If the score is lower than the defined settings, by default the information about the passed tests and total score is still added to the email headers and can be used in post-processing for less severe actions, such as tagging the mail as suspicious.

The user can customize these filters using a file "user_prefs" in their home directory. Within this file, they can specify individuals whose emails are never considered spam, or change the scores for certain rules. The user can also define a list of languages which they want to receive mail in, and SpamAssassin then assigns a higher score to all mails that appear to be written in another language. This can be very useful to users receiving a lot of foreign spam but never actually corresponding with people in that language.

Network-based filtering methods


SpamAssassin also supports:
  • DNS-based blackhole lists
  • URI blacklists such as SURBL or URIBL.com which track spam websites
  • checksum-based filters such as the Distributed Checksum Clearinghouses, Vipul's Razor and the Cloudmark Authority plug-in (commercial)
  • Hashcash
  • Sender Policy Framework
as a means to tell 'ham' from 'spam'.

More methods can be added reasonably easily by writing a Perl plug-in for SpamAssassin.

Bayesian filtering


SpamAssassin by default tries to reinforce its own rules through Bayesian filtering, but Bayesian learning is most effective with actual user input. Typically, the user is expected to "feed" example spam mails and example "ham" (useful) mails to the filter, which can then learn the difference between the two. For this purpose, SpamAssassin provides the command-line tool sa-learn, which can be instructed to learn a single mail or an entire mailbox as either ham or spam.

Typically, the user will move unrecognized spam to a separate folder for a while, and then run sa-learn on the folder of non-spam and on the folder of spam separately. Alternatively, if the mail user agent supports it, sa-learn can be called for individual emails. Regardless of the method used to perform the learning, SpamAssassin's Bayesian test will subsequently assign a higher score to e-mails that are similar to previously received spam (or, more precisely, to those emails that are different from non-spam in ways similar to previously received spam e-mails).

Licensing


SpamAssassin is free/open source software, licensed under the Apache License 2.0. Versions prior to 3.0 are dual-licensed under the Artistic License and the GNU General Public License.

sa-compile


sa-compile is a utility distributed with SpamAssassin as of version 3.2.0. It compiles a SpamAssassin ruleset into a deterministic finite automaton that allows SpamAssassin to use processor power more effeciently.

Testing SpamAssassin


Most implementations of SpamAssassin will trigger on the GTUBE, a 68 byte string not unlike the antivirus EICAR test file. If this string is inserted in a RFC 2822 formatted message and passed through the SpamAssassin engine, SpamAssassin will trigger with a weight of 1000.

See also


The following free/open source applications have support for SpamAssassin:

  • Citadel - email/groupware server contains built-in support for SpamAssassin integration
  • MailScanner - "A Free Anti-Virus and Anti-Spam Filter"
  • KMail supports SpamAssassin and other spam filters, through some modular filters.
  • MIMEDefang
  • SmarterMail 4.x - Free edition
SpamAssassin has also been used in many commercial products including:

  • Atmail uses the Spamassassin engine, includes custom rulesets, filters and Web interface for users to modify SA runtime preferences
  • AntibodyMX incorporates SpamAssassin
  • McAfee uses SpamAssassin in its anti-spam tool SpamKiller
  • Spamnix is also based on SpamAssassin
  • Kerio MailServer uses SpamEliminator, which is based on SpamAssassin, for heuristic spam filtering
  • MailLaunder is a hosted spam and virus solution that uses SpamAssassin as part of the filtering process
  • SmarterMail Enterprise - Enterprise version
  • Mail Them Pro is a mailer which has built-in Spamassassin and allows to check email before mailing with Spamassassin to prevent email being considered as a spam
Other free/open-source applications that have the same goal:
  • DSPAM and CRM114 are statistical spam filters


<-- Previous | Home Glossary | Next -->

📣 Latest tweets mentioning SpamAssassin


📖 Latest blogs mentioning SpamAssassin

dreamithost.com.au Icon 🏆 Alexa 504,102 - 📅 - Understanding MailScanner’s Spam Filters - Email has become an essential communication tool in our personal and professional lives. However, with the convenience of email comes the nuisance of spam. Spam emails can clog your inbox, waste time, and pose security threats. That’s where ...
logicweb.com Icon 🏆 Alexa 752,570 - 📅 - How to Protect your Inbox from Spam with SpamAssassin - How to Protect Your Inbox from Spam with SpamAssassin Spam emails are not just annoying, they can also pose serious security threats. Fortunately, there are
racknerd.com Icon 🏆 Alexa 39,949 - 📅 - How to Configure SpamAssassin on cPanel - SpamAssassin looks for patterns that are common in unwanted email and, if a message matches lots of patterns, it trigger the filter rule and the email is subjected to more than 600 individual tests. It identifies junk mail with great accuracy using ...
🏆 Alexa 1,328,510 - 📅 - cPanel 104 Roundcube Updates - cPanel 104 has introduced a couple of useful changes to the Roundcube webmail client. First, it now has a dark mode that is easily toggled via the icon in the lower left corner. You can see what both of these modes look like below. You can now also ...
hosteko.com Icon 🏆 Alexa 966,244 - 📅 - SpamAssassin : Tips Jitu untuk Mencegah Email Spam - Hosteko Blog SpamAssassin : Tips Jitu untuk Mencegah Email Spam Fitur layanan dari email hosting ini yang sangat berguna untuk mengantisipasi masalah SPAM pada email. Pada panduan ini, kami akan berbagi informasi tentang apa itu email SPAM hingga ...
cloudsigma.com Icon 🏆 Alexa 88,105 - 📅 - Mail Server Configuration Tutorial: How to Use Postfix, Dovecot, MySQL, and SpamAssassin - Introduction On Ubuntu 20.04, you can use tools like Postfix, Dovecot, MySQL, and SpamAssassin to configure a mail server. The process can be confusing for someone attempting it for the first time. This tutorial aims to simplify the mail server ...
hostingb2b.com Icon 🏆 Alexa 626,972 - 📅 - Enable Apache SpamAssassin and SpamBox in cPanel - Apache SpamAssassin is a mail filter that identifies spam. It is an intelligent email filter that uses a diverse range of tests to identify unsolicited bulk email, more commonly known as spam. These tests examine email headers and content to ...

📋 Latest news about SpamAssassin

Solar Data Center Deploys SpamAssassin - 📅 - Solar-powered data center operator Solar Data Center (solardatacenters.com) announced on Tuesday that it will integrate open-source anti-spam solution SpamAssassin as a standard feature for approximately 300 clients. SpamAssasin will be integrated into Solar Data Center's hosting services for free. ...