Site updates
Testing New System now...

Junk Mail Filters in Mozilla

Recent builds of Mozilla include junk mail filtering. This page should tell you how it works and what to do with it.

So, how does it work, then?

The short answer is it analyses the words in the message, and creates a score rating how "spammy" each one is. This is based on how often they occur in messages you have marked as junk, compared to ones you haven't. When a new message arrives, its score is calculated, and the message is marked junk if needed.

A Plan for Spam. is the inspiration behind Bayesian classification of junk messages, and has a good explanation of the workings. Wikipedia has a maths-heavy explanation of the statistical techniques behind the classification.

Can't the spammers find a way round this?

They can try changing their messages, but you can simply mark these as junk, and the new words they are using will soon get rated as likely spam as well. The only way they can avoid detection is by emulating the way your contacts speak to you. As they don't know who they are or what they talk like, they can't do this.

The real problem for the spammers is they have to include a pitch of some sort. They need to grab the attention of the .0001% who actually replies to this stuff. It is possible to remain undetected by removing all marketing terms and sending a very plain message. With any luck, messages like this simply won't bring in the results. So bulk mailers are forced to include a detectable message, and the junk mail controls can detect it.

As a response to Bayesian controls, spammers have started stuffing their messages with random words, in an attempt to confuse the filters. Most of these words are insignificant and will not affect the calculation. They may get lucky, and hit a "good" keyword, but the messages should still contain enough "bad" words to trigger the filter. To defeat Bayesian and keyword filters, spammers are becoming l33t, e.g. writing v|agr@. As it's highly unlikely a normal correspondent would use these words, they will quickly go the way of the junkheap.

How can I tweak the settings?

Bayes Junk Tool lets you do this. You can import a starter file, or tweak words to make the filter learn a little faster. There isn't an official starter file, because if everyone had a set of known "good" words, messages could be filled with them and pass through. It usually shouldn't be needed to make word tweaks, either. The filter learns from its mistakes.

Can I filter based on the junk status?

In newer builds you can. Bear in mind that as the filter learns, it will make mistakes, so you may not want to switch the filters on straight away. If you are confident it is not making any false positives, go ahead and do it. Select Tools > Junk Mail Controls, select the account you want, and tick "Move incoming messages...". Select a different folder if you want one.

What files are used by the filter?

The main file is training.dat. This contains the analysed words, and their various scores. If you keep a junk mail log, this will be in junklog.html, within the mail account's folder. The filter files may also be used, should you have created some.

You haven't answered my question! I have something to add!

Use the "contact" link on the left to pester me.

Mozilla pages