Bayesian Content Filtering and the Art of Statistical Language Classification
by Jonathan A. Zdziarski
July 2005, 312 pp. ISBN 1-593270-52-6
No Starch Press - www.nostarch.com/endingspam.htm
Zdziarski is the creator of the open source spam filter, DSPAM. As such, he is a vocal proponent of the school of statistical filtering, as popularized by Bayesian filtering. No surprise then that his book focuses on statistical filtering, painting it in its most positive light.
The book's structure is well-thought-out. If a chapter becomes too heavy-going -- and some chapters do go into some hair-raising mathematical detail -- the reader can simply skip forward without much trouble.
However, Zdziarski makes little or no effort to tackle the issue of false positives. Generally he glosses over the problems caused by legitimate mail being filtered as spam without acknowledging that such "errors" are much more expensive than the error of unfiltered spam. There were also several places in the book where I'd have preferred the editors and proofreader to have done a better job. It was as if they sometimes misunderstood the point that Zdziarski was making and thus obscured it.
Overall, this book is an excellent primer on spam, spammers, and spam fighting, but the casual reader might get indimidated. It shouldn't be relied on to give a complete and balanced look at spam fighting techniques.