Friday 24 June 2005

Yet more on Hotmail's move

If you've not read them yet, you might first want to look at these previous posts:At the risk of perpetuating a good ol' USENET-style flamewar (ad hominem attacks, and all), here are a few thoughts about Curt's latest.
blacklists will inevitably include people who shouldn't actually be on them
Yep. If we look at the sometimes-sorry example of DNS-based RBLs, blacklists (or blocklists, if you must) can cause collateral damage by including too broad an IP netblock, or by including IP addresses that have been reassigned from spammers to legitimate senders. This is one reason why reputation services based solely on IP addresses are a problem. SPF/SenderID/DKIM are a basis to make reputation services that track the senders' behaviour, not their IP addresses.
A good antispam system has 50,000+ rules. To say that there's one rule which is merely a contributing factor like the other 50,000 isn't worthy of an AP story or a press release or an entire Ferris Research implementation report
Microsoft has spent a bunch of time and effort talking about this recently precisely because they want people to know that the ruleset for Hotmail will be changing. I've talked to Craig Spiezle twice about this over the past month. BTW, the "entire" report, authored by Josh, is definitely one of the shorter reports that Ferris Research has published, and I understand has received good feedback from IT customers.

Yes, Microsoft wants people to publish SPF records. That doesn't constitute "forcing SenderID down people's throats." Curt believes they're wasting their time and ours. He's entitled. I disagree.
I believe that antispam filters focusing entirely on the "call to action" can and do get most of the job done with negligible false positives ... I must confess that my opinions are based mainly on research that's slightly over a year old
My take on this: I agree that CTA filtering did indeed seem like a compelling content filtering technique about a year ago. Several vendors made a "big thing" of how it was going to simplify life enormously. It's notable how it's just not talked about now. Perhaps the industry found a gap between "research" and "real life"? Certainly, gathering and acting on the CTA data is very resource-intensive and time-sensitive. A common theme in spam filtering is that there is no one, single, silver bullet to fix this problem. Not CTA, not Bayesian, and certainly not challenge/response.

Examples of the techniques employed at the first stage:
  • Valid HELO or EHLO?
  • Valid PTR or RDNS?
  • Greylisting/tempfailing
  • Throttling (prevents illegal pipelining)
  • IP reputation/blacklists
  • SPF/SenderID/DKIM
More broadly, there's a general problem with content filtering: it's expensive. In a world where 70-90% of the port 25 connection attempts are unwanted, we don't want to be wasting MTA horsepower on receiving the message and performing complex analysis on the content. Moreover, we want to be able to reject the message with an SMTP 5xx code—this allows us to avoid the collateral-damage causing "backscatter" of bounce messages. This means that we need to keep the connection open while we run the rules, which isn't pleasant.

OK, this post is way too long already, and I'm not being paid to write this ;-) To sum up, spam filters are increasingly running an initial set of anti-spam rules at the connection level, before the SMTP DATA transaction even starts. If these rules generate a high enough score, it's 5xx no spam for you, and goodnight Vienna. Only if the filter's unsure will the message make it to the second, content filtering stage. Adding SPF presence checks to the existing SPF rule allows Hotmail and others to reject more spam without expensive content filtering. This shouldn't cause any additional false positives, unless Hotmail does something dumb with the score weights.

2 comments:

Anonymous said...

I just sent an email from my domain (say myowndomain.com) to my hotmail email address. It ended up in my hotmail junk folder.

- I use a free email forwarding service that forwards all myowndomain.com mail to a personal comcast.net broadband email account. I retrieve my email from comcast using POP3.

- When I reply, my reply-to address is myself@myowndomain.com. I use smtp.comcast.net as my outgoing email server.

Hotmail likely does a SPF check on myowndomain.com and the returned address is not smtp.comcast.net. So my email is junked.

How does Hotmail solve for this?

Anonymous said...

It's even worse now. My email NEVER arrives to Hotmail. It's silently rejected because my server has no SPF!

Post a Comment