Saturday 22 October 2005

No, still I'm not sending you spam

Hi. If you've come to to find out who's sending you spam ... it's not me. This happened last month as well. For more info, click here.

Tags: , .

Fast, Full-Text Search can be Serendipitous, but Should be Global

The benefit usually touted for indexed (or fast, full-text search) is that it's lightening fast. That's true: compare a vanilla Outlook search for text in the body of thousands of emails with the same search using Google Desktop or the new version of Eudora. The Outlook search will take a minute or two with a large mailbox, wheres the indexed search will take seconds.

However, speed isn't the only benefit. When it's easy and quick to search everything, one can often find lucky results that one wasn't expecting. For example, when I was searching for a message I received from a new client yesterday, I realized that I'd also corresponded with him in a previous job, several years ago. I'd forgotten about that and I suspect he had too. I was also presented with some web pages that mentioned his name, a saved PowerPoint deck authored by him, and his full contact details. All information that I had on my PC, but that I didn't know was there -- that's the point: global, indexed search allows one to find things one didn't know existed.

For the full benefit, the searching should be global -- i.e. the search tool should index everything on your PC, not just your email. Unfortunately, Eudora 7 only searches its own store, and therefore misses out on this serendipitous benefit.

Statistical Spam Filters are Too Hard to Use

Statistical spam filters use powerful mathematics to decide if a message is spam or not. They classify email as spam or ham, using Bayesian analysis and other statistical methods. Examples of such filters are SpamBayes, POPfile, DSPAM, and CRM114.

State of the art statistical filters can achieve levels of accuracy as good as or better than a user manually filtering spam with the Delete button. However, such filters require several months of training before they can achieve the accuracy required. Filters that rely on end-users to train them aren't suitable for the majority of users.

This training can be done by feeding the filter a "corpus" of spam and legitimate messages (i.e. an archive of several months of spam and ham). However, the initial and ongoing training requirements are onerous and error-prone. When users complain that a good statistical spam filter isn't accurate, it's usually because they haven't trained it properly; but that's hardly fair -- users just want their filter to work.

Tags: .

Friday 21 October 2005

Switching to an abbreviated feed

Up until now, I've always run a full feed on Richi'Blog. That is, the text you get in your reader or aggregator is the full monty, with no need to click through to the web page. It's with a heavy heart that I'm changing to an abbreviated feed. I'm doing this for a several reasons, chief among them that I'm fed up with my writing being ripped off by sploggers. Grrr.

Tuesday 18 October 2005

Ajax in the enterprise

Infoworld has a nice roundup of how Ajax is making inroads into business's IT departments. Includes coverage of Scalix, and NetSuite. Also mentions dev. tools: Backbase, JackBe, TibcoGI, and Ruby on Rails. And the obligatory quote from Jesse James Garrett ;-)

read more | digg story