By now the story that AOL has released the search phrases used by over half a million of its users has gone round the web a hundred times. AOL themselves have admitted that they screwed up in letting this data out into the public domain in what was, apparently, an honestly-meant attempt to reach out to the academic community. But late to the party as I am on this (remember: this blog is your first stop for last week’s news), I could hardly run a blog about data and analytics and not make some sort of comment.
AOL claimed (originally) that because the data was anonymised, it couldn’t be linked back to individuals. But an article in the New York Times [requires registration] disproved that.
A number of sites have sprung up to help you explore and chuckle at the data. But there’s some pretty alarming stuff there: searches like “how to kill your wife”, for example.
Working as I do for MSN, there’s very much a sense of “there but for the grace of God…” about this; though I have to say whoever thought this would be a good idea at AOL is in serious need of having their head examined. But the data that organisations like AOL, MSN and Google hold on searches is tremendously valuable – and dangerous in the wrong hands.
Update (22/8/06): AOL has fired two of the people responsible for the leak, and their CTO has resigned. And the Electronic Fontier Foundation has filed a complaint against AOL with the FTC.