Should have blogged about this last week, but other demands on my time prevailed.
There's an article on TechCrunch (brought to my attention by my colleague Justin) about the launch of Swivel, whose founders Dmitry Dimov and Brian Mulloy describe as the "YouTube of data". What they mean by this is that they've created a place where users can upload interesting data sets and then plot them against other data sets from other users to look for correlations, such as the interesting one below:
Unfortunately I don't have much particularly interesting data to upload (and the data that I do have that is interesting is confidential), so I wasn't able to try this with some of my own data. Apparently when the site launches, you will be able to upload data and keep it private - though I don't know how many people will be happy to trust their precious data to a relatively unknown third party (not to mention the legal aspects).
If Swivel can overcome this obstacle, however (and they need to - charging for private data is their main revenue source, apparently), then they could be onto something. They're building out significant data center capability to perform correlations behind the scenes and suggest data sets that you might want to compare. But it will be interesting to see whether the correlations they come up with are anything more than just of the 'happy coincidence' variety (for example, the rising plot of oil prices in the chart above could appear to correlate nicely with the usage of World of Warcraft, if you're careful to pick the right range, etc). So perhaps Swivel should have a little tutorial on how correlation does not imply causation on their home page.
The site's other challenge is the cleanliness of the data - even when trying to compare data that was date-based, the site choked several times (doubtless these are problems that the team is working out), but there is a larger issue of 'standardization' of axes or segments. Date is (relatively) easy - you can make some assumptions about the date range that a particular data point relates to - but other ranges/segments are harder, such as:
- Country (problems with old vs new names, regions, etc)
- Age (lots of data is grouped into age ranges, e.g. 16-24, 25-34, but these are not consistent)
- Income (same problem as above, plus currency fluctuations thrown into the mix)
And that's just the axes/segments for humans - other entities like companies have their own characteristics which are not measured in a standard way, especially not internationally.
It'll be interesting to come back to Swivel in a few months when there's some more data in there (and when they have their private data service up and running). I wish them well.