“I've customer looking for low latency data injection to hadoop . Customer wants to inject 1million records per/sec. Can someone guide me which tools or technology can be used for this kind of data injection to hadoop.”
The question itself is interesting, given its assumption that Hadoop is part of the answer – Hadoop really is the new black in data storage & management these days – but the answers were even more interesting. Among the eleven or so people who responded to the question, there was almost no consensus. No single product (or even shortlist of products) emerged, but more importantly, the actual interpretation of the question (or what the question was getting at) differed widely, spinning off a moderately impassioned debate about the true meaning of “latency”, the merits of solid-state storage vs HD storage, and whether to clean/dedupe the data at load-time,or once the data is in Hadoop.
I wouldn’t class myself as a Hadoop expert (I’m more of a Cosmos guy), much less a data storage architect, so I may be unfairly mischaracterizing the discussion, but the message that jumped out of the thread at me was this: This Big Data stuff really is not mature yet.
I was very much put in mind of the early days of the Web Analytics industry, where so many aspects of the industry and the way customers interacted with it had yet to mature. Not only was there still a plethora of widely differing solutions available, with heated debates about tags vs logs, hosted vs on-premise, and flexible-vs-affordable, but customers themselves didn’t even know how to articulate their needs. Much of the time I spent with customers at WebAbacus in those days was taken up by translating the customer’s requirements (which often had been ghost-written by another vendor who took a radically different approach to web analytics) into terms that we could respond to.
This question thread felt a lot like that – there didn’t seem to be a very mature common language or frame of reference which united the asker of the question and the various folk that answered it. As I read the answers, I found myself feeling mightily sorry for the question-poser, because she now has a list as long as her arm of vendors and technologies to investigate, each of which approaches the problem in a different way, so it’ll be hard going to choose a winner.
If this sounds like a grumble, it’s really not – the opposite, in fact. It’s very exciting to be involved in another industry that is forming before my very eyes. Buy most seasoned Web Analytics professionals enough drinks and they’ll admit to you that the industry was actually a bit more interesting before it was carved up between Omniture and Google (yes, I know there are other players still – as Craig Ferguson would say, I look forward to your letters). So I’m going to enjoy the childhood and adolescence of Big Data while I can.