The Big Facts about Big Data
Love or hate the term, Big Data is here to stay. We run down the key facts and debunk the myths...
Myth 3: Big Data equals Hadoop
Much of conversation around Big Data has centred on Hadoop. The Apache project is certainly the most well-known and it was the first tool to analyse and store unstructured data to achieve prominence. But it's not the only one.
"People believe that if they start using Hadoop then they needn't do anything else but there's still room for traditional data warehousing.People need to keep to their existing IT infrastructure," according to Priestley.
He says the appeal of Hadoop is the way that organisations can get so much information with a comparatively modest outlay. "You could go and download Hadoop from Apache cost of software is zero and run it on standard servers," Priestley adds. "The alternative was going to a company like Oracle or Teradata and buying their integrated solutions. For many companies this may not be a viable option without fully understanding the benefits they can achieve thru analyzing the data they have available" adds Priestley.
Myth 4: It's hard to get a quantifiable ROI
Businesses love hard numbers. What would be perfect for the CIO is to say that it costs X amount to move to Big Data and that it would save Y amount over the course of three years. Big Data doesn't work like that. It's very hard to get any meaningful ROI out of a Big Data initiative. As Priestley points out, a lot of Big Data implementation is "what-if" and that's not something that's easily definable.
Compare that with something like CRM where the impact on a business is more immediately measurable. Businesses moving to Big Data will have to appreciate the difference.There also seems to be a changing mindset when it comes to ROI for major projects, businesses are recognising that it's always an easily measured tangible and costs can be over-ridden by business benefits.
Recently, Claranet carried out a survey looking at how organisations moved to the cloud. It found that more than a quarter of respondents considered ROI to be a factor in making their decision and 79 per cent thought ROI calculations did not give a true reflection of business benefits.
While that survey was primarily about cloud adoption, it's not unreasonable to suppose that the figures for a move to Big Data would be hugely different in both cases it's a technological leap into the future. Myth 5: You can't guarantee the answer
Big Data is a big unknown. What you're doing in analysing some imponderables and hard-to-ascertain numbers. By their very nature, these are not easily accessible or intuitive if they were, you wouldn't need Big Data techniques.
Therefore, companies have to realise that they can't guarantee the answer. It's no good them coming up with an idea of what a result should be and finding the figures that will give that hypothesis. In the example cited earlier, an airline company might like it if aircraft need only be serviced every 500,000 flying hours, but that's not very helpful if planes are dropping from the sky every 200,000 hours.
If there are certain myths and misconceptions about Big Data, there are certain key truths that companies going down the Big Data need to grasp.
Key truth 1: There's a need for a different skillset
There's one thing most observers are agreed on and that's the shortage of data scientists. McKinsey estimates that, by 2019, there'll be a global shortage of 190,000 scientists ready to tackle Big Data.
It's not difficult to see why. Handling a Big Data project entails a completely different set of skills from existing data warehousing implementations. And it's not just about handling the data, it also has to be translated in such a way that allows it to be processed and actioned.
"For example, in Hadoop, there's a tool called Map Reduce. It requires Java-writing application skills, which is not a common skillset for many data analysts today," Priestley says.
But there's more to it than that. The ideal person for handling Big Data will need to understand business processes, Java, stats (and possibly some SQL too). That's a big ask which is one of the reasons why so many people are saying that the shortage of data scientists is going to be a big inhibitor in the take-up of Big Data techniques.