What Is Big Data?
Is Big Data yet another buzz term? Steve Cassidy ponders what it really means for the IT industry...
Big Data, as the industry defines it, fixes this problem by throwing in enormous quantities of the cheapest part of the computing equation, AKA disk space. The theory says that if you reverse the effect of database optimisation (which as an approach, dates from an era when disks were far from cheap), and unravel all those relational-database records into an undifferentiated atomic soup, then you can start asking helicopter-grade questions with rollerskate expenditure. Better still, if the atomic soup in question can take information from sources which otherwise don't co-operate (say the Baked Bean query can't be extended to include Mars Bars from the garage beside the superstore, because garages use their own database layout), then a whole new class of queries become possible.
The industry wouldn't be pushing this idea if it was cheap to do. Above a certain pretty close highwater mark, you can't just stack up USB drives from the corner electronics store for atomic-soup, Big Data object based storage for Big Data to really deliver, you need very specific and very high performance storage systems, able to take the by-definition exceedingly random access load that a massive, de-optimised query will impose on the kit. Those are reassuringly expensive, at least if you are selling storage and looking for a justification that aligns well with budget-holders inside companies with cash to burn.
What's worse still for taxonomical obsessives like me is that this whole idea has hit the public (and the BBC) just as a more commonly perceived definition of Big Data has landed with an elephantine thud in most people's perceptions. Even in efficient storage formats, there's simply a hell of a lot of stuff being accumulated. An idle question on sister site PC Pro revealed a user who has 220GB of cloud storage for his photos most people worrying about Big Data, I would suggest, are trying to solve problems like that, rather than trying to indulge in big corporate data mining exercises. I lost a good couple of hours to a semi-hidden repository of 11GB of someone's scuba diving holiday pictures only last month not that large in the great scheme of things, but certainly larger than the rest of his team's common or garden working document set.
So that's three different meanings now. The industry think it's a term reserved for the immense disk farms that arise from trying to run queries that slice across pre-existing, incompatible data models. The public think it means struggling with backup processes or machine migration and now the BBC have hijacked it to stick a hot-topic badge on an issue which, while relevant, does not line up well with the kind of white papers and deep dive case studies that tend to define the IT business method of communicating these hard subjects.
Blame SEO for their choice of title, is my suggestion, and then make use of their unfortunate effort as a starting point for a conversation with your decision-makers. After all, if there's one thing that data mining tells us about buzzwords and communicating IT concepts, it is that there's no point getting into a fight with the BBC.
Managing security risk and compliance in a challenging landscape
How key technology partners grow with your organisationDownload now
Evaluate your order-to-cash process
15 recommended metrics to benchmark your O2C operationsDownload now
AI 360: Hold, fold, or double down?
How AI can benefit your businessDownload now
Getting started with Azure Red Hat OpenShift
A developer’s guide to improving application building and deployment capabilitiesDownload now