What Is Big Data?

Is Big Data yet another buzz term? Steve Cassidy ponders what it really means for the IT industry...

Big Data, as the industry defines it, fixes this problem by throwing in enormous quantities of the cheapest part of the computing equation, AKA disk space. The theory says that if you reverse the effect of database optimisation (which as an approach, dates from an era when disks were far from cheap), and unravel all those relational-database records into an undifferentiated atomic soup, then you can start asking helicopter-grade questions with rollerskate expenditure. Better still, if the atomic soup in question can take information from sources which otherwise don't co-operate (say the Baked Bean query can't be extended to include Mars Bars from the garage beside the superstore, because garages use their own database layout), then a whole new class of queries become possible.

The industry wouldn't be pushing this idea if it was cheap to do. Above a certain pretty close highwater mark, you can't just stack up USB drives from the corner electronics store for atomic-soup, Big Data object based storage for Big Data to really deliver, you need very specific and very high performance storage systems, able to take the by-definition exceedingly random access load that a massive, de-optimised query will impose on the kit. Those are reassuringly expensive, at least if you are selling storage and looking for a justification that aligns well with budget-holders inside companies with cash to burn.

What's worse still for taxonomical obsessives like me is that this whole idea has hit the public (and the BBC) just as a more commonly perceived definition of Big Data has landed with an elephantine thud in most people's perceptions. Even in efficient storage formats, there's simply a hell of a lot of stuff being accumulated. An idle question on sister site PC Pro revealed a user who has 220GB of cloud storage for his photos most people worrying about Big Data, I would suggest, are trying to solve problems like that, rather than trying to indulge in big corporate data mining exercises. I lost a good couple of hours to a semi-hidden repository of 11GB of someone's scuba diving holiday pictures only last month not that large in the great scheme of things, but certainly larger than the rest of his team's common or garden working document set.

So that's three different meanings now. The industry think it's a term reserved for the immense disk farms that arise from trying to run queries that slice across pre-existing, incompatible data models. The public think it means struggling with backup processes or machine migration and now the BBC have hijacked it to stick a hot-topic badge on an issue which, while relevant, does not line up well with the kind of white papers and deep dive case studies that tend to define the IT business method of communicating these hard subjects.

Blame SEO for their choice of title, is my suggestion, and then make use of their unfortunate effort as a starting point for a conversation with your decision-makers. After all, if there's one thing that data mining tells us about buzzwords and communicating IT concepts, it is that there's no point getting into a fight with the BBC.

Featured Resources

Digital document processes in 2020: A spotlight on Western Europe

The shift from best practice to business necessity

Download now

Four security considerations for cloud migration

The good, the bad, and the ugly of cloud computing

Download now

VR leads the way in manufacturing

How VR is digitally transforming our world

Download now

Deeper than digital

Top-performing modern enterprises show why more perfect software is fundamental to success

Download now

Most Popular

The top 12 password-cracking techniques used by hackers
Security

The top 12 password-cracking techniques used by hackers

5 Oct 2020
iPhone 12 lineup official with A14 Bionic chip and 5G support
Mobile Phones

iPhone 12 lineup official with A14 Bionic chip and 5G support

13 Oct 2020
Google blocked record-breaking 2.5Tbps DDoS attack in 2017
Security

Google blocked record-breaking 2.5Tbps DDoS attack in 2017

19 Oct 2020