Who’s afraid of the big data?

2012 is touted as the year when big data becomes mainstream, moving from something used only in science and some technology firms to something required in many enterprise applications.

There has been an explosion in the amount of data that's available from web server logs, tweet streams, online transaction records to data from sensors and government data. A new potential treasure trove of unstructured or semi structured data. An IDC study predicts that 2.7 zettabytes (1.8 x 1021 bytes) of data will be created and replicated this year alone.

Research analyst firm Gartner predicts by 2015 "organisations integrating high value, diverse new information sources and types into a coherent information management infrastructure will outperform industry peers financially by more than 20 per cent". Another study from the Centre for Economics and Business Research reckons that organisations investing in big data could help generate 216 billion for the UK over the next five years and create 58,000 new jobs.

A McKinsey report published last May is also predicting big things for big data, suggesting it will "become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus From the standpoint of competitiveness and the potential capture of value, all companies need to take big data seriously."

Gaining business advantage is clearly key to driving the adoption of big data, compliance and regulation as it continually requires much more granular data and different views of the "joined up" data sets to fulfil reporting demands. Financial services organisations will address regulatory requirements such as Basel III with the aid of new applications that integrate data and dynamically calculate liquidity and capital metrics.

Insurers in the UK will need to comply with the Retail Distribution Review programme, overseen by the Financial Service Authority and European firms will need to deal with Solvency II. These regulations underline the prominence of data and the challenges connected with compliant access and availability.

A sound business case for big data is where lots of low value data can be turned into highly valuable insights unearthing hidden relationships that can greatly influence how businesses run and even allow for identification and development of new products or services, creating and increasing revenue streams. But what tools and resources are needed by organisations to handle and process this amount of data?

According to David Rajan, director of technology at Oracle, an infrastructure for big data has unique requirements.

"In considering the components of a big data platform, it is important to remember that it exists within an enterprise data architecture and as such a key end goal is to easily integrate the storage and processing of unstructured data with tools allowing you to conduct deep analytics on the combined output of both structured and unstructured data sets," he says.

He adds that a big data infrastructure is best described as spanning data acquisition and organisation plus data analysis and decision-making. Two types of infrastructure exist to support the acquisition and organisation of big data.

The first delivers low latency in both capturing data and in executing short, simple transactions or queries; be able to handle very high transaction volumes; and support flexible, dynamic data structures. NoSQL databases are frequently used to acquire and store big data because they are well suited for dynamic data structures and are highly scalable.

The second is focused on the batch processing of a potentially enormous volume of data. Here CIOs require infrastructure with the means of both storing and processing effectively low value data at low cost. "This is achieved by creating a large cluster of servers where the internal compute and storage processes data locally in situ," says Rajan. Apache Hadoop enables this model for the storage and processing of large volumes and varieties of data formats; turning unstructured data into structured output.

With the need for low-latency access to relevant very large datasets, powerful analytical tools to crunch data and intelligent applications to decipher the information, how quickly can such solutions be implemented? Consultants reckon not that long for most.

"Big Data solutions are now easily accessible and can be implemented at great speed," says AJ Thompson, director at IT consultancy and solutions provider Northdoor.

"For instance, IBM's Netezza solution is one of the most powerful data warehousing tools available and takes just days to set up. The benefits from adopting such a strategy are also instantaneous and lead to dramatic time savings."

Cloud could also help organisations grapple with big data more quickly. The value of cloud with big data projects comes with its ability to provision storage, network capacity and raw computing power in a moment's notice, and only having to pay for the resources that you use, when you use them.

Some organisations opt to combine cloud and existing solutions. These hybrid models can provide several options and advantages according to Jamie Graham, head of marketing for IBM's systems and technology group. "For example, cloud versions can be a first step to creating big data solutions rapidly. They also work well as a test and development environment, or to add additional capacity for occasional usage spikes."

He adds that retailers, for example, might have in-house systems that are running at maximum capacity during the festive buying period. They could use cloud to expand their analytics capabilities for the short-term.

In business, the need to gain valuable insight from this flood of data is essential and a big data policy combined with solid infrastructure is absolutely vital.

Big data could be the major trend that delivers huge business benefits, but concealed overheads and complications present obstructions that many companies will struggle with. But with rapid innovation in the space, the business benefits of big data will far offset IT investments, but by how much still remains the big data big question.

Rene Millman is a freelance writer and broadcaster who covers cybersecurity, AI, IoT, and the cloud. He also works as a contributing analyst at GigaOm and has previously worked as an analyst for Gartner covering the infrastructure market. He has made numerous television appearances to give his views and expertise on technology trends and companies that affect and shape our lives. You can follow Rene Millman on Twitter.