The best big data technologies

Here are the best big data storage, data mining, analysis and visualisation tools

It's a data-driven world we live in, and that data is growing exponentially. So much so that it's rapidly changing our lives and organisations around the world are having to adjust and adapt to this vast amount of information.

From innovative storage technologies to IoT deployment and the EU's new GDPR legislation, big data is driving change in the industry. Big data is a challenge for even the largest of organisations, who can no longer afford to ignore the huge potential it has to improve business decisions, reach customers with greater accuracy, and streamline business processes.

To utilise big data to its full potential, companies need the right tools to process, analyse and store the vital information they produce and collect on a daily basis for real-time results.

The four main elements of any big data project are data storage, data mining, data analysis and data visualisation, and each has a number of innovative and high tech tools on offer for businesses.

Advertisement
Advertisement - Article continues below

Below we have listed the best tools for your big data projects.

Data storage

For big data projects, cloud-based storage tools are vital to maximising the amount of information you can store. Cloud storage options let you store data in a secure and accessible fashion, for ease of use. Here are our top three:

Hadoop

Hadoop is an open-source platform, specifically designed to store very large datasets using clusters. It supports both structured and unstructured data and scales effortlessly, so is great for organisations that are likely to need extra capacity without much notice. It can also handle a huge number of tasks without any latency. This is a great option for organisations that have the developer resource to implement Java, but it does require some effort to get up and running.

MongoDB

MongoDB is very useful for organisations that use a combination of semi-structured and unstructured data. This could be, for example, organisations that develop mobile apps, those that need to store data relating to product catalogues, or data used for real-time personalisation.

RainStor

Rather than simply storing big data, Rainstor compresses and de-duplicates data, providing storage savings of up to 40:1. It doesn't lose any of the datasets in the process, making it a great option if an organisation wants to take advantage of storage savings. Rainstor is available natively for Hadoop and uses SQL to manage data.

Data mining

Once you have your data stored, you'll need to add some tools to find the information you want to analyse or visualise. Our top three tools will help you extract the data you need without the hassle of manually trawling through it all (a task that's impossible for humans to do anyway if you hold thousands or more records).

IBM SPSS Modeler

IBM's SPSS Modeler can be used to build predictive models using its visual interface rather than via programming. It covers text analytics, entity analytics, decision management and optimisation and allows for the mining of both structured and unstructured data across an entire dataset.

KNIME

KNIME is a scalable open source solution with more than 1,000 modules to help data scientists mine for new insights, make predictions and uncover key points from data. Text files, databases, documents, images, networks and even Hadoop-based data can all be read, making it a perfect solution if the data types are mixed. It features a huge range of algorithms and community contributions to offer a full suite of data mining and analysis tools.

Advertisement
Advertisement - Article continues below

RapidMiner

RapidMiner is an open source data mining tool that allows customers to use templates rather than having to write code. This makes it an attractive option for organisations without a specific resource or if they're just looking for a tool to start mining data. A free version is also available, although it's limited to 1 logical processor and 10,000 data rows. The tool also provides environments for machine learning, text mining, predictive analytics and business analytics to help with the entire process.

Data analysis

Got the data you need? Now it's time to find the most powerful tools to help you analyse it in order to glean key insights into your business, your customers or the wider world. Here, we round up our favourite data analysis tools.

Apache Spark

Apache Spark is perhaps one of the most well-known big data analysis tools, built with big data at the forefront of eveything it does. It's open source, fast, effective and works with all major big data languages including Java, Scala, Python, R, and SQL.

It's also one of the most widely used data analysis tools and is used by all-sized companies, from small businesses, to public sector organisations and tech giants like Apple, Facebook, IBM, and Microsoft. 

Apache Spark takes analysis one step further, allowing developers to use large-scale SQL, batch processing, stream processing, and machine learning in one place, alongside graph procession too.

Apache Spark is super-flexible too, running on Hadoop (for which it was originally developed), Apache Mesos, Kubernetes, by itself as a standalone platform, or in the cloud, making it suitable for businesses of all sizes and in all sectors.

Presto

Like Apache Spark, Presto is an open source tool, using distributed SQL queries, designed to run queries against data as a powerful interactive analytics engine. It suports both non-relational sources, such as the Hadoop Distributed File System (HDFS), Amazon S3, Cassandra, MongoDB and HBase, plus relational data sources such as MySQL, PostgreSQL, Amazon Redshift, Microsoft SQL Server, and Teradata, making it a useful tool for businesses operating both types of database.

Advertisement
Advertisement - Article continues below

It's also used by huge corporations such as Facebook. In fact, the scial network was a major contributor to its development, although Netflix, Airbnb and Groupon were also involved in its development to make it one of the most powerful data analysis tools around.

SAP HANA

Data analytics is just one aspect of SAP's HANA platform, but it's a feature it does exceptionally well. Supporting text, spatial, graph and series data from one place, SAP HANA integrates with Hadoop, R and SAS to help businesses make fast decisions based on invaluable data insights.

Tableau

Tableau combines data analysis and visualisation tools and can be used on a desktop, via a server or online. The online version has a big focus on collaboration, meaning you can easily share your discoveries with anyone else in your organisation. Interactive visualisations make it easy for everyone to make sense of the information and with Tableau Cloud's fully hosted option, you won't need any resource to configure servers, manage software upgrades, or scale hardware capacity.

Splunk Hunk

Designed to run on top of Apaches Hadoop framework, Splunk's Hunk is a fully-equipped data analytics tool which can generate graphs and visual representations of the data it is fed, all manageable through a dashboard. Queries can be made against raw data through Hunk's interface, while graphs, charts and dashboards can be quickly created and shared through Hunk's interface. It also works on other databases and stores as well, including Amazon EMR, Cloudera CDH, and Hotronworks Data Platform among others.

Data Visualisation

Not everyone is adept at taking key insights from a list of data points or understanding what they mean. The best way to present your data is by turning it into data visualisations so everyone can understand what it means. Here are our top data visualisation tools.

Plotly

Plotly supports the creation of charts, presentations and dashboards from data analysed using JavaScript, Python, R, Matlab, Jupyter or Excel. A huge visualisation library and online chart creation tool makes it super-simple to create great looking graphics using a highly effective import and analysis GUI.

Advertisement
Advertisement - Article continues below

DataHero

DataHero is a simple to use visualisation tool, which can suck data from a variety of cloud services and inject them into charts and dashboards that make it easier for the entire business to understand insights. Because no coding is required, it's suitable for use by organisations without data scientists in residence.

QlikView

With a suite of capabilities on offer, QlikView allows its users to create data visualisations from all manner of data sources with self-service tools that remove the need for complex data models to be in place. Straightforward visualisation is served up by QlikView running on top of the company's own analytics platform, which can be shared with others so decision made upon trends the data revealed can be collaborative. More advanced capabilities allow for QilkView's visual analytics to be embedded into apps, while dashboards can guide people through the production of analytics reports without needed them to have an understanding of data science.

Featured Resources

The IT Pro guide to Windows 10 migration

Everything you need to know for a successful transition

Download now

Managing security risk and compliance in a challenging landscape

How key technology partners grow with your organisation

Download now

Software-defined storage for dummies

Control storage costs, eliminate storage bottlenecks and solve storage management challenges

Download now

6 best practices for escaping ransomware

A complete guide to tackling ransomware attacks

Download now
Advertisement

Recommended

Visit/email-clients/19598/hotmail-outlookcom-upgrades-your-questions-answered
Software

Hotmail.co.uk migration to Outlook.com: Qs answered

11 Nov 2019
Visit/careers/28219/it-manager-job-description-what-does-an-it-manager-do
Careers & training

IT manager job description: What does an IT manager do?

28 Oct 2019
Visit/business-strategy/31780/the-it-pro-panel
Business strategy

The IT Pro Panel

28 Oct 2019
Visit/security/ddos/28039/how-to-protect-against-a-ddos-attack
Security

How to protect against a DDoS attack

25 Oct 2019

Most Popular

Visit/security/identity-and-access-management-iam/354289/44-million-microsoft-customers-found-using
identity and access management (IAM)

44 million Microsoft customers found using compromised passwords

6 Dec 2019
Visit/cloud/microsoft-azure/354230/microsoft-not-amazon-is-going-to-win-the-cloud-wars
Microsoft Azure

Microsoft, not Amazon, is going to win the cloud wars

30 Nov 2019
Visit/network-internet/wifi-hotspots/354283/industrial-wi-fi-6-trial-reveals-blistering-speeds
wifi & hotspots

Industrial Wi-Fi 6 trial reveals blistering speeds

5 Dec 2019
Visit/hardware/354237/five-signs-that-its-time-to-retire-it-kit
Sponsored

Five signs that it’s time to retire IT kit

29 Nov 2019