How to measure data quality

Perfect data is unlikely, but there are some fundamentals you must still measure

Numbers against dark background

With data, as with most things in life, you tend to get out what you put in. This means the quality of insights that you or your business should expect to extract from data correlates with the quality of the underlying data. Gather and manage high-quality data, and you should expect to glean equally outstanding insights.

The best way to make sure that your business is maximising the potential of the data it gathers is by establishing ways to quality control the data for any errors or inconsistencies. This is much easier said than done, however.

While data should be stored in a highly structured manner using a combination of columns or rows, you may encounter challenges once you broaden your search to gather a variety of different types of information. This could be data gathered from social media sites, for example, or from other online resources. This scale of variety in the types of data you’re collecting might make it difficult to clean the data and structure it in such a way that you can easily derive insights.

In light of the compilations that can arise by adding more data to that which you already hold, it’s important to remember that quality normally trumps quantity. You should also only collect data if you’re certain it adds value.

There's more data being created as time passes, so, naturally, it might be even harder to ascertain what's actually useful and what isn't. Organisations looking for the highest quality data may need to dedicate resources to processing the data that comes in, and cleaning it so it’s ready to present and analyse for insights. This might not always be so easy to do, however. If, for instance, a business is hoping to quickly get an understanding of customer’s views on social media, you may need to sacrifice quality in the interests of speed.

All of this means that, in practice, perfect data quality is an aim that's nigh on impossible. The data you collect from various sources will be unstructured and cleaning it costs. However, that doesn't mean you shouldn't value the quality of the data you hold. While it won't be perfect, you want to ensure it's as clean as possible, so that it remains useful.

Related Resource

How MSPs build outperforming sales teams

The definitive guide to sales

How MSPs build outperforming sales teams - whitepaper from LiongardDownload now

When equipped with the key metrics of measuring data quality, enterprises know where they stand. Next would be to deploy a data quality management strategy, a process that further improves the measuring of data quality through applying the combination of the right people, processes and technologies.

So, how do I measure data quality?

There are a variety of definitions, but data quality is generally measured against a set of criteria called 'data quality dimensions' that assess the health of the data, such as completeness, or uniqueness.

In an ideal world, all these criteria would hold equal weight but depending on what you intend to use your data for, or its primary function, you may want to prioritise certain criteria more strongly than others.

Although many industries will have devised separate metrics for assessing data quality, DAMA International, the not-for-profit data resource management body, has set out its six key criteria that it considers as the standard for measuring any database against.

Completeness

Completeness is defined by DAMA as how much of a data set is populated, as opposed to being left blank. For instance, a survey would be 70% complete if it is completed by 70% of people. To ensure completeness, all data sets and data items must be recorded.

Uniqueness

This metric assesses how unique a data entry is, and whether it is duplicated anywhere else within your database. Uniqueness is ensured when the piece of data has only been recorded once. If there is no single view, you may have to dedupe it.

Timeliness

How recent is your data? This essential aspect of the DAMA criteria assesses how useful or relevant your data may be based on its age. Naturally, if an entry is dated, for instance, by 12 months, the scope for dramatic changes in the interim may render the data useless. Car mileage, which changes frequently, is a good example.

Validity

Simply put, does the data you've recorded reflect what type of data you set out to record? So if you ask for somebody to enter their phone number into a form, and they type 'sjdhsjdshsj', that data isn't valid, because it isn't a phone number - the data doesn't match the description of the type of data it should be.

Accuracy

Accuracy determines whether the information you hold is correct or not, and isn't to be confused with validity, a measure of whether the data is actually the type you wanted.

Consistency

For anyone trying to analyse data, consistency is a fundamental consideration. Basically, you need to ensure you can compare data across data sets and media (whether it's on paper, on a computer file, or in a database) - is it all recorded in the same way, allowing you to compare the data and treat it as a whole?

Remember that your data is rarely going to be perfect, and that you have to juggle managing your data quality with actually using the data - spend too long on ensuring quality, and there'll soon be no point analysing it, because it'll be far past its sell-by date.

However, you should perform regular data quality audits - especially as you're probably regularly collecting new data sets - to ensure it's as clean and useful as you can make it. Without good data, you can't rely on it to produce useful business insights and to inform good decisions.

Why measuring data quality is important

Quality data can be the difference between enterprises keeping their heads above water, and sinking. This is particularly apparent when considering competitive markets, which are typically flooded with SMBs struggling to steal slivers from giant corporations. With rivals taking advantage of data and budgets already stretched to breaking point, organisations that aren't capitalising on the opportunities strong data can provide risk being left behind.

From a purely economic perspective, as data quality is optimised so are company finances. That's because poor data needs resources to transform it into insight. Research conducted by Gartner found that organisations believe they lose an average of $15 million per year in losses related to poor quality data. Having a data strategy in place would ensure a certain quality level will be maintained, reducing these outlays.

Accurate data also allows enterprises to better understand their customers' needs. This makes for more effective marketing, with targeted campaigns reaching the desired demographics. Internal processes should improve, as when decision-makers are able to thoroughly trust the data they rely upon, better decisions can be made faster.

Companies also need to be aware of compliance regulations. In many industries, the process of storing data encroaches upon data-protection laws. The data must be protected to a standard, and must not be used for untoward purposes. With a better understanding of the data you possess, there is less chance of accidentally using data in ways that are restricted.

Featured Resources

How to choose an AI vendor

Five key things to look for in an AI vendor

Download now

The UK 2020 Databerg report

Cloud adoption trends in the UK and recommendations for cloud migration

Download now

2021 state of email security report: Ransomware on the rise

Securing the enterprise in the COVID world

Download now

The impact of AWS in the UK

How AWS is powering Britain's fastest-growing companies

Download now

Most Popular

Q&A: Enabling transformation
Sponsored

Q&A: Enabling transformation

10 Jun 2021
How to find RAM speed, size and type
Laptops

How to find RAM speed, size and type

16 Jun 2021
Millions of Volkswagen customers affected by data breach
data breaches

Millions of Volkswagen customers affected by data breach

14 Jun 2021