<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=552770&amp;fmt=gif">

Data & Analytics

3 minute read

4 Common Data Quality Issues and How to Solve Them

Dec 5, 2018

Written by: Spinnaker Team

Data is at the heart of nearly everything we do in business. From hiring new people to determining future revenue to deciding on the right new product offering how we use data to make decisions drives business value. But what about the data itself? Decision-making is only as strong as the data you’re using. Data quality is essential.

In my career, I’ve heard these phrases many times: “We don’t have that data.” “We have that data, but it will take us a few weeks to get it.” “We don’t trust that data, so I recommend not making decisions with it.” Have you heard—or said—the same?

There are 4 root causes of bad data. Take a look at these 4 common data quality issues and how to solve them:

4 COMMON DATA QUALITY ISSUES AND HOW TO SOLVE THEM

Data quality issues most often arise from these 4 situations:

Data is not true-sourced.

The vast majority of data being used today is not true-sourced, meaning it isn't being taken directly from its original location. When the data you’re using is not true-sourced, it has gone through numerous data sources before it got to you. Just like what happened in the old-fashioned telephone game, when data passes through multiple systems, there are chances for problems to occur. One example: Data is stored in data warehouses in ways that save space or enable faster access. But the very design of the data warehouse changes the data.

Solution: Shift your data analytics and reporting infrastructure back to query directly from raw source system data. Today, many forward-thinking companies are already investing in making this happen.

Data isn’t what you think it is.

If a field isn’t well described, can you be sure the data is what you’re looking for? I once worked in a company whose data included hundreds of “risk scores,” which weren’t consistent and weren’t usable. Over the course of decades, no one kept track of the meaning of the data, how it was calculated, or where it originated. No one really knew what each “risk score” was. Today that company is investing a substantial amount of time and money in data quality, developing metadata to define in detail what every data element is, where it came from, what it means, and how it’s calculated. If that can’t be determined, it can’t be used going forward.

Solution: Invest in stringent metadata policies practices from the get-go. If you haven’t, make it a priority ASAP.

Data originates from rogue datamarts.

Datamarts are commonly created for temporary analysis. Unfortunately, they tend to outlive their effective lives. While official data repositories are constantly changing, most user-maintained datamarts don’t keep up with the changes. Despite this, many people continue to rely on temporary datamarts for reporting, even when they’re not being maintained or updated. The end result is, at best, inefficiency, and, at worst, outdated or misleading data. And misleading data, of course, leads to misleading results.

Solution: Ensure official data repositories are the only source for data analytics and reporting.

Data or code is just wrong.

As much as we define processes and policies, people make mistakes. Was data miscalculated? Was it miskeyed? Was it coded inaccurately? In my experience, when something seems odd, assume something is wrong with the data. Too many people ignore the signs. They don’t want to pull on that thread to see if there’s a bigger issue to be unraveled, because they don’t know how, they don’t have time, or they’re just lazy. In most cases, it’s nothing. But sometimes a tiny clue is indicative of a larger data problem.

Solution: Though we can’t eliminate mistakes, we can minimize them by always challenging the data we use. Assume nothing. Question everything!

How can you do more to address these data quality issues in your organization? Empower data quality czars to own data quality.

EMPOWER DATA QUALITY CZARS TO OWN DATA QUALITY

Regardless of my role, I have always viewed myself as a self-declared data quality czar. You should, too. In fact, while you’re at it, anoint everyone! Empowering everyone who uses data to be their own data quality czar will pay big dividends.

Everyone should “own” the data they need, and run to ground any issues they experience. Everyone should challenge data, and resolve problems when they find them. Don’t just ask the person who pulled the data and wipe your hands. After all, that person naturally believes it’s right since they provided it to you. Everyone should dig in to find explanations for data phenomenons and anomalies. My bottom line: Data needs to be right 100 percent of the time.

Data quality czars need to care THE MOST about data quality and be acutely aware of data quality risks. It’s their job—and, they should assume, no one else’s—to challenge the data and make sure that, to the extent possible, the information is accurate, the data elements are well documented and defined, and upstream systems and policies are being faithfully executed.

Especially as organizations continue to pursue and develop automation, AI, and the Internet of Things, data quality will be responsible for driving success—or failure.

Filling your organization with data quality czars will ensure you are primed to make the right business decisions now—and fully leverage technological opportunities in the future.