Whether you’re investing in artificial intelligence (AI), analytics, or digital transformation, the foundation of each of these large-scale projects is data. However, unless the data you use is of high-quality, you will fail to realize any success.
The simple fact is, if you put garbage in, you get garbage out, and millions of dollars of investment will go to waste. Research by IBM found that in the US alone, $3.1 trillion is lost on an annual basis due to poor data quality. High-quality data is essential not just for your next innovation effort, but for the overall prosperity of your company.
Data quality improvement is one of the key functions of data governance and the main challenge data governance managers face is how to improve the quality of data. Rather than strategic, they take a tactical approach to fix data quality problems.
The trouble is, independent users tend to focus their energy on areas they are most affected by. For example, a data governance manager will concentrate on widespread data quality issues, a project manager might be more concerned with inefficiencies in the IT asset management process, while a CFO might present a report to the board, or shareholders, and find an important piece of data is missing.
In this blog, we will not only present a strategic approach to improving data quality but a complete execution strategy too.
The quality of data can be determined using several interconnected parameters. These parameters include the consistency of the data, its timeliness or relevance, accuracy, and completeness.
There are two key reasons for bad quality data. The first is related to source systems and the second occurs at the analysis phase.
When organizations collect data with no proper controls or standardization processes in place issues can arise. These issues occur in four core areas:
During capture: Data capture is an important part of the quality control process. This initial step can set the course for a bad quality data set.
For example, if a telephone number is entered incorrectly at this stage, later along the data journey, this information could conflict with records in other systems making it very difficult to confirm the customer’s identity.
During transformation: As data passes from user to user and system to system, it is transformed. For example, when a process isn’t documented correctly, it’s impossible to track the lineage of this data efficiently, and as a result, the quality of the data suffers.
Imagine a scenario where an accounting record passes from one staff member in the finance department to another. If the first staff member fails to update the record before transferring it, they could inadvertently enable a customer to skip a due payment.
Due to timeliness: Even if the data capture stage produces high quality data, over time, it may diminish. For example, someone might provide the correct address or job title when the data is captured, but if the same individual changes their job or address these fields must be updated.
Due to inconsistent processes and standards: This occurs when you capture data from different systems using different standards. For example, when you capture a unit of measurement in one system, you might be using codes like EA or LB. In another system, different standards might be used, like EACH or POUND.
Let’s take the country code analogy to explain some of these issues in greater detail. Many systems require users to enter a country code in order to complete registration documents, make bookings, and more. In some cases, users are required to enter these codes manually instead of selecting an option from a pre-established list.
The trouble is, there is no guarantee that each user will enter the same information. In fact, it’s almost impossible. When you ask people to type this information independently, you will inadvertently create many codes for the same country, and the system will be full of conflicting data points.
|User three:||UNITED STATES|
A number of processes must be in place for data analysts to understand the quality of the data at their disposal. When they are not in place, it makes interpreting data quality very difficult.
This lack of coherence and absence of standards also affects digital transformation, the process by which companies are merged—bad data quality makes these mergers difficult. When there are no standards or common problems defined, data quality becomes a big issue.
When data quality isn’t perfect, it becomes untrustworthy, making it difficult to convince employees to use it for data-driven initiatives.
As we mentioned at the start of this blog, data quality is a core outcome of a data governance initiative. As a result, a key concern for data governance teams, groups, and departments is to improve the overall quality of data. But there is a problem: coordination.
If you talk to different people from different departments about data quality you will always get different responses. For example, if you ask an ETL developer how they measure data quality, they will probably rely on a certain set of parameters or rules that ensure that the data they enter is up to scratch.
If the quality at the source is bad, they are unlikely to flag it up, or even see it as their concern. Alternatively, if you talk to someone who deals with a CRM system, their focus will be on the consistency of data because they are unable to match conflicting terms in the system. In short, every individual sees data quality from a different perspective.
As most data quality problems occur because of issues with integrations and data transformation across multiple applications, it’s important to have an independent data quality manager, or data governance manager, to take charge of improving data quality across an organization.
Because there are so many conflicting opinions, you need an independent body to mediate and implement data quality improvement efforts company-wide, without bias, and based on a hierarchy of importance. This body can be a data governance manager, or group.
Everybody's data quality problem is highly important to the individual, but prioritizing issues requires a framework. This framework is used to determine which data quality issue has the most business impact. In general, business impact guides the prioritization process. The following is a tried and tested strategy for improving data quality: the data quality improvement lifecycle.
The first step is to define data quality standards. This is the benchmark that you will aim to work towards. This step enables you to set goals and targets and build a vision of how improving the quality of your data will ultimately grow your business and make it more successful.
For example, every time you capture a social security number, you should capture nine digits. Or, every time you collect an email address, ensure it is entered twice as a secondary confirmation step.
Next, you need to record all the data quality issues in an organization using a framework to locate the data quality problems. There are two ways to successfully do so. The first is to create a data literacy program within the company.
Once you create wide-spread literacy within an organization, you can then put in place a reporting mechanism where users can go and communicate their data quality issues. The only objective of this step is to collect the data quality issues from all sources so the data governance group will have a list of issues that must be addressed.
When capturing data quality issues, you must record the following information:
The next part is to develop a mechanism that helps you to understand the business impact of these data quality issues. This is the most important task data governance managers are required to do. They must consider the following in their evaluation:
This process enables the governance team to prioritize the issues efficiently. This prioritization process usually creates a bottleneck as it can be difficult to come to a unanimous decision.
Using the country code example, different systems could have different options, say US and USA, deeply ingrained making it difficult to choose one or the other. To come up with a decision, there needs to be a framework and at the heart of this framework is a data governance committee. This committee should be made up of leaders from all the different business units in an organization.
When a data governance manager presents a problem, it needs to be taken to the committee for appraisal. They will weigh up the problem based on many factors including cost/benefit ratio, and business impact.
When critical data quality decisions are made, some sort of change to the business process is required. This essentially results in extra work and expenditure, so it needs to be decided at a cross-departmental, impartial, committee level.
Once issues are identified and prioritized, the person responsible for approving and fixing the problem needs to conduct a further root cause analysis. This process involves asking questions, such as where does each individual problem stems from. What is the real cause of the problem?
Using the country code example, you’d need to determine how this ineffective field was causing data quality problems. Is the source of the issues the fact that the user is typing the code manually, or could it be because the company is buying in the data and has no control over it?
There are four key ways to fix data quality issues:
You can make changes in the ETL pipeline. For this, you are required to develop code that decides how the data is being processed through the integrations you have installed, otherwise known as ETL logic. Using the country code example again, the United States and the USA are converted into the US.
Another option is to make a change to a particular process. For example, the process of selecting data changes in a country code field. Instead of requiring users to enter country codes manually, you can add a dropdown menu so there is no other option but to choose the right code for the country you selected.
The fourth method is called master data and reference data management. Well-defined data quality issues are made evident when the master data is missing. For example, you may need to enter a customer name field manually because the correct master data isn’t there, so there is no other way.
In another example, if a customer is coming to you from two separate systems, they will be registered in two different places. The email could match, but everything else could be wrong because of issues, such as misspellings.
One common master data management solution is to create a single place where all the master data is stored that other systems can reference using keys. Master data management requires a lot of funding and can be rather complex, but it is very efficient.
Reference data is usually lists that can be referenced by master data. They tend to be relatively static, unlike master data. Taking measures to manage reference data, such as access controls and relationship mapping, will also help improve the quality of your data.
The final step in the process is to write a set of data quality rules. These will ensure that if this issue arises again, a notification or ticket is created to address the problem.
With a notification like this, it makes it much easier to deal with the problem quickly rather than having to consult multiple people and conduct complex analysis.
Hey, at OvalEdge, we are determined to make a business find and work wonders with data that matters. Would you like us to work with you?ask for a demo
Fill the information below to set up a demo.