What it is, why it matters, and best practices. This guide provides definitions, frameworks, and practical advice to help you understand and perform modern data governance.
Data governance refers to the set of roles, processes, policies and tools which ensure proper data quality throughout the data lifecycle and proper data usage across an organization. Data governance allows users to more easily find, prepare, use and share trusted datasets on their own, without relying on IT.
The primary benefit of data governance is providing the high-quality data necessary for data analytics and BI tools. The insights gained from these tools result in better business decisions and improved performance. Additional benefits include:
In addition, one of the top 10 BI and data trends this year is that regulations are now combining data management, security, privacy, and identity and access management. So, security and governance have become a top priority, especially as you share APIs and data with partners.
The three main components of a data governance framework are people, process, and technology.
PEOPLE: For your governance program, you should consider including the following roles:
PROCESS: You’ll also need formal processes (or activities) to ensure consistent execution and enforcement of the usage policies and data standards set by the steering committee. These processes can be described in flow charts which make clear inputs and tasks for each use case.
TECHNOLOGY: As the name suggests, this component refers to the tools and techniques used to efficiently maintain and manage the security, integrity, lineage, usability, and availability of data. Modern tools can automate most aspects of managing a governance program. For example, a governed data catalog profiles and documents every data source and defines who in an organization can take which actions on which data.
This 2-minute video describes how data engineers, data stewards, and data consumers work with a data catalog as part of a robust data governance process.
While you set up the framework described above, keep in mind these three best practices to ensure you’re successful right out of the gate.
Developing a data glossary (or dictionary) which defines the business terms and concepts you use in your organization will give you consistent business context across multiple tools. For example, everyone should be clear on what qualifies as a “Marketing Qualified Lead” or an “Inactive Customer”.
Mapping where your data resides will help you know which system it’s in and how it flows through your organization. Classifying your datasets based on considerations like privacy or sensitivity issues determine how your policies are applied to each dataset.
Building a clear, use case-based data catalog gives you the ability to make different kinds of data available to different kinds of users quickly, without compromising risk. Data catalogs provide information on data lineage, search functions and collaboration tools and give an indexed inventory of available data assets.
Data lineage refers to the process of tracking all changes made to data on its journey from source to current location. Data lineage tools help you understand and visualize these changes and data flows so you can know where any specific piece of data came from, how it split and merged with other data, and what transformations have been applied.
So, in a data governance framework, a data steward or data engineer would use a lineage visualization similar to the below example to know they can trust the data and/or trace any errors back to the root cause.
Governance has traditionally focused on the management of finished data such as financial close metrics, regulatory submissions, and key performance indicators. This type of data requires formal definitions and high data quality.
But today’s advanced data science and data analytics often use raw and semi-finished data. And this creates a tension between data providers and data consumers. Providers work hard to provision data responsibly, to everyone, without putting the business at risk. Consumers want data for their projects immediately.
The tiered system shown below offers a solution to this challenge. The funnel addresses different user needs with different types of data, applying increasing scrutiny and quality standards as the data works its way through the system.
This system helps the enterprise governance function focus on a breadth of understanding across the enterprise, including enabling restrictions to sensitive data, as well as a depth of understanding for a smaller number of critical data assets.
Modern data integration delivers real-time, analytics-ready and actionable data to any analytics environment, from Qlik to Tableau, Power BI and beyond.