Articles
AI in Data Governance
Assisting in defining business terms and identifying weak ones
Assisting in developing data quality rules
Assist in data labeling and information classification (sensitive data)
Identify gaps in data lineage
Identify objects a policy applies to
Future of Data Governance in 2024
Implementation of cloud based services
Amazon has a growing number of services that help with governing data in the cloud. As a result Data Governance teams will need to learn how to utilize them and determine which services
Implementation of AI
Currently there are no Data Governance tools taking advantage of AI to help with governance. Some examples of AI include assistive business definition creation, data labeling,
Must know Data Governance Vocabulary
Data Catalog
Metadata
Business Glossary
Business Term
Data Lineage
Data Model
Policy
Standard
Data Custodian
Data Governance Success Measures
Enterprise use of data governance application
Difference in number of data quality incidents
Difference in number of downstream impact incidents
Enterprise business term fluency
Data Quality Rules - Can AI Help?
Yes but since general AI doesn't exist yet we will need human input, it's not out of this world for a data governance application to analyze a dataset and suggest data quality rules. I'm hoping the data governance applications begin to implement something like this feature, it would reduce the amount of human involvement at the least. There could even be an ML model for specific dataset contents, for example a database field "SSN" could be read and identified as a social security field. The model would then suggest data quality rules and ask for human input, this is taking use of the data labeling concept.
Data Governance Policies
What is a Data Governance Policy?
Describes guidelines for data which the enterprise wants to follow for its data management
Why are policies needed?
Communicates what guidelines need to be followed and it will result in a positive outcome
What are some examples?
Data Retention
Classification
Data Integrity
Data Classification
Confidential
Highly Confidential
Public
Restricted
Internal Only
Data Quality
Rules
Value range
Data length, precision, scale
Data type
Distinct values
Allowed characters
Allow present/future dates
Monitoring
A process/application will need to be utilized to monitor data quality (typically daily basis)
It will need to separate the data objects with data quality issues by a specific domain, the organization will need to identify this
Responsibility
Someone will need to be identified that will take responsibility for addressing data quality issues that are discovered
Data Governance Titles
Data Governance Analyst
Data Governance Specialist
Data Governance Engineer
Data Governance Developer
Data Governance Director
Collibra
Overview
Training - The availability of training is abundant and stands out from the rest of products on the market.
GUI - Updated interface
Data Catalog
What's good about it?
Each dataset has a description which will help the user understand the content
An owner can be assigned to a dataset along with an email to reach out to them.
What's bad about it?
Regarding the schedule re-occurring scans of databases/data assets, how do we reduce cost? How do I measure cost for running?
Data Quality & Observability
What's good about it?
Data classes allow for re-usability of data quality rules for a dataset
Data Quality dashboard provides valuable metrics for data stewards to easily identify issues with their data assets and have an understanding quantitively about the issue
What's bad about it?
There's a lot of features so it's a steeper learning curve compared to other products.
Data Connections
Supports connections to popular data storage services such as Amazon AWS, Google Cloud Platform, Microsoft Azure, and more.
Data Governance Applications
Collibra
Positives
Most advanced features in the industry
Negatives
Expensive
Informatica
DataHub
About
This is an open source data catalog
Positives
No licensing costs
Negatives
No guarantee the developers will continue work and provide support
SAP Information Steward
Negatives
Customer support is poor, long resolution timelines
Product isn't competitive with the most up to date industry features
Application page URL can't be shared, it will take the user to the home page and not the page that is desired
Data Governance Tool Migration
The migration from one data governance tool to another requires 4 main things, planning, testing, communication and analysis. Following these items will lead to a successful migration.
Planning
Determine timeline for implementation
Organize the migration into phases
Develop requirements based on the vision for the design
Testing
Verify planned methods work as expected
Communication
Assign migration work to team members and discuss priorities
Discuss expectations and vision for the new tool
Communicate the timeline migrating each part of the
Inform users about the new tool and train them
Analysis
Identify data that will be migrated
Understand the architecture of the new tool