Data Catalog Features
Collects and Organizes All Metadata
The first step for building a data catalog is collecting the data’s metadata. Data catalogs use metadata to identify the data tables, files, and databases. The catalog crawls the company’s databases and brings the metadata (not the actual data) to the data catalog.
A Data Catalog can typically crawl Data Management Platforms such as:
- Relational Databases – Oracle, SQL Server, MySQL, DB2, etc.
- Data Warehouses – Teradata, Vertica, etc.
- Object Storage
- Cloud Platforms – Google Big Query, MS Azure Data Lake, AWS – Athena & Red Shift
- Non-Relational / NoSQL Databases- Cassandra, MongoDB
- Hadoop Distributions
Analytics and Business Intelligence Platforms such as:
- Modern Business Intelligence Platforms
- Analytic Applications
And Custom Applications
Shows Data Profile
By looking at the profile of data, consumers view and understand the data quickly. These profiles are informative summaries that explain the data. For example, the profile of a database often includes the number of tables, files, row counts, etc... For a table, the profile may include column description, top values in a column, null count of a column, distinct count, maximum value, minimum value, and much more.
Builds Data Lineage
Data Lineage is a visual representation of where the data is coming from, where it moves, and what transformations it undergoes over time. It provides the ability to track, manage and view the data transformation along its path from source to destination. Hence, it enables the analyst to trace errors back to the root cause in the analytics.
Marks Relationships Amongst Data
Through this feature, data consumers can discover related data across multiple databases. For example, an analyst may need consolidated customer information. Through the data catalog, she finds that five files in five different systems have customer data.
With a data catalog and the help of IT, one can have an experimental area where you can join all the data and clean it. Then one can use that consolidated customer data to achieve your business goals.
Houses a Business Glossary
A data catalog is an apt platform to host a business glossary and make it available across an organization. A business glossary is a document that enables data stewards to build and manage a common business vocabulary. This vocabulary can be linked to the underlying technical metadata to provide a direct association between business terms and objects.
Tags Data Through AI
This feature enables PII to be found quickly through the use of AI, automatically masking the information. Assuring privacy compliance changes from a process that could take several weeks, or even months, to a few days.