We have discussed Azure Data Catalog briefly in previous post about Azure Data Services. In this post, let’s have a look at a recently launched similar service, Azure Purview.
Azure Purview is Microsoft’s new unified data governance and data discovery platform. It allows users to manage an organization-wide map of various data sources, including data stores that are on-prem, external cloud or multi-cloud as well as on the Azure platform. Though there is no clear official information available from Microsoft regarding a migration path from Azure Data Catalog to Azure Purview, some experts are calling Purview as Data Catalog v2.
Let’s have a look at the main features of Azure Purview:
1. Data Source Crawling: Once proper access has been established and data sources have been registered, Azure Purview can scan the data sources and capture the schema and metadata. Purview also allows users to schedule data source scans on a weekly or monthly frequency.
2. Data Classification: After capturing the metadata, Purview can classify the captured data automatically using the default rules or based on user-defined custom data classification rules.
3. Metadata Lineage: This is an extremely handy feature of Azure Purview, this allows the users to view a map of the data lifecycle e.g. for ETL system like Data Factory, users are able to see how the data flows from the source to the final sink.
4. Data Discovery: The biggest advantage of having all your data mapped in a data catalog is the ability for search and discover data when required. Purview provides extensive search and browsing capabilities, this enables better collaboration and reusability of existing data assets in the organization. Users are also able to navigate easily to view the related data assets.
5. Data Asset Insights: Purview comes with a built-in Insights report that provides a high-level view of the data estate, classified under various Purview features such as, scan insights, classification insights, file extension insights etc.