9 Data Catalog Capabilities You Should Know

Blog

A look under the hood of today’s data catalogs shows a lot more muscle and versatility than the data catalogs of old. Today’s data catalogs help you generate transparency about the nature and path of your data flow and delivery, and help more effectively manage your data and metadata. In this article, we highlight nine data catalog capabilities of Microsoft Purview that you might not know about.

data catalog benefits - image

Data catalogs have come a long way from their vintage spreadsheet roots. I started working with Microsoft’s data catalog – Purview – when it was still in Beta and rising from its Apache Atlas roots. I wasn’t looking for a “unifying” solution at the time, just something that would be a thousand times better than working in a spreadsheet. Purview turned out to be a nice data catalog with great lineage.

Today, data catalogs do so much more than that – from helping users discover data, to reducing the risk of data use, to proactively helping to manage compliance. A data catalog isn’t just a catalog anymore. It’s a critical component of an analytics ecosystem that helps to democratize your data lake, accelerate digital transformation and governance, and reduce time to insight.

In this article, we focus on these 9 data catalog capabilities in Microsoft Purview. Other data catalogs may have similar functionality.

  • Manage Data Use
  • Protect Sensitive Information
  • Prevent Data Loss
  • Manage the Data Lifecycle
  • Contain Insider Risk
  • Detect Regulatory or Conduct Violations
  • Audit User Activity and High-Bandwidth Data Access
  • Preserve Content for Investigations
  • Manage Compliance

1. Manage Data Use

Within Purview’s data source registration, you can choose from multiple Data Policy Enforcement options. Your selection will apply specific policies for accessing, moving, or sharing across various data sources. For example, you can apply different policies to a particular data resource based on who’s accessing that specific data resource – whether that’s operations, data owners themselves, or other users looking for self-service access.

Data use management includes 3 types of policies

  • Data Owner Access: Allow data owners to read or modify assets in your data estate from within the governance portal
  • DevOps Access: Grant or revoke IT and DevOps personnel access to system metadata efficiently and at scale
  • Self-Service Access: Allow data consumers to request access to data when browsing or searching for data
data catalog capabilities - data owner access policy example
Purview example of a sql server Data Owner Access Policy

2 - Protect Sensitive Information

You can discover, classify, and protect sensitive data anywhere it lives, so you can more effectively manage it and reduce overall risk. In Purview’s Information Protection options, you can automatically classify any type of data based on common patterns for personally identifiable information, such as social security number, email, etc. Alternatively, you can manually configure your sensitive data classifications for a specific use case. You can then establish security policies based on these classifications, using the Data Loss Prevention tools covered in the next section.

The screenshots below show how you can monitor the usage of classified information using Purview Compliance Manager.

data catalog capabilities - data classification dashboard example
Purview Example of a Data Classification Dashboard
data catalog capabilities - activity explorer example
Purview Example of Activity Explorer

3 - Prevent Data Loss

Purview’s Data Loss Prevention section enables you to protect sensitive data from exfiltration across applications, services, and devices. This helps users take the right actions when using sensitive data, ultimately helping your organization balance security and productivity.  With a Data Loss Prevention policy, you can identify, monitor, and automatically protect sensitive items across multiple platforms.

  • Microsoft 365 services such as Teams, Exchange, SharePoint, and OneDrive accounts
  • Office applications such as Word, Excel, and PowerPoint
  • Windows 10, Windows 11, and macOS (three latest released versions) endpoints
  • Non-Microsoft cloud apps
  • On-premises file shares and on-premises SharePoint
  • Power BI
data catalog capabilities - warning example
Purview example of a warning on using a Social Security number within Excel

4 - Manage the Data Lifecycle

Manage the lifecycle of your data by retaining the content you need to keep, and deleting the content you don’t. In the Data Lifecycle Management section, you can set retention policies for Microsoft 365 workloads including Exchange, SharePoint, OneDrive, Teams, and Viva Engage. You can configure whether content for these services needs to be retained indefinitely or for a specific period.

A subcomponent of Data Lifecycle Management tools is Records Management, which provides a reliable system to manage and store regulatory, legal, and business-critical records, retaining them in unalterable, compliant formats. This tool also supports versioning and retention policies.

Purview example of viewing the file plan through the compliance portal

5 - Contain Insider Risk

Purview’s Insider Risk Management tool uses machine learning to correlate various signals to identify potential malicious or inadvertent insider risks – such as IP theft, data leakage, and security violations. Now, you can create policies to manage your security and compliance.

Purview example of setting policies for insider risk management
Purview example of the dashboard showing all the insider risk alerts raised

6 - Detect Regulatory or Conduct Violations

Purview uses machine learning models and keyword matching to help you detect regulatory compliance issues (eg. SEC or FINRA) and business conduct violations such as sensitive or confidential information, harassing or threatening language, and sharing of adult content. The Communication Compliance tools work with

  • Microsoft Teams
  • Exchange Online
  • Microsoft Copilot for Microsoft 365
  • Viva Engage
  • Third-party sources like Instant Bloomberg, Whatsapp, Slack, Zoom, and SMS
data catalog capabilities - communication compliance tools example
Purview Overview of Communication Compliance Tools

7 - Audit User Activity and High-Bandwidth Data Access

Auditing tools make forensic investigations more productive and efficient. They provide insights on user activity, high-bandwidth data access, and reporting across dozens of Microsoft 365 services and solutions. You can track data captured, recorded, and retained in your organization’s unified audit log, and preserve the audit logs to meet regulatory requirements for up to 10 years.

8 - Preserve Content for Investigations

You can preserve, collect, analyze, review, and export content for both internal and external investigations. Purview’s eDiscovery tool provides an end-to-end workflow and uses intelligent machine learning capabilities such as deep indexing, email threading, and near-duplicate detection to help you narrow down large volumes of data.

9 - Manage Compliance Initiatives

Continuously assess and track the effectiveness of your compliance efforts with Purview’s Compliance Manager. It has many capabilities which can simplify compliance and reduce risk, including pre-built assessments for common standards (eg. around certain industry or governmental bodies). You also get step-by-step guidance on suggested improvements, as well as a risk-based compliance score to help you understand your compliance posture.

Being cognizant of modern data catalog capabilities can help you address the diversity, granularity, and dynamic nature of your data and metadata. Today’s data catalog capabilities allow you to reinforce data trust with cloud data governance.