About Data Catalog

Data Catalog helps you understand data lying in your data lake (Private Cloud Base Cluster). You can search to locate relevant data of interest based on various parameters. Using Data Catalog, you can understand how data is interpreted for use, how it is created and modified, and how data access is secured and protected.

Data Catalog is a service within Cloudera Data Platform that enables you to understand, manage, secure, and govern data assets across the enterprise.

Data Catalog enables data stewards across the enterprise to work with data assets in the following ways:
  • Organize and curate data globally

      • Organize data based on business classifications, purpose, protections needed, etc.

      • Promote responsible collaboration across enterprise data workers

  • Understand where relevant data is located

      • Catalog and search to locate relevant data of interest (sensitive data, commonly used, high risk data, etc.)

      • Understand what types of sensitive personal data exists and where it is located
  • Understand how data is interpreted for use

      • View basic descriptions: schema, classifications (business cataloging), and encodings

      • View statistical models and parameters

      • View user annotations, wrangling scripts, view definitions etc.

  • Understand how data is created and modified

      • Visualize upstream lineage and downstream impact

      • Understand how schema or data evolve

      • View and understand data supply chain (pipelines, versioning, and evolution)

  • Understand how data access is secured, protected, and audited

      • Understand who can see which data and metadata (for example, based on business classifications) and under what conditions (security policies, data protection, anonymization)

      • View who has accessed what data from a forensic audit or compliance perspective

      • Visualize access patterns and identify anomalies