About the Data Catalog Profiler

The Data Catalog profiler employs Kubernetes enabled job scheduling and runs profilers jobs on-demand.

These profilers create metadata annotations that summarize the content and shape characteristics of the data assets.

Profiler Name Description
Cluster Sensitivity Profiler A sensitive data profiler- PII, PCI, HIPAA and others.
Ranger Audit Profiler A Ranger audit log summarizer.
Hive Column Profiler Provides summary statistics like Maximum, Minimum, Mean, Unique, and Null values at the Hive column level.

For example, data profilers can create summarized information about contents of an asset and also provide annotations that indicate its shape (such as distribution of values in a box plot or histogram).