About the Data Catalog Profiler
The Data Catalog profiler employs Kubernetes enabled job scheduling and runs profilers jobs on-demand.
These profilers create metadata annotations that summarize the content and shape characteristics of the data assets.
Profiler Name | Description |
Cluster Sensitivity Profiler | A sensitive data profiler- PII, PCI, HIPAA and others. |
Ranger Audit Profiler | A Ranger audit log summarizer. |
Hive Column Profiler | Provides summary statistics like Maximum, Minimum, Mean, Unique, and Null values at the Hive column level. |
For example, data profilers can create summarized information about contents of an asset and also provide annotations that indicate its shape (such as distribution of values in a box plot or histogram).