Cloudera Data Catalog Profilers
Profilers create metadata annotations that summarize the content and shape characteristics of the data assets (such as distribution of values in a box plot or histogram).
The Cloudera Data Catalog profiler employs Kubernetes enabled job scheduling
and runs profilers jobs on-demand.
| Profiler Name | Description |
|---|---|
| Cluster Sensitivity Profiler | The profiler automatically classifies your data with preconfigured tags, such as, PII, PCI, HIPAA and others. |
| Ranger Audit Profiler | A Ranger audit log summarizer. |
| Hive Column Profiler | Provides summary statistics like Maximum, Minimum, Mean, Unique, and Null values at the Hive column level. |
Limitations
- Cloudera Data Catalog on premises 1.5.5 SP1 or lower do not support Iceberg tables. In Cloudera Data Catalog on premises 1.5.5 SP2 or higher, Iceberg tables can be profiled.
- In Compute Cluster enabled environments, profilers only support tables which are stored on AWS S3 storage.
- Supported file formats:
- Statistics Collector profilers and Data Compliance profilers
- CSV
- Parquet
- Iceberg tables
- ORC
- Avro
- Statistics Collector profilers and Data Compliance profilers
