Cloudera Data Catalog Profilers

Profilers create metadata annotations that summarize the content and shape characteristics of the data assets (such as distribution of values in a box plot or histogram).

The Cloudera Data Catalog profiler employs Kubernetes enabled job scheduling and runs profilers jobs on-demand.


Profiler Name	Description
Cluster Sensitivity Profiler	The profiler automatically classifies your data with preconfigured tags, such as, PII, PCI, HIPAA and others.
Ranger Audit Profiler	A Ranger audit log summarizer.
Hive Column Profiler	Provides summary statistics like Maximum, Minimum, Mean, Unique, and Null values at the Hive column level.

Limitations

Cloudera Data Catalog on premises 1.5.5 SP1 or lower do not support Iceberg tables. In Cloudera Data Catalog on premises 1.5.5 SP2 or higher, Iceberg tables can be profiled.
In Compute Cluster enabled environments, profilers only support tables which are stored on AWS S3 storage.
Supported file formats:
- Statistics Collector profilers and Data Compliance profilers
  - CSV
  - Parquet
  - Iceberg tables
  - ORC
  - Avro