Understanding the Data Catalog Profiler
Data Catalog includes a profiler engine that can run data profiling operations as a pipeline on data located in multiple data lakes. You can install the profiler agent in a data lake and set up a specific schedule to generate various types of data profiles. Data profilers generate metadata annotations on the assets for various purposes.
|Cluster Sensitivity Profiler||A sensitive data profiler- PII, PCI, HIPAA and others.|
|Ranger Audit Profiler||A Ranger audit log summarizer.|
|Hive Column Profiler||Provides summary statistics like Maximum, Minimum, Mean, Unique, and Null values at the Hive column level.|
For example, data profilers can create summarized information about contents of an asset and also provide annotations that indicate its shape (such as distribution of values in a box plot or histogram).