Managing Profilers
The profiler engine runs data profiling operations as a pipeline on data located in multiple data lakes. These profilers create metadata annotations that summarize the content and shape characteristics of the data assets.
Name | Profiler | Description |
---|---|---|
Hive Column |
tablestats hivecolumn |
A Hive column univariate statistical profiler. |
Hive Metastore | hive_metastore_profiler | Retrieves information about the number of hive tables that have been added every day. |
Sensitive | sensitiveinfo | A sensitive data profiler- PII, PCI, HIPAA, etc. |
Ranger Audit | audit | A Ranger audit log summarizer. |
You can edit some of the profiler configurations in Ambari via the Datalake Profiler component. Currently, you can only use pre-built profilers. You can only schedule profilers during installation.