Managing Profilers

Kubernetes enables profiler job scheduling and runs profiler jobs on-demand and on schedule.

A service called Profiler Launcher Service (PLS) is made available to launch the Data Catalog profiler. The PLS is deployed in the Control Plane during the stack installation and the Management Console application (DC-API) makes an HTTP call to schedule the jobs. PLS is authorized to schedule and run Kubernetes jobs in the targeted cluster. You must install a PLS service in each Kubernetes / OCP cluster and a single control plane application to manage all the profiler jobs.

Table 1. List of built-in profilers
Profiler Name Description
Cluster Sensitivity Profiler A sensitive data profiler- PII, PCI, HIPAA, etc.
Ranger Audit Profiler A Ranger audit log summarizer.
Hive Column Profiler Provides summary statistics like Maximum, Minimum, Mean, Unique, and Null values at the Hive column level.