Additional Configuration for Ranger Audit Profiler

In addition to the generic configuration, there are additional parameters for the Ranger Audit Profiler that can optionally be edited.

  1. Click Profilers in the main navigation menu on the left..
  2. Click Configs to view all of the configured profilers.
  3. Select Ranger Audit Profiler for which you need to edit the profiler configuration.

    You can use the toggle button to enable / disable the Ranger Audit Profiler.

    The Ranger Audit Profiler detail page is displayed which contains the following sections:

    • Profiler Configurations
    • Executor Configurations
    Profiler Configurations
    • Sampling configurations enables you to regulate sampling behaviour of the profilers. When an asset/table is profiled, instead of scanning the whole table, the profiler sample selects records as it finds them.
    • Sample Count: Indicates the number of times a table must be sampled for profiling. A value less than 3 and higher than 30 is not recommended.
    • Sample Factor: Controls the randomisation of records. Less value promote better random samples and higher values results in poor samples. A value 0.001 indicates that the data that is retrieved from Hive and a new random number is generated. If the value is less than or equal to the provided proportion (0.001), it will be chosen in the result set. If the value is greater, it is ignored.
    • Sample Records: Indicates the number of records to be retrieved in a given sample. Consider this as LIMIT clause of the SQL query.
    Executor Configurations

    Executor Configurations are the runtime configuration. These configuration must be changed if you are changing the Pod configurations and when there is a requirement for additional compute power.

    • Number of workers: Indicates the number of processes that are used by the distributed computing framework.

    • Number of threads per worker: Indicates the number of threads used by each worker to complete the job.

    • Worker Memory limit in GB: To avoid over utilization of memory, this parameter forces an upper threshold memory usage for a given worker. For example, if you have a 8 GB Pod and 4 threads, the value of this parameter must be 2 GB.