Configuring the Statistics Collector profiler

Configure additional parameters for the Statistics Collector to optimize profiling tasks. Adjust settings such as scheduling, incremental profiling, and resource allocation to enhance performance and accuracy.

  1. Go to Profilers and select your data lake.
  2. Click Save to apply the configuration changes to the selected profiler.
  3. Go to Profilers > Statistics Collector > Profiler Details > Configuration > All Configurations
  4. Select a schedule to run profiler using either UNIX Cron Expression or the Basic scheduler.
    Figure 1. Profiler schedule with cron expression
    Figure 2. Profiler schedule with natural language
  5. Select Incremental Profiling when needed.

    Using Incremental Profiling can decrease the compute resources and the time needed for the profiling job by processing only the information (only Iceberg tables) updated or added since previous job.

    Using Incremental Profiling, you can refine the results from the Last Run Check. Incremental Profiling checks the data (rows) in assets, while Last Run Check filters complete assets.

  6. Select Last Run Check and set a period in Day Range if needed.
  7. Continue with the resource settings:
    1. Set the Maximum number of executors

      Indicates the number of workers that are used by the distributed computing framework. The recommended value is at least 10 executors.

    2. Set the Maximum cores per executor

      Indicates the maximum number of cores that can be allocated to an executor.

    3. Set the Executor memory limit in GBs
    4. Set the Number of driver cores

      Indicates the maximum number of driver cores. Increase the number of cores to improve the speed of profiler job scheduling.

    5. Set the Maximum driver memory in GBs

      Indicates the maximum amount of memory that can be allocated to an driver core. Increasing the available memory accelerates the profiling of larger and more complex tables and prevents out-of-memory errors.

  8. Add Asset Filtering Rules as needed to customize the selection and deselection of assets which the profiler profiles.
    1. Set your Deny List and Allow-list.
      The profiler will skip profiling assets that meet any criteria in the Deny List and will include assets that meet any criteria in the Allow List.
      1. Click Add New Rule to define new rules.
      2. Use the radio buttons to define your new rule for the Allow or Deny List.
      3. Select the key from the drop-down list and the relevant operator. You can select from the following:
        Key Operator
        Database name
        • equals
        • starts with
        • ends with
        Name (of asset)
        • equals
        • contains
        • starts with
        • ends with
        Owner (of asset)
        Creation date1
        • greater than
        • less than
        1 By Creation Date, Greater than 7 days means an asset older than seven days. Less than 7 days means an asset younger than seven days.
      4. Enter the value corresponding to the key. For example, you can enter a string as mentioned in the previous example.
      5. Click Add Rule. Once a rule is added (enabled by default), you can toggle the state of the new rule to enable it or disable it as needed.
    Figure 3. Affected Assets in Asset Filtering Rules configuration
    Job Summary shows the asset filtering rules applied for the particular profiling job: