Recommendations for bulk operations in EFM

Learn about guidelines for configuring Edge Flow Manager (EFM) properties to optimize bulk operations.

Cloudera recommends the following EFM property configurations:
  • efm.operation.monitoring.rollingBatchOperationsSize: Set to 10-20% of the total number of agents, but ensure it does not exceed 1000.
  • efm.operation.monitoring.rollingOperationsSize.update.asset,
    efm.operation.monitoring.rollingOperationsSize.update.configuration,
    efm.operation.monitoring.rollingOperationsSize.update.properties,
    efm.operation.monitoring.rollingOperationsSize.sync.resource:
    

    You can fine-tune the maximum number of simultaneous operations for each operation type. Ensure the values are aligned with efm.operation.monitoring.rollingBatchOperationsSize. If you increase the batch size, adjust the values of these properties accordingly. The total number of simultaneous operations will not exceed the value set in efm.operation.monitoring.rollingBatchOperationsSize, even if higher values are defined for the individual operation type properties.

    For properties efm.operation.monitoring.rollingOperationsSize.update.asset and efm.operation.monitoring.rollingOperationsSize.sync.resource, the optimal value depends on the size of files being transferred.

    • For small files (kilobytes magnitude), you can increase the limit similarly to the settings for efm.operation.monitoring.rollingOperationsSize.update.properties.
    • For larger files (megabyte magnitude), keep the limit low, preferably not exceeding 10, to avoid performance issues.
  • efm.operation.monitoring.rollingBatchOperationsFrequency: Based on former execution times, find the frequency where at most 25% of the rolling batch size frees up in a single iteration.
  • efm.monitor.maxHeartbeatInterval in combination with efm.operation.monitoring.inQueuedStateTimeoutHeartbeatRate: Maxheartbeatrate should be close to 75 percentile so you can keep inQueuedStateTimeoutHeartbeatRate to a value which should not be more than 3. If a higher number is needed for the rate, you should investigate those agents why those agents do not match with criterias.
  • efm.operation.monitoring.inDeployedStateTimeout in combination with efm.operation.monitoring.inDeployedStateCheckFrequency: Deployed state timeout should be 120% of the longest expected operation execution time. State check frequency should be set to such a value where EFM checks state at most 4-10 times during the operation execution.

If you combine all the configurations explained in the scenarios, you can derive an expected execution time formula: