Tuning DLM Engine

You can tune the DLM Engine for tasks such as running multiple concurrent policies and handling multiple files.

Run Multiple Concurrent Policies

Perform the following steps to run multiple concurrent policies in DLM:

Log in to Ambari.
Set the beacon_quartz_thread_pool property to a value greater than the number of policies required to run concurrently.

Handle Multiple Files

For the DLM Engine to handle multiple files that are listed, ensure that it has sufficient memory.

Under /etc/beacon/conf/beacon_env.ini, set the heap value as applicable for BEACON_SERVER_HEAP paramenter. The default value is Xmx2048m. For example, if you want to increase the memory to 4 GB, set the value to be BEACON_SERVER_HEAP= -Xmx4096m.

The default value is sufficient to handle one million files on source dataset. If you have higher number of files in source dataset, change the heap value accordingly.

HDFS replication fails with connection refused error on the HA cluster

In a HA-enabled clusters, two parameters, namely, yarn.resourcemanager.connect.max-wait.ms and yarn.resourcemanager.connect.retry-interval.ms are used to connect to Resource Manager (RM). The default values are -1 and 30 seconds respectively. In case of the first parameter, the client waits indefinitely to connect to the RM. And in case of the other parameter, YARN waits for upto 30 seconds to connect with each RM.

To overcome the connectivity problems, in Beacon, a new parameter called yarn_rm_connect_timeout is added, which can overwrite yarn.resourcemanager.connect.max-wait.ms value. The default value of yarn_rm_connect_timeout is set to120 seconds in Beacon. It ensures that all four RMs are tried. You can tune this parameter based on your setup.