Tuning DLM Engine
You can tune the DLM Engine for tasks such as running multiple concurrent policies and handling multiple files.
Run Multiple Concurrent Policies
- Log in to Ambari.
- Set the beacon_quartz_thread_pool property to a value greater than the number of policies required to run concurrently.
- In Ambari, under DLM Engine >
Configs, set the value of Beacon Store Max
Connections to
number of parallel replication policies + 10
.
Handle Multiple Files
For the DLM Engine to handle multiple files that are listed, ensure that it has sufficient memory.
Under /etc/beacon/conf/beacon_env.ini
, set the heap value as
applicable for BEACON_SERVER_HEAP
paramenter. The default value is
Xmx2048m
. For example, if you want to increase the memory to 4
GB, set the value to be BEACON_SERVER_HEAP= -Xmx4096m
.
The default value is sufficient to handle one million files on source dataset. If you have higher number of files in source dataset, change the heap value accordingly.
HDFS replication fails with connection refused error on the HA cluster
In a HA-enabled clusters, two parameters, namely,
yarn.resourcemanager.connect.max-wait.ms
and
yarn.resourcemanager.connect.retry-interval.ms
are used to
connect to Resource Manager (RM). The default values are -1 and 30 seconds
respectively. In case of the first parameter, the client waits indefinitely to
connect to the RM. And in case of the other parameter, YARN waits for upto 30
seconds to connect with each RM.
To overcome the connectivity problems, in Beacon, a new parameter called
yarn_rm_connect_timeout
is added, which can overwrite
yarn.resourcemanager.connect.max-wait.ms
value. The default
value of yarn_rm_connect_timeout
is set to120 seconds in Beacon. It
ensures that all four RMs are tried. You can tune this parameter based on your
setup.