Memory Considerations for Running One Component on a Node
You can adjust the amount of memory given to a component to achieve mutual exclusion of components, depending upon the NodeManager configuration on each node. Typically all nodes have the same value independent of the actual memory.
Assuming the memory capacity for each NodeManager is known (yarn.nodemanager.resource.memory-mb
), you can configure the component to ask for 51% (basically more than half) of the maximum capacity. You also need to ensure that the maximum possible memory allocation (yarn.scheduler.maximum-allocation-mb
) allows that value.
For example, if yarn.nodemanager.resource.memory-mb
= yarn.scheduler.maximum-allocation-mb
= 2048 Set yarn.memory
= 1280 for the ACCUMULO_MASTER
and ACCUMULO_TSERVER
properties in the resources.json
file.
Then in the appConfig.json
file, set the ACCUMULO_MASTER
and ACCUMULO_TSERVER
heap sizes (including the ACCUMULO_TSERVER
off-heap memory properties, if native maps are enabled) to be 256 MB less than the memory requested for the YARN containers to allow for the agent memory consumption -- the agent should not use more than 100 MB, but you can assume that it consumes ~256 MB. So you can set the ACCUMULO_MASTER
and ACCUMULO_TSERVER
variables for memory limit to fit within 1024 MB.
Log Aggregation
This feature is backed by https://issues.apache.org/jira/browse/YARN-2468.
Log aggregation is specified in the yarn-site.xml
file.
The yarn.log-aggregation-enable
property enables log aggregation
for running applications. If a monitoring interval is also set, it will aggregate
logs while an application is running, with the specified interval. The minimum i
nterval is 3600 seconds.
<property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name> <value>3600</value> </property>
Log aggregation is specified in the global
section
of resources.json
:
"global": { "yarn.log.include.patterns": "", "yarn.log.exclude.patterns": "" },
If yarn.log.include.patterns
is empty, all container
logs are included. You can specify the name(s) of log files (for example,
agent.log
) that you do not want to aggregate using
yarn.log.exclude.patterns
.
The aggregated logs are stored in the HDFS /app-logs/
directory. The following command can be used to retrieve the logs:
yarn logs -applicationId <app_id>
Reserving Nodes for Accumulo
You can use YARN node labels to reserve cluster nodes for applications and their components. You could use node labels to reserve cluster nodes for Accumulo to ensure that ACCUMULO_MASTER
and ACCUMULO_TSERVER
provide a consistent performance level.
Node labels are specified with the yarn.label.expression
property. If no label is specified, only non-labeled nodes are used when allocating containers for component instances.
A brief summary is that you could label particular YARN nodes for Accumulo, say with labels “accumulo1” and “accumulo1_master”, and create a separate queue for assigning containers to these nodes. To use these labeled nodes, you would add yarn.label.expression
parameters to the Accumulo components in your resources.json
file (including the slider-appmaster), e.g. "yarn.label.expression": "accumulo1_master"
. When you run the slider create
command for your Accumulo cluster, you would add the parameter “--queue <queue_name>”
.
Configuring Accumulo for SSL
Accumulo can be configured to use SSL (Secure Sockets Layer) for its internal RPC communications, and its monitor web UI can also be configured to use SSL. The Slider Accumulo application package is set up to use the same SSL certs for both, although the features can be enabled independently.
The SSL certificates must be provided. To distribute the certificates in HDFS, upload them to the directory specified in “site.global.ssl_cert_dir”
(by default, /user/<user name>/ssl
). There should be a truststore.jks
file and a .jks
file for each host named <hostname>.jks
. You must add the passwords for the certificates to the credential provider by adding them to the list in the “credentials” section of the appConfig.json
file as shown below. To turn on SSL, set the Accumulo SSL properties below to true
. To turn on https for the monitor UI, change the monitor_protocol
to https
.
The properties are as follows:
"global": { "site.global.ssl_cert_dir": "ssl", "site.global.monitor_protocol": "http", "site.accumulo-site.instance.rpc.ssl.enabled": "false", "site.accumulo-site.instance.rpc.ssl.clientAuth": "false", }, "credentials": { "jceks://hdfs/user/${USER}/accumulo-${CLUSTER_NAME}.jceks": ["root.initial.password", "instance.secret", "trace.token.property.password"] },
Change these to:
"global": { "site.global.ssl_cert_dir": "ssl", "site.global.monitor_protocol": "https", "site.accumulo-site.instance.rpc.ssl.enabled": "true", "site.accumulo-site.instance.rpc.ssl.clientAuth": "true", }, "credentials": { "jceks://hdfs/user/${USER}/accumulo-${CLUSTER_NAME}.jceks": ["root.initial.password", "instance.secret", "trace.token.property.password", "rpc.javax.net.ssl.keyStorePassword", "rpc.javax.net.ssl.trustStorePassword", "monitor.ssl.keyStorePassword", "monitor.ssl.trustStorePassword"] },
If you would like to distribute the certs yourself rather than through HDFS, simply set the following properties to the locations of the .jks
files in the local files ystem (the keystore file should have the same name on all hosts for this configuration).
"global": { "site.accumulo-site.rpc.javax.net.ssl.keyStore": "<keystore file>", "site.accumulo-site.rpc.javax.net.ssl.trustStore": "<truststore file>", },
If clientAuth
is enabled, you must have a client.conf
file in your client Accumulo conf directory, or a .accumulo/config
file in your home directory. Your keystore.jks
and truststore.jks
SSL certs for the client can be placed in an ssl directory in the Accumulo conf
directory (or their locations can be specified by also specifying the rpc.javax.net.ssl.keyStore
and rpc.javax.net.ssl.trustStore
properties). If the client user is the same user that created the Accumulo cluster, it can use the same credential provider as the app, jceks://hdfs/user/<user name>/accumulo-<cluster name>.jceks
, but otherwise the client user will have to create their own credential provider using the hadoop credential command. The user must set the credential provider in their client.conf
file, and make sure the specified credential provider contains the rpc.javax.net.ssl.keyStorePassword
and rpc.javax.net.ssl.trustStorePassword
.
A client.conf
file for the Accumulo instance can be retrieved with the following command:
slider registry --getconf client --name <cluster name> --format properties --dest <path to local client.conf>
Building Accumulo Native Libraries
Accumulo performs better when it uses a native in-memory map for newly written data. To build the native libraries for your system, perform the following steps to add them to your application package. Then set the “site.accumulo-site.tserver.memory.maps.native.enabled”
property to true
in your appConfig.json
file, and be sure to adjust the ACCUMULO_TSERVER
heapsize parameter so that it no longer includes the tserver.memory.maps.max
memory.
unzip <app package name>.zip package/files/accumulo*gz cd package/files/ gunzip accumulo-<version>-bin.tar.gz tar xvf accumulo-<version>-bin.tar accumulo-1.6.0/bin/build_native_library.sh tar uvf accumulo-<version>-bin.tar accumulo-<version> rm -rf accumulo-<version> gzip accumulo-<version>-bin.tar cd ../../ zip <app package name>.zip -r package rm -rf package