Deployment Considerations
Memory Considerations for Running One Component on a Node
You can adjust the amount of memory given to a component to achieve mutual exclusion of components, depending upon the NodeManager configuration on each node. Typically all nodes have the same value independent of the actual memory.
Assuming the memory capacity for each NodeManager is known
(yarn.nodemanager.resource.memory-mb
), you can configure the component
to ask for 51% (basically more than half) of the maximum capacity. You also need to ensure
that the maximum possible memory allocation
(yarn.scheduler.maximum-allocation-mb
) allows that value.
For example, if yarn.nodemanager.resource.memory-mb
=
yarn.scheduler.maximum-allocation-mb
= 2048 Set
yarn.memory
= 1280 for the ACCUMULO_MASTER
and
ACCUMULO_TSERVER
properties in the resources.json
file.
Then in the appConfig.json
file, set the
ACCUMULO_MASTER
and ACCUMULO_TSERVER
heap sizes
(including the ACCUMULO_TSERVER
off-heap memory properties, if native
maps are enabled) to be 256 MB less than the memory requested for the YARN containers to
allow for the agent memory consumption -- the agent should not use more than 100 MB, but you
can assume that it consumes ~256 MB. So you can set the ACCUMULO_MASTER
and ACCUMULO_TSERVER
variables for memory limit to fit within 1024 MB.
Log Aggregation
Log aggregation is specified in the yarn-site.xml
file. The
yarn.log-aggregation-enable
property enables log aggregation for
running applications. If a monitoring interval is also set, it will aggregate logs while an
application is running, with the specified interval. The minimum i nterval is 3600 seconds.
<property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name> <value>3600</value> </property>
Log aggregation is specified in the global
section of
resources.json
:
"global": { "yarn.log.include.patterns": "", "yarn.log.exclude.patterns": "" },
If yarn.log.include.patterns
is empty, all container logs are
included. You can specify the name(s) of log files (for example,
agent.log
) that you do not want to aggregate using
yarn.log.exclude.patterns
.
The aggregated logs are stored in the HDFS /app-logs/
directory. The
following command can be used to retrieve the logs:
yarn logs -applicationId <app_id>
Reserving Nodes for Accumulo
You can use YARN node labels to reserve cluster nodes for applications and their
components. You could use node labels to reserve cluster nodes for Accumulo to ensure that
ACCUMULO_MASTER
and ACCUMULO_TSERVER
provide a
consistent performance level.
Node labels are specified with the yarn.label.expression
property. If
no label is specified, only non-labeled nodes are used when allocating containers for
component instances.
A brief summary is that you could label particular YARN nodes for Accumulo, say with
labels “accumulo1” and “accumulo1_master”, and create a separate queue for assigning
containers to these nodes. To use these labeled nodes, you would add
yarn.label.expression
parameters to the Accumulo components in your
resources.json
file (including the slider-appmaster), e.g.
"yarn.label.expression": "accumulo1_master"
. When
you run the slider create
command for your Accumulo cluster, you would
add the parameter “--queue <queue_name>”
.
Configuring Accumulo for SSL
Accumulo can be configured to use SSL (Secure Sockets Layer) for its internal RPC communications, and its monitor web UI can also be configured to use SSL. The Slider Accumulo application package is set up to use the same SSL certs for both, although the features can be enabled independently.
The SSL certificates must be provided. To distribute the certificates in HDFS, upload
them to the directory specified in “site.global.ssl_cert_dir”
(by
default, /user/<user name>/ssl
). There should be a
truststore.jks
file and a .jks
file for each host
named <hostname>.jks
. You must add the passwords for the
certificates to the credential provider by adding them to the list in the “credentials”
section of the appConfig.json
file as shown below. To turn on SSL, set
the Accumulo SSL properties below to true
. To turn on https for the
monitor UI, change the monitor_protocol
to https
.
The properties are as follows:
"global": { "site.global.ssl_cert_dir": "ssl", "site.global.monitor_protocol": "http", "site.accumulo-site.instance.rpc.ssl.enabled": "false", "site.accumulo-site.instance.rpc.ssl.clientAuth": "false", }, "credentials": { "jceks://hdfs/user/${USER}/accumulo-${CLUSTER_NAME}.jceks": ["root.initial.password", "instance.secret", "trace.token.property.password"] },
Change these to:
"global": { "site.global.ssl_cert_dir": "ssl", "site.global.monitor_protocol": "https", "site.accumulo-site.instance.rpc.ssl.enabled": "true", "site.accumulo-site.instance.rpc.ssl.clientAuth": "true", }, "credentials": { "jceks://hdfs/user/${USER}/accumulo-${CLUSTER_NAME}.jceks": ["root.initial.password", "instance.secret", "trace.token.property.password", "rpc.javax.net.ssl.keyStorePassword", "rpc.javax.net.ssl.trustStorePassword", "monitor.ssl.keyStorePassword", "monitor.ssl.trustStorePassword"] },
If you would like to distribute the certs yourself rather than through HDFS, simply set
the following properties to the locations of the .jks
files in the local
files ystem (the keystore file should have the same name on all hosts for this
configuration).
"global": { "site.accumulo-site.rpc.javax.net.ssl.keyStore": "<keystore file>", "site.accumulo-site.rpc.javax.net.ssl.trustStore": "<truststore file>", },
If clientAuth
is enabled, you must have a
client.conf
file in your client Accumulo conf directory, or a
.accumulo/config
file in your home directory. Your
keystore.jks
and truststore.jks
SSL certs for the
client can be placed in an ssl directory in the Accumulo conf
directory
(or their locations can be specified by also specifying the
rpc.javax.net.ssl.keyStore
and
rpc.javax.net.ssl.trustStore
properties). If the client user is the
same user that created the Accumulo cluster, it can use the same credential provider as the
app, jceks://hdfs/user/<user name>/accumulo-<cluster
name>.jceks
, but otherwise the client user will have to create their own
credential provider using the hadoop credential command. The user must set the credential
provider in their client.conf
file, and make sure the specified
credential provider contains the rpc.javax.net.ssl.keyStorePassword
and
rpc.javax.net.ssl.trustStorePassword
.
A client.conf
file for the Accumulo instance can be retrieved with
the following command:
slider registry --getconf client --name <cluster name> --format properties --dest <path to local client.conf>
Building Accumulo Native Libraries
Accumulo performs better when it uses a native in-memory map for newly written data. To
build the native libraries for your system, perform the following steps to add them to your
application package. Then set the
“site.accumulo-site.tserver.memory.maps.native.enabled”
property to
true
in your appConfig.json
file, and be sure to
adjust the ACCUMULO_TSERVER
heapsize parameter so that it no longer
includes the tserver.memory.maps.max
memory.
unzip <app package name>.zip package/files/accumulo*gz cd package/files/ gunzip accumulo-<version>-bin.tar.gz tar xvf accumulo-<version>-bin.tar accumulo-1.6.0/bin/build_native_library.sh tar uvf accumulo-<version>-bin.tar accumulo-<version> rm -rf accumulo-<version> gzip accumulo-<version>-bin.tar cd ../../ zip <app package name>.zip -r package rm -rf package