Configuring Accumulo on YARN
Accessing the Accumulo Configuration Files
The Accumulo application package includes default application and resource specification files. The package includes both non-secure (appConfig-default.json
) and secure (appConfig-secured-default.json
) versions of the application specification. You can save these files as a another name, and then edit the files to customize the Accumulo configuration.
You can use the unzip
command to extract the Accumulo application and resource specification files from the Accumulo-on-Slider application package. For example, you would use the following command to extract the files from the Accumulo application package in the /usr/work/app-packages/accumulo
directory:
unzip /usr/work/app-packages/accumulo/slider-app-packages/accumulo/slider-accumulo-app-package-1.7.0.2.3.2.0-2950.zip appConfig-default.json -d /usr/work/app-packages/accumulo unzip /usr/work/app-packages/accumulo/slider-app-packages/accumulo/slider-accumulo-app-package-1.7.0.2.3.2.0-2950.zip resources-default.json -d /usr/work/app-packages/accumulo
You can use the following commands to copy and rename the default Storm application and resource specification files in the /usr/work/app-packages/accumulo
directory:
cp /usr/work/app-packages/accumulo/appConfig-default.json /usr/work/app-packages/accumulo/appConfig.json cp /usr/work/app-packages/accumulo/resources-default.json /usr/work/app-packages/accumulo/resources.json
Application Configuration for Accumulo on YARN
The following is an example of an appConfig.json
file for
Accumulo on YARN via Slider. The basic properties to adjust for your system are
the heap size, the Accumulo memory properties, and the location of JAVA_HOME.
The directories and classpaths are configured properly for HDP in the default
appConfig-default.json
file, but you must set the JAVA_HOME
value in the "global" section of the appConfig.json
file to match your system JAVA_HOME setting.
{ "schema": "http://example.org/specification/v2.0.0", "metadata": { }, "global": { "application.def": ".slider/package/ACCUMULO/slider-accumulo-app-package-1.7.0.2.3.0.0-2557.zip", "java_home": "/usr/hadoop-jdk1.6.0_31", "site.global.app_root": "${AGENT_WORK_ROOT}/app/install/accumulo-1.7.0.2.3.0.0-2557", "site.global.app_user": "${USER}", "site.global.user_group": "hadoop", "site.accumulo-env.java_home": "${JAVA_HOME}", "site.accumulo-env.tserver_heapsize": "256m", "site.accumulo-env.master_heapsize": "128m", "site.accumulo-env.monitor_heapsize": "64m", "site.accumulo-env.gc_heapsize": "64m", "site.accumulo-env.other_heapsize": "128m", "site.accumulo-env.hadoop_prefix": "/usr/hdp/current/hadoop-client", "site.accumulo-env.hadoop_conf_dir": "/etc/hadoop/conf", "site.accumulo-env.zookeeper_home": "${zk.dir}", "site.client.instance.name": "${USER}-${CLUSTER_NAME}", "site.global.accumulo_root_password": "NOT_USED", "site.global.ssl_cert_dir": "ssl", "site.global.monitor_protocol": "http", "site.accumulo-site.instance.volumes": "${DEFAULT_DATA_DIR}/data", "site.accumulo-site.instance.zookeeper.host": "${ZK_HOST}", "site.accumulo-site.instance.security.authenticator": "org.apache.slider.accumulo.CustomAuthenticator", "site.accumulo-site.general.security.credential.provider.paths": "jceks://hdfs/user/${USER}/accumulo-${CLUSTER_NAME}.jceks", "site.accumulo-site.instance.rpc.ssl.enabled": "false", "site.accumulo-site.instance.rpc.ssl.clientAuth": "false", "site.accumulo-site.general.kerberos.keytab": "", "site.accumulo-site.general.kerberos.principal": "", "site.accumulo-site.tserver.memory.maps.native.enabled": "false", "site.accumulo-site.tserver.memory.maps.max": "80M", "site.accumulo-site.tserver.cache.data.size": "7M", "site.accumulo-site.tserver.cache.index.size": "20M", "site.accumulo-site.tserver.sort.buffer.size": "50M", "site.accumulo-site.tserver.walog.max.size": "40M", "site.accumulo-site.trace.user": "root", "site.accumulo-site.master.port.client": "0", "site.accumulo-site.trace.port.client": "0", "site.accumulo-site.tserver.port.client": "0", "site.accumulo-site.gc.port.client": "0", "site.accumulo-site.monitor.port.client": "${ACCUMULO_MONITOR.ALLOCATED_PORT}", "site.accumulo-site.monitor.port.log4j": "0", "site.accumulo-site.master.replication.coordinator.port": "0", "site.accumulo-site.replication.receipt.service.port": "0", "site.accumulo-site.general.classpaths": "$ACCUMULO_HOME/lib/accumulo-server.jar,\n$ACCUMULO_HOME/lib/accumulo-core.jar,\n$ACCUMULO_HOME/lib/accumulo-start.jar,\n$ACCUMULO_HOME/lib/accumulo-fate.jar,\n$ACCUMULO_HOME/lib/accumulo-proxy.jar,\n$ACCUMULO_HOME/lib/[^.].*.jar,\n$ZOOKEEPER_HOME/zookeeper[^.].*.jar,\n$HADOOP_CONF_DIR,\n$HADOOP_PREFIX/[^.].*.jar,\n$HADOOP_PREFIX/lib/[^.].*.jar,\n$HADOOP_PREFIX/share/hadoop/common/.*.jar,\n$HADOOP_PREFIX/share/hadoop/common/lib/.*.jar,\n$HADOOP_PREFIX/share/hadoop/hdfs/.*.jar,\n$HADOOP_PREFIX/share/hadoop/mapreduce/.*.jar,\n$HADOOP_PREFIX/share/hadoop/yarn/.*.jar,\n/usr/hdp/current/hadoop-client/.*.jar,\n/usr/hdp/current/hadoop-client/lib/.*.jar,\n/usr/hdp/current/hadoop-hdfs-client/.*.jar,\n/usr/hdp/current/hadoop-mapreduce-client/.*.jar,\n/usr/hdp/current/hadoop-yarn-client/.*.jar," }, "credentials": { "jceks://hdfs/user/${USER}/accumulo-${CLUSTER_NAME}.jceks": ["root.initial.password", "instance.secret", "trace.token.property.password"] }, "components": { "slider-appmaster": { "jvm.heapsize": "256M", "slider.am.keytab.local.path": "", "slider.keytab.principal.name": "" } } }
Resource Components in Accumulo on YARN
You can specify the following components (also referred to as "roles") when deploying Accumulo on YARN via Slider:
ACCUMULO_MASTER
⎯ Accumulo master process.ACCUMULO_TSERVER
⎯ Accumulo tablet server process.ACCUMULO_MONITOR
⎯ Accumulo monitor web UIACCUMULO_GC
⎯ Accumulo garbage collector processACCUMULO_TRACER
⎯ Accumulo trace collector process
The following is an example of an Accumulo resources.json
file with these roles configured:
{ "schema": "http://example.org/specification/v2.0.0", "metadata": { }, "global": { "yarn.log.include.patterns": "", "yarn.log.exclude.patterns": "" }, "components": { "ACCUMULO_MASTER": { "yarn.role.priority": "1", "yarn.component.instances": "1", "yarn.memory": "256" }, "slider-appmaster": { }, "ACCUMULO_TSERVER": { "yarn.role.priority": "2", "yarn.component.instances": "1", "yarn.memory": "512" }, "ACCUMULO_MONITOR": { "yarn.role.priority": "3", "yarn.component.instances": "1", "yarn.memory": "128" }, "ACCUMULO_GC": { "yarn.role.priority": "4", "yarn.component.instances": "1", "yarn.memory": "128" }, "ACCUMULO_TRACER": { "yarn.role.priority": "5", "yarn.component.instances": "1", "yarn.memory": "256" } } }
The memory and number of instances of each component should be adjusted
for your system and desired application instance size. You typically only
need to request one instance of the ACCUMULO_MONITOR
,
ACCUMULO_GC
, and ACCUMULO_TRACER
processes.
For HA (High Availability) purposes, you will generally want two instances of
ACCUMULO_MASTER
, and enough instances of ACCUMULO_TSERVER
to support your application.