Installing LLAP on a Secured Cluster
Prerequisites
Cluster is available and is already secured with Kerberos.
Slider and ZooKeeper are installed on the cluster.
The Hadoop directory is in the same location on each node so the native binary can be accessed, which supports secure I/O.
Important | |
---|---|
|
Installing LLAP on a Secured Cluster
Review the prerequisites for installing LLAP on a secured cluster before you begin.
Ensure that user hive exists on each node, and configure the following:
Create local directories that are similar to those set up for the yarn.nodemanager.local-dirs property:
mkdir -p /grid/0/hadoop/llap/local chown -R hive /grid/0/hadoop/llap
On the "setup" node, ensure that user hive has access to its HDFS home directory:
hadoop fs -mkdir -p /user/hive hadoop fs -chown -R hive /user/hive r
Set up keytabs.
You can perform this step on the "setup" machine and distribute it to the cluster, or you can perform this step on each node. The following example shows how to perform this step on each node. Use kadmin.local if under root; otherwise, use kadmin.
On each node (specified by their fully qualified domain names), create the host and headless principals, and a keytab with each:
kadmin.local -q 'addprinc -randkey hive@EXAMPLE.COM' kadmin.local -q "addprinc -randkey hive/<fqdn>@EXAMPLE.COM" kadmin.local -q 'cpw -pw hive hive' kadmin.local -q "xst -norandkey -k /etc/security/keytabs/hive.keytab hive/<fqdn>@EXAMPLE.COM" kadmin.local -q "xst -norandkey -k /etc/security/keytabs/hive.keytab hive@EXAMPLE.COM" chown hive /etc/security/keytabs/hive.keytab
On the "setup" node, create and install, as user hive, the headless keytab for Slider:
kadmin.local -q "xst -norandkey -k hive.headless.keytab hive@EXAMPLE.COM" chown hive hive.headless.keytab kinit -kt /etc/security/keytabs/hive.keytab hive@EXAMPLE.COM slider install-keytab --keytab hive.headless.keytab --folder hive --overwrite
If you want to use web UI SSL, set up the keystore for SSL.
Note that Keystore is often set up for other web UIs: for example HiveServer2. If the keystore is not already set up, perform the following steps:
Create the Certificate Authority (CA).
On the setup node, create the CA parameters:
cat > /tmp/cainput << EOF US California Palo Alto Example Certificate Authority Certificate Authority example.com . EOF
Create the CA:
Note The JAVA_HOME must be set. The default Java truststore password must be changed.
mkdir -p /etc/security/certs/ openssl genrsa -out /etc/security/certs/ca.key 4096 cat /tmp/cainput | openssl req -new -x509 -days 36525 -key /etc/security/certs/ca.key \ -out /etc/security/certs/ca.crt echo 01 > /etc/security/certs/ca.srl echo 01 > /etc/security/certs/ca.ser keytool -importcert -noprompt -alias example-ca –keystore \ $JAVA_HOME/jre/lib/security/cacerts -storepass changeit -file \ /etc/security/certs/ca.crt rm /tmp/cainput
Create the certificate.
On the "setup" node, create the keystore parameters. In the following example, llap00 is the password specified for the new keystore:
hostname -f > /tmp/keyinput hostname -d >> /tmp/keyinput cat >> /tmp/keyinput << EOF Example Corp Palo Alto CA US yes llap00 llap00 EOF
Generate a keystore, a certificate request, and a certificate, and then import the certificate into the keystore:
cat /tmp/keyinput | keytool -genkey -alias hive -keyalg RSA -keystore \ /etc/security/certs/keystore.jks -keysize 4096 -validity 36525 -storepass llap00 keytool -certreq -alias hive -keystore /etc/security/certs/keystore.jks \ -storepass llap00 -file /etc/security/certs/server.csr openssl x509 -req -days 36525 -in /etc/security/certs/server.csr \ -CA /etc/security/certs/ca.crt -CAkey /etc/security/certs/ca.key \ -CAserial /etc/security/certs/ca.ser -out /etc/security/certs/server.crt keytool -import -alias hive -keystore /etc/security/certs/keystore.jks \ -storepass llap00 -trustcacerts -file /etc/security/certs/server.crt chown hive:hadoop /etc/security/certs/keystore.jks /etc/security/certs/server.crt chmod 640 /etc/security/certs/keystore.jks /etc/security/certs/server.crt rm /tmp/keyinput
Distribute the keystore and certificate to each node:
On each node, create the directory:
mkdir -p /etc/security/certs
Upload the files from the "setup" node:
scp … /etc/security/certs/* …@node:/etc/security/certs/
Import the CA:
chown hive:hadoop /etc/security/certs/* chmod 640 /etc/security/certs/keystore.jks /etc/security/certs/server.crt keytool -importcert -noprompt -alias example-ca -keystore \ $JAVA_HOME/jre/lib/security/cacerts -storepass changeit -file \ /etc/security/certs/ca.crt
Configure LLAP and generate the package.
Specify the following properties in the
/etc/hive/conf/hive-site.xml
file:Table 9.6. Properties to Set in hive-site.xml for Secured Clusters
Property Values hive.llap.daemon.work.dirs hive.llap.daemon.work.dirs hive.llap.daemon.keytab.file hive.llap.daemon.keytab.file hive.llap.daemon.service.principal hive.llap.daemon.service.principal hive.llap.daemon.service.ssl True hive.llap.zk.sm.principal hive@EXAMPLE.COM hive.llap.zk.sm.keytab.file /etc/security/keytabs/hive.keytab hive.llap.zk.sm.connectionString ZooKeeper connection string: for example, <machine:port,machine:port, ...> hadoop.security.authentication kerberos hadoop.security.authorization true Following is an example of these properties set in the llap-daemon-site.xml file:
<property> <name>hive.llap.daemon.work.dirs</name> <value>/grid/0/hadoop/llap/local</value> </property> <property> <name>hive.llap.daemon.keytab.file</name> <value>/etc/security/keytabs/hive.keytab</value> </property> <property> <name>hive.llap.daemon.service.principal</name> <value>hive/_HOST@EXAMPLE.COM</value> </property> <property> <name>hive.llap.daemon.service.ssl</name> <value>true</value> </property> <property> <name>hive.llap.zk.sm.principal</name> <value>hive@EXAMPLE.COM</value> </property> <property> <name>hive.llap.zk.sm.keytab.file</name> <value>>/etc/security/keytabs/hive.keytab</value> </property> <property> <name>hive.llap.zk.sm.connectionString</name> <value>127.0.0.1:2181,128.0.0.1:2181,129.0.0.1:2181</value> </property> <property> <name>hadoop.security.authentication</name> <value>kerberos</value> </property> <property> <name>hadoop.security.authorization</name> <value>true</value> </property>
Optionally, you can also use hive.llap.daemon.acl and hive.llap.management.acl to restrict access to the LLAP daemon protocols.
The Hive user must have access to both.
Specify the following properties in the
ssl-server.xml
file.Ensure that you perform this step before you create the LLAP package.
Table 9.7. Properties to Set in ssl-server.xml for LLAP on Secured Clusters
Property Values ssl.server.truststore.location Path to Java truststore: for example, /jre/lib/security/cacerts ssl.server.keystore.location /etc/security/certs/keystore.jks ssl.server.truststore.password changeit Note: This is the default password. ssl.server.keystore.password llap00 ssl.server.keystore.keypassword llap00 Following is an example of these properties set in the
ssl-server.xml
file:<property> <name>ssl.server.truststore.location</name> <value>/jre/lib/security/cacerts</value> </property> <property> <name>ssl.server.keystore.location</name> <value>/etc/security/certs/keystore.jks</value> </property> <property> <name>ssl.server.truststore.password</name> <value>strong_password</value> </property> <property> <name>ssl.server.keystore.password</name> <value>llap00</value> </property> <property> <name>ssl.server.keystore.keypassword</name> <value>llap00</value> </property>
Generate the LLAP package.
Important Ensure that JAVA_HOME and HADOOP_HOME are set before generating the package. JAVA_HOME and the site.global.library_path property in the appConfig.json configuration file are set using JAVA_HOME and HADOOP_HOME. If you see problems such as a missing native library, check the appConfig.json configuration file.
Make sure that LLAP package generation is done under user hive because some HDFS paths in the configuration are user-specific. You can modify the paths after package generation.
To generate the LLAP package, run the following command, setting parameters as described in the LLAP Package Parameters table:
hive --service llap --name <llap_svc_name> --instances <number_of_cluster_nodes> --cache <cache_size>m --xmx <heap_size>m --size ((<cache_size>+<heap_size>)*1.05)m --executors <number_of_cores> --loglevel <WARN|INFO> --args " -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA -XX:-ResizePLAB"
Table 9.8. LLAP Package Parameters
Parameter Recomended Value (based on daemons using all available node resources) --instances <cache_size> Set to number of cluster nodes that you to use for LLAP. --cache <cache_size> <YARN_maximum_container_size> - (<hive.tez.container.size> * <number_of_cores>)
<hive.tez.container.size> is the setting for this property found in the hive-site.xml file. Depending on the size of the node, specify a minimum of 1-4 GB.
--xmx <heap_size> For medium-sized nodes:
<hive.tez.container.size> * <number_of_cores> * (0.8 to 0.95)
Where <hive.tez.container.size> is the setting for this property found in the hive-site.xml file.
Ensure that the setting for --xmx is 1GB less than (<hive.tez.container.size> * <number_of_cores>).
For smaller nodes:
Use the same formula as for medium-sized nodes, but multiply by 0.8
--executors <number_of_cores> Set to the number of CPU cores available on nodes running NodeManager. Set this value even if CPU scheduling is enabled in YARN. Set the --loglevel parameter to INFO when you are troubleshooting or testing. The INFO option provides verbose output. In a production environment, set the --loglevel parameter to WARN, which only outputs a message to the logs if there is a warning or error. This makes the logs easier to read and reduces load on the node.
Note The recommended values listed in the LLAP Package Parameters table represent a sample configuration. LLAP also can be configured to use a fraction of node resources.
Specify the keytab settings in the slider-appmaster section of the appConfig.json configuration file if they have not already been specified:
"components": { "slider-appmaster": { … existing settings … "slider.hdfs.keytab.dir": ".slider/keytabs/llap", "slider.am.login.keytab.name": "hive.headless.keytab", "slider.keytab.principal.name": "hive@EXAMPLE.COM"
Validating the Installation on a Secured Cluster
Make sure that you are logged in as user hive.
Verify that the following properties are set as follows in the hive-site.xml file that is being used by HiveServer2:
hive.execution.mode = llap
hive.llap.execution.mode = all
hive.llap.daemon.service.hosts = @<llap_service_name>
From the hive user home directory, start the LLAP service:
cd ~ ./llap-slider-<date>/run.sh
<date> is the date that you generated the LLAP package. To verify that you have the correct <date>, on the node where you generated the LLAP package, make sure you are in the hive user home directory and view the subdirectories:
cd ~ ls
There is a subdirectory named llap-slider-<date>. This subdirectory contains the run.sh script you use to start the LLAP service.
As user hive, run the Hive CLI and HiveServer2 to run test queries.
If you are using the Hive CLI, you must kinit.
After running test queries, check the following:
Check the logs on YARN for the Slider application that is running LLAP.
Look for changes that indicate that LLAP is processing the test queries.
Using the ResourceManager UI, monitor the Tez AM (session) to make sure that it does not launch new containers to run the query.