6. Setting up Wire Encryption in Hadoop

To set up Wire Encryption for Hadoop.

  1. Create HTTPS certificates and keystore/truststore files.

    1. For each host in the cluster, create a directory for storing the keystore and truststore. For example, SERVER_KEY_LOCATION. Also create a directory to store public certificate, for example, CLIENT_KEY_LOCATION.

      mkdir -p $SERVER_KEY_LOCATION ; mkdir -p $CLIENT_KEY_LOCATION
                          E.g: ssh host1.hwx.com “mkdir -p /etc/security/serverKeys ; mkdir -p /etc/security/clientKeys ; ”

    2. For each host, create a keystore file.

       cd $SERVER_KEY_LOCATION ; keytool -genkey -alias $hostname -keyalg RSA -keysize 1024 -dname \"CN=$hostname,OU=hw,O=hw,L=paloalto,ST=ca,C=us\" -keypass $SERVER_KEYPASS_PASSWORD -keystore $KEYSTORE_FILE -storepass $SERVER_STOREPASS_PASSWORD\”

    3. For each host, export the certificate public key to a certificate file.

      cd $SERVER_KEY_LOCATION ; keytool -export -alias $hostname -keystore $KEYSTORE_FILE -rfc -file $CERTIFICATE_NAME -storepass $SERVER_STOREPASS_PASSWORD\”

    4. For each host, import the certificate into truststore file.

      cd $SERVER_KEY_LOCATION ; keytool -import -noprompt -alias $hostname -file $CERTIFICATE_NAME -keystore $TRUSTSTORE_FILE -storepass $SERVER_TRUSTSTORE_PASSWORD

    5. Create a single truststore file containing the public key from all certificates. Login to host1 and import the truststore file for host1.

      keytool -import -noprompt -alias $host -file $CERTIFICATE_NAME -keystore $ALL_JKS -storepass $CLIENT_TRUSTSTORE_PASSWORD

      Copy $ALL_JKS from host1 to other hosts, and repeat the above command. For example, for a 2-node cluster with host1 and host2:

      1. Create $ALL_JKS on host1.

        keytool -import -noprompt -alias $host -file $CERTIFICATE_NAME -keystore $ALL_JKS -storepass $CLIENT_TRUSTSTORE_PASSWORD

      2. Copy over $ALL_JKS from host1 to host2. $ALL_JKS already has the certificate entry of host1.

      3. Import certificate entry of host2 to $ALL_JKS using same command as before:

        keytool -import -noprompt -alias $host -file $CERTIFICATE_NAME -keystore $ALL_JKS -storepass $CLIENT_TRUSTSTORE_PASSWORD
      4. Copy over the updated $ALL_JKS from host2 to host1.

        [Note]Note

        Repeat these steps each time for each node in the cluster. When you are finished, the $ALL_JKS file on host1 will have the certificates of all nodes.

      5. Copy over the $ALL_JKS file from host1 to all the nodes.

    6. Validate the common truststore file on all hosts.

      keytool -list -v -keystore $ALL_JKS -storepass $CLIENT_TRUSTSTORE_PASSWORD

    7. Set permissions and ownership on the keys:

      chgrp -R $YARN_USER:hadoop $SERVER_KEY_LOCATION
      chgrp -R $YARN_USER:hadoop $CLIENT_KEY_LOCATION
      chown 755 $SERVER_KEY_LOCATION
      chown 755 $CLIENT_KEY_LOCATION
      chown 440 $KEYSTORE_FILE
      chown 440 $TRUSTSTORE_FILE 
      chown 440 $CERTIFICATE_NAME
      chown 444 $ALL_JKS

      [Note]Note

      The complete path of the $SEVER_KEY_LOCATION and the CLIENT_KEY_LOCATION from the root directory /etc must be owned by the $YARN_USER user and the hadoop group.

  2. Enable HTTPS by setting the following properties.

    1. Set the following properties in core-site.xml. For example if you are using Ambari, set the properties as:

      hadoop.ssl.require.client.cert=false
      hadoop.ssl.hostname.verifier=DEFAULT
      hadoop.ssl.keystores.factory.class=org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory
      hadoop.ssl.server.conf=ssl-server.xml
      hadoop.ssl.client.conf=ssl-client.xml

    2. Set the following properties in ssl-server.xml. For example if you are using Ambari, set the properties as:

      ssl.server.truststore.location=/etc/security/serverKeys/truststore.jks
      ssl.server.truststore.password=serverTrustStorePassword
      ssl.server.truststore.type=jks
      ssl.server.keystore.location=/etc/security/serverKeys/keystore.jks
      ssl.server.keystore.password=serverStorePassPassword
      ssl.server.keystore.type=jks
      ssl.server.keystore.keypassword=serverKeyPassPassword

    3. Set the following properties in ssl-client.xml. For example if you are using Ambari, set the properties as:

      ssl.client.truststore.location=/etc/security/clientKeys/all.jks
      ssl.client.truststore.password=clientTrustStorePassword
      ssl.client.truststore.type=jks

    4. Set the following properties in hdfs-site.xml. For example if you are using Ambari, set the properties as:

      dfs.https.enable=true
       dfs.datanode.https.address=0.0.0.0:<DN_HTTPS_PORT> dfs.https.port=<NN_HTTPS_PORT> 
       dfs.namenode.https-address=<NN>:<NN_HTTPS_PORT> 

    5. Set the following properties in mapred-site.xml. For example if you are using Ambari, set the properties as:

        mapreduce.jobhistory.http.policy=HTTPS_ONLY
      mapreduce.jobhistory.webapp.https.address=<JHS>:<JHS_HTTPS_PORT> 

    6. Set the following properties in yarn-site.xml. For example if you are using Ambari, set the properties as:

      yarn.http.policy=HTTPS_ONLY
      yarn.log.server.url=https://<JHS>:<JHS_HTTPS_PORT>/jobhistory/logs
      yarn.resourcemanager.webapp.https.address=<RM>:<RM_HTTPS_PORT> 
      yarn.nodemanager.webapp.https.address=0.0.0.0:<NM_HTTPS_PORT>

  3. Enable Encrypted Shuffle by setting the follwing properties in mapred-site.xml. For example if you are using Ambari, set the properties as:

    mapreduce.shuffle.ssl.enabled=true
    mapreduce.shuffle.ssl.file.buffer.size=65536

    (The default buffer size is 65536. )

  4. Enable Encrypted RPC by setting the follwing properties in core-site.xml. For example if you are using Ambari, set the properties as:

    hadoop.rpc.protection=privacy

    (Also supported are the ‘authentication’ and ‘integrity’ settings.)

  5. Enable Encrypted DTP by setting the following properties in hdfs-site.xml. For example if you are using Ambari, set the properties as:

     dfs.encrypt.data.transfer=true
    dfs.encrypt.data.transfer.algorithm=3des

    (‘rc4’is also supported.)

    [Note]Note

    Secondary Namenode is not supported with the HTTPS port. It can only be accessed by “http://<SNN>:50090”. WebHDFS, hsftp, and shortcircuitread are not supported with SSL enabled.

  6. Integrate Oozie Hcatalog by adding following property to oozie-hcatalog job.properties. For example if you are using Ambari, set the properties as:

    hadoop.rpc.protection=privacy

    [Note]Note

    This property is in addition to any properties you must set for secure clusters.