Configuring the Key Management Server (KMS)

Hadoop KMS is a cryptographic key management server based on Hadoop's KeyProvider API. It provides a client which is a KeyProvider implementation that interacts with the KMS using the HTTP REST API. Both the KMS and its client support HTTP SPNEGO Kerberos authentication and SSL-secured communication. The KMS is a Java-based web application which runs using a pre-configured Tomcat server bundled with the Hadoop distribution.

For instructions on securing the KMS, see Securing the Key Management Server (KMS).

Setup Configuration

KeyProvider Configuration

Configure the KMS backing KeyProvider properties in the etc/hadoop/kms-site.xml configuration file:
<property>
    <name>hadoop.kms.key.provider.uri</name>
    <value>jceks://file@/${user.home}/kms.keystore</value>
</property>

<property>
    <name>hadoop.security.keystore.java-keystore-provider.password-file</name>
    <value>kms.keystore.password</value>
</property>

The password file is looked up in Hadoop's configuration directory using CLASSPATH.

Restart the KMS for configuration changes to take effect.

KMS Cache

KMS caches keys for short periods of time to avoid excessive hits to the underlying key provider. The cache is enabled by default and can be disabled by setting the hadoop.kms.cache.enable property to false.

The cache is used with the following methods only: getCurrentKey(), getKeyVersion() and getMetadata().

For the getCurrentKey() method, cached entries are kept for a maximum of 30000 milliseconds regardless of the number of times the key is accessed. This is to prevent stale keys from being considered current.

For the getKeyVersion() method, cached entries are kept with a default inactivity timeout of 600000 milliseconds (10 minutes). The cache and its timeout value is configurable using the following properties in the /etc/hadoop/kms-site.xml configuration file:
<property>
    <name>hadoop.kms.cache.enable</name>
    <value>true</value>
</property>

<property>
    <name>hadoop.kms.cache.timeout.ms</name>
    <value>600000</value>
</property>

<property>
    <name>hadoop.kms.current.key.cache.timeout.ms</name>
    <value>30000</value>
</property>

KMS Client Configuration

The KMS client KeyProvider uses the kms scheme, and the embedded URL must be the URL of the KMS.

For example, for a KMS running on http://localhost:16000/kms, the KeyProvider URI is kms://http@localhost:16000/kms. And for a KMS running on https://localhost:16000/kms, the KeyProvider URI is kms://https@localhost:16000/kms.

Starting/Stopping the KMS

To start or stop KMS use KMS's bin/kms.sh script. For example, to start the KMS:
hadoop-3.0.0 $ sbin/kms.sh start

Invoking the script without any parameters will list all possible parameters.

KMS Aggregated Audit logs

Audit logs are aggregated for API accesses to the GET_KEY_VERSION, GET_CURRENT_KEY, DECRYPT_EEK, and GENERATE_EEK operations.

Entries are grouped by the <user,key,operation> for a configurable aggregation interval after which the number of accesses to the specified end-point by the user for a given key is flushed to the audit log.

The aggregation interval is configured using the following property:
<property>
    <name>hadoop.kms.aggregation.delay.ms</name>
    <value>10000</value>
</property>

Configuring the Embedded Tomcat Server

The embedded Tomcat server can be configured using the share/hadoop/kms/tomcat/conf file. KMS pre-configures the HTTP and Admin ports in Tomcat's server.xml to 16000 and 16001. Tomcat logs are also preconfigured to go to Hadoop's logs/ directory.

The following environment variables can be set in KMS's etc/hadoop/kms-env.sh script and can be used to alter the default ports and log directory:

  • KMS_HTTP_PORT
  • KMS_ADMIN_PORT
  • KMS_LOG

Restart the KMS for the configuration changes to take effect.

Configuring KMS High Availability/Multiple KMSs

KMS supports multiple KMS instances behind a load balancer or VIP for scalability and HA purposes. These instances must be specially configured to work properly as a single logical service. When using multiple KMS instances, requests from the same user may be handled by different KMS instances.

HTTP Kerberos Principals Configuration

When KMS instances are behind a load balancer or VIP, clients will use the hostname of the VIP. For Kerberos SPNEGO authentication, the VIP hostname is used to construct the Kerberos principal for the server, HTTP/<FQDN-VIP>. This means for client communication, all KMS instances must have the load balancer or VIP's principal.

However, in order to allow clients to directly access a specific KMS instance, the KMS instance must also have a Kerberos principal with its own hostname.

Both the Kerberos service principals (for the load balancer/VIP and the actual KMS host) must be in the keytab file. The principal name specified in the configuration must be '*' as follows:
<property>
    <name>hadoop.kms.authentication.kerberos.principal</name>
    <value>*</value>
</property>

If using HTTPS, the SSL certificate used by the KMS instance must be configured to support multiple hostnames (see Java 7 keytool SAN extension support for details on how to do this).

HTTP Authentication Signature

KMS uses Hadoop Authentication for HTTP authentication. Hadoop Authentication issues a signed HTTP Cookie once a client has been authenticated successfully. This HTTP Cookie has an expiration time, after which it triggers a new authentication sequence. This is done to avoid requiring authentication for every HTTP request of a client.

A KMS instance must verify the HTTP Cookie signatures signed by other KMS instances. To do this all KMS instances must share the signing secret which can be configured by the hadoop.kms.authentication.signer.secret.provider property.

This secret can be shared using a Zookeeper service which must be configured in the kms-site.xml:
<property>
    <name>hadoop.kms.authentication.signer.secret.provider</name>
    <value>zookeeper</value>
    <description>
      Indicates how the secret to sign the authentication cookies will be
      stored. Options are 'random' (default), 'string' and 'zookeeper'.
      If using a setup with multiple KMS instances, 'zookeeper' should be used.
    </description>
</property>

<property>
    <name>hadoop.kms.authentication.signer.secret.provider.zookeeper.path</name>
    <value>/hadoop-kms/hadoop-auth-signature-secret</value>
    <description>
      The Zookeeper ZNode path where the KMS instances will store and retrieve
      the secret from.
    </description>
</property>

<property>
    <name>hadoop.kms.authentication.signer.secret.provider.zookeeper.connection.string</name>
    <value>#HOSTNAME#:#PORT#,...</value>
    <description>
      The Zookeeper connection string, a list of hostnames and port comma
      separated.
    </description>
</property>

<property>
    <name>hadoop.kms.authentication.signer.secret.provider.zookeeper.auth.type</name>
    <value>kerberos</value>
    <description>
      The Zookeeper authentication type, 'none' or 'sasl' (Kerberos).
    </description>
</property>

<property>
    <name>hadoop.kms.authentication.signer.secret.provider.zookeeper.kerberos.keytab</name>
    <value>/etc/hadoop/conf/kms.keytab</value>
    <description>
      The absolute path for the Kerberos keytab with the credentials to
      connect to Zookeeper.
    </description>
</property>
  
<property>
    <name>hadoop.kms.authentication.signer.secret.provider.zookeeper.kerberos.principal</name>
    <value>kms/#HOSTNAME#</value>
    <description>
      The Kerberos service principal used to connect to Zookeeper.
    </description>
</property>