Troubleshooting Kerberos Issues

This topic describes the steps you can take to investigate problems with Kerberos authentication. It contains some sample KDC configuration scripts that you can use to make sure your cluster was configured correctly. The following sections also have instructions on using the Kerberos command-line tools, kinit and klist, to investigate the KDC and cluster setup. Finally, you can use the instructions described below to enable debugging for Kerberos using either the command-line or Cloudera Manager.

Verifying Kerberos Configuration

When you're faced with a Kerberos-related issue, first try to pinpoint the cause of failure. A Kerberized deployment has several potential points of failure. These include the KDC itself, missing Kerberos or OS packages, incorrect mapping of Kerberos realms, among others. For example, you could start by investigating whether the issue is with a user with faulty credentials, or with the service that is failing to authenticate users. Another good starting point is to make sure that the Kerberos configuration files have been configured correctly and are being deployed consistently across all cluster hosts.

If the issue you are diagnosing is not already obvious to you, Cloudera recommends you begin with auditing your Kerberos deployment. Use this audit to confirm that you have followed the Kerberos steps as listed in the Cloudera Security Guide, and that your cluster has been configured correctly. Make sure you perform the following configuration checks:
  • Confirm that your /etc/hosts file conforms to Cloudera Manager's installation requirements. Verify forward and reverse name resolution for all cluster hosts, including the KDC hosts, MIT or AD.
  • Ensure the required Kerberos server and workstation packages based on the version of the OS you are using.
  • Check whether the hadoop.security.auth_to_local property in core-site.xml has the proper mappings for all trusted Kerberos realms, especially the HDFS trusted realms. Do this for every service that is using Kerberos.
  • Verify your Kerberos configuration using the sample krb5.conf and kdc.conf files provided below.
  • Review the configuration of all the KDC, REALM, and domain hosts referenced in the krb5.conf and kdc.conf files. The KDC host in particular, is a common point-of-failure and you may have to begin troubleshooting there. Ensure that the REALM set in krb5.conf has the correct hostname listed for the KDC. If you are using cross-realm authentication, see Reviewing Service Ticket Credentials in Cross Realm Deployments.
  • Check whether the services using Kerberos are running and responding properly with kinit/klist.
  • Attempt to authenticate to Cloudera Manager using cluster service credentials specific to the issue or affected service. Examine the issued credentials if you are able to successfully authenticate with the service keytab.
  • Use klist to list the principals present within a service keytab to ensure each service has one.
  • Enabling debugging using either the command line or Cloudera Manager.

Sample Kerberos Configuration Files

/etc/krb5.conf

The /etc/krb5.conf file is the configuration a client uses to access a realm through its configured KDC. The krb5.conf maps the realm to the available servers supporting those realms. It also defines the host-specific configuration rules for how tickets are requested and granted.

[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = EXAMPLE.COM
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
# udp_preference_limit = 1

# set udp_preference_limit = 1 when TCP only should be
# used. Consider using in complex network environments when
# troubleshooting or when dealing with inconsistent
# client behavior or GSS (63) messages.

# uncomment the following if AD cross realm auth is ONLY providing DES encrypted tickets
# allow-weak-crypto = true

[realms]
 AD-REALM.EXAMPLE.COM = {
  kdc = AD1.ad-realm.example.com:88
  kdc = AD2.ad-realm.example.com:88
  admin_server = AD1.ad-realm.example.com:749
  admin_server = AD2.ad-realm.example.com:749
  default_domain = ad-realm.example.com
 }
 EXAMPLE.COM = {
  kdc = kdc1.example.com:88
  admin_server = kdc1.example.com:749
  default_domain = example.com
 }

# The domain_realm is critical for mapping your host domain names to the kerberos realms
# that are servicing them. Make sure the lowercase left hand portion indicates any domains or subdomains
# that will be related to the kerberos REALM on the right hand side of the expression. REALMs will 
# always be UPPERCASE. For example, if your actual DNS domain was test.com but your kerberos REALM is
# EXAMPLE.COM then you would have,

[domain_realm]
test.com = EXAMPLE.COM
#AD domains and realms are usually the same
ad-domain.example.com = AD-REALM.EXAMPLE.COM  
ad-realm.example.com = AD-REALM.EXAMPLE.COM

/var/kerberos/krb5kdc

The kdc.conf file only needs to be configured on the actual cluster-dedicated KDC, and should be located at /var/kerberos/krb5kdc. Only primary and secondary KDCs need access to this configuration file. The contents of this file establish the configuration rules which are enforced for all client hosts in the REALM.

[kdcdefaults]
 kdc_ports = 88
 kdc_tcp_ports = 88

[realms]
  EXAMPLE.COM = {
  #master_key_type = aes256-cts
  max_renewable_life = 7d 0h 0m 0s
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
# note that aes256 is ONLY supported in Active Directory in a domain / forrest operating at a 2008 or greater functional level.
# aes256 requires that you download and deploy the JCE Policy files for your JDK release level to provide
# strong java encryption extension levels like AES256. Make sure to match based on the encryption configured within AD for
# cross realm auth, note that RC4 = arcfour when comparing windows and linux enctypes
  supported_enctypes = aes256-cts:normal aes128-cts:normal arcfour-hmac:normal
  default_principal_flags = +renewable, +forwardable
 }

kadm5.acl

*/admin@HADOOP.COM *
cloudera-scm@HADOOP.COM * flume/*@HADOOP.COM
cloudera-scm@HADOOP.COM * hbase/*@HADOOP.COM
cloudera-scm@HADOOP.COM * hdfs/*@HADOOP.COM
cloudera-scm@HADOOP.COM * hive/*@HADOOP.COM
cloudera-scm@HADOOP.COM * httpfs/*@HADOOP.COM
cloudera-scm@HADOOP.COM * HTTP/*@HADOOP.COM
cloudera-scm@HADOOP.COM * hue/*@HADOOP.COM
cloudera-scm@HADOOP.COM * impala/*@HADOOP.COM
cloudera-scm@HADOOP.COM * mapred/*@HADOOP.COM
cloudera-scm@HADOOP.COM * oozie/*@HADOOP.COM
cloudera-scm@HADOOP.COM * solr/*@HADOOP.COM
cloudera-scm@HADOOP.COM * sqoop/*@HADOOP.COM
cloudera-scm@HADOOP.COM * yarn/*@HADOOP.COM
cloudera-scm@HADOOP.COM * zookeeper/*@HADOOP.COM

Authenticate to Kerberos using the kinit command line tool

The kinit command line tool is used to authenticate a user, service, system, or device to a KDC. The most basic example is a user authenticating to Kerberos with a username (principal) and password. In the following example, the first attempt uses a wrong password, followed by a second successful attempt.

[alice@host1 ~]$ kinit alice@TEST.ORG.LAB
Password for alice@TEST.ORG.LAB: (wrong password)
kinit: Preauthentication failed while getting initial credentials

[alice@host1 ~]$ kinit alice@TEST.ORG.LAB
Password for alice@TEST.ORG.LAB: (correct password)
(note silent return on successful auth)
[alice@host1 ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_10001
Default principal: alice@TEST.ORG.LAB

Valid starting     Expires            Service principal
03/11/14 11:55:39  03/11/14 21:54:55  krbtgt/TEST.ORG.LAB@TEST.ORG.LAB
renew until 03/18/14 11:55:39

Another method of authentication is using keytabs with the kinit command. You can verify whether authentication was successful by using the klist command to show the credentials issued by the KDC. The following example attempts to authenticate the hdfs service to the KDC by using the hdfs keytab file.

[root@host1 312-hdfs-DATANODE]# kinit -kt hdfs.keytab hdfs/host1.test.lab@TEST.LAB
[root@host1 312-hdfs-DATANODE]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hdfs/host1.test.lab@TEST.LAB

Valid starting      Expires      Service principal
03/11/14 11:18:34 03/12/14 11:18:34 krbtgt/TEST.LAB@TEST.LAB
renew until 03/18/14 11:18:34

Troubleshooting using service keytabs maintained by Cloudera Manager

Every service managed by Cloudera Manager has a keytab file that is provided at startup by the Cloudera Manager Agent. The most recent keytab files can be examined by navigating to the path, /var/run/cloudera-scm-agent/process, with an ls -ltr command.

As you can see in the example below, Cloudera Manager service directory names have the form: ###-service-ROLE. Therefore, if you are troubleshooting the HDFS service, the service directory may be called, 326-hdfs-NAMENODE.
[root@cehd1 ~]# cd /var/run/cloudera-scm-agent/process/
[root@cehd1 process]# ls -ltr | grep NAMENODE | tail -3
drwxr-x--x 3 hdfs         hdfs         4096 Mar  3 23:43 313-hdfs-NAMENODE
drwxr-x--x 3 hdfs         hdfs         4096 Mar  4 00:07 326-hdfs-NAMENODE
drwxr-x--x 3 hdfs         hdfs         4096 Mar  4 00:07 328-hdfs-NAMENODE-nnRpcWait

[root@cehd1 process]# cd 326-hdfs-NAMENODE

[root@cehd1 326-hdfs-NAMENODE]# ls
cloudera_manager_agent_fencer.py              dfs_hosts_allow.txt         hdfs.keytab                 log4j.properties             topology.py
cloudera_manager_agent_fencer_secret_key.txt  dfs_hosts_exclude.txt       hdfs-site.xml               logs
cloudera-monitor.properties                   event-filter-rules.json     http-auth-signature-secret  navigator.client.properties
core-site.xml                                 hadoop-metrics2.properties  krb5cc_494                  topology.map

If you have root access to the /var/run/cloudera-scm-agent/process path, you can use any service's keytab file to log in as root or a sudo user to verify whether basic Kerberos authentication is working.

Once you have located a service keytab file, examine its contents using the klist command (more on this, later). The klist command can show you the credentials stored in a keytab file. For example, to list the credentials stored in the hdfs.keytab file, use the following command:

[root@host1 326-hdfs-DATANODE]# klist -kt hdfs.keytab

Keytab name: WRFILE:hdfs.keytab
KVNO Timestamp         Principal
---- ----------------- --------------------------------------------------------
  4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB
  4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB
  4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB
  4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB
  4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB
  4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB
  4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB
  4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB
  4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB
  4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB
  4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB
  4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB

Now, attempt to authenticate using the keytab file and a principal within it. In this case, we use the hdfs.keytab file with the hdfs/host1.test.lab@TEST.LAB principal. Then use the klist command without any arguments to see the current user session's credentials.

root@host1 312-hdfs-DATANODE]# kinit -kt hdfs.keytab hdfs/host1.test.lab@TEST.LAB
[root@host1 312-hdfs-DATANODE]# klist

Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hdfs/host1.test.lab@TEST.LAB

Valid starting     Expires            Service principal
03/11/14 11:18:34  03/12/14 11:18:34  krbtgt/TEST.LAB@TEST.LAB
renew until 03/18/14 11:18:34

Note that Kerberos credentials have an expiry date and time. This means, to make sure Kerberos credentials are valid uniformly over a cluster, all hosts and clients within the cluster should be using NTP and must never drift more than 5 minutes apart from each other. Kerberos session tickets have a limited lifespan, but can be renewed (as indicated in the sample krb5.conf and kdc.conf). CDH requires renewable tickets for cluster principals. Check whether renewable tickets have been enabled by using a klist command with the -e (list key encryption types) and -f (list flags set) switches when examining Kerberos sessions and credentials.

Examining Kerberos credentials with klist

So far we've only seen basic usage examples of the klist command to list the contents of a keytab file, or to examine a user's credentials. To get more information from the klist command, such as the encryption types being negotiated, or the flags being set for credentials being issued by the KDC, use the klist -ef command. The output for this command will show you the negotiated encryption types for a user or service principal. This is useful information because you may troubleshoot errors caused (especially in cross-realm trust deployments) because an AD or MIT KDC server may not support a particular encryption type. Look for the encryption types under the "Etype" section of the output.

Flags indicate options supported by Kerberos that extend the features of a set of issued credentials. As discussed previously, CDH requires renewable as well as forwardable tickets for successful authentication, especially in cross realm environments. Look for these settings in the "Flags:" section of the klist -ef output shown below where, F = Forwardable, and, R = renewable.

For example, if you use the klist -ef command in an ongoing user session:

[alice@host1 ~]$ klist -ef
Ticket cache: FILE:/tmp/krb5cc_10001
Default principal: alice@TEST.ORG.LAB 
Valid starting Expires Service principal
03/11/14 11:55:39 03/11/14 21:54:55 krbtgt/TEST.ORG.LAB@TEST.ORG.LAB
renew until 03/18/14 11:55:39, Flags: FRIA 
Etype (skey, tkt): aes256-cts-hmac-sha1-96,
    aes256-cts-hmac-sha1-96

Reviewing Service Ticket Credentials in Cross Realm Deployments

When you examine your cluster configuration, make sure you haven't violated any of following the integration rules:

  • When negotiating encryption types, follow the realm with the most specific limitations on supported encryption types.
  • All realms should be known to one another through the /etc/krb5.conf file deployed on the cluster.
  • When you make configuration decisions for Active Directory environments, you must evaluate the Domain Functional Level or Forrest Functional Level that is present.

Kerberos will typically negotiate the strongest form of encryption possible between a client and server for authentication into the realm. However, the encryption types for TGTs may sometimes end up being negotiated downward towards the weaker encryption types, which is not desirable. To investigate such issues, check the kvno of the cross-realm trust principal (krbtgt) as described in the following steps. Replace CLUSTER.REALM and AD.REALM (or MIT.REALM) with the appropriate values for your configured realm. This scenario assumes cross-realm authentication with Active Directory.

  1. Once trust has been configured (see sample files in previous section), kinit as a system user by authenticating to the AD Kerberos realm.
  2. From the command line, perform a kvno check of the local and cross-realm krbtgt entry. The local representation of this special REALM service principal is in the form, krbtgt/CLUSTER.REALM@CLUSTER.REALM. The cross-realm principal is named after the trusted realm in the form, krbtgt/AD.REALM.

    If the kvno check fails, this means cross-realm trust was not set up correctly. Once again review the encryption types in use to make sure there are no incompatibilities or unsupported encryption types being used across realms.

Enabling Debugging in Cloudera Manager for CDH Services

The following instructions are specific to a Cloudera Manager managed-HDFS service and must be modified based on the Kerberized service you are troubleshooting.

  1. Go to the Cloudera Manager Admin Console and navigate to the HDFS service.
  2. Click Configuration.
  3. Search for properties specific to the different role types for which you want to enable debugging. For example, if you want to enable debugging for the HDFS NameNode, search for the NameNode Logging Threshold property and select at least DEBUG level logging.
  4. Enable Kerberos debugging by using the HDFS service's Advanced Configuration Snippet. Once again, this may be different for each specific role type or service. For the HDFS NameNode, add the following properties to the HDFS Service Environment Safety Valve:
    HADOOP_JAAS_DEBUG=true
    HADOOP_OPTS="-Dsun.security.krb5.debug=true"
  5. Click Save Changes.
  6. Restart the HDFS service.

The output will be seen in the process logs: stdout.log and stderr.log. These can be found in the runtime path of the instance: /var/run/cloudera-scm-agent/process/###-service-ROLE. Once Cloudera Manager services have been restarted, the most recent instance of the ###-service-ROLE directory will have debug logs. Use ls -ltr in the /var/run/cloudera-scm-agent/process path to determine the most current path.

Enabling Debugging for Command Line Troubleshooting

Set the following properties in your environment to produce detailed debugging output of the Kerberos authentication process.

# export HADOOP_ROOT_LOGGER=TRACE,console; export HADOOP_JAAS_DEBUG=true; export HADOOP_OPTS="-Dsun.security.krb5.debug=true"

You can then use the following command to copy the console output to the user (with the debugging), along with all output from STDOUT and STDERR to a file.

# hadoop fs -ls / > >(tee fsls-logfile.txt) 2>&1