Authentication and Kerberos Issues
- Failure of the Key Distribution Center (KDC)
- Missing Kerberos or OS packages or libraries
- Incorrect mapping of Kerberos REALMs for cross-realm authentication
- Is the issue a local issue or a global issue? That is, are all users failing to authenticate, or is the issue specific to a single user?
- Is the issue specific to a single service, or are all services problematic? and so on.
If all users and multiple services are affected—and if the cluster has not worked at all after integrating with Kerberos for authentication—step through all settings for the Kerberos configuration files, as outlined in Auditing the Kerberos Configuration.
Continue reading:
- Auditing the Kerberos Configuration
- Kerberos Command-Line Tools
- Enabling Debugging for Authentication Issues
- Kerberos Credential-Generation Issues
- Hadoop commands fail after enabling Kerberos security
- Using the UserGroupInformation class to authenticate Oozie
- Certain Java versions cannot read credentials cache
- Resolving Cloudera Manager Service keytab Issues
- Reviewing Service Ticket Credentials in Cross-Realm Deployments
- Sample Kerberos Configuration Files
Auditing the Kerberos Configuration
- Verify that all /etc/hosts files conform to Cloudera Manager's installation requirements.
- Verify forward and reverse name resolution for all cluster hosts and for the MIT KDC or Active Directory KDC hosts.
- Verify that all required Kerberos server and workstation packages have been installed and are the correct versions for the OS running on the host systems.
- Verify that the hadoop.security.auth_to_local property in the core-site.xml has proper mappings for all trusted Kerberos realms, including HDFS trusted realms, for all services on the cluster that use Kerberos.
- Verify your Kerberos configuration by comparing to the Sample Kerberos Configuration Files shown below (see krb5.conf and kdc.conf).
- Review the configuration of all the KDC, REALM, and domain hosts referenced in the krb5.conf and kdc.conf files. The KDC host in particular, is a common point-of-failure and you may have to begin troubleshooting there. Ensure that the REALM set in krb5.conf has the correct hostname listed for the KDC. For cross-realm authentication, see Reviewing Service Ticket Credentials in Cross-Realm Deployments.
- Use whether the services using Kerberos are running and responding properly with kinit/klist.
- Attempt to authenticate to Cloudera Manager using cluster service credentials specific to the issue or affected service. Examine the issued credentials if you are able to successfully authenticate with the service keytab.
- Use klist to list the principals present within a service keytab to ensure each service has one.
- Enabling debugging using either the command line or Cloudera Manager.
Kerberos Command-Line Tools
User Authentication with and Without Keytab
The kinit command line tool is used to authenticate a user, service, system, or device to a KDC. The most basic example is a user authenticating to Kerberos with a username (principal) and password. In the following example, the first attempt uses a wrong password, followed by a second successful attempt.
[alice@host1 ~]$ kinit alice@TEST.ORG.LAB Password for alice@TEST.ORG.LAB: (wrong password) kinit: Preauthentication failed while getting initial credentials [alice@host1 ~]$ kinit alice@TEST.ORG.LAB Password for alice@TEST.ORG.LAB: (correct password) (note silent return on successful auth) [alice@host1 ~]$ klist Ticket cache: FILE:/tmp/krb5cc_10001 Default principal: alice@TEST.ORG.LAB Valid starting Expires Service principal 03/11/14 11:55:39 03/11/14 21:54:55 krbtgt/TEST.ORG.LAB@TEST.ORG.LAB renew until 03/18/14 11:55:39
Another method of authentication is using keytabs with the kinit command. You can verify whether authentication was successful by using the klist command to show the credentials issued by the KDC. The following example attempts to authenticate the hdfs service to the KDC by using the hdfs keytab file.
[root@host1 312-hdfs-DATANODE]# kinit -kt hdfs.keytab hdfs/host1.test.lab@TEST.LAB [root@host1 312-hdfs-DATANODE]# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: hdfs/host1.test.lab@TEST.LAB Valid starting Expires Service principal 03/11/14 11:18:34 03/12/14 11:18:34 krbtgt/TEST.LAB@TEST.LAB renew until 03/18/14 11:18:34
Examining Kerberos credentials with klist
So far we've only seen basic usage examples of the klist command to list the contents of a keytab file, or to examine a user's credentials. To get more information from the klist command, such as the encryption types being negotiated, or the flags being set for credentials being issued by the KDC, use the klist -ef command. The output for this command will show you the negotiated encryption types for a user or service principal. This is useful information because you may troubleshoot errors caused (especially in cross-realm trust deployments) because an AD or MIT KDC server may not support a particular encryption type. Look for the encryption types under the "Etype" section of the output.
Flags indicate options supported by Kerberos that extend the features of a set of issued credentials. As discussed previously, CDH requires renewable as well as forwardable tickets for successful authentication, especially in cross realm environments. Look for these settings in the "Flags:" section of the klist -ef output shown below where, F = Forwardable, and, R = renewable.
For example, if you use the klist -ef command in an ongoing user session:
[alice@host1 ~]$ klist -ef Ticket cache: FILE:/tmp/krb5cc_10001 Default principal: alice@TEST.ORG.LAB Valid starting Expires Service principal 03/11/14 11:55:39 03/11/14 21:54:55 krbtgt/TEST.ORG.LAB@TEST.ORG.LAB renew until 03/18/14 11:55:39, Flags: FRIA Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
Enabling Debugging for Authentication Issues
Using Cloudera Manager for Debugging
To obtain additional information in the logs and facilitate troubleshooting, administrators can set debug levels for any of the services running on Cloudera Manager Server. Typically, the settings are added using the Advanced Configuration Snippet (Safety Valve) settings for the specific service, the names are specific to the service.
as for HDFS as detailed below:
- Log in to the Cloudera Manager Admin Console.
- Select .
- Click the Configuration tab.
- Search for properties specific to the different role types for which you want to enable debugging. For example, if you want to enable debugging for the HDFS NameNode, search for the NameNode Logging Threshold property and select at least DEBUG level logging.
- Enable Kerberos debugging by using the HDFS service's Advanced Configuration Snippet. Once again, this may be different for each specific role type or service. For the HDFS NameNode,
add the following properties to the HDFS Service Environment Safety Valve:
HADOOP_JAAS_DEBUG=true HADOOP_OPTS="-Dsun.security.krb5.debug=true"
- Click Save Changes.
- Restart the HDFS service.
The output will be seen in the process logs: stdout.log and stderr.log. These can be found in the runtime path of the instance:
/var/run/cloudera-scm-agent/process/###-service-ROLE
After restarting Cloudera Manager Service, the most recent instance of the ###-service-ROLE directory will have debug logs. Use ls -ltr in the /var/run/cloudera-scm-agent/process path to determine the most current path.
Enabling Debugging for the Authentication Process
Set the following properties on the cluster to obtain debugging information from the Kerberos authentication process.
# export HADOOP_ROOT_LOGGER=TRACE,console; # export HADOOP_JAAS_DEBUG=true; # export HADOOP_OPTS="-Dsun.security.krb5.debug=true"
You can then use the following command to copy the console output to the user (with debugging messages), along with all output from STDOUT and STDERR to a file.
# hadoop fs -ls / > >(tee fsls-logfile.txt) 2>&1
Kerberos Credential-Generation Issues
Cloudera Manager creates accounts needed by CDH services using an internal command (Generate Credentials) that is triggered automatically by the Kerberos configuration wizard or when changes are made to the cluster that require new Kerberos principals.
- Log in to the Cloudera Manager Admin Console. Any error messages display on the Home page, in the Status area near
the top of the page. The following Status message indicates that the Generate Credentials command failed:
Role is missing Kerberos keytab
- To display the output of the command, go to the All Recent Commands tab. tab and click the
Active Directory Credential-Generation Errors
Error: ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
Possible cause: The Domain Controller specified is incorrect or LDAPS has not been enabled for it.
- Log in to Cloudera Manager Admin Console.
- Select .
- Select Kerberos for the Category filter.
Error: ldap_add: Insufficient access (50)
Possible cause: The Active Directory account you are using for Cloudera Manager does not have permissions to create other accounts.
Steps to resolve: Use the Delegate Control wizard to grant permission to the Cloudera Manager account to create other accounts. You can also login to Active Directory as the Cloudera Manager user to check that it can create other accounts in your Organizational Unit.
MIT Kerberos Credential-Generation Errors
Error: kadmin: Cannot resolve network address for admin server in requested realm while initializing kadmin interface.
Possible cause: The hostname for the KDC server is incorrect.
Steps to resolve: Check the kdc field for your default realm in krb5.conf and make sure the hostname is correct.
Hadoop commands fail after enabling Kerberos security
Users need to obtain valid Kerberos tickets to interact with a secure cluster, that is, a cluster that has been configured to use Kerberos for authentication. Running any Hadoop command (such as hadoop fs -ls) will fail if you do not have a valid Kerberos ticket in your credentials cache. If you do not have a valid ticket, you will receive an error such as:
11/01/04 12:08:12 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] Bad connection to FS. command aborted. exception: Call to nn-host/10.0.0.2:8020 failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
Steps to resolve: Examine the Kerberos tickets currently in your credentials cache by running the klist command. You can obtain a ticket by running the kinit command and either specifying a keytab file containing credentials, or entering the password for your principal.
Using the UserGroupInformation class to authenticate Oozie
Secured CDH services mainly use Kerberos to authenticate RPC communication. RPCs are one of the primary means of communication between nodes in a Hadoop cluster. For example, RPCs are used by the YARN NodeManager to communicate with the ResourceManager, or by the HDFS client to communicate with the NameNode.
CDH services handle Kerberos authentication by calling the UserGroupInformation (UGI) login method, loginUserFromKeytab(), once every time the service starts up. Since Kerberos ticket expiration times are typically short, repeated logins are required to keep the application secure. Long-running CDH applications have to be implemented accordingly to accommodate these repeated logins. If an application is only going to communicate with HDFS, YARN, MRv1, and HBase, then you only need to call the UserGroupInformation.loginUserFromKeytab() method at startup, before any actual API activity occurs. The HDFS, YARN, MRv1 and HBase services' RPC clients have their own built-in mechanisms to automatically re-login when a keytab's Ticket-Granting Ticket (TGT) expires. Therefore, such applications do not need to include calls to the UGI re-login method because their RPC client layer performs the re-login task for them.
However, some applications may include other service clients that do not involve the generic Hadoop RPC framework, such as Hive or Oozie clients. Such applications must explicitly call the UserGroupInformation.getLoginUser().checkTGTAndReloginFromKeytab() method before every attempt to connect with a Hive or Oozie client. This is because these clients do not have the logic required for automatic re-logins.
This is an example of an infinitely polling Oozie client application:
// App startup UserGroupInformation.loginFromKeytab(KEYTAB_PATH, PRINCIPAL_STRING); OozieClient client = loginUser.doAs(new PrivilegedAction<OozieClient>() { public OozieClient run() { try { returnnew OozieClient(OOZIE_SERVER_URI); } catch (Exception e) { e.printStackTrace(); returnnull; } } }); while (true && client != null) { // Application's long-running loop // Every time, complete the TGT check first UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); loginUser.checkTGTAndReloginFromKeytab(); // Perform Oozie client work within the context of the login user object loginUser.doAs(new PrivilegedAction<Void>() { publicVoid run() { try { List<WorkflowJob> list = client.getJobsInfo(""); for (WorkflowJob wfJob : list) { System.out.println(wfJob.getId()); } } catch (Exception e) { e.printStackTrace(); } } // End of function block }); // End of doAs } // End of loop
Certain Java versions cannot read credentials cache
11/01/04 12:08:12 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] Bad connection to FS. command aborted. exception: Call to nn-host/10.0.0.2:8020 failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
Possible cause:
At release 1.8.1 of MIT Kerberos, a change was made to the credentials cache format that conflicts with Oracle JDK 6 Update 26 (and earlier JDKs) rendering Java incapable of reading Kerberos credentials cache created by versions of MIT Kerberos 1.8.1 (or higher). Kerberos 1.8.1 is the default in Ubuntu Lucid and higher releases and Debian Squeeze and higher releases. On RHEL and CentOS, an older version of MIT Kerberos which does not have this issue, is the default.
Workaround: Use the -R (renew) option with kinit after initially obtaining credentials with kinit. This sequence causes the ticket to be renewed and credentials are cached using a format that Java can read. However, the initial ticket must be renewable.
For example:
$ klist klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_1000) $ hadoop fs -ls 11/01/04 13:15:51 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] Bad connection to FS. command aborted. exception: Call to nn-host/10.0.0.2:8020 failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] $ kinit Password for username@REALM-NAME.EXAMPLE.COM: $ klist Ticket cache: FILE:/tmp/krb5cc_1000 Default principal: username@REALM-NAME.EXAMPLE.COM Valid starting Expires Service principal 01/04/11 13:19:31 01/04/11 23:19:31 krbtgt/REALM-NAME.EXAMPLE.COM@REALM-NAME.EXAMPLE.COM renew until 01/05/11 13:19:30 $ hadoop fs -ls 11/01/04 13:15:59 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] Bad connection to FS. command aborted. exception: Call to nn-host/10.0.0.2:8020 failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] $ kinit -R $ hadoop fs -ls Found 6 items drwx------ - user user 0 2011-01-02 16:16 /user/user/.staging
kinit: Ticket expired while renewing credentials
Resolving Cloudera Manager Service keytab Issues
Every service managed by Cloudera Manager has a keytab file that is provided at startup by the Cloudera Manager Agent. The most recent keytab files can be examined by navigating to the path, /var/run/cloudera-scm-agent/process, with an ls -ltr command.
[root@cehd1 ~]# cd /var/run/cloudera-scm-agent/process/ [root@cehd1 process]# ls -ltr | grep NAMENODE | tail -3 drwxr-x--x 3 hdfs hdfs 4096 Mar 3 23:43 313-hdfs-NAMENODE drwxr-x--x 3 hdfs hdfs 4096 Mar 4 00:07 326-hdfs-NAMENODE drwxr-x--x 3 hdfs hdfs 4096 Mar 4 00:07 328-hdfs-NAMENODE-nnRpcWait [root@cehd1 process]# cd 326-hdfs-NAMENODE [root@cehd1 326-hdfs-NAMENODE]# ls cloudera_manager_agent_fencer.py dfs_hosts_allow.txt hdfs.keytab log4j.properties topology.py cloudera_manager_agent_fencer_secret_key.txt dfs_hosts_exclude.txt hdfs-site.xml logs cloudera-monitor.properties event-filter-rules.json http-auth-signature-secret navigator.client.properties core-site.xml hadoop-metrics2.properties krb5cc_494 topology.map
If you have root access to the /var/run/cloudera-scm-agent/process path, you can use any service's keytab file to log in as root or a sudo user to verify whether basic Kerberos authentication is working.
[root@host1 326-hdfs-DATANODE]# klist -kt hdfs.keytab Keytab name: WRFILE:hdfs.keytab KVNO Timestamp Principal ---- ----------------- -------------------------------------------------------- 4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB 4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB 4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB 4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB 4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB 4 02/17/14 19:09:17 HTTP/host1.test.lab@TEST.LAB 4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB 4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB 4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB 4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB 4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB 4 02/17/14 19:09:17 hdfs/host1.test.lab@TEST.LAB
Now, attempt to authenticate using the keytab file and a principal within it. In this case, we use the hdfs.keytab file with the hdfs/host1.test.lab@TEST.LAB principal. Then use the klist command without any arguments to see the current user session's credentials.
root@host1 312-hdfs-DATANODE]# kinit -kt hdfs.keytab hdfs/host1.test.lab@TEST.LAB [root@host1 312-hdfs-DATANODE]# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: hdfs/host1.test.lab@TEST.LAB Valid starting Expires Service principal 03/11/14 11:18:34 03/12/14 11:18:34 krbtgt/TEST.LAB@TEST.LAB renew until 03/18/14 11:18:34
Note that Kerberos credentials have an expiry date and time. This means, to make sure Kerberos credentials are valid uniformly over a cluster, all hosts and clients within the cluster should be using NTP and must never drift more than 5 minutes apart from each other. Kerberos session tickets have a limited lifespan, but can be renewed (as indicated in the sample krb5.conf and kdc.conf). CDH requires renewable tickets for cluster principals. Check whether renewable tickets have been enabled by using a klist command with the -e (list key encryption types) and -f (list flags set) switches when examining Kerberos sessions and credentials.
Reviewing Service Ticket Credentials in Cross-Realm Deployments
When you examine your cluster configuration, make sure you haven't violated any of following the integration rules:
- When negotiating encryption types, follow the realm with the most specific limitations on supported encryption types.
- All realms should be known to one another through the /etc/krb5.conf file deployed on the cluster.
- When you make configuration decisions for Active Directory environments, you must evaluate the Domain Functional Level or Forrest Functional Level that is present.
Kerberos typically negotiates and uses the strongest form of encryption possible between a client and server for authentication into the realm. However, the encryption types for TGTs may sometimes end up being negotiated downward towards the weaker encryption types, which is not desirable. To investigate such issues, check the kvno of the cross-realm trust principal (krbtgt) as described in the following steps. Replace CLUSTER.REALM and AD.REALM (or MIT.REALM) with the appropriate values for your configured realm. This scenario assumes cross-realm authentication with Active Directory.
- Once trust has been configured (see sample files in previous section), kinit as a system user by authenticating to the AD Kerberos realm.
- From the command line, perform a kvno check of the local and cross-realm krbtgt entry. The local representation of this special REALM service principal is in the form, krbtgt/CLUSTER.REALM@CLUSTER.REALM. The cross-realm principal is named after the trusted realm in the form, krbtgt/AD.REALM.
- Failure of the kvno check indicates incorrect cross-realm trust configuration. Review encryption types again, looking for incompatibilities or unsupported encryption types configured between realms.
Sample Kerberos Configuration Files
This section contains several example Kerberos configuration files.
Continue reading:
/etc/krb5.conf
The /etc/krb5.conf file is the configuration a client uses to access a realm through its configured KDC. The krb5.conf maps the realm to the available servers supporting those realms. It also defines the host-specific configuration rules for how tickets are requested and granted.
[logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] default_realm = EXAMPLE.COM dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true # udp_preference_limit = 1 # set udp_preference_limit = 1 when TCP only should be # used. Consider using in complex network environments when # troubleshooting or when dealing with inconsistent # client behavior or GSS (63) messages. # uncomment the following if AD cross realm auth is ONLY providing DES encrypted tickets # allow-weak-crypto = true [realms] AD-REALM.EXAMPLE.COM = { kdc = AD1.ad-realm.example.com:88 kdc = AD2.ad-realm.example.com:88 admin_server = AD1.ad-realm.example.com:749 admin_server = AD2.ad-realm.example.com:749 default_domain = ad-realm.example.com } EXAMPLE.COM = { kdc = kdc1.example.com:88 admin_server = kdc1.example.com:749 default_domain = example.com } # The domain_realm is critical for mapping your host domain names to the kerberos realms # that are servicing them. Make sure the lowercase left hand portion indicates any domains or subdomains # that will be related to the kerberos REALM on the right hand side of the expression. REALMs will # always be UPPERCASE. For example, if your actual DNS domain was test.com but your kerberos REALM is # EXAMPLE.COM then you would have, [domain_realm] test.com = EXAMPLE.COM #AD domains and realms are usually the same ad-domain.example.com = AD-REALM.EXAMPLE.COM ad-realm.example.com = AD-REALM.EXAMPLE.COM
/var/kerberos/krb5kdc/kdc.conf
The kdc.conf file only needs to be configured on the actual cluster-dedicated KDC, and should be located at /var/kerberos/krb5kdc. Only primary and secondary KDCs need access to this configuration file. The contents of this file establish the configuration rules which are enforced for all client hosts in the REALM.
[kdcdefaults] kdc_ports = 88 kdc_tcp_ports = 88 [realms] EXAMPLE.COM = { #master_key_type = aes256-cts max_renewable_life = 7d 0h 0m 0s acl_file = /var/kerberos/krb5kdc/kadm5.acl dict_file = /usr/share/dict/words admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab # note that aes256 is ONLY supported in Active Directory in a domain / forrest operating at a 2008 or greater functional level. # aes256 requires that you download and deploy the JCE Policy files for your JDK release level to provide # strong java encryption extension levels like AES256. Make sure to match based on the encryption configured within AD for # cross realm auth, note that RC4 = arcfour when comparing windows and linux enctypes supported_enctypes = aes256-cts:normal aes128-cts:normal arcfour-hmac:normal default_principal_flags = +renewable, +forwardable } [logging] kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmin.log
kadm5.acl
*/admin@HADOOP.COM * cloudera-scm@HADOOP.COM * flume/*@HADOOP.COM cloudera-scm@HADOOP.COM * hbase/*@HADOOP.COM cloudera-scm@HADOOP.COM * hdfs/*@HADOOP.COM cloudera-scm@HADOOP.COM * hive/*@HADOOP.COM cloudera-scm@HADOOP.COM * httpfs/*@HADOOP.COM cloudera-scm@HADOOP.COM * HTTP/*@HADOOP.COM cloudera-scm@HADOOP.COM * hue/*@HADOOP.COM cloudera-scm@HADOOP.COM * impala/*@HADOOP.COM cloudera-scm@HADOOP.COM * mapred/*@HADOOP.COM cloudera-scm@HADOOP.COM * oozie/*@HADOOP.COM cloudera-scm@HADOOP.COM * solr/*@HADOOP.COM cloudera-scm@HADOOP.COM * sqoop/*@HADOOP.COM cloudera-scm@HADOOP.COM * yarn/*@HADOOP.COM cloudera-scm@HADOOP.COM * zookeeper/*@HADOOP.COM