Hadoop Authentication with Kerberos for Cloudera Data Science Workbench

Cloudera Data Science Workbench users can authenticate themselves using Kerberos against the cluster KDC defined in the host's /etc/krb5.conf file. Cloudera Data Science Workbench does not assume that your Kerberos principal is always the same as your login information. Therefore, you will need to make sure Cloudera Data Science Workbench knows your Kerberos identity when you sign in.

Authenticate against your cluster’s Kerberos KDC by going to the top-right dropdown menu, click Account settings > Hadoop Authentication and enter your Kerberos principal and password. Once successfully authenticated, Cloudera Data Science Workbench uses your stored keytab to ensure that you are secure when running your workloads.

After you authenticate with Kerberos, Cloudera Data Science Workbench will store your keytab. This keytab is then injected into any running engines so that users are automatically authenticated against the CDH cluster when using an engine. Type klist at the engine terminal, to see your Kerberos principal. You should now be able to connect to Spark, Hive, and Impala without manually running kinit.

UI Behavior for Non-Kerberized Clusters

The contents of the Hadoop Authentication tab change depending on whether the cluster is kerberized. For a secure cluster with Kerberos enabled, the Hadoop Authentication tab displays a Kerberos section with fields to enter your Kerberos principal and username. However, if Cloudera Data Science Workbench cannot detect a krb5.conf file on the host, it will assume the cluster is not kerberized, and the Hadoop Authentication tab will display Hadoop Username Override configuration instead.

For a non-kerberized cluster, by default, your Hadoop username will be set to your Cloudera Data Science Workbench login username. To override this default and set an alternative HADOOP_USER_NAME, go to the Hadoop Username Override setting at Account settings > Hadoop Authentication.

If the Hadoop Authentication tab is incorrectly displaying Kerberos configuration fields for a non-kerberized cluster, make sure the krb5.conf file is not present on the host running Cloudera Data Science Workbench. If you do find any instances of krb5.conf on the host, run cdsw stop, remove the krb5.conf file(s), and run cdsw start. You should now see the expected Hadoop Username Override configuration field.

Limitations

  • Cloudera Data Science Workbench only supports Active Directory and MIT KDCs. PowerBroker-equipped Active Directory is not supported.

  • Kerberos principals are case sensitive when stored as keytabs, even though interactive kinit is case-insensitive. Always use your full Kerberos principal, such as username@EDH.COMPANY.COM. If your deployment has a case mismatch between the Kerberos principals and the Linux usernames, you will need further configuration to successfully authenticate to the KDC. See Case Mismatch Between Kerberos Principals and Linux Usernames.

  • Cloudera Data Science Workbench does not support the use of Kerberos plugin modules in krb5.conf.

Case Mismatch Between Kerberos Principals and Linux Usernames

Cloudera Data Science Workbench typically only accepts Kerberos principals whose primary precisely matches the user principal name (UPN) in Active Directory. This is because Kerberos keytabs require principals to have the correct case. For example, if the user's UPN is UPPERCASE@edh.company.com, the principal must be UPPERCASE@EDH.COMPANY.COM.

In contrast, password-based kinit is not case-sensitive and will accept even uppercase@EDH.COMPANY.COM as a valid principal. This difference becomes problematic when the Linux usernames corresponding to upper-cased Kerberos principals are lower-cased; for example, if the Kerberos principal is UPPERCASE@EDH.COMPANY.COM, but the Linux username is uppercase, CDH services will not be able to correctly infer the Linux username from a Kerberos TGT.

Depending on the encryption type, you can use one of the following ways to ensure CDH services are able to infer the Linux username from a Kerberos TGT:
  • The krb5.conf files on the CDH nodes can be modified to include only rc4-hmac or arcfour-hmac in the permitted_enctypes field. These encryption types cause Cloudera Data Science Workbench to be case insensitive with respect to the principal's primary when used with Active Directory.

    AES encryption types cause the principal to become case-sensitive when used with keytabs but not with passwords. While this is not a bug, some users might be unaware of the actual case of their principal because kinit with password is not case-sensitive.

  • If you want to use AES encryption types that are stronger than rc4-hmac, use auth_to_local to map your Kerberos principal to a differently-cased Linux username. For example, if your Kerberos principals are of the form, UPPERCASE@EDH.COMPANY.COM, but the corresponding Linux usernames are uppercase, you can configure an auth_to_local mapping in Cloudera Manager as follows:
    1. Go to the Cloudera Manager Admin console. You can do this by clicking > Cloudera Manager in the upper right hand corner of the Cloudera Data Science Workbench web application.
    2. Go to the service you want to configure.
    3. Click Configuration.
    4. Search for the Advanced Configuration Snippet (Safety Valve) for core-site.xml property and add an auth_to_local rule to map the service's principal name to the Linux username. For example, the following rule maps a principal's primary to its lower-cased version.
      <property>
        <name>hadoop.security.auth_to_local</name>
        <value>
          RULE:[1:$1]/L
          DEFAULT
        </value>
      </property>
    5. Click Save Changes.
    6. Repeat steps 2-6 for each CDH service that requires the mapping.
    7. Deploy client configuration and restart the relevant services. You should now be able to authenticate with your correctly-cased Kerberos principal, and CDH services will be able to associate your uppercase principals with the lowercase Linux usernames.
      You can test the auth_to_local rule on a Cloudera Data Science Workbench host or from a session terminal as follows:
      hadoop org.apache.hadoop.security.HadoopKerberosName <username@EDH.COMPANY.COM>

    For more details on this method, see the Configuring the Mapping from Kerberos Principals to Short Names.