Hadoop Authentication with Kerberos for Cloudera Data Science Workbench

Cloudera Data Science Workbench users can authenticate themselves using Kerberos against the cluster KDC defined in the host's /etc/krb5.conf file. Cloudera Data Science Workbench does not assume that your Kerberos principal is always the same as your login information. Therefore, you will need to make sure Cloudera Data Science Workbench knows your Kerberos identity when you sign in.

To authenticate against your cluster’s Kerberos KDC, go to the top-right dropdown menu, click Account settings > Hadoop Authentication, and enter your Kerberos principal. To authenticate, either enter your password or click Upload Keytab to upload the keytab file directly to Cloudera Data Science Workbench. Once successfully authenticated, Cloudera Data Science Workbench uses your stored credentials to ensure that you are secure when running your workloads.

When you authenticate with Kerberos, Cloudera Data Science Workbench will store your keytab in an internal database. When you subsequently run an engine, the keytab is used by a Cloudera Data Science Workbench sidecar container to generate ticket-granting tickets for use by your code. Ticket-granting tickets allow you to access resources such as Spark, Hive, and Impala, on Kerberized CDH clusters.

While you can view your current ticket-granting ticket by typing klist in an engine terminal, there is no way for you or your code to view your keytab. This prevents malicious code and users from stealing your keytab.

UI Behavior for Non-Kerberized Clusters

The contents of the Hadoop Authentication tab change depending on whether the cluster is kerberized. For a secure cluster with Kerberos enabled, the Hadoop Authentication tab displays a Kerberos section with fields to enter your Kerberos principal and username. However, if Cloudera Data Science Workbench cannot detect a krb5.conf file on the host, it will assume the cluster is not kerberized, and the Hadoop Authentication tab will display Hadoop Username Override configuration instead.

For a non-kerberized cluster, by default, your Hadoop username will be set to your Cloudera Data Science Workbench username. To override this default and set an alternative HADOOP_USER_NAME, go to the Hadoop Username Override setting at Account settings > Hadoop Authentication.

If the Hadoop Authentication tab is incorrectly displaying Kerberos configuration fields for a non-kerberized cluster, make sure the krb5.conf file is not present on the host running Cloudera Data Science Workbench. If you do find any instances of krb5.conf on the host, depending on your deployment, perform one of the following sets of actions:

  • On CSD deployments, go to Cloudera Manager and stop the CDSW service. Remove the krb5.conf file(s) from the Cloudera Data Science Workbench gateway host, and then start the CDSW in Cloudera Manager.

    OR

  • On RPM deployments, run cdsw stop, remove the krb5.conf file(s) from the Cloudera Data Science Workbench gateway host, and run cdsw start.

You should now see the expected Hadoop Username Override configuration field.

Limitations