Customizing Kerberos Principals and System Users

How to configure custom service principals in Cloudera Manager.

Custom Kerberos Principals are a non-default cluster configuration option for customers who have specific security or identity management requirements. For example, customers may have a pre-existing identity management (LDAP/AD/IPA) configuration with identities that are either incompatible with Hadoop and/or Linux username syntax restrictions, or are otherwise undesirable to use as the identifier of an end-user within the Hadoop environment.

In this case, the Kerberos principal differs from the user understood by both the CDP platform and the operating system via a set of mapping rules, today implemented by the ‘auth_to_local’ feature of Apache Hadoop. This occurs when customers have long-standing naming conventions and policies about identity management at the corporate level which cannot be altered for their Cloudera deployment.

To accommodate this, custom Kerberos principals are often used in conjunction with custom service accounts at the local (OS) and Hadoop levels. These are known to Cloudera Manager as “System users”.

Preparation of System Users/Groups

Cluster administrators will need to ensure that all the required custom system users have been created on the hosts that will be joining the cluster. Cloudera Manager only creates the default system users. It is highly recommended that IT/Ops should handle the creation of system users with standard tools that help manage large number of hosts, eg. Ansible. The below procedure is a manual alternative to using such tools.
  1. Ensure the CM Agent is installed on the hosts.
  2. Select a host and ssh into the host.
  3. Copy service_list.csv and group_list.csv to a directory of your choice, eg. /tmp
  4. Edit your copied service_list.csv file to include the chosen custom names for system users and/or the respective groups for each Cloudera service component.
  5. For each one of the hosts managed by CM:
    1. Copy the modified service_list.csv and group_list.csv to a directory on the host eg. /tmp
    2. Log into the host as root or a sudo-capable user account.
    3. $ cd /opt/cloudera/cm-agent/service/inituids
    4. Assuming the files are in /tmp, run this command:$ ./set-service-uids.py -s /tmp/service_list.csv -g /tmp/group_list.csv