Overview of Cloudera Manager authentication.
Authentication is a basic security requirement for any computing environment. In simple
terms, users and services must prove their identity (authenticate) to the system before they
can use system features to the degree authorized. Authentication and authorization work
hand-in-hand to protect system resources. Authorization is handled in many different ways,
from access control lists (ACLs), to HDFS extended ACLs, to role-based access controls (RBAC)
using Ranger.
Several different mechanisms work together to authenticate users and services in a cluster.
These vary depending on the services configured on the cluster. Most CDP components, including
Apache Hive, Hue, and Apache Impala can use Kerberos for authentication. Both MIT and
Microsoft Active Directory Kerberos implementations can be integrated for use with Cloudera
clusters.
In addition, Kerberos credentials can be stored and managed in the
LDAP-compliant identity service, such as OpenLDAP and Microsoft Active
Directory, a core component of Windows Server.
This section provides a brief overview with special focus on different
deployment models available when using Microsoft Active Directory for
Kerberos authentication or when integrating MIT Kerberos and Microsoft
Active Directory.
Cloudera does not provide a Kerberos implementation. Cloudera clusters
can be configured to use Kerberos for authentication, either MIT Kerberos
or Microsoft Server Active Directory Kerberos, specifically the Key
Distribution Center or KDC. The Kerberos instance must be setup and
operational before you can configure the cluster to use it.
Gathering all the configuration details about the KDC—or having the
Kerberos administrator available to help during the setup process—is an
important preliminary task involved with integrating the cluster and
Kerberos regardless of the deployment model.
Kerberos Overview🔗
In simple terms, Kerberos is an authentication
protocol that relies on cryptographic mechanisms to handle interactions
between a requesting client and server, greatly reducing the risk of
impersonation. Passwords are not stored locally nor sent over the
network in the clear. The password users enter when logging in to their
systems is used to unlock a local mechanism that is then used in a
subsequent interaction with a trusted third-party to grant a user a
ticket (with a limited lifetime) that is used to authenticate with
requested services. After the client and server processes prove their
respective identities to each other, communications are encrypted to
ensure privacy and data integrity.
The trusted third-party is the Kerberos Key Distribution Center (KDC),
the focal point for Kerberos operations which also provides the
Authentication Service and the Ticket Granting Service (TGS) for the
system. Briefly, the TGS issues a ticket to the requesting user or
service which is then presented to the requested service that proves the
user (or service) identity for the ticket lifetime (by default, 10
hours). There are many nuances to Kerberos, including defining the
principals that identify users and services for the system, ticket
renewal, delegated token handling, to name a few. See Keytab principals.
Furthermore, these processes occur for the most part completely
transparently. For example, business users of the cluster simply enter
their password when they log in, and the ticket-handling, encryption,
and other details take place automatically, behind the scenes.
Additionally, users are authenticated not only to a single service
target, but to the network as a whole thanks to the tickets and other
mechanisms at work in the Kerberos infrastructure.