Establishing identity with strong authentication is the basis for secure access in Hadoop. Users need to be able to reliably “identify” themselves and then have that identity propagated throughout the Hadoop cluster. Once this is done those users can access resources (such as files or directories) or interact with the cluster (like excecuting MapReduce jobs). As well, Hadoop cluster resources themselves (such as Hosts and Services) need to authenticate with each other to avoid potential malicious systems “posing as” part of the cluster to gain access to data.
To create that secure communication among its various components, Hadoop uses Kerberos. Kerberos is a third party authentication mechanism, in which users and services that users wish to access rely on a third party - the Kerberos server - to authenticate each to the other. The Kerberos server itself is known as the Key Distribution Center, or KDC. At a high level, it has three parts:
A database of the users and services (known as principals) that it knows about and their respective Kerberos passwords
An authentication server (AS) which performs the initial authentication and issues a Ticket Granting Ticket (TGT)
A Ticket Granting Server (TGS) that issues subsequent service tickets based on the initial TGT.
A user principal requests authentication from the AS and the AS, in turn, returns a TGT. (TGT is encrypted using user principal's Kerberos password and is known only to the user principal and the AS. )
The user principal decrypts the TGT locally using its Kerberos password, and from that point forward, until the ticket expires, the user principal can use the TGT to get service tickets from the TGS.
Service principal uses a special file containing authentication credentials. This file is called a keytab. The service tickets allow the principal to access various services.
The set of hosts, users, and services over which the Kerberos server has control is called a realm.
Table 15.1. Kerberos terminology
Term | Description |
---|---|
Key Distribution Center, or KDC | The trusted source for authentication in a Kerberos-enabled environment. |
Kerberos KDC Server | The machine, or server, that serves as the Key Distribution Center. |
Kerberos Client | Any machine in the cluster that authenticates against the KDC. |
Principal | The unique name of a user or service that authenticates against the KDC. |
Keytab | A file that includes one or more principals and their keys. |
Realm | The Kerberos network that includes a KDC and a number of Clients. |