Authentication
The purpose of authentication in Hadoop, as in other systems, is simply to prove that a user or service is who he or she claims to be.
Typically, authentication in enterprises is managed through a single distributed system, such as a Lightweight Directory Access Protocol (LDAP) directory. LDAP authentication consists of straightforward username/password services backed by a variety of storage systems, ranging from file to database.
A common enterprise-grade authentication system is Kerberos. Kerberos provides strong security benefits including capabilities that render intercepted authentication packets unusable by an attacker. It virtually eliminates the threat of impersonation by never sending a user's credentials in cleartext over the network.
Several components of the Hadoop ecosystem are converging to use Kerberos authentication with the option to manage and store credentials in LDAP or AD. For example, Microsoft's Active Directory (AD) is an LDAP directory that also provides Kerberos authentication for added security.
Continue reading:
- Configuring Authentication in Cloudera Manager
- Why Use Cloudera Manager to Implement Kerberos Authentication?
- Ways to Configure Kerberos Authentication Using Cloudera Manager
- Cloudera Manager User Accounts
- Configuring External Authentication for Cloudera Manager
- Kerberos Concepts - Principals, Keytabs and Delegation Tokens
- Enabling Kerberos Authentication Using the Wizard
- Considerations when using an Active Directory KDC
- Step 1: Install Cloudera Manager and CDH
- Step 2: If You are Using AES-256 Encryption, Install the JCE Policy File
- Step 3: Get or Create a Kerberos Principal for the Cloudera Manager Server
- Step 4: Enabling Kerberos Using the Wizard
- Step 5: Create the HDFS Superuser
- Step 6: Get or Create a Kerberos Principal for Each User Account
- Step 7: Prepare the Cluster for Each User
- Step 8: Verify that Kerberos Security is Working
- Step 9: (Optional) Enable Authentication for HTTP Web Consoles for Hadoop Roles
- Enabling Kerberos Authentication for Single User Mode or Non-Default Users
- Configuring a Cluster with Custom Kerberos Principals
- Viewing and Regenerating Kerberos Principals
- Mapping Kerberos Principals to Short Names
- Using Auth-to-Local Rules to Isolate Cluster Users
- Configuring Kerberos for Flume Thrift Source and Sink
- Configuring YARN for Long-running Applications
- Enabling Kerberos Authentication Without the Wizard
- Step 1: Install Cloudera Manager and CDH
- Step 2: If You are Using AES-256 Encryption, Install the JCE Policy File
- Step 3: Get or Create a Kerberos Principal for the Cloudera Manager Server
- Step 4: Import KDC Account Manager Credentials
- Step 5: Configure the Kerberos Default Realm in the Cloudera Manager Admin Console
- Step 6: Stop All Services
- Step 7: Enable Hadoop Security
- Step 8: Wait for the Generate Credentials Command to Finish
- Step 9: Enable Hue to Work with Hadoop Security using Cloudera Manager
- Step 10: (Flume Only) Use Substitution Variables for the Kerberos Principal and Keytab
- Step 11: (CDH 4.0 and 4.1 only) Configure Hue to Use a Local Hive Metastore
- Step 12: Start All Services
- Step 13: Deploy Client Configurations
- Step 14: Create the HDFS Superuser Principal
- Step 15: Get or Create a Kerberos Principal for Each User Account
- Step 16: Prepare the Cluster for Each User
- Step 17: Verify that Kerberos Security is Working
- Step 18: (Optional) Enable Authentication for HTTP Web Consoles for Hadoop Roles
- Configuring Authentication in the Cloudera Navigator Data Management Component
- Configuring External Authentication for the Cloudera Navigator Data Management Component
- Managing Users and Groups for the Cloudera Navigator Data Management Component
- Configuring Authentication in CDH Using the Command Line
- Enabling Kerberos Authentication for Hadoop Using the Command Line
- Step 1: Install CDH 5
- Step 2: Verify User Accounts and Groups in CDH 5 Due to Security
- Step 3: If you are Using AES-256 Encryption, install the JCE Policy File
- Step 4: Create and Deploy the Kerberos Principals and Keytab Files
- Step 5: Shut Down the Cluster
- Step 6: Enable Hadoop Security
- Step 7: Configure Secure HDFS
- Optional Step 8: Configuring Security for HDFS High Availability
- Optional Step 9: Configure secure WebHDFS
- Optional Step 10: Configuring a secure HDFS NFS Gateway
- Step 11: Set Variables for Secure DataNodes
- Step 12: Start up the NameNode
- Step 12: Start up a DataNode
- Step 14: Set the Sticky Bit on HDFS Directories
- Step 15: Start up the Secondary NameNode (if used)
- Step 16: Configure Either MRv1 Security or YARN Security
- Flume Authentication
- HBase Authentication
- Configuring Kerberos Authentication for HBase
- Configuring Kerberos Authentication for HBase Using Cloudera Manager
- Configuring Kerberos Authentication for HBase Using the Command Line
- Configure HBase Servers to Authenticate with a Secure HDFS Cluster Using the Command Line
- Configure HBase Servers and Clients to Authenticate with a Secure ZooKeeper
- Configure HBase JVMs (all Masters, Region Servers, and clients) to use JAAS
- Configure the HBase Servers (Masters and Region Servers) to use Authentication to connect to ZooKeeper
- Configure Authentication for the HBase REST and Thrift Gateways
- Configure doAs Impersonation for the HBase Thrift Gateway
- Start HBase
- Configuring Secure HBase Replication
- Configuring the HBase Client TGT Renewal Period
- Configuring Kerberos Authentication for HBase
- HCatalog Authentication
- Hive Authentication
- HiveServer2 Security Configuration
- Enabling Kerberos Authentication for HiveServer2
- Using LDAP Username/Password Authentication with HiveServer2
- Configuring LDAPS Authentication with HiveServer2
- Pluggable Authentication
- Trusted Delegation with HiveServer2
- HiveServer2 Impersonation
- Securing the Hive Metastore
- Disabling the Hive Security Configuration
- Hive Metastore Server Security Configuration
- Using Hive to Run Queries on a Secure HBase Server
- HiveServer2 Security Configuration
- HttpFS Authentication
- Hue Authentication
- Impala Authentication
- Enabling Kerberos Authentication for Impala
- Enabling LDAP Authentication for Impala
- Using Multiple Authentication Methods with Impala
- Configuring Impala Delegation for Hue and BI Tools
- Llama Authentication
- Oozie Authentication
- Search Authentication
- Spark Authentication
- Sqoop 2 Authentication
- ZooKeeper Authentication
- FUSE Kerberos Configuration
- Using kadmin to Create Kerberos Keytab Files
- Configuring the Mapping from Kerberos Principals to Short Names
- Enabling Debugging Output for the Sun Kerberos Classes
- Enabling Kerberos Authentication for Hadoop Using the Command Line
- Configuring a Cluster-dedicated MIT KDC with Cross-Realm Trust
- Integrating Hadoop Security with Active Directory
- Integrating Hadoop Security with Alternate Authentication
- Hadoop Users in Cloudera Manager and CDH
- Authenticating Kerberos Principals in Java Code
- Using a Web Browser to Access an URL Protected by Kerberos HTTP SPNEGO
- Troubleshooting Authentication Issues
- Sample Kerberos Configuration files: krb5.conf, kdc.conf, kadm5.acl
- Potential Security Problems and Their Solutions
- Issues with Generate Credentials
- Running any Hadoop command fails after enabling security.
- Java is unable to read the Kerberos credentials cache created by versions of MIT Kerberos 1.8.1 or higher.
- java.io.IOException: Incorrect permission
- A cluster fails to run jobs after security is enabled.
- The NameNode does not start and KrbException Messages (906) and (31) are displayed.
- The NameNode starts but clients cannot connect to it and error message contains enctype code 18.
- (MRv1 Only) Jobs won't run and TaskTracker is unable to create a local mapred directory.
- (MRv1 Only) Jobs will not run and TaskTracker is unable to create a Hadoop logs directory.
- After you enable cross-realm trust, you can run Hadoop commands in the local realm but not in the remote realm.
- (MRv1 Only) Jobs won't run and cannot access files in mapred.local.dir
- Users are unable to obtain credentials when running Hadoop jobs or commands.
- Request is a replay exceptions in the logs.
- CDH services fail to start