Cloudera Security
This guide is intended for system administrators who want to secure a cluster using data encryption, user authentication, and authorization techniques. This topic also provides information about Hadoop security programs and shows you how to set up a gateway to restrict access.
Cloudera strives to consolidate security configurations across several projects. The process of securing an enterprise data hub comprises at least the following operations:
- Perimeter Security and Authentication: Guarding access to the system, its data, and its various services.
- Entitlement and Access: Defining and enforcing what users and applications can do with data.
- Data Protection: Protecting data from unauthorized access either while at rest or in transit.
Continue reading:
- Authentication
- Configuring Authentication in Cloudera Manager
- Why Use Cloudera Manager to Implement Kerberos Authentication?
- Ways to Configure Kerberos Authentication Using Cloudera Manager
- Cloudera Manager User Accounts
- Configuring External Authentication for Cloudera Manager
- Kerberos Principals and Keytabs
- Enabling Kerberos Authentication Using the Wizard
- Considerations when using an Active Directory KDC
- Step 1: Install Cloudera Manager and CDH
- Step 2: If You are Using AES-256 Encryption, Install the JCE Policy File
- Step 3: Get or Create a Kerberos Principal for the Cloudera Manager Server
- Step 4: Enabling Kerberos Using the Wizard
- Step 5: Create the HDFS Superuser
- Step 6: Get or Create a Kerberos Principal for Each User Account
- Step 7: Prepare the Cluster for Each User
- Step 8: Verify that Kerberos Security is Working
- Step 9: (Optional) Enable Authentication for HTTP Web Consoles for Hadoop Roles
- Viewing and Regenerating Kerberos Principals
- Mapping Kerberos Principals to Short Names
- Enabling Kerberos Authentication Without the Wizard
- Step 1: Install Cloudera Manager and CDH
- Step 2: If You are Using AES-256 Encryption, Install the JCE Policy File
- Step 3: Get or Create a Kerberos Principal for the Cloudera Manager Server
- Step 4: Import KDC Account Manager Credentials
- Step 5: Configure the Kerberos Default Realm in the Cloudera Manager Admin Console
- Step 6: Stop All Services
- Step 7: Enable Hadoop Security
- Step 8: Wait for the Generate Credentials Command to Finish
- Step 9: Enable Hue to Work with Hadoop Security using Cloudera Manager
- Step 10: (Flume Only) Use Substitution Variables for the Kerberos Principal and Keytab
- Step 11: (CDH 4.0 and 4.1 only) Configure Hue to Use a Local Hive Metastore
- Step 12: Start All Services
- Step 13: Deploy Client Configurations
- Step 14: Create the HDFS Superuser Principal
- Step 15: Get or Create a Kerberos Principal for Each User Account
- Step 16: Prepare the Cluster for Each User
- Step 17: Verify that Kerberos Security is Working
- Step 18: (Optional) Enable Authentication for HTTP Web Consoles for Hadoop Roles
- Configuring Authentication in Cloudera Navigator
- Configuring Authentication in CDH Using the Command Line
- Enabling Kerberos Authentication for Hadoop Using the Command Line
- Step 1: Install CDH 5
- Step 2: Verify User Accounts and Groups in CDH 5 Due to Security
- Step 3: If you are Using AES-256 Encryption, Install the JCE Policy File
- Step 4: Create and Deploy the Kerberos Principals and Keytab Files
- Step 5: Shut Down the Cluster
- Step 6: Enable Hadoop Security
- Step 7: Configure Secure HDFS
- Optional Step 8: Configuring Security for HDFS High Availability
- Optional Step 9: Configure secure WebHDFS
- Optional Step 10: Configuring a secure HDFS NFS Gateway
- Step 11: Set Variables for Secure DataNodes
- Step 12: Start up the NameNode
- Step 12: Start up a DataNode
- Step 14: Set the Sticky Bit on HDFS Directories
- Step 15: Start up the Secondary NameNode (if used)
- Step 16: Configure Either MRv1 Security or YARN Security
- Flume Authentication
- HBase Authentication
- HCatalog Authentication
- Hive Authentication
- HiveServer2 Security Configuration
- Enabling Kerberos Authentication for HiveServer2
- Encrypted Communication with Client Drivers
- Using LDAP Username/Password Authentication with HiveServer2
- Configuring LDAPS Authentication with HiveServer2
- Pluggable Authentication
- Trusted Delegation with HiveServer2
- HiveServer2 Impersonation
- Securing the Hive Metastore
- Disabling the Hive Security Configuration
- Hive Metastore Server Security Configuration
- Using Hive to Run Queries on a Secure HBase Server
- HiveServer2 Security Configuration
- HttpFS Authentication
- Hue Authentication
- Impala Authentication
- Enabling Kerberos Authentication for Impala
- Enabling LDAP Authentication for Impala
- Using Multiple Authentication Methods with Impala
- Configuring Impala Delegation for Hue and BI Tools
- Llama Authentication
- Oozie Authentication
- Search Authentication
- ZooKeeper Authentication
- FUSE Kerberos Configuration
- Using kadmin to Create Kerberos Keytab Files
- Configuring the Mapping from Kerberos Principals to Short Names
- Enabling Debugging Output for the Sun Kerberos Classes
- Enabling Kerberos Authentication for Hadoop Using the Command Line
- Configuring a Cluster-dedicated MIT KDC and Default Domain for a Cluster
- Integrating Hadoop Security with Active Directory
- Integrating Hadoop Security with Alternate Authentication
- Configuring LDAP Group Mappings
- Hadoop Users in Cloudera Manager and CDH
- Authenticating Kerberos Principals in Java Code
- Using a Web Browser to Access an URL Protected by Kerberos HTTP SPNEGO
- Troubleshooting Authentication Issues
- Sample Kerberos Configuration files: krb5.conf, kdc.conf, kadm5.acl
- Potential Security Problems and Their Solutions
- Issues with Generate Credentials
- Running any Hadoop command fails after enabling security.
- Java is unable to read the Kerberos credentials cache created by versions of MIT Kerberos 1.8.1 or higher.
- java.io.IOException: Incorrect permission
- A cluster fails to run jobs after security is enabled.
- The NameNode does not start and KrbException Messages (906) and (31) are displayed.
- The NameNode starts but clients cannot connect to it and error message contains enctype code 18.
- (MRv1 Only) Jobs won't run and TaskTracker is unable to create a local mapred directory.
- (MRv1 Only) Jobs will not run and TaskTracker is unable to create a Hadoop logs directory.
- After you enable cross-realm trust, you can run Hadoop commands in the local realm but not in the remote realm.
- (MRv1 Only) Jobs won't run and cannot access files in mapred.local.dir
- Users are unable to obtain credentials when running Hadoop jobs or commands.
- Request is a replay exceptions in the logs.
- CDH services fail to start
- Configuring Authentication in Cloudera Manager
- Encryption
- SSL Certificates Overview
- Configuring TLS Security for Cloudera Manager
- Configuring TLS Encryption Only for Cloudera Manager
- Level 1: Configuring TLS Encryption for Cloudera Manager Agents
- Level 2: Configuring TLS Verification of Cloudera Manager Server by the Agents
- Level 3: Configuring TLS Authentication of Agents to the Cloudera Manager Server
- Step 1: Configure TLS encryption
- Step 2: Configure TLS Verification of Server Trust by Agents
- Approach A: Using OpenSSL to Create Private Keys and Request Agent Certificates
- Approach A - Step 3. Generate the private key and certificate signing request for the Agent using OpenSSL.
- Approach A - Step 4: Submit the certificate signing request to your CA and distribute the issued certificates.
- Approach A - (Optional) Step 5: Import the OpenSSL private key and certificate into the per-host Java keystore.
- Approach B: Creating a Java Keystore and Importing Signed Agent Certificates into it
- Step 6: Create a File that Contains the Password for the Key
- Step 7: Configure the Agent with its Private Key and Certificate
- Step 8: Verify that steps 3-7 Were Completed for every Agent Host in Your Cluster
- Step 9: Create a Truststore by Importing CA and Agent Certificates
- Step 10: Enable Agent Authentication and Configure the Cloudera Manager Server to Use the New Truststore
- Step 12: Restart the Cloudera Manager Server
- Step 13: Restart the Cloudera Manager Agents
- Step 14: Verify that the Server and Agents Are Communicating
- HTTPS Communication in Cloudera Manager
- Configuring SSL/TLS Encryption for CDH Services
- HDFS Data At Rest Encryption
- Use Cases
- Architecture
- crypto Command Line Interface
- Enabling HDFS Encryption on a Cluster
- DistCp Considerations
- Attack Vectors
- Configuring the Key Management Server (KMS)
- Securing the Key Management Server (KMS)
- Configuring CDH Services for HDFS Encryption
- Configuring Encrypted HDFS Data Transport
- Troubleshooting SSL/TLS Connectivity
- Authorization
- Cloudera Manager User Roles
- Cloudera Navigator User Roles
- Enabling HDFS Extended ACLs
- The Sentry Service
- Prerequisites
- Privilege Model
- Users and Groups
- Appendix: Authorization Privilege Model for Hive and Impala
- Installing and Upgrading the Sentry Service
- Migrating from Sentry Policy Files to the Sentry Service
- Configuring the Sentry Service
- Hive SQL Syntax for Use with Sentry
- Sentry Policy File Authorization
- Prerequisites
- Roles and Privileges
- Privilege Model
- Users and Groups
- Policy File
- Sample Sentry Configuration Files
- Accessing Sentry-Secured Data Outside Hive/Impala
- Debugging Failed Sentry Authorization Requests
- Authorization Privilege Model for Hive and Impala
- Installing and Upgrading Sentry for Policy File Authorization
- Configuring Sentry Policy File Authorization Using Cloudera Manager
- Configuring User to Group Mappings
- Enabling URIs for Per-DB Policy Files
- Using User-Defined Functions with HiveServer2
- Enabling Policy File Authorization for Hive
- Configuring Group Access to the Hive Metastore
- Enabling Policy File Authorization for Impala
- Enabling Sentry Authorization for Solr
- Configuring Sentry to Enable BDR Replication
- Configuring Sentry Policy File Authorization Using the Command Line
- Enabling Sentry Authorization for Impala
- The Sentry Privilege Model
- Starting the impalad Daemon with Sentry Authorization Enabled
- Using Impala with the Sentry Service (CDH 5.1 or higher only)
- Using Impala with the Sentry Policy File
- Policy File Location and Format
- Examples of Policy File Rules for Security Scenarios
- A User with No Privileges
- Examples of Privileges for Administrative Users
- A User with Privileges for Specific Databases and Tables
- Privileges for Working with External Data Files
- Controlling Access at the Column Level through Views
- Separating Administrator Responsibility from Read and Write Privileges
- Using Multiple Policy Files for Different Databases
- Setting Up Schema Objects for a Secure Impala Deployment
- Privilege Model and Object Hierarchy
- Debugging Failed Sentry Authorization Requests
- Managing Sentry for Impala through Cloudera Manager
- The DEFAULT Database in a Secure Deployment
- Enabling Sentry Authorization for Search
- Roles and Collection-Level Privileges
- Users and Groups
- Setup and Configuration
- Policy File
- Sample Configuration
- Enabling Sentry in Cloudera Search for CDH 5
- Providing Document-Level Security Using Sentry
- Enabling Secure Impersonation
- Debugging Failed Sentry Authorization Requests
- Appendix: Authorization Privilege Model for Search
- Configuring HBase Authorization
- Overview of Impala Security
- Miscellaneous Topics