Documentation
Products
Services & Support
Solutions
Cloudera Enterprise 5.13.x
|
Other versions
Impala Guide
SQL Reference
SQL Statements
SET
Query Options for the SET Statement
View All Categories
Cloudera Introduction
Cloudera Personas
CDH Overview
Hive Overview
Apache Impala Overview
Cloudera Search Overview
Understanding Cloudera Search
Cloudera Search and Other Cloudera Components
Cloudera Search Architecture
Cloudera Search Tasks and Processes
Apache Kudu Overview
Apache Sentry Overview
Apache Spark Overview
External Documentation
Cloudera Manager 5 Overview
Cloudera Manager Admin Console
Cloudera Manager Admin Console Home Page
Displaying Cloudera Manager Documentation
Automatic Logout
Cloudera Manager API
Using the Cloudera Manager API for Cluster Automation
Extending Cloudera Manager
Cloudera Navigator Data Management
Getting Started with Cloudera Navigator
Cloudera Navigator Frequently Asked Questions
Cloudera Navigator Data Encryption
Cloudera Navigator Key Trustee Server Overview
Cloudera Navigator Key HSM Overview
Cloudera Navigator HSM KMS Overview
Cloudera Navigator Encrypt Overview
Frequently Asked Questions About Cloudera Software
Getting Support
Cloudera Release Notes
Requirements and Supported Versions
Cloudera QuickStart VM
Cloudera Manager
Cloudera Manager 5 Frequently Asked Questions
Cloudera Installation
Configuration Requirements for Cloudera Manager, Cloudera Navigator, and CDH 5
Permission Requirements for Package-based Installations and Upgrades of CDH
Cluster Hosts and Role Assignments
Required Tomcat Directories
Ports
Ports Used by Cloudera Manager and Cloudera Navigator
Ports Used by Cloudera Navigator Encryption
Ports Used by Components of CDH 5
Ports Used by Impala
Ports Used by Cloudera Search
Ports Used by DistCp
Ports Used by Third-Party Components
Ports Used by Apache Flume and Apache Solr
Managing Software Installation Using Cloudera Manager
Parcels
Creating Virtual Images of Cluster Hosts
Migrating from Packages to Parcels
Migrating from Parcels to Packages
Installing Cloudera Manager and CDH
Java Development Kit Installation
Configuring Single User Mode
Cloudera Manager and Managed Service Datastores
Embedded PostgreSQL Database
Install and Configure PostgreSQL
Install and Configure MariaDB
Install and Configure MySQL
Oracle Database
Configuring an External Database for Oozie
Configuring an External Database for Sqoop
Backing Up Databases
Data Storage for Monitoring Data
Storage Space Planning for Cloudera Manager
Installation Path A - Automated Installation by Cloudera Manager (Non-Production Mode)
Installation Path B - Installation Using Cloudera Manager Parcels or Packages
(Optional) Manually Install CDH and Managed Service Packages
Installation Path C - Manual Installation Using Cloudera Manager Tarballs
Installing Impala
Installing Kudu
Installing Cloudera Search
Installing Spark
Installing the GPL Extras Parcel
Understanding Custom Installation Solutions
Creating and Using a Parcel Repository for Cloudera Manager
Creating and Using a Package Repository for Cloudera Manager
Configuring a Custom Java Home Location
Installing Lower Versions of Cloudera Manager 5
Creating a CDH Cluster Using a Cloudera Manager Template
Deploying Clients
Testing the Installation
Uninstalling Cloudera Manager and Managed Software
Uninstalling a CDH Component From a Single Host
Installing the Cloudera Navigator Data Management Component
Installing Cloudera Navigator Key Trustee Server
Installing Cloudera Navigator Key HSM
Installing Key Trustee KMS
Installing Navigator HSM KMS Backed by Thales HSM
Installing Navigator HSM KMS Backed by Luna HSM
Installing Cloudera Navigator Encrypt
Installing and Deploying CDH Using the Command Line
Before You Install CDH 5 on a Cluster
Creating a Local Yum Repository
Installing the Latest CDH 5 Release
Installing an Earlier CDH 5 Release
CDH 5 and MapReduce
Migrating from MapReduce (MRv1) to MapReduce (MRv2)
Deploying CDH 5 on a Cluster
Configuring Dependencies Before Deploying CDH on a Cluster
Enable an NTP Service
Configuring Network Names
Disabling SELinux
Disabling the Firewall
Deploying HDFS on a Cluster
Deploying MapReduce v2 (YARN) on a Cluster
Deploying MapReduce v1 (MRv1) on a Cluster
Configuring Hadoop Daemons to Run at Startup
Installing CDH 5 Components
Crunch Installation
Crunch Prerequisites
Crunch Packaging
Installing and Upgrading Crunch
Crunch Documentation
Flume Installation
Upgrading Flume
Flume Packaging
Installing the Flume Tarball
Installing the Flume RPM or Debian Packages
Flume Configuration
Verifying the Flume Installation
Running Flume
Files Installed by the Flume RPM and Debian Packages
Supported Sources, Sinks, and Channels
Viewing the Flume Documentation
HBase Installation
New Features and Changes for HBase in CDH 5
Upgrading HBase
Installing HBase
Configuration Settings for HBase
Starting HBase in Standalone Mode
Configuring HBase in Pseudo-Distributed Mode
Deploying HBase on a Cluster
Accessing HBase by using the HBase Shell
Configuring HBase Online Merge
Using MapReduce with HBase
Troubleshooting HBase
Viewing the HBase Documentation
HCatalog Installation
HCatalog Prerequisites
Installing and Upgrading the HCatalog RPM or Debian Packages
Configuration Change on Hosts Used with HCatalog
Starting and Stopping the WebHCat REST server
Accessing Table Information with the HCatalog Command-line API
Accessing Table Data with MapReduce
Accessing Table Data with Pig
Accessing Table Information with REST
Viewing the HCatalog Documentation
Impala Installation
Requirements
Installing Impala from the Command Line
Upgrading Impala
Starting Impala
Modifying Impala Startup Options
Hive Installation
Installing Hive
Upgrading Hive
HttpFS Installation
About HttpFS
HttpFS Packaging
HttpFS Prerequisites
Installing HttpFS
Configuring HttpFS
Starting the HttpFS Server
Stopping the HttpFS Server
Using the HttpFS Server with curl
Hue Installation
Configuring CDH Components for Hue
Hue Configuration
KMS Installation and Upgrade
Kudu Installation
Upgrading Kudu
Mahout Installation
Installing Mahout
Upgrading Mahout
The Mahout Executable
Getting Started with Mahout
Viewing the Mahout Documentation
Oozie Installation
About Oozie
Oozie Packaging
Oozie Prerequisites
Upgrading Oozie
Installing Oozie
Configuring Oozie
Starting, Stopping, and Accessing the Oozie Server
Using Sqoop Actions with Oozie
Viewing the Oozie Documentation
Pig Installation
Upgrading Pig
Installing Pig
Using Pig with HBase
Installing DataFu
Viewing the Pig Documentation
Search Installation
Installing Cloudera Search without Cloudera Manager
Installing the Spark Indexer
Installing MapReduce Tools for use with Cloudera Search
Installing the Lily HBase Indexer Service
Upgrading Cloudera Search
Installing Hue Search
Updating Hue Search
Sentry Installation
Snappy Installation
Spark Installation
Spark Packages
Spark Prerequisites
Installing and Upgrading Spark
Sqoop 1 Installation
Sqoop 2 Installation
Upgrading Sqoop 2 from an Earlier CDH 5 Release
Installing Sqoop 2
Configuring Sqoop 2
Starting, Stopping, and Accessing the Sqoop 2 Server
Viewing the Sqoop 2 Documentation
Feature Differences - Sqoop 1 and Sqoop 2
Whirr Installation
Upgrading Whirr
Installing Whirr
Generating an SSH Key Pair for Whirr
Defining a Whirr Cluster
Managing a Cluster with Whirr
Viewing the Whirr Documentation
ZooKeeper Installation
Upgrading ZooKeeper from an Earlier CDH 5 Release
Installing the ZooKeeper Packages
Maintaining a ZooKeeper Server
Viewing the ZooKeeper Documentation
Building RPMs from CDH Source RPMs
Prerequisites
Setting Up an Environment for Building RPMs
Building an RPM
Apache and Third-Party Licenses
Apache License
Third-Party Licenses
Uninstalling CDH Components
Viewing the Apache Hadoop Documentation
Troubleshooting Installation and Upgrade Problems
Upgrade
Cloudera Administration
Managing CDH and Managed Services
Managing CDH and Managed Services Using Cloudera Manager
Configuration Overview
Modifying Configuration Properties Using Cloudera Manager
Modifying Configuration Properties (Classic Layout)
Autoconfiguration
Custom Configuration
Stale Configurations
Client Configuration Files
Viewing and Reverting Configuration Changes
Exporting and Importing Cloudera Manager Configuration
Managing Clusters
Adding and Deleting Clusters
Starting, Stopping, Refreshing, and Restarting a Cluster
Pausing a Cluster in AWS
Renaming a Cluster
Cluster-Wide Configuration
Moving a Host Between Clusters
Managing Services
Adding a Service
Comparing Configurations for a Service Between Clusters
Add-on Services
Starting, Stopping, and Restarting Services
Rolling Restart
Aborting a Pending Command
Deleting Services
Renaming a Service
Configuring Maximum File Descriptors
Exposing Hadoop Metrics to Graphite
Exposing Hadoop Metrics to Ganglia
Managing Roles
Role Instances
Role Groups
Managing Hosts
Viewing Host Details
Using the Host Inspector
Adding a Host to the Cluster
Specifying Racks for Hosts
Host Templates
Decommissioning and Recommissioning Hosts
Deleting Hosts
Maintenance Mode
Cloudera Manager Configuration Properties
Managing CDH Using the Command Line
Starting CDH Services Using the Command Line
Configuring init to Start Hadoop System Services
Starting and Stopping HBase Using the Command Line
Stopping CDH Services Using the Command Line
Migrating Data between Clusters Using distcp
Copying Cluster Data Using DistCp
Copying Data between a Secure and an Insecure Cluster using DistCp and WebHDFS
Post-migration Verification
Decommissioning DataNodes Using the Command Line
Managing Individual Services
Managing Flume
Configuring Flume Security with Kafka
Managing the HBase Service
Managing HBase
Managing HBase Security
Starting and Stopping HBase
Accessing HBase by using the HBase Shell
Using HBase Command-Line Utilities
Configuring HBase Garbage Collection
Configuring the HBase Canary
Checking and Repairing HBase Tables
Hedged Reads
Configuring the Blocksize for HBase
Configuring the HBase BlockCache
Configuring the HBase Scanner Heartbeat
Limiting the Speed of Compactions
Reading Data from HBase
HBase Filtering
Writing Data to HBase
Importing Data Into HBase
Configuring and Using the HBase REST API
Configuring HBase MultiWAL Support
Storing Medium Objects (MOBs) in HBase
Configuring the Storage Policy for the Write-Ahead Log (WAL)
Exposing HBase Metrics to a Ganglia Server
Using Azure Data Lake Store with HBase
Using HashTable and SyncTable Tool
Managing HDFS
NameNodes
Backing Up and Restoring HDFS Metadata
Moving NameNode Roles
Sizing NameNode Heap Memory
Backing Up and Restoring NameNode Metadata
DataNodes
Configuring Storage Directories for DataNodes
Configuring Storage Balancing for DataNodes
Performing Disk Hot Swap for DataNodes
JournalNodes
Configuring Short-Circuit Reads
Configuring HDFS Trash
HDFS Balancers
Enabling WebHDFS
Adding HttpFS
Adding and Configuring an NFS Gateway
Setting HDFS Quotas
Configuring Mountable HDFS
Configuring Centralized Cache Management in HDFS
Configuring Proxy Users to Access HDFS
Using CDH with Isilon Storage
Configuring Heterogeneous Storage in HDFS
Managing Hive
Managing Hue
Adding a Hue Service and Role Instance
Managing Hue Analytics Data Collection
Enabling Hue Applications Using Cloudera Manager
Managing Impala
The Impala Service
Post-Installation Configuration for Impala
Configuring Impala to Work with ODBC
Configuring Impala to Work with JDBC
Managing Key-Value Store Indexer
Managing Kudu
Managing Oozie
Oozie High Availability
Adding the Oozie Service Using Cloudera Manager
Redeploying the Oozie ShareLib
Configuring Oozie Data Purge Settings Using Cloudera Manager
Dumping and Loading an Oozie Database Using Cloudera Manager
Adding Schema to Oozie Using Cloudera Manager
Enabling the Oozie Web Console
Enabling Oozie SLA with Cloudera Manager
Setting the Oozie Database Timezone
Scheduling in Oozie Using Cron-like Syntax
Configuring Oozie to Enable MapReduce Jobs To Read/Write from Amazon S3
Configuring Oozie to Enable MapReduce Jobs To Read/Write from Microsoft Azure (ADLS)
Managing Solr
Managing Spark
Managing Spark Using Cloudera Manager
Managing Spark Standalone Using the Command Line
Managing the Spark History Server
Managing the Sqoop 1 Client
Managing Sqoop 2
Managing YARN (MRv2) and MapReduce (MRv1)
Managing YARN
Managing YARN ACLs
Managing MapReduce
Managing ZooKeeper
Configuring Services to Use the GPL Extras Parcel
Performance Management
Optimizing Performance in CDH
Choosing and Configuring Data Compression
Tuning the Solr Server
Tuning Spark Applications
Tuning YARN
Resource Management
Static Service Pools
Linux Control Groups (cgroups)
Dynamic Resource Pools
YARN (MRv2) and MapReduce (MRv1) Schedulers
Configuring the Fair Scheduler
Enabling and Disabling Fair Scheduler Preemption
Resource Management for Impala
Admission Control and Query Queuing
Managing Impala Admission Control
Cluster Utilization Reports
Creating a Custom Cluster Utilization Report
High Availability
HDFS High Availability
Introduction to HDFS High Availability
Configuring Hardware for HDFS HA
Enabling HDFS HA
Disabling and Redeploying HDFS HA
Configuring Other CDH Components to Use HDFS HA
Administering an HDFS High Availability Cluster
Changing a Nameservice Name for Highly Available HDFS Using Cloudera Manager
MapReduce (MRv1) and YARN (MRv2) High Availability
YARN (MRv2) ResourceManager High Availability
Work Preserving Recovery for YARN Components
MapReduce (MRv1) JobTracker High Availability
Cloudera Navigator Key Trustee Server High Availability
Enabling Key Trustee KMS High Availability
Enabling Navigator HSM KMS High Availability
High Availability for Other CDH Components
HBase High Availability
HBase Read Replicas
Oozie High Availability
Search High Availability
Configuring Cloudera Manager for High Availability With a Load Balancer
Introduction to Cloudera Manager Deployment Architecture
Prerequisites for Setting up Cloudera Manager High Availability
Cloudera Manager Failover Protection
High-Level Steps to Configure Cloudera Manager High Availability
Step 1: Setting Up Hosts and the Load Balancer
Step 2: Installing and Configuring Cloudera Manager Server for High Availability
Step 3: Installing and Configuring Cloudera Management Service for High Availability
Step 4: Automating Failover with Corosync and Pacemaker
Database High Availability Configuration
TLS and Kerberos Configuration for Cloudera Manager High Availability
Backup and Disaster Recovery
Port Requirements for Backup and Disaster Recovery
Data Replication
Designating a Replication Source
HDFS Replication
HDFS Replication Tuning
Monitoring the Performance of HDFS Replications
Hive/Impala Replication
Monitoring the Performance of Hive/Impala Replications
Replicating Data to Impala Clusters
Using Snapshots with Replication
Enabling Replication Between Clusters with Kerberos Authentication
Replication of Encrypted Data
HBase Replication
Snapshots
Cloudera Manager Snapshot Policies
Managing HBase Snapshots
Managing HDFS Snapshots
BDR Tutorials
How To Back Up and Restore Apache Hive Data Using Cloudera Enterprise BDR
How To Back Up and Restore HDFS Data Using Cloudera Enterprise BDR
BDR Automation Examples
Cloudera Manager Administration
Starting, Stopping, and Restarting the Cloudera Manager Server
Configuring Cloudera Manager Server Ports
Moving the Cloudera Manager Server to a New Host
Migrating from the Cloudera Manager Embedded PostgreSQL Database Server to an External PostgreSQL Database
Managing the Cloudera Manager Server Log
Cloudera Manager Agents
Starting, Stopping, and Restarting Cloudera Manager Agents
Configuring Cloudera Manager Agents
Managing Cloudera Manager Agent Logs
Changing Hostnames
Configuring Network Settings
Alerts
Managing Alerts
Configuring Alert Email Delivery
Configuring Alert SNMP Delivery
Configuring Custom Alert Scripts
Managing Licenses
Sending Usage and Diagnostic Data to Cloudera
Exporting and Importing Cloudera Manager Configuration
Backing up Cloudera Manager
Other Cloudera Manager Tasks and Settings
Cloudera Management Service
Cloudera Navigator Administration
Accessing Storage Using Amazon S3
Configuring the Amazon S3 Connector
Using S3 Credentials with YARN, MapReduce, or Spark
Using Fast Upload with Amazon S3
Configuring and Managing S3Guard
How to Configure a MapReduce Job to Access S3 with an HDFS Credstore
Configuring ADLS Connectivity
How To Create a Multitenant Enterprise Data Hub
Cloudera Navigator Data Management
Overview
Cloudera Navigator Console
Data Stewardship Dashboard
Auditing
Use Case Examples for Cloudera Navigator Auditing
Exploring Audit Data
Cloudera Navigator Audit Event Reports
Downloading HDFS Directory Access Permission Reports
Metadata
Defining Properties for Managed Metadata
Adding and Editing Metadata
Finding Specific Entities by Searching Metadata
Performing Actions on Entities
Using Policies to Automate Metadata Tagging
Metadata Policy Expressions
Lineage Diagrams
Using Lineage to Display Table Schema
Cloudera Navigator Administration
Managing Metadata Storage with Purge
Administering User Roles
Cloudera Navigator and the Cloud
Using Cloudera Navigator with Altus Clusters
Configuring Extraction for Altus Clusters on AWS
Using Cloudera Navigator with Amazon S3
Configuring Extraction for Amazon S3
Services and Security Management
Navigator Audit Server Management
Setting Up Navigator Audit Server
Enabling Audit and Log Collection for Services
Configuring Audit Properties
Monitoring Navigator Audit Service Health
Publishing Audit Events
Navigator Metadata Server Management
Setting Up Navigator Metadata Server
Navigator Metadata Server Tuning
Hive and Impala Lineage Configuration
Configuring and Managing Extraction
Configuring the Server for Policy Messages
Authentication and Authorization
Encryption (TLS/SSL) and Cloudera Navigator
Configuring Cloudera Navigator to work with Hue HA
Cloudera Navigator APIs
Navigator APIs Overview
Applying Metadata to HDFS and Hive Entities using the API
Using the Purge APIs for Metadata Maintenance Tasks
Cloudera Navigator Reference
Lineage Diagram Icons
Search Syntax and Properties
Service Audit Events
User Roles and Privileges Reference
Troubleshooting Navigator Data Management
Cloudera Operation
Monitoring and Diagnostics
Introduction to Cloudera Manager Monitoring
Time Line
Health Tests
Cloudera Manager Admin Console Home Page
Viewing Charts for Cluster, Service, Role, and Host Instances
Configuring Monitoring Settings
Monitoring Clusters
Monitoring Multiple CDH Deployments Using the Multi Cloudera Manager Dashboard
Installing and Managing the Multi Cloudera Manager Dashboard
Using the Multi Cloudera Manager Status Dashboard
Monitoring Services
Monitoring Service Status
Viewing Service Status
Viewing Service Instance Details
Viewing Role Instance Status
The Processes Tab
Running Diagnostic Commands for Roles
Periodic Stacks Collection
Viewing Running and Recent Commands
Monitoring Resource Management
Monitoring Hosts
Host Details
Host Inspector
Monitoring Activities
Monitoring MapReduce Jobs
Viewing and Filtering MapReduce Activities
Viewing the Jobs in a Pig, Oozie, or Hive Activity
Task Attempts
Viewing Activity Details in a Report Format
Comparing Similar Activities
Viewing the Distribution of Task Attempts
Monitoring Impala Queries
Query Details
Monitoring YARN Applications
Monitoring Spark Applications
Events
Triggers
Cloudera Manager Trigger Use Cases
Lifecycle and Security Auditing
Charting Time-Series Data
Dashboards
tsquery Language
Metric Aggregation
Logs
Viewing the Cloudera Manager Server Log
Viewing the Cloudera Manager Agent Logs
Managing Disk Space for Log Files
Reports
Directory Usage Report
Disk Usage Reports
Activity, Application, and Query Reports
The File Browser
Downloading HDFS Directory Access Permission Reports
Troubleshooting Cluster Configuration and Operation
Cloudera Manager Entity Types
Cloudera Manager Entity Type Attributes
Cloudera Manager Events
HBASE Category
AUDIT_EVENT Category
ACTIVITY_EVENT Category
SYSTEM Category
LOG_MESSAGE Category
HEALTH_CHECK Category
Cloudera Manager Health Tests
Active Database Health Tests
Active Key Trustee Server Health Tests
Activity Monitor Health Tests
Alert Publisher Health Tests
Beeswax Server Health Tests
Cloudera Management Service Health Tests
DataNode Health Tests
Event Server Health Tests
Failover Controller Health Tests
Flume Health Tests
Flume Agent Health Tests
Garbage Collector Health Tests
HBase Health Tests
HBase REST Server Health Tests
HBase Thrift Server Health Tests
HDFS Health Tests
History Server Health Tests
Hive Health Tests
Hive Metastore Server Health Tests
HiveServer2 Health Tests
Host Health Tests
Host Monitor Health Tests
HttpFS Health Tests
Hue Health Tests
Hue Server Health Tests
Impala Health Tests
Impala Catalog Server Health Tests
Impala Daemon Health Tests
Impala Llama ApplicationMaster Health Tests
Impala StateStore Health Tests
JobHistory Server Health Tests
JobTracker Health Tests
JournalNode Health Tests
Kafka Broker Health Tests
Kafka MirrorMaker Health Tests
Kerberos Ticket Renewer Health Tests
Key Management Server Health Tests
Key Management Server Proxy Health Tests
Key-Value Store Indexer Health Tests
Lily HBase Indexer Health Tests
Load Balancer Health Tests
Logger Health Tests
MapReduce Health Tests
Master Health Tests
Monitor Health Tests
NFS Gateway Health Tests
NameNode Health Tests
Navigator Audit Server Health Tests
Navigator HSM KMS Metastore Health Tests
Navigator HSM KMS Proxy Health Tests
Navigator Metadata Server Health Tests
NodeManager Health Tests
Oozie Health Tests
Oozie Server Health Tests
Passive Database Health Tests
Passive Key Trustee Server Health Tests
RegionServer Health Tests
Reports Manager Health Tests
ResourceManager Health Tests
SecondaryNameNode Health Tests
Sentry Health Tests
Sentry Server Health Tests
Service Monitor Health Tests
Solr Health Tests
Solr Server Health Tests
Spark Health Tests
Spark (Standalone) Health Tests
Spark 2 Health Tests
Sqoop 2 Health Tests
Sqoop 2 Server Health Tests
Tablet Server Health Tests
TaskTracker Health Tests
Tracer Health Tests
WebHCat Server Health Tests
Worker Health Tests
YARN (MR2 Included) Health Tests
ZooKeeper Health Tests
ZooKeeper Server Health Tests
Cloudera Manager Metrics
Accumulo Metrics
Accumulo 1.4 Metrics
Active Database Metrics
Active Key Trustee Server Metrics
Activity Metrics
Activity Monitor Metrics
Agent Metrics
Alert Publisher Metrics
Attempt Metrics
Beeswax Server Metrics
Cloudera Management Service Metrics
Cloudera Manager Server Metrics
Cluster Metrics
DataNode Metrics
Directory Metrics
Disk Metrics
Event Server Metrics
Failover Controller Metrics
Filesystem Metrics
Flume Metrics
Flume Channel Metrics
Flume Sink Metrics
Flume Source Metrics
Garbage Collector Metrics
HBase Metrics
HBase REST Server Metrics
HBase RegionServer Replication Peer Metrics
HBase Thrift Server Metrics
HDFS Metrics
HDFS Cache Directive Metrics
HDFS Cache Pool Metrics
HRegion Metrics
HTable Metrics
History Server Metrics
Hive Metrics
Hive Metastore Server Metrics
HiveServer2 Metrics
Host Metrics
Host Monitor Metrics
HttpFS Metrics
Hue Metrics
Hue Server Metrics
Impala Metrics
Impala Catalog Server Metrics
Impala Daemon Metrics
Impala Daemon Resource Pool Metrics
Impala Llama ApplicationMaster Metrics
Impala Pool Metrics
Impala Pool User Metrics
Impala Query Metrics
Impala StateStore Metrics
Isilon Metrics
Java KeyStore KMS Metrics
JobHistory Server Metrics
JobTracker Metrics
JournalNode Metrics
Kafka Metrics
Kafka Broker Metrics
Kafka Broker Topic Metrics
Kafka MirrorMaker Metrics
Kafka Replica Metrics
Kerberos Ticket Renewer Metrics
Key Management Server Metrics
Key Management Server Proxy Metrics
Key Trustee KMS Metrics
Key Trustee Server Metrics
Key-Value Store Indexer Metrics
Kudu Metrics
Kudu Replica Metrics
Lily HBase Indexer Metrics
Load Balancer Metrics
Logger Metrics
MapReduce Metrics
Master Metrics
Monitor Metrics
NFS Gateway Metrics
NameNode Metrics
Navigator Audit Server Metrics
Navigator HSM KMS Metastore Metrics
Navigator HSM KMS Proxy Metrics
Navigator HSM KMS backed by SafeNet Luna HSM Metrics
Navigator HSM KMS backed by Thales HSM Metrics
Navigator Metadata Server Metrics
Network Interface Metrics
NodeManager Metrics
Oozie Metrics
Oozie Server Metrics
Passive Database Metrics
Passive Key Trustee Server Metrics
RegionServer Metrics
Reports Manager Metrics
ResourceManager Metrics
SecondaryNameNode Metrics
Sentry Metrics
Sentry Server Metrics
Server Metrics
Service Monitor Metrics
Solr Metrics
Solr Replica Metrics
Solr Server Metrics
Solr Shard Metrics
Spark Metrics
Spark (Standalone) Metrics
Spark 2 Metrics
Sqoop 1 Client Metrics
Sqoop 2 Metrics
Sqoop 2 Server Metrics
Tablet Server Metrics
TaskTracker Metrics
Time Series Table Metrics
Tracer Metrics
User Metrics
WebHCat Server Metrics
Worker Metrics
YARN (MR2 Included) Metrics
YARN Pool Metrics
YARN Pool User Metrics
ZooKeeper Metrics
Disabling Metrics for Specific Roles
Cloudera Security
Cloudera Security Overview
Authentication Concepts
Encryption Concepts
Encryption Mechanisms Overview
Authorization Concepts
Auditing and Data Lineage Concepts
Authentication
Kerberos Security Artifacts Overview
Configuring Authentication in Cloudera Manager
Cloudera Manager User Accounts
Configuring External Authentication for Cloudera Manager
Enabling Kerberos Authentication Using the Wizard
Step 1: Install Cloudera Manager and CDH
Step 2: If You are Using AES-256 Encryption, Install the JCE Policy File
Step 3: Create the Kerberos Principal for Cloudera Manager Server
Step 4: Enabling Kerberos Using the Wizard
Step 5: Create the HDFS Superuser
Step 6: Get or Create a Kerberos Principal for Each User Account
Step 7: Prepare the Cluster for Each User
Step 8: Verify that Kerberos Security is Working
Step 9: (Optional) Enable Authentication for HTTP Web Consoles for Hadoop Roles
Kerberos Authentication for Single User Mode and Non-Default Users
Configuring a Cluster with Custom Kerberos Principals
Managing Kerberos Credentials Using Cloudera Manager
Using a Custom Kerberos Keytab Retrieval Script
Adding Trusted Realms to the Cluster
Using Auth-to-Local Rules to Isolate Cluster Users
Configuring Authentication for Cloudera Navigator
Cloudera Navigator and External Authentication
Configuring Cloudera Navigator for Active Directory
Configuring Cloudera Navigator for LDAP
Configuring Cloudera Navigator for SAML
Configuring Groups for Cloudera Navigator
Configuring Authentication in CDH Using the Command Line
Enabling Kerberos Authentication for Hadoop Using the Command Line
Step 1: Install CDH 5
Step 2: Verify User Accounts and Groups in CDH 5 Due to Security
Step 3: If you are Using AES-256 Encryption, Install the JCE Policy File
Step 4: Create and Deploy the Kerberos Principals and Keytab Files
Step 5: Shut Down the Cluster
Step 6: Enable Hadoop Security
Step 7: Configure Secure HDFS
Optional Step 8: Configuring Security for HDFS High Availability
Optional Step 9: Configure secure WebHDFS
Optional Step 10: Configuring a secure HDFS NFS Gateway
Step 11: Set Variables for Secure DataNodes
Step 12: Start up the NameNode
Step 12: Start up a DataNode
Step 14: Set the Sticky Bit on HDFS Directories
Step 15: Start up the Secondary NameNode (if used)
Step 16: Configure Either MRv1 Security or YARN Security
Configuring MRv1 Security
Configuring YARN Security
FUSE Kerberos Configuration
Using kadmin to Create Kerberos Keytab Files
Hadoop Users in Cloudera Manager and CDH
Configuring the Mapping from Kerberos Principals to Short Names
Configuring Authentication for Other Components
Flume Authentication
Configuring Flume's Security Properties
Configuring Kerberos for Flume Thrift Source and Sink Using Cloudera Manager
Configuring Kerberos for Flume Thrift Source and Sink Using the Command Line
Flume Account Requirements
Testing the Flume HDFS Sink Configuration
Writing to a Secure HBase Cluster
Step 10: (Flume Only) Use Substitution Variables for the Kerberos Principal and Keytab
HBase Authentication
Configuring Kerberos Authentication for HBase
Configuring Secure HBase Replication
Configuring the HBase Client TGT Renewal Period
HCatalog Authentication
Hive Authentication
HiveServer2 Security Configuration
Hive Metastore Server Security Configuration
Using Hive to Run Queries on a Secure HBase Server
HttpFS Authentication
Hue Authentication
Configuring Kerberos Authentication for Hue
Step 9: Enable Hue to Work with Hadoop Security using Cloudera Manager
Impala Authentication
Enabling Kerberos Authentication for Impala
Enabling LDAP Authentication for Impala
Using Multiple Authentication Methods with Impala
Configuring Impala Delegation for Hue and BI Tools
Llama Authentication
Oozie Authentication
Configuring Kerberos Authentication for the Oozie Server
Configuring Oozie HA with Kerberos
Solr Authentication
Using Kerberos with Solr
Spark Authentication
Configuring Spark on YARN for Long-Running Applications
Sqoop 2 Authentication
Sqoop 1, Pig, and Whirr Security
ZooKeeper Authentication
Configuring a Cluster-dedicated MIT KDC with Cross-Realm Trust
Integrating Hadoop Security with Active Directory
Authorization
Cloudera Manager User Roles
HDFS Extended ACLs
Authorization for HDFS Web UIs
Configuring LDAP Group Mappings
Authorization With Apache Sentry
The Sentry Service
Before You Install Sentry
Installing and Upgrading the Sentry Service
Migrating from Sentry Policy Files to the Sentry Service
Configuring the Sentry Service
Sentry High Availability
Sentry Debugging and Failure Scenarios
Hive SQL Syntax for Use with Sentry
Synchronizing HDFS ACLs and Sentry Permissions
Using the Sentry Web Server
Troubleshooting the Sentry Service
Sentry Policy File Authorization
Installing and Upgrading Sentry for Policy File Authorization
Configuring Sentry Policy File Authorization Using Cloudera Manager
Configuring Sentry Policy File Authorization Using the Command Line
Enabling Sentry Authorization for Impala
Configuring Sentry Authorization for Cloudera Search
Configuring HBase Authorization
Encrypting Data in Transit
Understanding Keystores and Truststores
Configuring TLS Encryption for Cloudera Manager
Configuring TLS/SSL Encryption for CDH Services
Configuring TLS/SSL for HDFS, YARN and MapReduce
Configuring TLS/SSL for HBase
Configuring TLS/SSL for Flume Thrift Source and Sink
Configuring Encrypted Communication Between HiveServer2 and Client Drivers
Configuring TLS/SSL for Hue
Configuring TLS/SSL for Impala
Configuring TLS/SSL for Oozie
Configuring TLS/SSL for Solr
Spark Encryption
Configuring TLS/SSL for HttpFS
Encrypted Shuffle and Encrypted Web UIs
Configuring TLS/SSL for Navigator Audit Server
Configuring TLS/SSL for Navigator Metadata Server
Configuring TLS/SSL for Kafka (Navigator Event Broker)
Configuring Encrypted Transport for HDFS
Configuring Encrypted Transport for HBase
Encrypting Data at Rest
Data at Rest Encryption Reference Architecture
Data at Rest Encryption Requirements
Resource Planning for Data at Rest Encryption
HDFS Transparent Encryption
Optimizing Performance for HDFS Transparent Encryption
Enabling HDFS Encryption Using the Wizard
Managing Encryption Keys and Zones
Configuring the Key Management Server (KMS)
Securing the Key Management Server (KMS)
Configuring KMS Access Control Lists (ACLs)
Migrating Keys from a Java KeyStore to Cloudera Navigator Key Trustee Server
Configuring CDH Services for HDFS Encryption
Cloudera Navigator Key Trustee Server
Backing Up and Restoring Key Trustee Server and Clients
Initializing Standalone Key Trustee Server
Configuring a Mail Transfer Agent for Key Trustee Server
Verifying Cloudera Navigator Key Trustee Server Operations
Managing Key Trustee Server Organizations
Managing Key Trustee Server Certificates
Cloudera Navigator Key HSM
Initializing Navigator Key HSM
HSM-Specific Setup for Cloudera Navigator Key HSM
Validating Key HSM Settings
Managing the Navigator Key HSM Service
Integrating Key HSM with Key Trustee Server
Cloudera Navigator Encrypt
Registering Cloudera Navigator Encrypt with Key Trustee Server
Preparing for Encryption Using Cloudera Navigator Encrypt
Encrypting and Decrypting Data Using Cloudera Navigator Encrypt
Migrating eCryptfs-Encrypted Data to dm-crypt
Navigator Encrypt Access Control List
Maintaining Cloudera Navigator Encrypt
Configuring Encryption for Data Spills
Configuring Encrypted On-disk File Channels for Flume
Impala Security Overview
Security Guidelines for Impala
Securing Impala Data and Log Files
Installation Considerations for Impala Security
Securing the Hive Metastore Database
Securing the Impala Web User Interface
Kudu Security Overview
Security How-To Guides
Add Root and Intermediate CAs to Truststore for TLS/SSL
Amazon S3 Security
Authenticating Kerberos Principals in Java Code
Check Cluster Security Settings
Configure Antivirus Software on CDH Hosts
How to Configure Browser-based Interfaces to Require Kerberos Authentication
Configure Browsers for Kerberos Authentication (SPNEGO)
Configure Cluster to Use Kerberos Authentication
Convert DER, JKS, PEM Files for TLS/SSL Artifacts
Configure Authentication for Amazon S3
Configure Encryption for Amazon S3
Configure AWS Credentials
Enable Sensitive Data Redaction
Enable Sentry High Availability
Log a Security Support Case
Obtain and Deploy Keys and Certificates for TLS/SSL
Renew and Redistribute Certificates
Set Up a Gateway Node to Restrict Access to the Cluster
Set Up Access to Cloudera EDH or Cloudera Director (Microsoft Azure Marketplace)
Use Self-Signed Certificates for TLS
Using Sentry to Manage Table Access in Hue
Verify HDFS ACL Sync
Troubleshooting Security Issues
Error Messages
Authentication and Kerberos Issues
HDFS Encryption Issues
TLS/SSL Issues
YARN, MRv1, and Linux OS Security
TaskController Error Codes (MRv1)
ContainerExecutor Error Codes (YARN)
File Formats and Compression
Parquet
Avro
Data Compression
Snappy Compression
HBase Guide
Hive Guide
Installation and Upgrade
Configuring
Configuring Hive Metastore
Configuring HiveServer2
Starting the Metastore
File System Permissions
Starting, Stopping, & Using HS2
Starting HS1 and Hive CLI (deprecated)
Using Hive w/HBase
Using Schema Tool
Installing JDBC Driver on Clients
Setting HADOOP_MAPRED_HOME
Configuring HMS for HDFS HA
Using & Managing
Managing Hive with Cloudera Manager
Ingesting & Querying Data
Running Hive on Spark
Using HS2 Web UI
Accessing Table Statistics
Managing UDFs
Hive ETL Jobs on S3
Hive with Amazon RDS
Hive with ADLS
Tuning
Tuning Hive on Spark
Tuning Hive on S3
Configuring Metastore HA
Configuring HS2 HA
Data Replication
Security
Troubleshooting
Hue Guide
Hue Versions
Installation & Upgrade
Databases
Connect Hue to MySQL or MariaDB
Connect Hue to PostgreSQL
Connect Hue to Oracle (Parcel)
Connect Hue to Oracle (Package)
Migrate Hue Database
Hue Custom Database Tutorial
Populate the Hue Database
Administration
Hue Configuration Files
Hue Logs and Paths
Hue User Permissions
Create Hue Password Scripts
Customize Hue Web UI
Security
Configure Hue for High Availability
Authenticate Hue Users with LDAP
Synchronize Hue with LDAP Server
Authenticate Hue Users with SAML
Authorize Hue Groups with Sentry
Hue How-tos
Add Hue Load Balancer
Enable SQL Editor Autocompleter
Enable and Use Governance-Based Data Discovery
Enable S3 Cloud Storage in Hue
Use S3 as Source or Sink in Hue
Run Hue Shell Commands
Troubleshooting
Potential Misconfiguration
Impala Guide
Concepts and Architecture
Components
Developing Applications
Role in the Hadoop Ecosystem
Deployment Planning
Requirements
Designing Schemas
Tutorials
Administration
How to Configure Resource Management for Impala
Setting Timeouts
Load-Balancing Proxy for HA
Managing Disk Space
Auditing
Viewing Lineage Info
SQL Reference
Comments
Data Types
ARRAY Complex Type (CDH 5.5 or higher only)
BIGINT
BOOLEAN
CHAR
DECIMAL
DOUBLE
FLOAT
INT
MAP Complex Type (CDH 5.5 or higher only)
REAL
SMALLINT
STRING
STRUCT Complex Type (CDH 5.5 or higher only)
TIMESTAMP
TINYINT
VARCHAR
Complex Types (CDH 5.5 or higher only)
Literals
SQL Operators
Schema Objects and Object Names
Aliases
Databases
Functions
Identifiers
Tables
Views
SQL Statements
DDL Statements
DML Statements
ALTER TABLE
ALTER VIEW
COMPUTE STATS
CREATE DATABASE
CREATE FUNCTION
CREATE ROLE
CREATE TABLE
CREATE VIEW
DELETE
DESCRIBE
DROP DATABASE
DROP FUNCTION
DROP ROLE
DROP STATS
DROP TABLE
DROP VIEW
EXPLAIN
GRANT
INSERT
INVALIDATE METADATA
LOAD DATA
REFRESH
REVOKE
SELECT
Joins
ORDER BY Clause
GROUP BY Clause
HAVING Clause
LIMIT Clause
OFFSET Clause
UNION Clause
Subqueries
TABLESAMPLE Clause
WITH Clause
DISTINCT Operator
Hints
SET
Query Options for the SET Statement
ABORT_ON_DEFAULT_LIMIT_EXCEEDED
ABORT_ON_ERROR
ALLOW_UNSUPPORTED_FORMATS
APPX_COUNT_DISTINCT
BATCH_SIZE
BUFFER_POOL_LIMIT
COMPRESSION_CODEC
DEBUG_ACTION
DECIMAL_V2
DEFAULT_JOIN_DISTRIBUTION_MODE
DEFAULT_ORDER_BY_LIMIT
DEFAULT_SPILLABLE_BUFFER_SIZE
DISABLE_CODEGEN
DISABLE_CODEGEN_ROWS_THRESHOLD
DISABLE_ROW_RUNTIME_FILTERING
DISABLE_STREAMING_PREAGGREGATIONS
DISABLE_UNSAFE_SPILLS
ENABLE_EXPR_REWRITES
EXEC_SINGLE_NODE_ROWS_THRESHOLD
EXPLAIN_LEVEL
HBASE_CACHE_BLOCKS
HBASE_CACHING
LIVE_PROGRESS
LIVE_SUMMARY
MAX_ERRORS
MAX_IO_BUFFERS
MAX_NUM_RUNTIME_FILTERS
MAX_ROW_SIZE
MAX_SCAN_RANGE_LENGTH
MEM_LIMIT
MIN_SPILLABLE_BUFFER_SIZE
MT_DOP
NUM_NODES
NUM_SCANNER_THREADS
OPTIMIZE_PARTITION_KEY_SCANS
PARQUET_COMPRESSION_CODEC
PARQUET_ANNOTATE_STRINGS_UTF8
PARQUET_ARRAY_RESOLUTION
PARQUET_DICTIONARY_FILTERING
PARQUET_FALLBACK_SCHEMA_RESOLUTION
PARQUET_FILE_SIZE
PARQUET_READ_STATISTICS
PREFETCH_MODE
QUERY_TIMEOUT_S
REQUEST_POOL
REPLICA_PREFERENCE
RESERVATION_REQUEST_TIMEOUT
RUNTIME_BLOOM_FILTER_SIZE
RUNTIME_FILTER_MAX_SIZE
RUNTIME_FILTER_MIN_SIZE
RUNTIME_FILTER_MODE
RUNTIME_FILTER_WAIT_TIME_MS
S3_SKIP_INSERT_STAGING
SCAN_NODE_CODEGEN_THRESHOLD
SCHEDULE_RANDOM_REPLICA
SCRATCH_LIMIT
SUPPORT_START_OVER
SYNC_DDL
V_CPU_CORES
SHOW
TRUNCATE TABLE
UPDATE
UPSERT
USE
Built-In Functions
Mathematical Functions
Bit Functions
Type Conversion Functions
Date and Time Functions
Conditional Functions
String Functions
Miscellaneous Functions
Aggregate Functions
APPX_MEDIAN
AVG
COUNT
GROUP_CONCAT
MAX
MIN
NDV
STDDEV, STDDEV_SAMP, STDDEV_POP
SUM
VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP
Analytic Functions
User-Defined Functions (UDFs)
SQL Differences Between Impala and Hive
Porting SQL
The Impala Shell
Configuration Options
Connecting to impalad
Running Commands and SQL Statements
Command Reference
Performance Tuning
Performance Best Practices
Join Performance
Table and Column Statistics
Benchmarking
Controlling Resource Usage
Runtime Filtering
HDFS Caching
Testing Impala Performance
EXPLAIN Plans and Query Profiles
HDFS Block Skew
Scalability Considerations
Dedicated Coordinators
Partitioning
File Formats
Text Data Files
Parquet Data Files
Avro Data Files
RCFile Data Files
SequenceFile Data Files
Using Impala to Query Kudu Tables
HBase Tables
S3 Tables
Configure with Cloudera Manager
Configure from Command Line
ADLS Tables
Isilon Storage
Logging
Troubleshooting Impala
Web User Interface
Breakpad Minidumps
Ports Used by Impala
Impala Reserved Words
Impala Frequently Asked Questions
Kudu Guide
Concepts and Architecture
Installation and Upgrade
Usage Limitations
Configuration
Administration
Developing Applications with Kudu
Using Apache Impala with Kudu
Schema Design
Transaction Semantics
Background Tasks
Troubleshooting
More Resources
Cloudera Search Guide
Cloudera Search Tutorial
Validating the Cloudera Search Deployment
Preparing to Index Sample Tweets with Cloudera Search
Using MapReduce Batch Indexing to Index Sample Tweets
Near Real Time (NRT) Indexing Tweets Using Flume
Using Hue with Cloudera Search
Deployment Planning for Cloudera Search
Schemaless Mode
Deploying Cloudera Search
Using Search through a Proxy for High Availability
Using Custom JAR Files with Search
Cloudera Search Security
Managing Cloudera Search
Managing Cloudera Search Configuration
Managing Collections in Cloudera Search
solrctl Reference
Example solrctl Usage
Migrating Solr Replicas
Backing Up and Restoring Cloudera Search
ETL With Cloudera Morphlines
Example Morphline Usage
Indexing Data
NRT Indexing
Flume NRT Indexing
Flume MorphlineSolrSink Configuration Options
Flume MorphlineInterceptor Configuration Options
Flume Solr UUIDInterceptor Configuration Options
Flume Solr BlobHandler Configuration Options
Flume Solr BlobDeserializer Configuration Options
Lily HBase NRT Indexing
Using the Lily HBase NRT Indexer Service
Batch Indexing
Spark Indexing
MapReduce Indexing
MapReduceIndexerTool
Lily HBase Batch Indexing
HdfsFindTool
Cloudera Search Frequently Asked Questions
Troubleshooting Cloudera Search
Static Solr Log Analysis
Spark Guide
Running Your First Spark Application
Spark Application Overview
Developing Spark Applications
Developing and Running a Spark WordCount Application
Using Spark Streaming
Using Spark SQL
Using Spark MLlib
Accessing External Storage
Accessing Data Stored in Amazon S3 through Spark
Accessing Data Stored in Azure Data Lake Store (ADLS) through Spark
Accessing Avro Data Files From Spark SQL Applications
Accessing Parquet Files From Spark SQL Applications
Building Spark Applications
Configuring Spark Applications
Running Spark Applications
Running Spark Applications on YARN
Using PySpark
Running Spark Python Applications
Spark and IPython and Jupyter Notebooks
Tuning Spark Applications
Spark and Hadoop Integration
Building and Running a Crunch Application with Spark
Cloudera Glossary
To read this documentation, you must turn JavaScript on.
V_CPU_CORES Query Option (
CDH 5.0
or higher only)
SYNC_DDL
SHOW