Homepage
/
Cloudera Runtime
7.2.12
(Public Cloud)
Search Documentation
▶︎
Cloudera
Reference Architectures
▼
Cloudera Public Cloud
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime
▶︎
Cloudera Private Cloud
Data Services
Getting Started
Cloudera Manager
Management Console
Replication Manager
Data Catalog
Data Engineering
Data Warehouse
Data Warehouse Runtime
Machine Learning
Base
Getting Started
Runtime
Upgrade
Storage
Flow Management
Streaming Analytics
Flow Management Operator
Streaming Analytics Operator
Streams Messaging Operator
▶︎
Cloudera Manager
Cloudera Manager
▶︎
Applications
Streaming Community Edition
Data Science Workbench
Data Visualization
Edge Management
Observability
Observability on premises
Workload XM On-Prem
▶︎
Legacy
Cloudera Enterprise
Flow Management
Stream Processing
HDP
HDF
Streams Messaging Manager
Streams Replication Manager
▶︎
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime
«
Filter topics
Cloudera Runtime
▼
Cloudera Runtime Release Notes
CVE-2021-45105 & CVE-2021-44832 Remediation for 7.2.12
CVE-2021-4428 Remediation for 7.2.12
Overview
Cloudera Runtime Component Versions
▶︎
Using the Cloudera Runtime Maven repository
Maven Artifacts for Cloudera Runtime 7.2.12
▶︎
What's New
Atlas
Cruise Control
HBase
Hive
Hue
Impala
Kudu
Schema Registry
Search
Spark
Sqoop
Streams Replication Manager
Unaffected Components in this release
▼
Fixed Issues In Cloudera Runtime 7.2.12
Atlas
Avro
Cloud Connectors
Cruise Control
DAS
Hadoop
HBase
HDFS
Hive
HWC
Hue
Impala
Kafka
Knox
Kudu
Oozie
Phoenix
Parquet
Ranger
Schema Registry
Search
Solr
Spark
Sqoop
Streams Messaging Manager
Streams Replication Manager
YARN
Zeppelin
ZooKeeper
Service Pack in Cloudera Runtime 7.2.12
Fixed Issues In Cloudera Runtime 7.2.12.7
Fixed Issues In Cloudera Runtime 7.2.12.8
Fixed Issues In Cloudera Runtime 7.2.12.9
Fixed Issues In Cloudera Runtime 7.2.12.10
Fixed Issues In Cloudera Runtime 7.2.12.11
Fixed Issues In Cloudera Runtime 7.2.12.12
▶︎
Known Issues In Cloudera Runtime 7.2.12
Atlas
Avro
Cruise Control
DAS
HBase
HDFS
Hive
Hue
Impala
Kafka
Knox
Kudu
Oozie
Phoenix
Ranger
Schema Registry
Search
Spark
Sqoop
Streams Messaging Manager
Streams Replication Manager
YARN
Zeppelin
ZooKeeper
▶︎
Behavioral Changes In Cloudera Runtime 7.2.12
Kudu
Search
Phoenix
▶︎
Deprecation Notices In Cloudera Runtime 7.2.12
Kudu
Kafka
HBase
Cloudera Manager Release Notes
▶︎
Concepts
▶︎
Storage
▶︎
HDFS Overview
▶︎
Introduction
Overview of HDFS
▶︎
NameNodes
▶︎
Moving NameNode roles
Moving highly available NameNode, failover controller, and JournalNode roles using the Migrate Roles wizard
Moving a NameNode to a different host using Cloudera Manager
▶︎
Sizing NameNode heap memory
Environment variables for sizing NameNode heap memory
Monitoring heap memory usage
Files and directories
Disk space versus namespace
Replication
Examples of estimating NameNode heap memory
Remove or add storage directories for NameNode data directories
▶︎
DataNodes
How NameNode manages blocks on a failed DataNode
Replace a disk on a DataNode host
Remove a DataNode
Fixing block inconsistencies
Add storage directories using Cloudera Manager
Remove storage directories using Cloudera Manager
▶︎
Configuring storage balancing for DataNodes
Configure storage balancing for DataNodes using Cloudera Manager
Perform a disk hot swap for DataNodes using Cloudera Manager
▶︎
JournalNodes
Moving the JournalNode edits directory for a role group using Cloudera Manager
Moving the JournalNode edits directory for a role instance using Cloudera Manager
Synchronizing the contents of JournalNodes
▶︎
Apache HBase Overview
Introduction
▶︎
Apache Kudu Overview
Kudu introduction
Kudu architecture in a CDP public cloud deployment
Kudu network architecture
Kudu-Impala integration
Example use cases
Kudu concepts
▶︎
Apache Kudu usage limitations
Schema design limitations
Partitioning limitations
Scaling recommendations and limitations
Server management limitations
Cluster management limitations
Impala integration limitations
Spark integration limitations
Security limitations
Other known issues
More Resources
▶︎
Apache Kudu Background Operations
Maintenance manager
Flushing data to disk
Compacting on-disk data
Write-ahead log garbage collection
Tablet history garbage collection and the ancient history mark
▶︎
Apache Hadoop YARN Overview
Introduction
YARN Features
Understanding YARN architecture
▶︎
Data Access
▶︎
Data Analytics Studio Overview
Data Analytics Studio overview
DAS architecture
Difference between Tez UI and DAS
▶︎
Apache Hive Metastore Overview
Introduction to Hive metastore
Apache Hive storage in public clouds
▶︎
Apache Hive Overview
Apache Hive features
Hive low-latency analytical processing
Hive unsupported interfaces and features in public clouds
Apache Hive 3 architectural overview
Apache Hive content roadmap
▶︎
Apache Impala Overview
Apache Impala Overview
Components
▶︎
Hue Overview
Hue overview
▶︎
Cloudera Search Overview
What is Cloudera Search
How Cloudera Search works
Cloudera Search and CDP
Search and other Runtime components
Cloudera Search architecture
Local file system support
▶︎
Cloudera Search tasks and processes
Ingestion
Indexing
Querying
ETL with Cloudera Morphlines
Backing up and restoring data
▶︎
Operational Database Overview
▶︎
Operational Database overview
Introduction to Apache HBase
▶︎
Introduction to Apache Phoenix
Apache Phoenix and SQL
▶︎
Data Engineering
▶︎
Apache Spark Overview
Apache Spark Overview
Unsupported Apache Spark Features
▶︎
Apache Zeppelin Overview
Overview
▶︎
CDP Security Overview
Cloudera Runtime Security and Governance
▶︎
Governance
▶︎
Governance Overview
Using metadata for cluster governance
Data Stewardship with Apache Atlas
Apache Atlas dashboard tour
Apache Atlas metadata collection overview
Atlas metadata model overview
▶︎
Controlling Data Access with Tags
Atlas classifications drive Ranger policies
When to use Atlas classifications for access control
▶︎
How tag-based access control works
Propagation of tags as deferred actions
Examples of controlling data access using classifications
▶︎
Extending Atlas to Manage Metadata from Additional Sources
Top-down process for adding a new metadata source
▶︎
Streams Messaging
▶︎
Apache Kafka Overview
Kafka Introduction
▶︎
Kafka Architecture
Brokers
Topics
Records
Partitions
Record order and assignment
Logs and log segments
Kafka brokers and Zookeeper
Leader positions and in-sync replicas
▶︎
Kafka FAQ
Basics
Use cases
▶︎
Cruise Control Overview
Kafka cluster load balancing using Cruise Control
How Cruise Control retrieves metrics
How Cruise Control self-healing works
▶︎
Schema Registry Overview
▶︎
Schema Registry Overview
Examples of Interacting with Schema Registry
▶︎
Schema Registry Use Cases
Use Case 1: Registering and Querying a Schema for a Kafka Topic
Use Case 2: Reading/Deserializing and Writing/Serializing Data from and to a Kafka Topic
Use Case 3: Dataflow Management with Schema-based Routing
Schema Registry Component Architecture
▶︎
Schema Registry Concepts
Schema Entities
Compatibility Policies
▶︎
Streams Messaging Manager Overview
Introduction to Streams Messaging Manager
▶︎
Streams Replication Manager Overview
Overview
Key Features
Main Use Cases
▶︎
Use Case Architectures
▶︎
Highly Available Kafka Architectures
Active / Stand-by Architecture
Active / Active Architecture
Cross Data Center Replication
▶︎
Cluster Migration Architectures
On-premise to Cloud and Kafka Version Upgrade
Aggregation for Analytics
▶︎
Streams Replication Manager Architecture
▶︎
Streams Replication Manager Driver
Connect workers
Connectors
Task architecture and load-balancing
Driver inter-node coordination
▶︎
Streams Replication Manager Service
Remote Querying [Technical Preview]
▶︎
Understanding Replication Flows
Replication Flows Overview
Remote Topics
Bidirectional Replication Flows
Fan-in and Fan-out Replication Flows
Understanding co-located and external clusters
Understanding SRM properties, their configuration and hierarchy
▶︎
Planning
▶︎
Deployment Planning for Cloudera Search
Planning overview
Dimensioning guidelines for deploying Cloudera Search
Schemaless mode overview and best practices
Advantages of defining a schema for production use
▶︎
Planning for Apache Impala
Guidelines for Schema Design
User Account Requirements
▶︎
Planning for Apache Kudu
▶︎
Kudu schema design
The perfect schema
▶︎
Column design
Decimal type
Varchar type
Column encoding
Column compression
▶︎
Primary key design
Primary key index
Considerations for backfill inserts
▶︎
Partitioning
▶︎
Range partitioning
Adding and Removing Range Partitions
Hash partitioning
Multilevel partitioning
Partition pruning
▶︎
Partitioning examples
Range partitioning
Hash partitioning
Hash and range partitioning
Hash and hash partitioning
Schema alterations
Schema design limitations
Partitioning limitations
▶︎
Kudu transaction semantics
Single tablet write operations
Writing to multiple tablets
Read operations (scans)
▶︎
Known issues and limitations
Writes
Reads (scans)
▶︎
Scaling Kudu
Terms
Example workload
▶︎
Memory
Verifying if a memory limit is sufficient
File descriptors
Threads
Scaling recommendations and limitations
▶︎
Planning for Streams Replication Manager
Streams Replication Manager requirements
Recommended deployment architecture
▶︎
How To
▶︎
Storage
▶︎
Managing Data Storage
▶︎
Optimizing data storage
▶︎
Balancing data across disks of a DataNode
▶︎
Plan the data movement across disks
Parameters to configure the Disk Balancer
Run the Disk Balancer plan
Disk Balancer commands
▶︎
Erasure coding overview
Understanding erasure coding policies
Comparing replication and erasure coding
Best practices for rack and node setup for EC
Prerequisites for enabling erasure coding
Limitations of erasure coding
Using erasure coding for existing data
Using erasure coding for new data
Advanced erasure coding configuration
Erasure coding CLI command
Erasure coding examples
▶︎
Increasing storage capacity with HDFS compression
Enable GZipCodec as the default compression codec
Use GZipCodec with a one-time job
▶︎
Setting HDFS quotas
Set quotas using Cloudera Manager
▶︎
Configuring heterogeneous storage in HDFS
HDFS storage types
HDFS storage policies
Commands for configuring storage policies
Set up a storage policy for HDFS
Set up SSD storage using Cloudera Manager
Configure archival storage
The HDFS mover command
▶︎
Balancing data across an HDFS cluster
Why HDFS data becomes unbalanced
▶︎
Configurations and CLI options for the HDFS Balancer
Properties for configuring the Balancer
Balancer commands
Recommended configurations for the Balancer
▶︎
Configuring and running the HDFS balancer using Cloudera Manager
Configuring the balancer threshold
Configuring concurrent moves
Recommended configurations for the balancer
Running the balancer
Configuring block size
▶︎
Cluster balancing algorithm
Storage group classification
Storage group pairing
Block move scheduling
Block move execution
Exit statuses for the HDFS Balancer
HDFS
▶︎
Optimizing performance
▶︎
Improving performance with centralized cache management
Benefits of centralized cache management in HDFS
Use cases for centralized cache management
Centralized cache management architecture
Caching terminology
Properties for configuring centralized caching
Commands for using cache pools and directives
▶︎
Specifying racks for hosts
Viewing racks assigned to cluster hosts
Editing rack assignments for hosts
▶︎
Customizing HDFS
Customize the HDFS home directory
Properties to set the size of the NameNode edits directory
▶︎
Optimizing NameNode disk space with Hadoop archives
Overview of Hadoop archives
Hadoop archive components
Create a Hadoop archive
List files in Hadoop archives
Format for using Hadoop archives with MapReduce
▶︎
Detecting slow DataNodes
Enable disk IO statistics
Enable detection of slow DataNodes
▶︎
Allocating DataNode memory as storage
HDFS storage types
LAZY_PERSIST memory storage policy
Configure DataNode memory as storage
▶︎
Improving performance with short-circuit local reads
Prerequisites for configuring short-ciruit local reads
Properties for configuring short-circuit local reads on HDFS
▶︎
Configure mountable HDFS
Add HDFS system mount
Optimize mountable HDFS
Configuring Proxy Users to Access HDFS
▶︎
Using DistCp to copy files
Using DistCp
Distcp syntax and examples
Using DistCp with Highly Available remote clusters
▶︎
Using DistCp with Amazon S3
Using a credential provider to secure S3 credentials
Examples of DistCp commands using the S3 protocol and hidden credentials
Kerberos setup guidelines for Distcp between secure clusters
▶︎
Distcp between secure clusters in different Kerberos realms
Configure source and destination realms in krb5.conf
Configure HDFS RPC protection
Configure acceptable Kerberos principal patterns
Specify truststore properties
Set HADOOP_CONF to the destination cluster
Launch distcp
Copying data between a secure and an insecure cluster using DistCp and WebHDFS
Post-migration verification
Using DistCp between HA clusters using Cloudera Manager
▶︎
Using the NFS Gateway for accessing HDFS
Configure the NFS Gateway
▶︎
Start and stop the NFS Gateway services
Verify validity of the NFS services
▶︎
Access HDFS from the NFS Gateway
How NFS Gateway authenticates and maps users
▶︎
APIs for accessing HDFS
Set up WebHDFS on a secure cluster
▶︎
Using HttpFS to provide access to HDFS
Add the HttpFS role
Using Load Balancer with HttpFS
▶︎
HttpFS authentication
Use curl to access a URL protected by Kerberos HTTP SPNEGO
▶︎
Data storage metrics
Using JMX for accessing HDFS metrics
▶︎
Configure the G1GC garbage collector
Recommended settings for G1GC
Switching from CMS to G1GC
HDFS Metrics
▶︎
Configuring Data Protection
▶︎
Data protection
▶︎
Backing up HDFS metadata
▶︎
Introduction to HDFS metadata files and directories
▶︎
Files and directories
NameNodes
JournalNodes
DataNodes
▶︎
HDFS commands for metadata files and directories
Configuration properties
▶︎
Back up HDFS metadata
Prepare to back up the HDFS metadata
Backing up NameNode metadata
Back up HDFS metadata using Cloudera Manager
Restoring NameNode metadata
Restore HDFS metadata from a backup using Cloudera Manager
Perform a backup of the HDFS metadata
▶︎
Configuring HDFS trash
Trash behavior with HDFS Transparent Encryption enabled
Enabling and disabling trash
Setting the trash interval
▶︎
Using HDFS snapshots for data protection
Considerations for working with HDFS snapshots
Enable snapshot creation on a directory
Create snapshots on a directory
Recover data from a snapshot
Options to determine differences between contents of snapshots
CLI commands to perform snapshot operations
▶︎
Managing snapshot policies using Cloudera Manager
Create a snapshot policy
Edit or delete a snapshot policy
Enable and disable snapshot creation using Cloudera Manager
Create snapshots using Cloudera Manager
Delete snapshots using Cloudera Manager
Preventing inadvertent deletion of directories
▶︎
Accessing Cloud Data
Cloud storage connectors overview
The Cloud Storage Connectors
▶︎
Working with Amazon S3
Limitations of Amazon S3
▶︎
Configuring Access to S3
Configuring Access to S3 on CDP Public Cloud
▶︎
Configuring Access to S3 on CDP Private Cloud Base
Using Configuration Properties to Authenticate
Using Per-Bucket Credentials to Authenticate
Using Environment Variables to Authenticate
Using EC2 Instance Metadata to Authenticate
Referencing S3 Data in Applications
▶︎
Configuring Per-Bucket Settings
Customizing Per-Bucket Secrets Held in Credential Files
Configuring Per-Bucket Settings to Access Data Around the World
▶︎
Encrypting Data on S3
▶︎
SSE-S3: Amazon S3-Managed Encryption Keys
Enabling SSE-S3
▶︎
SSE-KMS: Amazon S3-KMS Managed Encryption Keys
Enabling SSE-KMS
IAM Role permissions for working with SSE-KMS
▶︎
SSE-C: Server-Side Encryption with Customer-Provided Encryption Keys
Enabling SSE-C
Configuring Encryption for Specific Buckets
Encrypting an S3 Bucket with Amazon S3 Default Encryption
Performance Impact of Encryption
▶︎
Using S3Guard for Consistent S3 Metadata
Introduction to S3Guard
▶︎
Configuring S3Guard
Preparing the S3 Bucket
Choosing a DynamoDB Table and IO Capacity
Creating DynamoDB Access Policy
Restricting Access to S3Guard Tables
Configuring S3Guard in Cloudera Manager
Create the S3Guard Table in DynamoDB
Monitoring and Maintaining S3Guard
Disabling S3Guard and Destroying a S3Guard Database
Pruning Old Data from S3Guard Tables
Importing a Bucket into S3Guard
Verifying that S3Guard is Enabled on a Bucket
Using the S3Guard CLI
S3Guard: Operational Issues
▶︎
Safely Writing to S3 Through the S3A Committers
Introducing the S3A Committers
Configuring Directories for Intermediate Data
Using the Directory Committer in MapReduce
Verifying That an S3A Committer Was Used
Cleaning up after failed jobs
Using the S3Guard Command to List and Delete Uploads
▶︎
Advanced Committer Configuration
Enabling Speculative Execution
Using Unique Filenames to Avoid File Update Inconsistency
Speeding up Job Commits by Increasing the Number of Threads
Securing the S3A Committers
The S3A Committers and Third-Party Object Stores
Limitations of the S3A Committers
Troubleshooting the S3A Committers
Security Model and Operations on S3
S3A and Checksums (Advanced Feature)
A List of S3A Configuration Properties
Working with versioned S3 buckets
Working with Third-party S3-compatible Object Stores
▶︎
Improving Performance for S3A
Working with S3 buckets in the same AWS region
▶︎
Configuring and tuning S3A block upload
Tuning S3A Uploads
Thread Tuning for S3A Data Upload
Optimizing S3A read performance for different file types
S3 Performance Checklist
Troubleshooting S3 and S3Guard
▶︎
Working with Google Cloud Storage
▶︎
Configuring Access to Google Cloud Storage
Create a GCP Service Account
Create a Custom Role
Modify GCS Bucket Permissions
Configure Access to GCS from Your Cluster
Additional Configuration Options for GCS
▶︎
Working with the ABFS Connector
▶︎
Introduction to Azure Storage and the ABFS Connector
Feature Comparisons
Setting up and configuring the ABFS connector
▶︎
Configuring the ABFS Connector
▶︎
Authenticating with ADLS Gen2
Configuring Access to Azure on CDP Public Cloud
Configuring Access to Azure on CDP Private Cloud Base
ADLS Proxy Setup
▶︎
Performance and Scalability
Hierarchical namespaces vs. non-namespaces
Flush options
▶︎
Using ABFS using CLI
Hadoop File System commands
Create a table in Hive
Accessing Azure Storage account container from spark-shell
Copying data with Hadoop DistCp
DistCp and Proxy Settings
ADLS Trash Folder Behavior
Troubleshooting ABFS
▶︎
Configuring Fault Tolerance
▶︎
High Availability on HDFS clusters
▶︎
Configuring HDFS High Availability
NameNode architecture
Preparing the hardware resources for HDFS High Availability
▶︎
Using Cloudera Manager to manage HDFS HA
Enabling HDFS HA
Prerequisites for enabling HDFS HA using Cloudera Manager
Enabling High Availability and automatic failover
Disabling and redeploying HDFS HA
▶︎
Configuring other CDP components to use HDFS HA
Configuring HBase to use HDFS HA
Configuring the Hive Metastore to use HDFS HA
Configuring Impala to work with HDFS HA
Configuring oozie to use HDFS HA
Changing a nameservice name for Highly Available HDFS using Cloudera Manager
Manually failing over to the standby NameNode
Additional HDFS haadmin commands to administer the cluster
Turning safe mode on HA NameNodes
Converting from an NFS-mounted shared edits directory to Quorum-Based Storage
Administrative commands
▶︎
Configuring HDFS ACLs
HDFS ACLs
Configuring ACLs on HDFS
Using CLI commands to create and list ACLs
ACL examples
ACLS on HDFS features
Use cases for ACLs on HDFS
▶︎
Enable authorization for HDFS web UIs
Enable authorization for additional HDFS web UIs
Configuring HSTS for HDFS Web UIs
▶︎
Configuring Apache Kudu
▶︎
Configure Kudu processes
Experimental flags
Configuring the Kudu master
Configuring tablet servers
Rack awareness (Location awareness)
▶︎
Directory configurations
Changing directory configuration
▶︎
Managing Apache Kudu
▶︎
Limitations
Server management limitations
Cluster management limitations
Start and stop Kudu processes
▶︎
Orchestrate a rolling restart with no downtime
Minimize cluster distruption during planned downtime
▶︎
Kudu web interfaces
Kudu master web interface
Kudu tablet server web interface
Common web interface pages
Best practices when adding new tablet servers
Decommission or remove a tablet server
Use cluster names in the kudu command line tool
Migrate data on the same host
▶︎
Migrate to multiple Kudu masters
Prepare for the migration
Perform the migration
▶︎
Change master hostnames
Prepare for hostname changes
Perform hostname changes
▶︎
Remove Kudu masters
Prepare for removal
Perform the removal
▶︎
Run the tablet rebalancing tool
Run a tablet rebalancing tool on a rack-aware cluster
Run a tablet rebalancing tool in Cloudera Manager
▶︎
Managing Apache Kudu Security
Security considerations
Security limitations
▶︎
Authentication
Kudu authentication with Kerberos
Internal private key infrastructure (PKI)
Authentication tokens
Client authentication to secure Kudu clusters
Configuring custom Kerberos principal for Kudu
Coarse-grained authorization
▶︎
Fine-grained authorization
Apache Ranger integration
Authorization tokens
Trusted users
Configure Kudu's integration with Apache Ranger
Ranger client caching
Ranger policies for Kudu
Configuring TLS/SSL encryption for Kudu
Configuring TLS/SSL encryption for Kudu using Cloudera Manager
Redaction
▶︎
Configure a secure Kudu cluster using Cloudera Manager
Enable Kerberos authentication and RPC encryption
Configure coarse-grained authorization with ACLs
Enable Ranger authorization
Configure HTTPS encryption
Configure a secure Kudu cluster using flag safety valves
▶︎
Backing up and Recovering Apache Kudu
▶︎
Kudu backup
Back up tables
Backup tools
Generate a table list
Backup directory structure
Physical backups of an entire node
▶︎
Kudu recovery
Restore tables from backups
Recover from disk failure
Recover from full disks
Bring a tablet that has lost a majority of replicas back online
Rebuild a Kudu filesystem layout
▶︎
Developing Applications with Apache Kudu
View the API documentation
Kudu example applications
Kudu Python client
▶︎
Kudu integration with Spark
Spark integration known issues and limitations
Spark integration best practices
Upsert option in Kudu Spark
Use Spark with a secure Kudu cluster
Spark tuning
▶︎
Using Apache Impala with Apache Kudu
▶︎
Understanding Impala integration with Kudu
Impala database containment model
Internal and external Impala tables
Verifying the Impala dependency on Kudu
Impala integration limitations
▶︎
Using Impala to query Kudu tables
Query an existing Kudu table from Impala
Create a new Kudu table from Impala
Use CREATE TABLE AS SELECT
▶︎
Partitioning tables
Basic partitioning
Advanced partitioning
Non-covering range partitions
Partitioning guidelines
Optimize performance for evaluating SQL predicates
Insert data
INSERT and primary key uniqueness violations
Update data
Upsert a row
Alter a table
Delete data
Failures during INSERT, UPDATE, UPSERT, and DELETE operations
Drop a Kudu table
▶︎
Monitoring Apache Kudu
▶︎
Kudu metrics
Listing available metrics
Collecting metrics through HTTP
Diagnostics logging
Monitor cluster health with ksck
Report craches using breakpad
Enable core dump
Use the Charts Library
▶︎
Compute
▶︎
Using YARN Web UI and CLI
Access the YARN Web User Interface
View Cluster Overview
View Nodes and Node Details
View Queues and Queue Details
▶︎
View All Applications
Search applications
View application details
UI Tools
Use the YARN CLI to View Logs for Applications
▶︎
Configuring Apache Hadoop YARN Security
Linux Container Executor
▶︎
Managing Access Control Lists
YARN ACL rules
YARN ACL syntax
▶︎
YARN ACL types
Admin ACLs
Queue ACLs
▶︎
Application ACLs
Application ACL evaluation
MapReduce Job ACLs
Spark Job ACLs
Application logs' ACLs
▶︎
Configure TLS/SSL for Core Hadoop Services
Configure TLS/SSL for HDFS
Configure TLS/SSL for YARN
Enable HTTPS communication
Configure Cross-Origin Support for YARN UIs and REST APIs
Configure YARN Security for Long-Running Applications
▶︎
YARN Ranger authorization support
YARN Ranger authorization support compatibility matrix
Enabling YARN Ranger authorization support
Disabling YARN Ranger authorization support
Enabling custom Kerberos principal support in YARN
Enabling custom Kerberos principal support in a Queue Manager cluster
▶︎
Configuring Apache Hadoop YARN High Availability
▶︎
YARN ResourceManager High Availability
YARN ResourceManager high availability architecture
Configure YARN ResourceManager high availability
Use the yarn rmadmin tool to administer ResourceManager high availability
▶︎
Work Preserving Recovery for YARN components
Configure work preserving recovery on ResourceManager
Configure work preserving recovery on NodeManager
Example: Configuration for work preserving recovery
▶︎
Managing and Allocating Cluster Resources using Capacity Scheduler
▶︎
Resource scheduling and management
YARN resource allocation of multiple resource-types
Hierarchical queue characteristics
Scheduling among queues
Application reservations
Resource distribution workflow
Resource allocation overview
▶︎
Use CPU scheduling
Configure CPU scheduling and isolation
Use CPU scheduling with distributed shell
▶︎
Use FPGA scheduling
Configure FPGA scheduling and isolation
Use FPGA with distributed shell
▶︎
Limit CPU usage with Cgroups
Use Cgroups
Enable Cgroups
▶︎
Manage queues
Prerequisite
Add queues using YARN Queue Manager UI
Configure cluster capacity with queues
Change resource allocation mode
Start and stop queues
Delete queues
▶︎
Configure scheduler properties at the global level
Set global maximum application priority
Configure preemption
Enable Intra-Queue preemption
Enabling LazyPreemption
Set global application limits
Set default Application Master resource limit
Enable asynchronous scheduler
Configuring queue mapping to use the user name from the application tag using Cloudera Manager
Configure NodeManager heartbeat
Configure data locality
▶︎
Configure per queue properties
Set user limits within a queue
Set Maximum Application limit for a specific queue
Set Application-Master resource-limit for a specific queue
Control access to queues using ACLs
Enable preemption for a specific queue
Enable Intra-Queue Preemption for a specific queue
Configure dynamic queue properties
▶︎
Set Ordering policies within a specific queue
Configure queue ordering policies
▶︎
Dynamic Queue Scheduling [Technical Preview]
Enabling the Dynamic Queue Scheduling feature
Creating a new Dynamic Configuration
Managing Dynamic Configurations
How to read the Schedule table
▶︎
Manage placement rules
Placement rule policies
How to read the Placement Rules table
▶︎
Create placement rules
Example - Placement rules creation
Reorder placement rules
Delete placement rules
Enable override of default queue mappings
▶︎
Manage dynamic queues
Managed Parent Queues
Converting a queue to a Managed Parent Queue
Enabling dynamic child creation in weight mode
Managing dynamic child creation enabled parent queues
Managing dynamically created child queues
Disabling auto queue deletion
Deleting dynamically created child queues
▶︎
Configure Partitions
Enabling node labels on a cluster to configure partition
Create partitions
Assign or unassign a node to a partition
View partitions
Associate partitions with queues
Disassociate partitions from queues
Delete partitions
Setting a default partition expression
Use partitions when submitting a job
Provide Read-only access to Queue Manager UI
▶︎
Managing Apache Hadoop YARN Services
Configure YARN Services API to Manage Long-running Applications
Configure YARN Services using Cloudera Manager
Configuring Node Attribute for Application Master Placement
Migrating database configuration to a new location
▶︎
Running YARN Services
Deploy and manage services on YARN
Launch a YARN service
Save a YARN service definition
▶︎
Create new YARN services using UI
Create a standard YARN service
Create a custom YARN service
Manage the YARN service life cycle through the REST API
YARN services API examples
▶︎
Configuring Apache Hadoop YARN Log Aggregation
YARN Log Aggregation Overview
Log Aggregation File Controllers
Configure Log Aggregation
Log Aggregation Properties
Configure Debug Delay
▶︎
Managing Apache ZooKeeper
Add a ZooKeeper service
Use multiple ZooKeeper services
Replace a ZooKeeper disk
Replace a ZooKeeper role with ZooKeeper service downtime
Replace a ZooKeeper role without ZooKeeper service downtime
Replace a ZooKeeper role on an unmanaged cluster
Confirm the election status of a ZooKeeper service
▶︎
Configuring Apache ZooKeeper
Enable the AdminServer
Configure four-letter-word commands in ZooKeeper
▶︎
Managing Apache ZooKeeper Security
▶︎
ZooKeeper Authentication
Configure ZooKeeper server for Kerberos authentication
Configure ZooKeeper client shell for Kerberos authentication
Verify the ZooKeeper authentication
Enable server-server mutual authentication
Use Digest Authentication Provider
Configure ZooKeeper TLS/SSL using Cloudera Manager
▶︎
ZooKeeper ACLs Best Practices
ZooKeeper ACLs Best Practices: Atlas
ZooKeeper ACLs Best Practices: Cruise Control
ZooKeeper ACLs Best Practices: HBase
ZooKeeper ACLs Best Practices: HDFS
ZooKeeper ACLs Best Practices: Kafka
ZooKeeper ACLs Best Practices: Oozie
ZooKeeper ACLs Best Practices: Ranger
ZooKeeper ACLs best practices: Search
ZooKeeper ACLs Best Practices: YARN
ZooKeeper ACLs Best Practices: ZooKeeper
▶︎
Data Access
▶︎
Using Data Analytics Studio
Compose queries
▶︎
Manage queries
Searching queries
Refining query search using filters
Saving the search results
Compare queries
▶︎
View query details
Viewing the query recommendations
Viewing the query details
Viewing the visual explain for a query
Viewing the Hive configurations for a query
Viewing the query timeline
Viewing the task-level DAG information
Viewing the DAG flow
Viewing the DAG counters
Viewing the Tez configurations for a query
▶︎
Manage databases and tables
Using the Database Explorer
Searching tables
Managing tables
Creating tables
Uploading tables
Editing tables
Deleting tables
Managing columns
Managing partitions
Viewing storage information
Viewing detailed information
Viewing table and column statistics
Previewing tables using Data Preview
▶︎
Manage reports
Viewing the Read and Write report
Viewing the Join report
▶︎
Working with Apache Hive Metastore
HMS table storage
Configuring HMS for high availability
▶︎
Starting Apache Hive
Starting Hive on an insecure cluster
Starting Hive using a password
Accessing Hive from an external node
Running a Hive command
▶︎
Using Apache Hive
▶︎
Apache Hive 3 tables
Hive table locations
Refer to a table using dot notation
Creating a CRUD transactional table
Creating an insert-only transactional table
Creating an S3-based external table
Dropping an external table along with data
Converting a managed non-transactional table to external
▶︎
Accessing StorageHandler and other external tables
Creating secure external tables
Check for required Ranger features in Data Hub
Enable authorization of StorageHandler-based tables in Data Hub
Examples of creating secure external tables
Using constraints
Determining the table type
Apache Hive 3 ACID transactions
▶︎
Apache Hive query basics
Querying the information_schema database
Inserting data into a table
Updating data in a table
Merging data in tables
Deleting data from a table
▶︎
Using a subquery
Subquery restrictions
Use wildcards with SHOW DATABASES
Aggregating and grouping data
Querying correlated data
▶︎
Using common table expressions
Use a CTE in a query
Comparing tables using ANY/SOME/ALL
Escaping an invalid identifier
CHAR data type support
ORC vs Parquet formats
Creating a default directory for managed tables
Generating surrogate keys
▶︎
Partitions and performance
Creating partitions dynamically
▶︎
Partition refresh and configuration
Automating partition discovery and repair
Repairing partitions manually using MSCK repair
Managing partition retention time
▶︎
Query scheduling
Enabling scheduled queries
Periodically rebuilding a materialized view
Getting scheduled query information and monitor the query
▶︎
Materialized views
▶︎
Creating and using a materialized view
Creating the tables and view
Verifing use of a query rewrite
Using optimizations from a subquery
Dropping a materialized view
Showing materialized views
Describing a materialized view
Managing query rewrites
Purposely using a stale materialized view
Creating and using a partitioned materialized view
▶︎
CDW stored procedures
Setting up a CDW client
Creating a function
Using the cursor to return record sets
HPL/SQL examples
▶︎
Using functions
Reloading, viewing, and filtering functions
▶︎
Create a user-defined function
Setting up the development environment
Creating the UDF class
Building the project and upload the JAR
Registering the UDF
Calling the UDF in a query
▶︎
Managing Apache Hive
▶︎
ACID operations
Viewing transactions
Viewing transaction locks
▶︎
Data compaction
Compaction prerequisites
Compaction tasks
Initiating automatic compaction in Cloudera Manager
Starting compaction manually
Viewing compaction progress
Disabling automatic compaction
Configuring compaction using table properties
Configuring compaction in Cloudera Manager
Configuring the compaction check interval
Compactor properties
▶︎
Query vectorization
Vectorization default
▶︎
Configuring Apache Hive
▶︎
Generating statistics
Generating and viewing Apache Hive statistics
Statistics generation and viewing commands
▶︎
Securing Apache Hive
Transactional table access
External table access
▶︎
Hive authentication
Securing HiveServer using LDAP
Client connections to HiveServer
JDBC connection string syntax
Securing an endpoint under AutoTLS
Token-based authentication for Cloudera Data Warehouse integrations
▶︎
Signing on and running queries
Acquiring an integration token
Using the Passcode token
ODBC sign-on walkthrough
▶︎
Integrating Apache Hive with Apache Spark and BI
▶︎
Hive Warehouse Connector for accessing Apache Spark data
Set up
HWC limitations
▶︎
Reading data through HWC
Direct Reader mode introduction
Using Direct Reader mode
Direct Reader configuration properties
Direct Reader limitations
JDBC read mode introduction
Using JDBC read mode
JDBC mode configuration properties
JDBC mode limitations
Kerberos configurations for HWC
Writing data through HWC
Apache Spark executor task statistics
▶︎
HWC and DataFrame APIs
HWC and DataFrame API limitations
HWC supported types mapping
Catalog operations
Read and write operations
Committing a transaction for Direct Reader
Closing HiveWarehouseSession operations
Using HWC for streaming
HWC API Examples
Hive Warehouse Connector Interfaces
Submitting a Scala or Java application
▶︎
HWC integration pyspark, sparklyr, and Zeppelin
Submitting a Python app
Reading and writing Hive tables in R
Livy interpreter configuration
Reading and writing Hive tables in Zeppelin
▶︎
Apache Hive-Kafka integration
Creating a table for a Kafka stream
▶︎
Querying Kafka data
Querying live data from Kafka
Perform ETL by ingesting data from Kafka into Hive
▶︎
Writing data to Kafka
Writing transformed Hive data to Kafka
Setting consumer and producer table properties
Kafka storage handler and table properties
▶︎
Connecting Hive to BI tools using a JDBC/ODBC driver
Getting the JDBC or ODBC driver in Data Hub
Integrating Hive and a BI tool
▶︎
Apache Hive Performance Tuning
Query results cache
Best practices for performance tuning
▶︎
ORC file format
Advanced ORC properties
Performance improvement using partitions
Hive low-latency analytical processing
Bucketed tables in Hive
▶︎
Migrating Data Using Sqoop
Data migration to Apache Hive
▶︎
Imports into Hive
Creating a Sqoop import command
Importing RDBMS data into Hive
Import command options
▶︎
Starting and Stopping Apache Impala
Modifying Impala Startup Options
▶︎
Configuring Client Access to Impala
▶︎
Impala Shell Tool
Impala Shell Configuration Options
Impala Shell Configuration File
Connecting to Impala Daemon in Impala Shell
Running Commands and SQL Statements in Impala Shell
Impala Shell Command Reference
Configuring ODBC for Impala
Configuring JDBC for Impala
Configuring Impyla for Impala
Configuring Delegation for Clients
Spooling Query Results
Shut Down Impala
▶︎
Setting Timeouts in Impala
Setting Timeout and Retries for Thrift Connections to Backend Client
Increasing StateStore Timeout
Setting the Idle Query and Idle Session Timeouts
▶︎
Securing Apache Impala
▶︎
Securing Impala
Configuring Impala TLS/SSL
▶︎
Impala Authentication
Configuring Kerberos Authentication
▶︎
Configuring LDAP Authentication
Enabling LDAP in Hue
Enabling LDAP Authentication for impala-shell
▶︎
Impala Authorization
Configuring Authorization
Row-level filtering in Impala with Ranger policies
▶︎
Configuring Apache Impala
Configuring Impala
▶︎
Tuning Apache Impala
Setting Up HDFS Caching
Setting up Data Cache for Remote Reads
Configuring Dedicated Coordinators and Executors
▶︎
Managing Apache Impala
▶︎
ACID Operation
Concepts Used in FULL ACID v2 Tables
Key Differences between INSERT-ONLY and FULL ACID Tables
Compaction of Data in FULL ACID Transactional Table
▶︎
Managing Resources in Impala
Admission Control and Query Queuing
Enabling Admission Control
Creating Static Pools
Configuring Dynamic Resource Pool
Dynamic Resource Pool Settings
Admission Control Sample Scenario
Cancelling a Query
Using HLL Datasketch Algorithms in Impala
Using KLL Datasketch Algorithms in Impala
▶︎
Managing Metadata in Impala
On-demand Metadata
Automatic Invalidation of Metadata Cache
▶︎
Automatic Invalidation/Refresh of Metadata
Configuring Event Based Automatic Metadata Sync
▶︎
Monitoring Apache Impala
▶︎
Impala Logs
Managing Logs
Impala lineage
▶︎
Web User Interface for Debugging
Debug Web UI for Impala Daemon
Debug Web UI for StateStore
Debug Web UI for Catalog Server
Configuring Impala Web UI
▶︎
Using Hue
Using Hue
Using SQL to query HBase from Hue
Querying existing HBase tables
Enabling the SQL editor autocompleter
▶︎
Using governance-based data discovery
Defining metadata tags
Searching metadata tags
▶︎
Using Amazon S3 with Hue
Enabling S3 browser for Hue configured with IDBroker
Enabling S3 browser for Hue configured without IDBroker
Populating an S3 bucket
Creating a table from an Amazon S3 file
Exporting query results to Amazon S3
▶︎
Using Azure Data Lake Storage Gen2 with Hue
Enabling ABFS file browser for Hue configured with IDBroker
Enabling ABFS file browser for Hue configured without IDBroker
Granting permission to access S3 and ABFS File Browser in Hue
Supported special characters
▶︎
Administering Hue
Reference architecture
Hue configuration files
Hue Advanced Configuration Snippet
Hue logs
Hue supported browsers
▶︎
Customizing the Hue web UI
Adding a custom banner
Changing the page logo
Setting the cache timeout
Enabling or disabling anonymous usage date collection
Enabling Hue applications with Cloudera Manager
Running shell commands
Downloading and exporting data from Hue
Enabling a multi-threaded environment for Hue
▶︎
Securing Hue
▶︎
User management in Hue
Understanding Hue users and groups
Finding the list of Hue superusers
Creating a Hue user
Creating a group in Hue
Managing Hue permissions
Resetting Hue user password
Assigning superuser status to an LDAP user
▶︎
User authentication in Hue
Authentication using Kerberos
▶︎
Authentication using LDAP
Import and sync LDAP users and groups
Configuring authentication with LDAP and Search Bind
Configuring authentication with LDAP and Direct Bind
Multi-server LDAP/AD autentication
Testing the LDAP configuration
Configuring group permissions
Enabling LDAP authentication with HiveServer2 and Impala
LDAP properties
Configuring LDAP on unmanaged clusters
▶︎
Authentication using SAML
Configuring SAML authentication on managed clusters
Manually configuring SAML authentication
Integrating your identity provider's SAML server with Hue
SAML properties
Troubleshooting SAML authentication
Applications and permissions reference
Securing Hue passwords with scripts
▶︎
Configuring TLS/SSL for Hue
Creating a truststore file in PEM format
Configuring Hue as a TLS/SSL client
Enabling Hue as a TLS/SSL client
Configuring Hue as a TLS/SSL server
Enabling Hue as a TLS/SSL server using Cloudera Manager
Enabling TLS/SSL for Hue Load Balancer
Enabling TLS/SSL communication with HiveServer2
Enabling TLS/SSL communication with Impala
Securing database connections with TLS/SSL
Enforcing TLS version 1.2 for Hue
Securing sessions
Specifying HTTP request methods
Restricting supported ciphers for Hue
Specifying domains or pages to which Hue can redirect users
Setting Oozie permissions
Configuring secure access between Solr and Hue
▶︎
Tuning Hue
Adding a load balancer
▶︎
Configuring high availability for Hue
Configuring Hive and Impala for high availability with Hue
Configuring for HDFS high availability
▶︎
Search Tutorial
Tutorial
▶︎
Validating the Cloudera Search deployment
Create a test collection
Index sample data
Query sample data
▶︎
Indexing sample tweets with Cloudera Search
Create a collection for tweets
Copy sample tweets to HDFS
▶︎
Using MapReduce batch indexing to index sample Tweets
Batch indexing into offline Solr shards
▶︎
Securing Cloudera Search
Cloudera Search security aspects
Enable LDAP authentication in Solr
Creating a JAAS configuration file
Manage Ranger authorization in Solr
Configuring Ranger authorization
Enable document-level authorization
▶︎
Tuning Cloudera Search
Solr server tuning categories
Setting Java system properties for Solr
Enable multi-threaded faceting
Tuning garbage collection
Enable garbage collector logging
Solr and HDFS - the block cache
▶︎
Tuning replication
Adjust the Solr replication factor for index files stored in HDFS
▶︎
Managing Cloudera Search
▶︎
Cloudera Search log files
Viewing and modifying log levels for Search and related services
▶︎
Viewing and modifying Search configuration using Cloudera Manager
Cloudera Search configuration files
▶︎
Managing collection configuration
Cloudera Search config templates
Generating collection configuration using configs
Securing configs with ZooKeeper ACLs and Ranger
Generating Solr collection configuration using instance directories
Modifying a collection configuration generated using an instance directory
Converting instance directories to configs
Using custom JAR files with Search
▶︎
Managing collections
Creating a Solr collection
Viewing existing collections
Deleting all documents in a collection
Deleting a collection
Updating the schema in a collection
Creating a replica of an existing shard
Migrating Solr replicas
Backing up a collection from HDFS
Backing up a collection from local file system
Restoring a collection
Defining a backup target in solr.xml
▶︎
Cloudera Search ETL
Using Morphlines to index Avro
Using Morphlines with Syslog
▶︎
Indexing Data Using Morphlines
Indexing data
▶︎
Lily HBase NRT indexing
Enable cluster-wide HBase replication
Adding the Lily HBase indexer service
Starting the Lily HBase NRT indexer service
▶︎
Using the Lily HBase NRT indexer service
Enable replication on HBase column families
Create a Collection in Cloudera Search
Creating a Lily HBase Indexer Configuration File
Creating a Morphline Configuration File
Understanding the extractHBaseCells Morphline Command
Registering a Lily HBase Indexer Configuration with the Lily HBase Indexer Service
Verifying that Indexing Works
Using the indexer HTTP interface
▶︎
Configuring Lily HBase Indexer Security
Configure Lily HBase Indexer to use TLS/SSL
Configure Lily HBase Indexer Service to use Kerberos authentication
▶︎
Batch indexing using Morphlines
Spark indexing using morphlines
▶︎
MapReduce indexing
▶︎
MapReduceIndexerTool
MapReduceIndexerTool input splits
MapReduceIndexerTool metadata
MapReduceIndexerTool usage syntax
Indexing data with MapReduceIndexerTool in Solr backup format
▶︎
Lily HBase batch indexing for Cloudera Search
Populating an HBase Table
Create a Collection in Cloudera Search
Creating a Lily HBase Indexer Configuration File
Creating a Morphline Configuration File
Understanding the extractHBaseCells Morphline Command
Running the HBaseMapReduceIndexerTool
HBaseMapReduceIndexerTool command line reference
Using --go-live with SSL or Kerberos
Understanding --go-live and HDFS ACLs
▶︎
Indexing Data Using Spark-Solr Connector
▶︎
Batch indexing to Solr using SparkApp framework
Create indexer Maven project
Run the spark-submit job
▶︎
Operational Database
▶︎
Overview
Operational database cluster
Before you create an operational database cluster
▶︎
Creating an operational database cluster
Default operational database cluster definition
Provision an operational database cluster
▶︎
Configuring Apache HBase
Using DNS with HBase
Use the Network Time Protocol (NTP) with HBase
Configure the graceful shutdown timeout property
▶︎
Setting user limits for HBase
Configure ulimit for HBase using Cloudera Manager
Configuring ulimit for HBase
Configure ulimit using Pluggable Authentication Modules using the Command Line
Using dfs.datanode.max.transfer.threads with HBase
Configure encryption in HBase
▶︎
Using hedged reads
Enable hedged reads for HBase
Monitor the performance of hedged reads
▶︎
Understanding HBase garbage collection
Configure HBase garbage collection
Disable the BoundedByteBufferPool
Configure the HBase canary
Configuring auto split policy in an HBase table
▶︎
Using HBase blocksize
Configure the blocksize for a column family
▶︎
Configuring HBase BlockCache
Contents of the BlockCache
Size the BlockCache
Decide to use the BucketCache
▶︎
About the Off-heap BucketCache
Off-heap BucketCache
BucketCache IO engine
Configure BucketCache IO engine
Configure the off-heap BucketCache using Cloudera Manager
Configure the off-heap BucketCache using the command line
Cache eviction priorities
Bypass the BlockCache
Monitor the BlockCache
▶︎
Using quota management
Configuring quotas
General Quota Syntax
▶︎
Throttle quotas
Throttle quota examples
Space quotas
Quota enforcement
Quota violation policies
▶︎
Impact of quota violation policy
Live write access
Bulk Write Access
Read access
Metrics and Insight
Examples of overlapping quota policies
Number-of-Tables Quotas
Number-of-Regions Quotas
▶︎
Using HBase scanner heartbeat
Configure the scanner heartbeat using Cloudera Manager
▶︎
Storing medium objects (MOBs)
Prerequisites
Configure columns to store MOBs
Configure the MOB cache using Cloudera Manager
Test MOB storage and retrieval performance
MOB cache properties
▶︎
Limiting the speed of compactions
Configure the compaction speed using Cloudera Manager
Enable HBase indexing
▶︎
Using HBase coprocessors
Add a custom coprocessor
Disable loading of coprocessors
▶︎
Configuring HBase MultiWAL
Configuring MultiWAL support using Cloudera Manager
▶︎
Configuring the storage policy for the Write-Ahead Log (WAL)
Configure the storage policy for WALs using Cloudera Manager
Configure the storage policy for WALs using the Command Line
▶︎
Using RegionServer grouping
Enable RegionServer grouping using Cloudera Manager
Configure RegionServer grouping
Monitor RegionServer grouping
Remove a RegionServer from RegionServer grouping
Enabling ACL for RegionServer grouping
Best practices when using RegionServer grouping
Disable RegionServer grouping
▶︎
Optimizing HBase I/O
HBase I/O components
Advanced configuration for write-heavy workloads
▶︎
Managing Apache HBase Security
▶︎
HBase authentication
Configuring HBase servers to authenticate with a secure HDFS cluster
Configuring secure HBase replication
Configure the HBase client TGT renewal period
Disabling Kerberos authentication for HBase clients
HBase authorization
▶︎
Configuring TLS/SSL for HBase
Prerequisites to configure TLS/SSL for HBase
Configuring TLS/SSL for HBase Web UIs
Configuring TLS/SSL for HBase REST Server
Configuring TLS/SSL for HBase Thrift Server
Configuring HSTS for HBase Web UIs
▶︎
Accessing Apache HBase
▶︎
Use the HBase shell
Virtual machine options for HBase Shell
Script with HBase Shell
Use the HBase command-line utilities
Use the HBase APIs for Java
▶︎
Use the HBase REST server
Installing the REST Server using Cloudera Manager
Using the REST API
Using the REST proxy API
Use the Apache Thrift Proxy API
▶︎
Using Apache HBase Hive integration
Configure Hive to use with HBase
Using HBase Hive integration
▶︎
Using the HBase-Spark connector
Configure HBase-Spark connector
Example: Using the HBase-Spark connector
▶︎
Use the Hue HBase app
Configure the HBase thrift server role
▶︎
Managing Apache HBase
▶︎
Starting and stopping HBase using Cloudera Manager
Start HBase
Stop HBase
▶︎
Graceful HBase shutdown
Gracefully shut down an HBase RegionServer
Gracefully shut down the HBase service
▶︎
Importing data into HBase
Choose the right import method
Use snapshots
Use CopyTable
▶︎
Use BulkLoad
Use cases for BulkLoad
Use cluster replication
Use Sqoop
Use Spark
Use a custom MapReduce job
▶︎
Use HashTable and SyncTable Tool
HashTable/SyncTable tool configuration
Synchronize table data using HashTable/SyncTable tool
▶︎
Writing data to HBase
Variations on Put
Versions
Deletion
Examples
▶︎
Reading data from HBase
Perform scans using HBase Shell
▶︎
HBase filtering
Dynamically loading a custom filter
Logical operators, comparison operators and comparators
Compound operators
Filter types
HBase Shell example
Java API example
HBase online merge
Move HBase Master Role to another host
Expose HBase metrics to a Ganglia server
▶︎
Configuring Apache HBase High Availability
Enable HBase high availability using Cloudera Manager
HBase read replicas
Timeline consistency
Keep replicas current
Read replica properties
Configure read replicas using Cloudera Manager
▶︎
Using rack awareness for read replicas
Create a topology map
Create a topology script
Activate read replicas on a table
Request a timeline-consistent read
▶︎
Using Apache HBase Backup and Disaster Recovery
HBase backup and disaster recovery strategies
▶︎
Configuring HBase snapshots
About HBase snapshots
▶︎
Manage HBase snapshots using COD CLI
Create a snapshot
List snapshots
Restore a snapshot
List restored snapshots
Delete snapshots
▶︎
Manage HBase snapshots using the HBase shell
Shell commands
Take a snapshot using a shell script
Export a snapshot to another cluster
Information and debugging
▶︎
Using HBase replication
Common replication topologies
Notes about replication
Replication requirements
▶︎
Deploy HBase replication
Replication across three or more clusters
Enable replication on a specific table
Configure secure replication
▶︎
Configure bulk load replication
Enable bulk load replication using Cloudera Manager
Create empty table on the destination cluster
Disable replication at the peer level
Stop replication in an emergency
▶︎
Initiate replication when data already exist
Replicate pre-exist data in an active-active deployment
Using the CldrCopyTable utility to copy data
Effects of WAL rolling on replication
Configuring secure HBase replication
Restore data from a replica
Verify that replication works
Replication caveats
▶︎
Configuring Apache HBase for Apache Phoenix
Configure HBase for use with Phoenix
▶︎
Using Apache Phoenix to Store and Access Data
▶︎
Mapping Apache Phoenix schemas to Apache HBase namespaces
Enable namespace mapping
▶︎
Associating tables of a schema to a namespace
Associate table in a customized Kerberos environment
Associate a table in a non-customized environment without Kerberos
▶︎
Using secondary indexing
Use strongly consistent indexing
Migrate to strongly consistent indexing
▶︎
Using transactions
Configure transaction support
Use transactions with tables
▶︎
Using JDBC API
Connecting to PQS using JDBC
Connect to Phoenix Query Server
Connect to Phoenix Query Server through Apache Knox
Using non-JDBC drivers
▶︎
Using Apache Phoenix-Spark connector
Configure Phoenix-Spark connector
Phoenix-Spark connector usage examples
▶︎
Using Apache Phoenix-Hive connector
Configure Phoenix-Hive connector
Apache Phoenix-Hive usage examples
Limitations of Phoenix-Hive connector
▶︎
Managing Apache Phoenix Security
Managing Apache Phoenix security
Enable Phoenix ACLs
Configure TLS encryption manually for Phoenix Query Server
▶︎
Data Engineering
▶︎
Configuring Apache Spark
▶︎
Configuring dynamic resource allocation
Customize dynamic resource allocation settings
Configure a Spark job for dynamic resource allocation
Dynamic resource allocation properties
▶︎
Spark security
Enabling Spark authentication
Enabling Spark Encryption
Running Spark applications on secure clusters
Configuring HSTS for Spark
Accessing compressed files in Spark
▶︎
Developing Apache Spark Applications
Introduction
Spark application model
Spark execution model
Developing and running an Apache Spark WordCount application
Using the Spark DataFrame API
▶︎
Building Spark Applications
Best practices for building Apache Spark applications
Building reusable modules in Apache Spark applications
Packaging different versions of libraries with an Apache Spark application
▶︎
Using Spark SQL
SQLContext and HiveContext
Querying files into a DataFrame
Spark SQL example
Interacting with Hive views
Performance and storage considerations for Spark SQL DROP TABLE PURGE
TIMESTAMP compatibility for Parquet files
Accessing Spark SQL through the Spark shell
Calling Hive user-defined functions (UDFs)
▶︎
Using Spark Streaming
Spark Streaming and Dynamic Allocation
Spark Streaming Example
Enabling fault-tolerant processing in Spark Streaming
Configuring authentication for long-running Spark Streaming jobs
Building and running a Spark Streaming application
Sample pom.xml file for Spark Streaming with Kafka
▶︎
Accessing external storage from Spark
▶︎
Accessing data stored in Amazon S3 through Spark
Examples of accessing Amazon S3 data from Spark
Accessing Hive from Spark
Accessing HDFS Files from Spark
▶︎
Accessing ORC Data in Hive Tables
Accessing ORC files from Spark
Predicate push-down optimization
Loading ORC data into DataFrames using predicate push-down
Optimizing queries using partition pruning
Enabling vectorized query execution
Reading Hive ORC tables
Accessing Avro data files from Spark SQL applications
Accessing Parquet files from Spark SQL applications
▶︎
Using Spark MLlib
Running a Spark MLlib example
Enabling Native Acceleration For MLlib
Using custom libraries with Spark
▶︎
Running Apache Spark Applications
Introduction
Running Spark 3 Applications
Running your first Spark application
Running sample Spark applications
▶︎
Configuring Spark Applications
Configuring Spark application properties in spark-defaults.conf
Configuring Spark application logging properties
▶︎
Submitting Spark applications
spark-submit command options
Spark cluster execution overview
Canary test for pyspark command
Fetching Spark Maven dependencies
Accessing the Spark History Server
▶︎
Running Spark applications on YARN
Spark on YARN deployment modes
Submitting Spark Applications to YARN
Monitoring and Debugging Spark Applications
Example: Running SparkPi on YARN
Configuring Spark on YARN Applications
Dynamic allocation
▶︎
Submitting Spark applications using Livy
Using the Livy API to run Spark jobs
▶︎
Running an interactive session with the Livy REST API
Livy objects for interactive sessions
Setting Python path variables for Livy
Livy API reference for interactive sessions
▶︎
Submitting batch applications using the Livy REST API
Livy batch object
Livy API reference for batch jobs
Submitting a Spark job to a Data Hub cluster using Livy
Configuring the Livy Thrift Server
Connecting to the Apache Livy Thrift Server
Using Livy with Spark
Using Livy with interactive notebooks
▶︎
Using PySpark
Running PySpark in a virtual environment
Running Spark Python applications
Automating Spark Jobs with Oozie Spark Action
▶︎
Tuning Apache Spark
Introduction
Check Job Status
Check Job History
Improving Software Performance
▶︎
Tuning Apache Spark Applications
Tuning Spark Shuffle Operations
Choosing Transformations to Minimize Shuffles
When Shuffles Do Not Occur
When to Add a Shuffle Transformation
Secondary Sort
Tuning Resource Allocation
Resource Tuning Example
Tuning the Number of Partitions
Reducing the Size of Data Structures
Choosing Data Formats
▶︎
Configuring Apache Zeppelin
Introduction
Configuring Livy
Configure User Impersonation for Access to Hive
Configure User Impersonation for Access to Phoenix
▶︎
Enabling Access Control for Zeppelin Elements
Enable Access Control for Interpreter, Configuration, and Credential Settings
Enable Access Control for Notebooks
Enable Access Control for Data
▶︎
Shiro Settings: Reference
Active Directory Settings
LDAP Settings
General Settings
shiro.ini Example
▶︎
Using Apache Zeppelin
Introduction
Launch Zeppelin
▶︎
Working with Zeppelin Notes
Create and Run a Note
Import a Note
Export a Note
Using the Note Toolbar
Import External Packages
▶︎
Configuring and Using Zeppelin Interpreters
Modify interpreter settings
Using Zeppelin Interpreters
Customize interpreter settings in a note
Use the JDBC interpreter to access Hive
Use the JDBC interpreter to access Phoenix
Use the Livy interpreter to access Spark
Using Spark Hive Warehouse and HBase Connector Client .jar files with Livy
▶︎
Security
▶︎
Apache Ranger Auditing
Audit Overview
▶︎
Managing Auditing with Ranger
View audit details
Create a read-only Admin user (Auditor)
Update Ranger audit configration parameters
Ranger Audit Filters
Changing Ranger audit storage location and migrating data
▶︎
Apache Ranger Authorization
Using Ranger to Provide Authorization in CDP
▶︎
Ranger Policies Overview
Ranger tag-based policies
Tags and policy evaluation
Ranger access conditions
▶︎
Using the Ranger Console
Accessing the Ranger console
Ranger console navigation
Configure session timeout for Ranger Admin Web UI
▶︎
Resource-based Services and Policies
▶︎
Configuring resource-based services
Configure a resource-based service: ADLS
Configure a resource-based service: Atlas
Configure a resource-based service: HBase
Configure a resource-based service: HDFS
Configure a resource-based service: Hive
Configure a resource-based service: Kafka
Configure a resource-based service: Knox
Configure a resource-based service: NiFi
Configure a resource-based service: NiFi Registry
Configure a resource-based service: S3
Configure a resource-based service: Solr
Configure a resource-based service: YARN
▶︎
Configuring resource-based policies
Configure a resource-based policy: ADLS
Configure a resource-based policy: Atlas
Configure a resource-based policy: HBase
Configure a resource-based policy: HDFS
Configure a resource-based policy: HadoopSQL
Configure a resource-based storage handler policy: HadoopSQL
Configure a resource-based policy: Kafka
Configure a resource-based policy: Knox
Configure a resource-based policy: NiFi
Configure a resource-based policy: NiFi Registry
Configure a resource-based policy: S3
Configure a resource-based policy: Solr
Configure a resource-based policy: YARN
Wildcards and variables in resource-based policies
Preloaded resource-based services and policies
▶︎
Importing and exporting resource-based policies
Import resource-based policies for a specific service
Import resource-based policies for all services
Export resource-based policies for a specific service
Export all resource-based policies for all services
▶︎
Row-level filtering and column masking in Hive
Row-level filtering in Hive with Ranger policies
Dynamic resource-based column masking in Hive with Ranger policies
Dynamic tag-based column masking in Hive with Ranger policies
▶︎
Tag-based Services and Policies
Adding a tag-based service
▶︎
Adding tag-based policies
Using tag attributes and values in Ranger tag-based policy conditions
Adding a tag-based PII policy
Default EXPIRES ON tag policy
▶︎
Importing and exporting tag-based policies
Import tag-based policies
Export tag-based policies
Create a time-bound policy
▶︎
Ranger Security Zones
Overview
Adding a Ranger security zone
▶︎
Administering Ranger Users, Groups, Roles, and Permissions
Add a user
Edit a user
Delete a user
Add a group
Edit a group
Delete a group
Add a role through Ranger
Add a role through Hive
Edit a role
Delete a role
Add or edit permissions
▶︎
Administering Ranger Reports
View Ranger reports
Search Ranger reports
Export Ranger reports
Using Ranger client libraries
Using session cookies to validate Ranger policies
▶︎
Configuring Ranger Authentication with UNIX, LDAP, or AD
▶︎
Configuring Ranger Authentication with UNIX, LDAP, AD, or PAM
Configure Ranger authentication for UNIX
Configure Ranger authentication for AD
Configure Ranger authentication for LDAP
Configure Ranger authentication for PAM
▶︎
Ranger AD Integration
Ranger UI authentication
Ranger UI authorization
Ranger Usersync
Ranger user management
Configure Ranger Usersync for Deleted Users and Groups
▶︎
How to manage log rotation for Ranger Services
Managing logging properties for Ranger services
▶︎
Apache Knox Authentication
▶︎
Apache Knox Overview
Securing Access to Hadoop Cluster: Apache Knox
Apache Knox Gateway Overview
Knox Supported Services Matrix
Proxy Cloudera Manager through Apache Knox
▶︎
Installing Apache Knox
Apache Knox Install Role Parameters
▶︎
Governance
▶︎
Searching with Metadata
Searching overview
Using Basic Search
Using Search filters
Using Free-text Search
▶︎
Ignore or Prune pattern to filter Hive metadata entities
How Ignore and Prune feature works
Using Ignore and Prune patterns
Saving searches
Using advanced search
▶︎
Working with Classifications and Labels
Working with Atlas classifications and labels
Creating classifications
Creating labels
Adding attributes to classifications
Associating classifications with entities
Propagating classifications through lineage
Searching for entities using classifications
▶︎
Exploring using Lineage
Lineage overview
Viewing lineage
Lineage lifecycle
▶︎
Leveraging Business Metadata
Business Metadata overview
Creating Business Metadata
Adding attributes to Business Metadata
Associating Business Metadata attributes with entities
Importing Business Metadata associations in bulk
Searching for entities using Business Metadata attributes
▶︎
Managing Business Terms with Atlas Glossaries
Glossaries overview
Creating glossaries
Creating terms
Associating terms with entities
Defining related terms
Creating categories
Assigning terms to categories
Searching using terms
▶︎
Importing Glossary terms in bulk
Enhancements related to bulk glossary terms import
▶︎
Setting up Atlas High Availability
About Atlas High Availability
Prerequisites for setting up Atlas HA
Installing Atlas in HA
▶︎
Auditing Atlas Entities
▶︎
Audit Operations
Atlas Type Definitions
Atlas Export and Import Operations
Atlas Server Operations
Audit enhancements
Examples of Audit Operations
▶︎
Securing Atlas
Securing Atlas
Configuring TLS/SSL for Apache Atlas
▶︎
Configuring Atlas Authentication
Configure Kerberos authentication for Apache Atlas
Configure Atlas authentication for AD
Configure Atlas authentication for LDAP
Configure Atlas PAM authentication
Configure Atlas file-based authentication
▶︎
Configuring Atlas Authorization
Restricting classifications based on user permission
Configuring Ranger Authorization for Atlas
Configuring Atlas Authorization using Ranger
Configuring Simple Authorization in Atlas
▶︎
Configuring Atlas using Cloudera Manager
▶︎
Configuring and Monitoring Atlas
Showing Atlas Server status
Accessing Atlas logs
▶︎
Migrating Data from Cloudera Navigator to Atlas
Transitioning Navigator content to Atlas
▶︎
Extracting S3 Metadata using Atlas
▶︎
Amazon S3 metadata collection
Accessing AWS
AWS objects and inferred hierarchy
AWS object lifecycle
▶︎
AWS configuration
To configure an SQS queue suitable for Atlas extraction
To configure an S3 bucket to publish events
▶︎
S3 Extractor configuration
Prerequisites
Configure credentials for Atlas extraction
Extraction Command
Extractor configuration properties
Defining what assets to extract metadata for
Running bulk extraction
Running incremental extraction
Logging Extractor Activity
S3 actions that produce or update Atlas entities
▶︎
S3 entities created in Atlas
AWS S3 Base
AWS S3 Container
AWS S3 Contained
AWS S3 Bucket
AWS S3 Object
▶︎
AWS S3 Directory
▶︎
S3 relationships
Example of Atlas S3 Lineage
S3 entity audit entries
▶︎
Extracting ADLS Metadata using Atlas
Before you start
Introduction to Atlas ADLS Extractor
Terminologies
Extraction Prerequisites
Updating Extractor Configuration with ADLS Authentication
Configuring ADLS Gen2 Storage Queue
▶︎
Setting up Azure managed Identity for Extraction
Creating Managed Identity
Assigning Roles for the Managed Identities
Mapping Atlas Identity to CDP users
Running ADLS Metadata Extractor
Running Bulk Extraction
Running Incremental Extraction
Command-line options to run Extraction
Extraction Configuration
Verifying Atlas for the extracted data
Resources for on-boarding Azure for CDP users
▶︎
Configuring Oozie
Overview of Oozie
Adding the Oozie service using Cloudera Manager
Considerations for Oozie to work with AWS
▶︎
Redeploying the Oozie ShareLib
Redeploying the Oozie sharelib using Cloudera Manager
▶︎
Oozie configurations with CDP services
▶︎
Using Sqoop actions with Oozie
Deploying and configuring Oozie Sqoop1 Action JDBC drivers
Configuring Oozie Sqoop1 Action workflow JDBC drivers
Configuring Oozie to enable MapReduce jobs to read or write from Amazon S3
Configuring Oozie to use HDFS HA
Using Hive Warehouse Connector with Oozie Spark action
▶︎
Oozie High Availability
Requirements for Oozie High Availability
▶︎
Configuring Oozie High Availability using Cloudera Manager
Oozie Load Balancer configuration
Enabling Oozie High Availability
Disabling Oozie High Availability
▶︎
Scheduling in Oozie using cron-like syntax
Oozie scheduling examples
▶︎
Configuring an external database for Oozie
Configuring PostgreSQL for Oozie
Configuring MariaDB for Oozie
Configuring MySQL for Oozie
Configuring Oracle for Oozie
▶︎
Working with the Oozie server
Starting the Oozie server
Stopping the Oozie server
Accessing the Oozie server with the Oozie Client
Accessing the Oozie server with a browser
Adding schema to Oozie using Cloudera Manager
Enabling the Oozie web console on managed clusters
Enabling Oozie SLA with Cloudera Manager
▶︎
Oozie database configurations
Configuring Oozie data purge settings using Cloudera Manager
Loading the Oozie database
Dumping the Oozie database
Setting the Oozie database timezone
Prerequisites for configuring TLS/SSL for Oozie
Configure TLS/SSL for Oozie
Oozie security enhancements
Additional considerations when configuring TLS/SSL for Oozie HA
Configure Oozie client when TLS/SSL is enabled
Configuring custom Kerberos principal for Oozie
▶︎
Streams Messaging
▶︎
Configuring Apache Kafka
Operating system requirements
Performance considerations
Quotas
▶︎
JBOD
JBOD setup
JBOD Disk migration
Setting user limits for Kafka
Connecting Kafka clients to Data Hub provisioned clusters
Configuring Kafka ZooKeeper chroot
Rack awareness
▶︎
Securing Apache Kafka
▶︎
Channel encryption
Configure Kafka brokers
Configure Kafka clients
Configure Kafka MirrorMaker
Configure Zookeeper TLS/SSL support for Kafka
▶︎
Authentication
▶︎
TLS/SSL client authentication
Configure Kafka brokers
Configure Kafka clients
Principal name mapping
Enable Kerberos authentication
▶︎
Delegation token based authentication
Enable or disable authentication with delegation tokens
Manage individual delegation tokens
Rotate the master key/secret
▶︎
Client authentication using delegation tokens
Configure clients on a producer or consumer level
Configure clients on an application level
▶︎
LDAP authentication
Configure Kafka brokers
Configure Kafka clients
▶︎
PAM authentication
Configure Kafka brokers
Configure Kafka clients
▶︎
Authorization
▶︎
Ranger
Enable authorization in Kafka with Ranger
Configure the resource-based Ranger service used for authorization
▶︎
Governance
Configuring the Atlas hook in Kafka
Inter-broker security
Configuring multiple listeners
▶︎
Kafka security hardening with Zookeeper ACLs
Restricting access to Kafka metadata in Zookeeper
Unlocking access to Kafka metadata in Zookeeper
▶︎
Tuning Apache Kafka Performance
Handling large messages
▶︎
Cluster sizing
Sizing estimation based on network and disk message throughput
Choosing the number of partitions for a topic
▶︎
Broker Tuning
JVM and garbage collection
Network and I/O threads
ISR management
Log cleaner
▶︎
System Level Broker Tuning
File descriptor limits
Filesystems
Virtual memory handling
Networking parameters
Configure JMX ephemeral ports
Kafka-ZooKeeper performance tuning
▶︎
Managing Apache Kafka
▶︎
Management basics
Broker log management
Record management
Broker garbage log collection and log rotation
Client and broker compatibility across Kafka versions
▶︎
Managing topics across multiple Kafka clusters
Set up MirrorMaker in Cloudera Manager
Settings to avoid data loss
▶︎
Broker migration
Migrate brokers by modifying broker IDs in meta.properties
Use rsync to copy files from one broker to another
▶︎
Disk management
Monitoring
▶︎
Handling disk failures
Disk Replacement
Disk Removal
Reassigning replicas between log directories
Retrieving log directory replica assignment information
▶︎
Metrics
Building Cloudera Manager charts with Kafka metrics
Essential metrics to monitor
▶︎
Command Line Tools
Unsupported command line tools
kafka-topics
kafka-configs
kafka-console-producer
kafka-console-consumer
kafka-consumer-groups
▶︎
kafka-reassign-partitions
Tool usage
Reassignment examples
kafka-log-dirs
zookeeper-security-migration
kafka-delegation-tokens
kafka-*-perf-test
Configuring log levels for command line tools
Understanding the kafka-run-class Bash Script
▶︎
Developing Apache Kafka Applications
Kafka producers
▶︎
Kafka consumers
Subscribing to a topic
Groups and fetching
Protocol between consumer and broker
Rebalancing partitions
Retries
Kafka clients and ZooKeeper
▶︎
Java client
▶︎
Client examples
Simple Java consumer
Simple Java producer
Security examples
▶︎
.NET client
▶︎
Client examples
Simple .NET consumer
Simple .NET producer
Performant .NET producer
Security examples
Kafka Streams
Kafka public APIs
Recommendations for client development
▶︎
Configuring Cruise Control
Configuring capacity estimations and goals
Adding self-healing goals to Cruise Control in Cloudera Manager
▶︎
Securing Cruise Control
Enable security for Cruise Control
▶︎
Managing Cruise Control
Rebalancing with Cruise Control
Cruise Control REST API endpoints
▶︎
Securing Streams Messaging Manager
Securing Streams Messaging Manager
Verifying the setup
▶︎
Monitoring Kafka Clusters using Streams Messaging Manager
Monitoring Kafka clusters
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring Kafka brokers
Monitoring Kafka consumers
▶︎
Managing Alert Policies using Streams Messaging Manager
Introduction to alert policies in Streams Messaging Manager
Component types and metrics for alert policies
Notifiers
▶︎
Managing alert policies and notifiers in SMM
Creating a notifier
Updating a notifier
Deleting a notifier
Creating an alert policy
Updating an alert policy
Enabling an alert policy
Disabling an alert policy
Deleting an alert policy
▶︎
Managing Kafka Topics using Streams Messaging Manager
Creating a Kafka topic
Modifying a Kafka topic
Deleting a Kafka topic
▶︎
Monitoring End-to-End Latency using Streams Messaging Manager
End to end latency overview
Granularity of metrics for end-to-end latency
Enabling interceptors
Monitoring end to end latency for Kafka topic
End to end latency use case
▶︎
Monitoring Kafka Cluster Replications using Streams Messaging Manager
Introduction to monitoring Kafka cluster replications in SMM
Configuring SMM for monitoring Kafka cluster replications
▶︎
Viewing Kafka cluster replication details
Searching Kafka cluster replications by source
Monitoring Kafka cluster replications by quick ranges
Monitoring status of the clusters to be replicated
▶︎
Monitoring topics to be replicated
Searching by topic name
Monitoring throughput for cluster replication
Monitoring replication latency for cluster replication
Monitoring checkpoint latency for cluster replication
Monitoring replication throughput and latency by values
▶︎
Integrating with Schema Registry
▶︎
Integrating with NiFi
Understand the NiFi Record Based Processors and Controller Services
Set up the HortonworksSchemaRegistry Controller Service
Adding and Configuring Record Reader and Writer Controller Services
Using Record-Enabled Processors
▶︎
Integrating with Kafka
Integrate Kafka and Schema Registry using NiFi Processors
Integrate Kafka and Schema Registry
Integrating with Atlas
Improve Performance in Schema Registry
▶︎
Using Schema Registry
Adding a new schema
Querying a schema
Evolving a schema
Deleting a schema
Importing Confluent Schema Registry schemas into Cloudera Schema Registry
▶︎
Securing Schema Registry
▶︎
Schema Registry Authorization through Ranger Access Policies
Pre-defined Access Policies for Schema Registry
Add the user or group to a pre-defined access policy
Create a Custom Access Policy
▶︎
TLS Encryption
TLS Certificate Requirements and Recommendations
Configure TLS Encryption Manually for Schema Registry
Schema Registry TLS Properties
▶︎
Schema Registry Authorization through Ranger Access Policies
Pre-defined Access Policies for Schema Registry
Add the user or group to a pre-defined access policy
Create a Custom Access Policy
Customizing the Kerberos principal for Schema Registry
▶︎
Configuring Streams Replication Manager
Enable high availability
▶︎
Defining and adding clusters for replication
Defining external Kafka clusters
Defining co-located Kafka clusters using a service dependency
Defining co-located Kafka clusters using Kafka credentials
Adding clusters to SRM's configuration
Configuring replications
Configuring the driver role target clusters
Configuring the service role target cluster
Configuring properties not exposed in Cloudera Manager
Configuring replication specific REST servers
▶︎
Configuring Remote Querying [Technical Preview]
Enabling Remote Querying [Technical Preview]
Configuring the advertised information of the SRM Service role [Technical Preview]
Configuring automatic group offset synchronization
New topic and consumer group discovery
▶︎
Configuration examples
Bidirectional replication example of two active clusters
Cross data center replication example of multiple clusters
▶︎
Using Streams Replication Manager
▶︎
SRM Command Line Tools
▶︎
srm-control
▶︎
Configuring srm-control
Configuring the SRM client's secure storage
Configuring TLS/SSL properties
Configuring Kerberos properties
Configuring properties for non-Kerberos authentication mechanisms
Setting the secure storage password as an environment variable
Topics and Groups Subcommand
Offsets Subcommand
Monitoring Replication with Streams Messaging Manager
Replicating Data
▶︎
How to Set up Failover and Failback
Configure SRM for Failover and Failback
Migrating Consumer Groups Between Clusters
▶︎
Securing Streams Replication Manager
Security overview
Enabling TLS/SSL for the SRM service
Enabling Kerberos for the SRM service
SRM security example
▶︎
Use cases for Streams Replication Manager in CDP Public Cloud
Using SRM in CDP Public Cloud overview
Replicating data from PvC Base to Data Hub with on-prem SRM
Replicating data from PvC Base to Data Hub with cloud SRM
Replicate data between Data Hub clusters with cloud SRM
▶︎
Troubleshooting
▶︎
Troubleshooting Apache Hive
Unable to alter S3-backed tables
▶︎
Troubleshooting Apache Impala
Troubleshooting Impala
Using Breakpad Minidumps for Crash Reporting
▶︎
Troubleshooting Apache Hadoop YARN
Troubleshooting Docker on YARN
Troubleshooting Linux Container Executor
▶︎
Troubleshooting Apache HBase
Troubleshooting HBase
▶︎
Using the HBCK2 tool to remediate HBase clusters
Running the HBCK2 tool
Finding issues
Fixing issues
HBCK2 tool command reference
Thrift Server crashes after receiving invalid data
HBase is using more disk space than expected
Troubleshoot RegionServer grouping
▶︎
Troubleshooting Apache Kudu
▶︎
Issues starting or restarting the master or the tablet server
Errors during hole punching test
Already present: FS layout already exists
Troubleshooting NTP stability problems
Disk space usage issue
▶︎
Performance issues
▶︎
Kudu tracing
Accessing the tracing web interface
RPC timeout traces
Kernel stack watchdog traces
Memory limits
Block cache size
Heap sampling
Slow name resolution and nscd
▶︎
Usability issues
ClassNotFoundException: com.cloudera.kudu.hive.KuduStorageHandler
Runtime error: Could not create thread: Resource temporarily unavailable (error 11)
Tombstoned or STOPPED tablet replicas
Corruption: checksum error on CFile block
Symbolizing stack traces
▶︎
Recover from a dead Kudu master
Prepare for the recovery
Perform the recovery
▶︎
Troubleshooting Cloudera Search
Identifying problems
Troubleshooting
▶︎
Troubleshooting Data Analytics Studio
▶︎
Problem area: Queries page
Queries are not appearing on the Queries page
Query column is empty but you can see the DAG ID and Application ID
Cannot see the DAG ID and the Application ID
Cannot view queries of other users
▶︎
Problem area: Compose page
Cannot see databases, or the query editor is missing
Unable to view new databases and tables, or unable to see changes to the existing databases or tables
Troubleshooting replication failure in the DAS Event Processor
Problem area: Reports page
How DAS helps to debug Hive on Tez queries
▶︎
Troubleshooting Hue
The Hue load balancer not distributing users evenly across various Hue servers
Unable to authenticate users in Hue using SAML
Cleaning up old data to improve performance
Unable to connect to database with provided credential
Activating Hive query editor on Hue UI
Completed Hue query shows executing on CM
Finding the list of Hue superusers
Unable to view Snappy-compressed files
Invalid query handle
Services backed by PostgreSQL fail or stop responding
Invalid method name: 'GetLog' error
Authorization Exception error
Cannot alter compressed tables in Hue
MySQL: 1040, 'Too many connections' exception
Increasing the maximum number of processes for Oracle database
UTF-8 codec error
ASCII codec error
Fixing authentication issues between HBase and Hue
Lengthy BalancerMember Route length
Enabling access to HBase browser from Hue
Fixing a warning related to accessing non-optimized Hue
▶︎
Troubleshooting Apache Sqoop
Merge process stops during Sqoop incremental imports
Sqoop Hive import stops when HS2 does not use Kerberos authentication
▶︎
Reference
▶︎
Apache Hadoop YARN Reference
▶︎
Tuning Apache Hadoop YARN
YARN tuning overview
Step 1: Worker host configuration
Step 2: Worker host planning
Step 3: Cluster size
Steps 4 and 5: Verify settings
Step 6: Verify container settings on cluster
Step 6A: Cluster container capacity
Step 6B: Container parameters checking
Step 7: MapReduce configuration
Step 7A: MapReduce settings checking
Set properties in Cloudera Manager
Configure memory settings
YARN Configuration Properties
Use the YARN REST APIs to manage applications
▶︎
Comparison of Fair Scheduler with Capacity Scheduler
Why one scheduler?
Scheduler performance improvements
Feature comparison
Migration from Fair Scheduler to Capacity Scheduler
▶︎
Configuring and using Queue Manager REST API
Limitations
Using the REST API
Prerequisites
Start Queue
Stop Queue
Add Queue
Change Queue Capacities
Change Queue Properties
Delete Queue
▶︎
Data Access
▶︎
Apache Hive Materialized View Commands
ALTER MATERIALIZED VIEW REBUILD
ALTER MATERIALIZED VIEW REWRITE
CREATE MATERIALIZED VIEW
DESCRIBE EXTENDED and DESCRIBE FORMATTED
DROP MATERIALIZED VIEW
SHOW MATERIALIZED VIEWS
▶︎
Apache Impala Reference
▶︎
Performance Considerations
Performance Best Practices
Query Join Performance
▶︎
Table and Column Statistics
Generating Table and Column Statistics
Runtime Filtering
Min/Max Filtering
▶︎
Partitioning
Partition Pruning for Queries
HDFS Caching
HDFS Block Skew
Understanding Performance using EXPLAIN Plan
Understanding Performance using SUMMARY Report
Understanding Performance using Query Profile
▶︎
Scalability Considerations
Scaling Limits and Guidelines
Dedicated Coordinator
▶︎
Hadoop File Formats Support
Using Text Data Files
Using Parquet Data Files
Using ORC Data Files
Using Avro Data Files
Using RCFile Data Files
Using SequenceFile Data Files
▶︎
Storage Systems Supports
Impala with HDFS
▶︎
Impala with Kudu
Configuring for Kudu Tables
▶︎
Impala DDL for Kudu
Partitioning for Kudu Tables
Impala DML for Kudu Tables
Impala with HBase
Impala with Azure Data Lake Store (ADLS)
▶︎
Impala with Amazon S3
Specifying Impala Credentials to Access S3
Ports Used by Impala
Migration Guide
Setting up Data Cache for Remote Reads
Managing Metadata in Impala
On-demand Metadata
Transactions
▶︎
Apache Impala SQL Reference
Apache Impala SQL Overview
▶︎
Schema objects
Aliases
Databases
Functions
Identifiers
Tables
Views
▶︎
Data types
ARRAY complex type
BIGINT data type
BOOLEAN data type
CHAR data type
DATE data type
DECIMAL data type
DOUBLE data type
FLOAT data type
INT data type
MAP complex type
REAL data type
SMALLINT data type
STRING data type
STRUCT complex type
▶︎
TIMESTAMP data type
Customizing time zones
TINYINT data type
VARCHAR data type
Complex types
Literals
Operators
Comments
▶︎
SQL statements
DDL statements
DML statements
ALTER DATABASE statement
ALTER TABLE statement
ALTER VIEW statement
COMMENT statement
COMPUTE STATS statement
CREATE DATABASE statement
CREATE FUNCTION statement
CREATE TABLE statement
CREATE VIEW statement
DELETE statement
DESCRIBE statement
DROP DATABASE statement
DROP FUNCTION statement
DROP STATS statement
DROP TABLE statement
DROP VIEW statement
EXPLAIN statement
GRANT statement
INSERT statement
INVALIDATE METADATA statement
LOAD DATA statement
REFRESH statement
REFRESH AUTHORIZATION statement
REFRESH FUNCTIONS statement
REVOKE statement
▶︎
SELECT statement
Joins in Impala SELECT statements
ORDER BY clause
GROUP BY clause
HAVING clause
LIMIT clause
OFFSET clause
UNION, INTERSECT, and EXCEPT clauses
Subqueries in Impala SELECT statements
TABLESAMPLE clause
WITH clause
DISTINCT operator
SET statement
SHOW statement
SHUTDOWN statement
TRUNCATE TABLE statement
UPDATE statement
UPSERT statement
USE statement
VALUES statement
Optimizer hints
Query options
▶︎
Built-in functions
Mathematical functions
Bit functions
Conversion functions
Date and time functions
Conditional functions
String functions
Miscellaneous functions
▶︎
Aggregate functions
APPX_MEDIAN function
AVG function
COUNT function
GROUPING() and GROUPING_ID() functions
GROUP_CONCAT function
MAX function
MIN function
NDV function
STDDEV, STDDEV_SAMP, STDDEV_POP functions
SUM function
VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP functions
▶︎
Analytic functions
OVER
WINDOW
AVG
COUNT
CUME_DIST
DENSE_RANK
FIRST_VALUE
LAG
LAST_VALUE
LEAD
MAX
MIN
NTILE
PERCENT_RANK
RANK
ROW_NUMBER
SUM
▶︎
User-defined functions (UDFs)
UDF concepts
Runtime environment for UDFs
Installing the UDF development package
Writing UDFs
Writing user-defined aggregate functions (UDAFs)
Building and deploying UDFs
Performance considerations for UDFs
Examples of creating and using UDFs
Security considerations for UDFs
Limitations and restrictions for Impala UDFs
Transactions
Reserved words
Impala SQL and Hive SQL
SQL migration to Impala
▶︎
Cloudera Search solrctl Reference
solrctl Reference
Using solrctl with an HTTP proxy
▶︎
Operational Database
▶︎
Apache Phoenix Frequently Asked Questions
Frequently asked questions
▶︎
Apache Phoenix Performance Tuning
Performance tuning
▶︎
Apache Phoenix Command Reference
Apache Phoenix SQL command reference
▶︎
Apache Atlas Reference
Apache Atlas Advanced Search language reference
Apache Atlas Statistics reference
Apache Atlas metadata attributes
Defining Apache Atlas enumerations
▶︎
Purging deleted entities
Auditing purged entities
PUT /admin/purge/ API
POST /admin/audit/ API
▶︎
Apache Atlas technical metadata migration reference
System metadata migration
HDFS entity metadata migration
Hive entity metadata migration
Impala entity metadata migration
Spark entity metadata migration
AWS S3 entity metadata migration
▶︎
NiFi metadata collection
How Lineage strategy works
Understanding the data that flow into Atlas
NiFi lineage
Atlas NiFi relationships
Atlas NiFi audit entries
How the reporting task runs in a NiFi cluster
Analysing event analysis
Limitations of Atlas-NiFi integration
▶︎
HiveServer metadata collection
HiveServer actions that produce Atlas entities
HiveServer entities created in Atlas
HiveServer relationships
HiveServer lineage
HiveServer audit entries
▶︎
HBase metadata collection
HBase actions that produce Atlas entities
HBase entities created in Atlas
Hbase lineage
HBase audit entries
▶︎
Schema Registry metadata collection
Configuring Atlas and Schema Registry
Schema Registry actions that produce Atlas entities
Schema replationships
Schema Registry audit entries
Troubleshooting Schema Registry
▶︎
Impala metadata collection
Impala actions that produce Atlas entities
Impala entities created in Atlas
Impala lineage
Impala audit entries
▶︎
Kafka metadata collection
Kafka actions that produce Atlas entities
Kafka relationships
Kafka lineage
Kafka audit entries
▶︎
Spark metadata collection
Spark actions that produce Atlas entities
Spark entities created in Apache Atlas
Spark lineage
Spark relationships
Spark audit entries
Spark troubleshooting
▶︎
Streams Messaging
Streams Messaging Manager REST API Reference
▶︎
Streams Replication Manager Reference
srm-control Options Reference
Configuration Properties Reference for Properties not Available in Cloudera Manager
Kafka credentials property reference
SRM Service data traffic reference
Streams Replication Manager REST API Reference
Cruise Control REST API Reference
▶︎
Encryption Reference
Auto-TLS Requirements and Limitations
Rotate Auto-TLS Certificate Authority and Host Certificates
Auto-TLS Agent File Locations
▶︎
Apache Oozie Reference
Submit Oozie Jobs in Data Engineering Cluster
.NET client
A List of S3A Configuration Properties
About Atlas High Availability
About HBase snapshots
About the Off-heap BucketCache
Access HDFS from the NFS Gateway
Access the YARN Web User Interface
Accessing Apache HBase
Accessing Atlas logs
Accessing Avro data files from Spark SQL applications
Accessing AWS
Accessing Azure Storage account container from spark-shell
Accessing Cloud Data
Accessing compressed files in Spark
Accessing data stored in Amazon S3 through Spark
Accessing external storage from Spark
Accessing HDFS Files from Spark
Accessing Hive from an external node
Accessing Hive from Spark
Accessing ORC Data in Hive Tables
Accessing ORC files from Spark
Accessing Parquet files from Spark SQL applications
Accessing Spark SQL through the Spark shell
Accessing StorageHandler and other external tables
Accessing the Oozie server with a browser
Accessing the Oozie server with the Oozie Client
Accessing the Ranger console
Accessing the Spark History Server
Accessing the tracing web interface
ACID Operation
ACID operations
ACL examples
ACLS on HDFS features
Acquiring an integration token
Activate read replicas on a table
Activating Hive query editor on Hue UI
Active / Active Architecture
Active / Stand-by Architecture
Active Directory Settings
Add a custom coprocessor
Add a group
Add a role through Hive
Add a role through Ranger
Add a user
Add a ZooKeeper service
Add HDFS system mount
Add or edit permissions
Add Queue
Add queues using YARN Queue Manager UI
Add storage directories using Cloudera Manager
Add the HttpFS role
Add the user or group to a pre-defined access policy
Add the user or group to a pre-defined access policy
Adding a custom banner
Adding a load balancer
Adding a new schema
Adding a Ranger security zone
Adding a tag-based PII policy
Adding a tag-based service
Adding and Configuring Record Reader and Writer Controller Services
Adding and Removing Range Partitions
Adding attributes to Business Metadata
Adding attributes to classifications
Adding clusters to SRM's configuration
Adding schema to Oozie using Cloudera Manager
Adding self-healing goals to Cruise Control in Cloudera Manager
Adding tag-based policies
Adding the Lily HBase indexer service
Adding the Oozie service using Cloudera Manager
Additional Configuration Options for GCS
Additional considerations when configuring TLS/SSL for Oozie HA
Additional HDFS haadmin commands to administer the cluster
Adjust the Solr replication factor for index files stored in HDFS
ADLS Proxy Setup
ADLS Trash Folder Behavior
Admin ACLs
Administering Hue
Administering Ranger Reports
Administering Ranger Users, Groups, Roles, and Permissions
Administrative commands
Admission Control and Query Queuing
Admission Control Sample Scenario
Advanced Committer Configuration
Advanced configuration for write-heavy workloads
Advanced erasure coding configuration
Advanced ORC properties
Advanced partitioning
Advantages of defining a schema for production use
Aggregate functions
Aggregating and grouping data
Aggregation for Analytics
Aliases
Allocating DataNode memory as storage
Already present: FS layout already exists
Alter a table
ALTER DATABASE statement
ALTER MATERIALIZED VIEW REBUILD
ALTER MATERIALIZED VIEW REWRITE
ALTER TABLE statement
ALTER VIEW statement
Amazon S3 metadata collection
Analysing event analysis
Analytic functions
Apache Atlas Advanced Search language reference
Apache Atlas dashboard tour
Apache Atlas metadata attributes
Apache Atlas metadata collection overview
Apache Atlas Reference
Apache Atlas Statistics reference
Apache Atlas technical metadata migration reference
Apache Hadoop YARN Overview
Apache Hadoop YARN Reference
Apache HBase Overview
Apache Hive 3 ACID transactions
Apache Hive 3 architectural overview
Apache Hive 3 tables
Apache Hive content roadmap
Apache Hive features
Apache Hive Materialized View Commands
Apache Hive Metastore Overview
Apache Hive Overview
Apache Hive Performance Tuning
Apache Hive query basics
Apache Hive storage in public clouds
Apache Hive-Kafka integration
Apache Impala Overview
Apache Impala Overview
Apache Impala Reference
Apache Impala SQL Overview
Apache Impala SQL Reference
Apache Kafka Overview
Apache Knox Authentication
Apache Knox Gateway Overview
Apache Knox Install Role Parameters
Apache Knox Overview
Apache Kudu Background Operations
Apache Kudu Overview
Apache Kudu usage limitations
Apache Oozie Reference
Apache Phoenix and SQL
Apache Phoenix Command Reference
Apache Phoenix Frequently Asked Questions
Apache Phoenix Performance Tuning
Apache Phoenix SQL command reference
Apache Phoenix-Hive usage examples
Apache Ranger Auditing
Apache Ranger Authorization
Apache Ranger integration
Apache Spark executor task statistics
Apache Spark Overview
Apache Spark Overview
Apache Zeppelin Overview
APIs for accessing HDFS
Application ACL evaluation
Application ACLs
Application logs' ACLs
Application reservations
Applications and permissions reference
APPX_MEDIAN function
ARRAY complex type
ASCII codec error
Assign or unassign a node to a partition
Assigning Roles for the Managed Identities
Assigning superuser status to an LDAP user
Assigning terms to categories
Associate a table in a non-customized environment without Kerberos
Associate partitions with queues
Associate table in a customized Kerberos environment
Associating Business Metadata attributes with entities
Associating classifications with entities
Associating tables of a schema to a namespace
Associating terms with entities
Atlas
Atlas
Atlas
Atlas classifications drive Ranger policies
Atlas Export and Import Operations
Atlas metadata model overview
Atlas NiFi audit entries
Atlas NiFi relationships
Atlas Server Operations
Atlas Type Definitions
Audit enhancements
Audit Operations
Audit Overview
Auditing Atlas Entities
Auditing purged entities
Authenticating with ADLS Gen2
Authentication
Authentication
Authentication tokens
Authentication using Kerberos
Authentication using LDAP
Authentication using SAML
Authorization
Authorization Exception error
Authorization tokens
Auto-TLS Agent File Locations
Auto-TLS Requirements and Limitations
Automatic Invalidation of Metadata Cache
Automatic Invalidation/Refresh of Metadata
Automating partition discovery and repair
Automating Spark Jobs with Oozie Spark Action
AVG
AVG function
Avro
Avro
AWS configuration
AWS object lifecycle
AWS objects and inferred hierarchy
AWS S3 Base
AWS S3 Bucket
AWS S3 Contained
AWS S3 Container
AWS S3 Directory
AWS S3 entity metadata migration
AWS S3 Object
Back up HDFS metadata
Back up HDFS metadata using Cloudera Manager
Back up tables
Backing up a collection from HDFS
Backing up a collection from local file system
Backing up and Recovering Apache Kudu
Backing up and restoring data
Backing up HDFS metadata
Backing up NameNode metadata
Backup directory structure
Backup tools
Balancer commands
Balancing data across an HDFS cluster
Balancing data across disks of a DataNode
Basic partitioning
Basics
Batch indexing into offline Solr shards
Batch indexing to Solr using SparkApp framework
Batch indexing using Morphlines
Before you create an operational database cluster
Before you start
Behavioral Changes In Cloudera Runtime 7.2.12
Benefits of centralized cache management in HDFS
Best practices for building Apache Spark applications
Best practices for performance tuning
Best practices for rack and node setup for EC
Best practices when adding new tablet servers
Best practices when using RegionServer grouping
Bidirectional replication example of two active clusters
Bidirectional Replication Flows
BIGINT data type
Bit functions
Block cache size
Block move execution
Block move scheduling
BOOLEAN data type
Bring a tablet that has lost a majority of replicas back online
Broker garbage log collection and log rotation
Broker log management
Broker migration
Broker Tuning
Brokers
BucketCache IO engine
Bucketed tables in Hive
Building and deploying UDFs
Building and running a Spark Streaming application
Building Cloudera Manager charts with Kafka metrics
Building reusable modules in Apache Spark applications
Building Spark Applications
Building the project and upload the JAR
Built-in functions
Bulk Write Access
Business Metadata overview
Bypass the BlockCache
Cache eviction priorities
Caching terminology
Calling Hive user-defined functions (UDFs)
Calling the UDF in a query
Canary test for pyspark command
Cancelling a Query
Cannot alter compressed tables in Hue
Cannot see databases, or the query editor is missing
Cannot see the DAG ID and the Application ID
Cannot view queries of other users
Catalog operations
CDP Security Overview
CDW stored procedures
Centralized cache management architecture
Change master hostnames
Change Queue Capacities
Change Queue Properties
Change resource allocation mode
Changing a nameservice name for Highly Available HDFS using Cloudera Manager
Changing directory configuration
Changing Ranger audit storage location and migrating data
Changing the page logo
Channel encryption
CHAR data type
CHAR data type support
Check for required Ranger features in Data Hub
Check Job History
Check Job Status
Choose the right import method
Choosing a DynamoDB Table and IO Capacity
Choosing Data Formats
Choosing the number of partitions for a topic
Choosing Transformations to Minimize Shuffles
ClassNotFoundException: com.cloudera.kudu.hive.KuduStorageHandler
Cleaning up after failed jobs
Cleaning up old data to improve performance
CLI commands to perform snapshot operations
Client and broker compatibility across Kafka versions
Client authentication to secure Kudu clusters
Client authentication using delegation tokens
Client connections to HiveServer
Client examples
Client examples
Closing HiveWarehouseSession operations
Cloud Connectors
Cloud storage connectors overview
Cloudera Runtime
Cloudera Runtime Component Versions
Cloudera Runtime Release Notes
Cloudera Runtime Security and Governance
Cloudera Search and CDP
Cloudera Search architecture
Cloudera Search config templates
Cloudera Search configuration files
Cloudera Search ETL
Cloudera Search log files
Cloudera Search Overview
Cloudera Search security aspects
Cloudera Search solrctl Reference
Cloudera Search tasks and processes
Cluster balancing algorithm
Cluster management limitations
Cluster management limitations
Cluster Migration Architectures
Cluster sizing
Coarse-grained authorization
Collecting metrics through HTTP
Column compression
Column design
Column encoding
Command Line Tools
Command-line options to run Extraction
Commands for configuring storage policies
Commands for using cache pools and directives
COMMENT statement
Comments
Committing a transaction for Direct Reader
Common replication topologies
Common web interface pages
Compacting on-disk data
Compaction of Data in FULL ACID Transactional Table
Compaction prerequisites
Compaction tasks
Compactor properties
Compare queries
Comparing replication and erasure coding
Comparing tables using ANY/SOME/ALL
Comparison of Fair Scheduler with Capacity Scheduler
Compatibility Policies
Completed Hue query shows executing on CM
Complex types
Component types and metrics for alert policies
Components
Compose queries
Compound operators
Compute
COMPUTE STATS statement
Concepts Used in FULL ACID v2 Tables
Conditional functions
Configuration examples
Configuration properties
Configuration Properties Reference for Properties not Available in Cloudera Manager
Configurations and CLI options for the HDFS Balancer
Configure a resource-based policy: ADLS
Configure a resource-based policy: Atlas
Configure a resource-based policy: HadoopSQL
Configure a resource-based policy: HBase
Configure a resource-based policy: HDFS
Configure a resource-based policy: Kafka
Configure a resource-based policy: Knox
Configure a resource-based policy: NiFi
Configure a resource-based policy: NiFi Registry
Configure a resource-based policy: S3
Configure a resource-based policy: Solr
Configure a resource-based policy: YARN
Configure a resource-based service: ADLS
Configure a resource-based service: Atlas
Configure a resource-based service: HBase
Configure a resource-based service: HDFS
Configure a resource-based service: Hive
Configure a resource-based service: Kafka
Configure a resource-based service: Knox
Configure a resource-based service: NiFi
Configure a resource-based service: NiFi Registry
Configure a resource-based service: S3
Configure a resource-based service: Solr
Configure a resource-based service: YARN
Configure a resource-based storage handler policy: HadoopSQL
Configure a secure Kudu cluster using Cloudera Manager
Configure a secure Kudu cluster using flag safety valves
Configure a Spark job for dynamic resource allocation
Configure acceptable Kerberos principal patterns
Configure Access to GCS from Your Cluster
Configure archival storage
Configure Atlas authentication for AD
Configure Atlas authentication for LDAP
Configure Atlas file-based authentication
Configure Atlas PAM authentication
Configure BucketCache IO engine
Configure bulk load replication
Configure clients on a producer or consumer level
Configure clients on an application level
Configure cluster capacity with queues
Configure coarse-grained authorization with ACLs
Configure columns to store MOBs
Configure CPU scheduling and isolation
Configure credentials for Atlas extraction
Configure Cross-Origin Support for YARN UIs and REST APIs
Configure data locality
Configure DataNode memory as storage
Configure Debug Delay
Configure dynamic queue properties
Configure encryption in HBase
Configure four-letter-word commands in ZooKeeper
Configure FPGA scheduling and isolation
Configure HBase for use with Phoenix
Configure HBase garbage collection
Configure HBase-Spark connector
Configure HDFS RPC protection
Configure Hive to use with HBase
Configure HTTPS encryption
Configure JMX ephemeral ports
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka MirrorMaker
Configure Kerberos authentication for Apache Atlas
Configure Kudu processes
Configure Kudu's integration with Apache Ranger
Configure Lily HBase Indexer Service to use Kerberos authentication
Configure Lily HBase Indexer to use TLS/SSL
Configure Log Aggregation
Configure memory settings
Configure mountable HDFS
Configure NodeManager heartbeat
Configure Oozie client when TLS/SSL is enabled
Configure Partitions
Configure per queue properties
Configure Phoenix-Hive connector
Configure Phoenix-Spark connector
Configure preemption
Configure queue ordering policies
Configure Ranger authentication for AD
Configure Ranger authentication for LDAP
Configure Ranger authentication for PAM
Configure Ranger authentication for UNIX
Configure Ranger Usersync for Deleted Users and Groups
Configure read replicas using Cloudera Manager
Configure RegionServer grouping
Configure scheduler properties at the global level
Configure secure replication
Configure session timeout for Ranger Admin Web UI
Configure source and destination realms in krb5.conf
Configure SRM for Failover and Failback
Configure storage balancing for DataNodes using Cloudera Manager
Configure the blocksize for a column family
Configure the compaction speed using Cloudera Manager
Configure the G1GC garbage collector
Configure the graceful shutdown timeout property
Configure the HBase canary
Configure the HBase client TGT renewal period
Configure the HBase thrift server role
Configure the MOB cache using Cloudera Manager
Configure the NFS Gateway
Configure the off-heap BucketCache using Cloudera Manager
Configure the off-heap BucketCache using the command line
Configure the resource-based Ranger service used for authorization
Configure the scanner heartbeat using Cloudera Manager
Configure the storage policy for WALs using Cloudera Manager
Configure the storage policy for WALs using the Command Line
Configure TLS encryption manually for Phoenix Query Server
Configure TLS Encryption Manually for Schema Registry
Configure TLS/SSL for Core Hadoop Services
Configure TLS/SSL for HDFS
Configure TLS/SSL for Oozie
Configure TLS/SSL for YARN
Configure transaction support
Configure ulimit for HBase using Cloudera Manager
Configure ulimit using Pluggable Authentication Modules using the Command Line
Configure User Impersonation for Access to Hive
Configure User Impersonation for Access to Phoenix
Configure work preserving recovery on NodeManager
Configure work preserving recovery on ResourceManager
Configure YARN ResourceManager high availability
Configure YARN Security for Long-Running Applications
Configure YARN Services API to Manage Long-running Applications
Configure YARN Services using Cloudera Manager
Configure ZooKeeper client shell for Kerberos authentication
Configure ZooKeeper server for Kerberos authentication
Configure Zookeeper TLS/SSL support for Kafka
Configure ZooKeeper TLS/SSL using Cloudera Manager
Configuring Access to Azure on CDP Private Cloud Base
Configuring Access to Azure on CDP Public Cloud
Configuring Access to Google Cloud Storage
Configuring Access to S3
Configuring Access to S3 on CDP Private Cloud Base
Configuring Access to S3 on CDP Public Cloud
Configuring ACLs on HDFS
Configuring ADLS Gen2 Storage Queue
Configuring an external database for Oozie
Configuring and Monitoring Atlas
Configuring and running the HDFS balancer using Cloudera Manager
Configuring and tuning S3A block upload
Configuring and using Queue Manager REST API
Configuring and Using Zeppelin Interpreters
Configuring Apache Hadoop YARN High Availability
Configuring Apache Hadoop YARN Log Aggregation
Configuring Apache Hadoop YARN Security
Configuring Apache HBase
Configuring Apache HBase for Apache Phoenix
Configuring Apache HBase High Availability
Configuring Apache Hive
Configuring Apache Impala
Configuring Apache Kafka
Configuring Apache Kudu
Configuring Apache Spark
Configuring Apache Zeppelin
Configuring Apache ZooKeeper
Configuring Atlas and Schema Registry
Configuring Atlas Authentication
Configuring Atlas Authorization
Configuring Atlas Authorization using Ranger
Configuring Atlas using Cloudera Manager
Configuring authentication for long-running Spark Streaming jobs
Configuring authentication with LDAP and Direct Bind
Configuring authentication with LDAP and Search Bind
Configuring Authorization
Configuring auto split policy in an HBase table
Configuring automatic group offset synchronization
Configuring block size
Configuring capacity estimations and goals
Configuring Client Access to Impala
Configuring compaction in Cloudera Manager
Configuring compaction using table properties
Configuring concurrent moves
Configuring Cruise Control
Configuring custom Kerberos principal for Kudu
Configuring custom Kerberos principal for Oozie
Configuring Data Protection
Configuring Dedicated Coordinators and Executors
Configuring Delegation for Clients
Configuring Directories for Intermediate Data
Configuring dynamic resource allocation
Configuring Dynamic Resource Pool
Configuring Encryption for Specific Buckets
Configuring Event Based Automatic Metadata Sync
Configuring Fault Tolerance
Configuring for HDFS high availability
Configuring for Kudu Tables
Configuring group permissions
Configuring HBase BlockCache
Configuring HBase MultiWAL
Configuring HBase servers to authenticate with a secure HDFS cluster
Configuring HBase snapshots
Configuring HBase to use HDFS HA
Configuring HDFS ACLs
Configuring HDFS High Availability
Configuring HDFS trash
Configuring heterogeneous storage in HDFS
Configuring high availability for Hue
Configuring Hive and Impala for high availability with Hue
Configuring HMS for high availability
Configuring HSTS for HBase Web UIs
Configuring HSTS for HDFS Web UIs
Configuring HSTS for Spark
Configuring Hue as a TLS/SSL client
Configuring Hue as a TLS/SSL server
Configuring Impala
Configuring Impala TLS/SSL
Configuring Impala to work with HDFS HA
Configuring Impala Web UI
Configuring Impyla for Impala
Configuring JDBC for Impala
Configuring Kafka ZooKeeper chroot
Configuring Kerberos Authentication
Configuring Kerberos properties
Configuring LDAP Authentication
Configuring LDAP on unmanaged clusters
Configuring Lily HBase Indexer Security
Configuring Livy
Configuring log levels for command line tools
Configuring MariaDB for Oozie
Configuring multiple listeners
Configuring MultiWAL support using Cloudera Manager
Configuring MySQL for Oozie
Configuring Node Attribute for Application Master Placement
Configuring ODBC for Impala
Configuring Oozie
Configuring Oozie data purge settings using Cloudera Manager
Configuring Oozie High Availability using Cloudera Manager
Configuring Oozie Sqoop1 Action workflow JDBC drivers
Configuring Oozie to enable MapReduce jobs to read or write from Amazon S3
Configuring oozie to use HDFS HA
Configuring Oozie to use HDFS HA
Configuring Oracle for Oozie
Configuring other CDP components to use HDFS HA
Configuring Per-Bucket Settings
Configuring Per-Bucket Settings to Access Data Around the World
Configuring PostgreSQL for Oozie
Configuring properties for non-Kerberos authentication mechanisms
Configuring properties not exposed in Cloudera Manager
Configuring Proxy Users to Access HDFS
Configuring queue mapping to use the user name from the application tag using Cloudera Manager
Configuring quotas
Configuring Ranger Authentication with UNIX, LDAP, AD, or PAM
Configuring Ranger Authentication with UNIX, LDAP, or AD
Configuring Ranger authorization
Configuring Ranger Authorization for Atlas
Configuring Remote Querying [Technical Preview]
Configuring replication specific REST servers
Configuring replications
Configuring resource-based policies
Configuring resource-based services
Configuring S3Guard
Configuring S3Guard in Cloudera Manager
Configuring SAML authentication on managed clusters
Configuring secure access between Solr and Hue
Configuring secure HBase replication
Configuring secure HBase replication
Configuring Simple Authorization in Atlas
Configuring SMM for monitoring Kafka cluster replications
Configuring Spark application logging properties
Configuring Spark application properties in spark-defaults.conf
Configuring Spark Applications
Configuring Spark on YARN Applications
Configuring srm-control
Configuring storage balancing for DataNodes
Configuring Streams Replication Manager
Configuring tablet servers
Configuring the ABFS Connector
Configuring the advertised information of the SRM Service role [Technical Preview]
Configuring the Atlas hook in Kafka
Configuring the balancer threshold
Configuring the compaction check interval
Configuring the driver role target clusters
Configuring the Hive Metastore to use HDFS HA
Configuring the Kudu master
Configuring the Livy Thrift Server
Configuring the service role target cluster
Configuring the SRM client's secure storage
Configuring the storage policy for the Write-Ahead Log (WAL)
Configuring TLS/SSL encryption for Kudu
Configuring TLS/SSL encryption for Kudu using Cloudera Manager
Configuring TLS/SSL for Apache Atlas
Configuring TLS/SSL for HBase
Configuring TLS/SSL for HBase REST Server
Configuring TLS/SSL for HBase Thrift Server
Configuring TLS/SSL for HBase Web UIs
Configuring TLS/SSL for Hue
Configuring TLS/SSL properties
Configuring ulimit for HBase
Confirm the election status of a ZooKeeper service
Connect to Phoenix Query Server
Connect to Phoenix Query Server through Apache Knox
Connect workers
Connecting Hive to BI tools using a JDBC/ODBC driver
Connecting Kafka clients to Data Hub provisioned clusters
Connecting to Impala Daemon in Impala Shell
Connecting to PQS using JDBC
Connecting to the Apache Livy Thrift Server
Connectors
Considerations for backfill inserts
Considerations for Oozie to work with AWS
Considerations for working with HDFS snapshots
Contents of the BlockCache
Control access to queues using ACLs
Controlling Data Access with Tags
Conversion functions
Converting a managed non-transactional table to external
Converting a queue to a Managed Parent Queue
Converting from an NFS-mounted shared edits directory to Quorum-Based Storage
Converting instance directories to configs
Copy sample tweets to HDFS
Copying data between a secure and an insecure cluster using DistCp and WebHDFS
Copying data with Hadoop DistCp
Corruption: checksum error on CFile block
COUNT
COUNT function
Create a collection for tweets
Create a Collection in Cloudera Search
Create a Collection in Cloudera Search
Create a Custom Access Policy
Create a Custom Access Policy
Create a Custom Role
Create a custom YARN service
Create a GCP Service Account
Create a Hadoop archive
Create a new Kudu table from Impala
Create a read-only Admin user (Auditor)
Create a snapshot
Create a snapshot policy
Create a standard YARN service
Create a table in Hive
Create a test collection
Create a time-bound policy
Create a topology map
Create a topology script
Create a user-defined function
Create and Run a Note
CREATE DATABASE statement
Create empty table on the destination cluster
CREATE FUNCTION statement
Create indexer Maven project
CREATE MATERIALIZED VIEW
Create new YARN services using UI
Create partitions
Create placement rules
Create snapshots on a directory
Create snapshots using Cloudera Manager
CREATE TABLE statement
Create the S3Guard Table in DynamoDB
CREATE VIEW statement
Creating a CRUD transactional table
Creating a default directory for managed tables
Creating a function
Creating a group in Hue
Creating a Hue user
Creating a JAAS configuration file
Creating a Kafka topic
Creating a Lily HBase Indexer Configuration File
Creating a Lily HBase Indexer Configuration File
Creating a Morphline Configuration File
Creating a Morphline Configuration File
Creating a new Dynamic Configuration
Creating a notifier
Creating a replica of an existing shard
Creating a Solr collection
Creating a Sqoop import command
Creating a table for a Kafka stream
Creating a table from an Amazon S3 file
Creating a truststore file in PEM format
Creating an alert policy
Creating an insert-only transactional table
Creating an operational database cluster
Creating an S3-based external table
Creating and using a materialized view
Creating and using a partitioned materialized view
Creating Business Metadata
Creating categories
Creating classifications
Creating DynamoDB Access Policy
Creating glossaries
Creating labels
Creating Managed Identity
Creating partitions dynamically
Creating secure external tables
Creating Static Pools
Creating tables
Creating terms
Creating the tables and view
Creating the UDF class
Cross Data Center Replication
Cross data center replication example of multiple clusters
Cruise Control
Cruise Control
Cruise Control
Cruise Control Overview
Cruise Control REST API endpoints
CUME_DIST
Customize dynamic resource allocation settings
Customize interpreter settings in a note
Customize the HDFS home directory
Customizing HDFS
Customizing Per-Bucket Secrets Held in Credential Files
Customizing the Hue web UI
Customizing the Kerberos principal for Schema Registry
Customizing time zones
CVE-2021-4428 Remediation for 7.2.12
CVE-2021-45105 & CVE-2021-44832 Remediation for 7.2.12
DAS
DAS
DAS architecture
Data Access
Data Access
Data Access
Data Analytics Studio Overview
Data Analytics Studio overview
Data compaction
Data Engineering
Data Engineering
Data migration to Apache Hive
Data protection
Data Stewardship with Apache Atlas
Data storage metrics
Data types
Databases
DataNodes
DataNodes
Date and time functions
DATE data type
DDL statements
Debug Web UI for Catalog Server
Debug Web UI for Impala Daemon
Debug Web UI for StateStore
Decide to use the BucketCache
DECIMAL data type
Decimal type
Decommission or remove a tablet server
Dedicated Coordinator
Default EXPIRES ON tag policy
Default operational database cluster definition
Defining a backup target in solr.xml
Defining and adding clusters for replication
Defining Apache Atlas enumerations
Defining co-located Kafka clusters using a service dependency
Defining co-located Kafka clusters using Kafka credentials
Defining external Kafka clusters
Defining metadata tags
Defining related terms
Defining what assets to extract metadata for
Delegation token based authentication
Delete a group
Delete a role
Delete a user
Delete data
Delete partitions
Delete placement rules
Delete Queue
Delete queues
Delete snapshots
Delete snapshots using Cloudera Manager
DELETE statement
Deleting a collection
Deleting a Kafka topic
Deleting a notifier
Deleting a schema
Deleting all documents in a collection
Deleting an alert policy
Deleting data from a table
Deleting dynamically created child queues
Deleting tables
Deletion
DENSE_RANK
Deploy and manage services on YARN
Deploy HBase replication
Deploying and configuring Oozie Sqoop1 Action JDBC drivers
Deployment Planning for Cloudera Search
Deprecation Notices In Cloudera Runtime 7.2.12
DESCRIBE EXTENDED and DESCRIBE FORMATTED
DESCRIBE statement
Describing a materialized view
Detecting slow DataNodes
Determining the table type
Developing and running an Apache Spark WordCount application
Developing Apache Kafka Applications
Developing Apache Spark Applications
Developing Applications with Apache Kudu
Diagnostics logging
Difference between Tez UI and DAS
Dimensioning guidelines for deploying Cloudera Search
Direct Reader configuration properties
Direct Reader limitations
Direct Reader mode introduction
Directory configurations
Disable loading of coprocessors
Disable RegionServer grouping
Disable replication at the peer level
Disable the BoundedByteBufferPool
Disabling an alert policy
Disabling and redeploying HDFS HA
Disabling auto queue deletion
Disabling automatic compaction
Disabling Kerberos authentication for HBase clients
Disabling Oozie High Availability
Disabling S3Guard and Destroying a S3Guard Database
Disabling YARN Ranger authorization support
Disassociate partitions from queues
Disk Balancer commands
Disk management
Disk Removal
Disk Replacement
Disk space usage issue
Disk space versus namespace
DistCp and Proxy Settings
Distcp between secure clusters in different Kerberos realms
Distcp syntax and examples
DISTINCT operator
DML statements
DOUBLE data type
Downloading and exporting data from Hue
Driver inter-node coordination
Drop a Kudu table
DROP DATABASE statement
DROP FUNCTION statement
DROP MATERIALIZED VIEW
DROP STATS statement
DROP TABLE statement
DROP VIEW statement
Dropping a materialized view
Dropping an external table along with data
Dumping the Oozie database
Dynamic allocation
Dynamic Queue Scheduling [Technical Preview]
Dynamic resource allocation properties
Dynamic Resource Pool Settings
Dynamic resource-based column masking in Hive with Ranger policies
Dynamic tag-based column masking in Hive with Ranger policies
Dynamically loading a custom filter
Edit a group
Edit a role
Edit a user
Edit or delete a snapshot policy
Editing rack assignments for hosts
Editing tables
Effects of WAL rolling on replication
Enable Access Control for Data
Enable Access Control for Interpreter, Configuration, and Credential Settings
Enable Access Control for Notebooks
Enable and disable snapshot creation using Cloudera Manager
Enable asynchronous scheduler
Enable authorization for additional HDFS web UIs
Enable authorization for HDFS web UIs
Enable authorization in Kafka with Ranger
Enable authorization of StorageHandler-based tables in Data Hub
Enable bulk load replication using Cloudera Manager
Enable Cgroups
Enable cluster-wide HBase replication
Enable core dump
Enable detection of slow DataNodes
Enable disk IO statistics
Enable document-level authorization
Enable garbage collector logging
Enable GZipCodec as the default compression codec
Enable HBase high availability using Cloudera Manager
Enable HBase indexing
Enable hedged reads for HBase
Enable high availability
Enable HTTPS communication
Enable Intra-Queue preemption
Enable Intra-Queue Preemption for a specific queue
Enable Kerberos authentication
Enable Kerberos authentication and RPC encryption
Enable LDAP authentication in Solr
Enable multi-threaded faceting
Enable namespace mapping
Enable or disable authentication with delegation tokens
Enable override of default queue mappings
Enable Phoenix ACLs
Enable preemption for a specific queue
Enable Ranger authorization
Enable RegionServer grouping using Cloudera Manager
Enable replication on a specific table
Enable replication on HBase column families
Enable security for Cruise Control
Enable server-server mutual authentication
Enable snapshot creation on a directory
Enable the AdminServer
Enabling a multi-threaded environment for Hue
Enabling ABFS file browser for Hue configured with IDBroker
Enabling ABFS file browser for Hue configured without IDBroker
Enabling Access Control for Zeppelin Elements
Enabling access to HBase browser from Hue
Enabling ACL for RegionServer grouping
Enabling Admission Control
Enabling an alert policy
Enabling and disabling trash
Enabling custom Kerberos principal support in a Queue Manager cluster
Enabling custom Kerberos principal support in YARN
Enabling dynamic child creation in weight mode
Enabling fault-tolerant processing in Spark Streaming
Enabling HDFS HA
Enabling High Availability and automatic failover
Enabling Hue applications with Cloudera Manager
Enabling Hue as a TLS/SSL client
Enabling Hue as a TLS/SSL server using Cloudera Manager
Enabling interceptors
Enabling Kerberos for the SRM service
Enabling LazyPreemption
Enabling LDAP Authentication for impala-shell
Enabling LDAP authentication with HiveServer2 and Impala
Enabling LDAP in Hue
Enabling Native Acceleration For MLlib
Enabling node labels on a cluster to configure partition
Enabling Oozie High Availability
Enabling Oozie SLA with Cloudera Manager
Enabling or disabling anonymous usage date collection
Enabling Remote Querying [Technical Preview]
Enabling S3 browser for Hue configured with IDBroker
Enabling S3 browser for Hue configured without IDBroker
Enabling scheduled queries
Enabling Spark authentication
Enabling Spark Encryption
Enabling Speculative Execution
Enabling SSE-C
Enabling SSE-KMS
Enabling SSE-S3
Enabling the Dynamic Queue Scheduling feature
Enabling the Oozie web console on managed clusters
Enabling the SQL editor autocompleter
Enabling TLS/SSL communication with HiveServer2
Enabling TLS/SSL communication with Impala
Enabling TLS/SSL for Hue Load Balancer
Enabling TLS/SSL for the SRM service
Enabling vectorized query execution
Enabling YARN Ranger authorization support
Encrypting an S3 Bucket with Amazon S3 Default Encryption
Encrypting Data on S3
Encryption Reference
End to end latency overview
End to end latency use case
Enforcing TLS version 1.2 for Hue
Enhancements related to bulk glossary terms import
Environment variables for sizing NameNode heap memory
Erasure coding CLI command
Erasure coding examples
Erasure coding overview
Errors during hole punching test
Escaping an invalid identifier
Essential metrics to monitor
ETL with Cloudera Morphlines
Evolving a schema
Example - Placement rules creation
Example of Atlas S3 Lineage
Example use cases
Example workload
Example: Configuration for work preserving recovery
Example: Running SparkPi on YARN
Example: Using the HBase-Spark connector
Examples
Examples of accessing Amazon S3 data from Spark
Examples of Audit Operations
Examples of controlling data access using classifications
Examples of creating and using UDFs
Examples of creating secure external tables
Examples of DistCp commands using the S3 protocol and hidden credentials
Examples of estimating NameNode heap memory
Examples of Interacting with Schema Registry
Examples of overlapping quota policies
Exit statuses for the HDFS Balancer
Experimental flags
EXPLAIN statement
Exploring using Lineage
Export a Note
Export a snapshot to another cluster
Export all resource-based policies for all services
Export Ranger reports
Export resource-based policies for a specific service
Export tag-based policies
Exporting query results to Amazon S3
Expose HBase metrics to a Ganglia server
Extending Atlas to Manage Metadata from Additional Sources
External table access
Extracting ADLS Metadata using Atlas
Extracting S3 Metadata using Atlas
Extraction Command
Extraction Configuration
Extraction Prerequisites
Extractor configuration properties
Failures during INSERT, UPDATE, UPSERT, and DELETE operations
Fan-in and Fan-out Replication Flows
Feature comparison
Feature Comparisons
Fetching Spark Maven dependencies
File descriptor limits
File descriptors
Files and directories
Files and directories
Filesystems
Filter types
Finding issues
Finding the list of Hue superusers
Finding the list of Hue superusers
Fine-grained authorization
FIRST_VALUE
Fixed Issues In Cloudera Runtime 7.2.12
Fixed Issues In Cloudera Runtime 7.2.12.10
Fixed Issues In Cloudera Runtime 7.2.12.11
Fixed Issues In Cloudera Runtime 7.2.12.12
Fixed Issues In Cloudera Runtime 7.2.12.7
Fixed Issues In Cloudera Runtime 7.2.12.8
Fixed Issues In Cloudera Runtime 7.2.12.9
Fixing a warning related to accessing non-optimized Hue
Fixing authentication issues between HBase and Hue
Fixing block inconsistencies
Fixing issues
FLOAT data type
Flush options
Flushing data to disk
Format for using Hadoop archives with MapReduce
Frequently asked questions
Functions
General Quota Syntax
General Settings
Generate a table list
Generating and viewing Apache Hive statistics
Generating collection configuration using configs
Generating Solr collection configuration using instance directories
Generating statistics
Generating surrogate keys
Generating Table and Column Statistics
Getting scheduled query information and monitor the query
Getting the JDBC or ODBC driver in Data Hub
Glossaries overview
Governance
Governance
Governance
Governance Overview
Graceful HBase shutdown
Gracefully shut down an HBase RegionServer
Gracefully shut down the HBase service
GRANT statement
Granting permission to access S3 and ABFS File Browser in Hue
Granularity of metrics for end-to-end latency
GROUP BY clause
GROUPING() and GROUPING_ID() functions
Groups and fetching
GROUP_CONCAT function
Guidelines for Schema Design
Hadoop
Hadoop archive components
Hadoop File Formats Support
Hadoop File System commands
Handling disk failures
Handling large messages
Hash and hash partitioning
Hash and range partitioning
Hash partitioning
Hash partitioning
HashTable/SyncTable tool configuration
HAVING clause
HBase
HBase
HBase
HBase
HBase actions that produce Atlas entities
HBase audit entries
HBase authentication
HBase authorization
HBase backup and disaster recovery strategies
HBase entities created in Atlas
HBase filtering
HBase I/O components
HBase is using more disk space than expected
Hbase lineage
HBase metadata collection
HBase online merge
HBase read replicas
HBase Shell example
HBaseMapReduceIndexerTool command line reference
HBCK2 tool command reference
HDFS
HDFS
HDFS
HDFS ACLs
HDFS Block Skew
HDFS Caching
HDFS commands for metadata files and directories
HDFS entity metadata migration
HDFS Metrics
HDFS Overview
HDFS storage policies
HDFS storage types
HDFS storage types
Heap sampling
Hierarchical namespaces vs. non-namespaces
Hierarchical queue characteristics
High Availability on HDFS clusters
Highly Available Kafka Architectures
Hive
Hive
Hive
Hive authentication
Hive entity metadata migration
Hive low-latency analytical processing
Hive low-latency analytical processing
Hive table locations
Hive unsupported interfaces and features in public clouds
Hive Warehouse Connector for accessing Apache Spark data
Hive Warehouse Connector Interfaces
HiveServer actions that produce Atlas entities
HiveServer audit entries
HiveServer entities created in Atlas
HiveServer lineage
HiveServer metadata collection
HiveServer relationships
HMS table storage
How Cloudera Search works
How Cruise Control retrieves metrics
How Cruise Control self-healing works
How DAS helps to debug Hive on Tez queries
How Ignore and Prune feature works
How Lineage strategy works
How NameNode manages blocks on a failed DataNode
How NFS Gateway authenticates and maps users
How tag-based access control works
How the reporting task runs in a NiFi cluster
How to manage log rotation for Ranger Services
How to read the Placement Rules table
How to read the Schedule table
How to Set up Failover and Failback
HPL/SQL examples
HttpFS authentication
Hue
Hue
Hue
Hue Advanced Configuration Snippet
Hue configuration files
Hue logs
Hue Overview
Hue overview
Hue supported browsers
HWC
HWC and DataFrame API limitations
HWC and DataFrame APIs
HWC API Examples
HWC integration pyspark, sparklyr, and Zeppelin
HWC limitations
HWC supported types mapping
IAM Role permissions for working with SSE-KMS
Identifiers
Identifying problems
Ignore or Prune pattern to filter Hive metadata entities
Impact of quota violation policy
Impala
Impala
Impala
Impala actions that produce Atlas entities
Impala audit entries
Impala Authentication
Impala Authorization
Impala database containment model
Impala DDL for Kudu
Impala DML for Kudu Tables
Impala entities created in Atlas
Impala entity metadata migration
Impala integration limitations
Impala integration limitations
Impala lineage
Impala lineage
Impala Logs
Impala metadata collection
Impala Shell Command Reference
Impala Shell Configuration File
Impala Shell Configuration Options
Impala Shell Tool
Impala SQL and Hive SQL
Impala with Amazon S3
Impala with Azure Data Lake Store (ADLS)
Impala with HBase
Impala with HDFS
Impala with Kudu
Import a Note
Import and sync LDAP users and groups
Import command options
Import External Packages
Import resource-based policies for a specific service
Import resource-based policies for all services
Import tag-based policies
Importing a Bucket into S3Guard
Importing and exporting resource-based policies
Importing and exporting tag-based policies
Importing Business Metadata associations in bulk
Importing Confluent Schema Registry schemas into Cloudera Schema Registry
Importing data into HBase
Importing Glossary terms in bulk
Importing RDBMS data into Hive
Imports into Hive
Improve Performance in Schema Registry
Improving Performance for S3A
Improving performance with centralized cache management
Improving performance with short-circuit local reads
Improving Software Performance
Increasing StateStore Timeout
Increasing storage capacity with HDFS compression
Increasing the maximum number of processes for Oracle database
Index sample data
Indexing
Indexing data
Indexing Data Using Morphlines
Indexing Data Using Spark-Solr Connector
Indexing data with MapReduceIndexerTool in Solr backup format
Indexing sample tweets with Cloudera Search
Information and debugging
Ingestion
Initiate replication when data already exist
Initiating automatic compaction in Cloudera Manager
INSERT and primary key uniqueness violations
Insert data
INSERT statement
Inserting data into a table
Installing Apache Knox
Installing Atlas in HA
Installing the REST Server using Cloudera Manager
Installing the UDF development package
INT data type
Integrate Kafka and Schema Registry
Integrate Kafka and Schema Registry using NiFi Processors
Integrating Apache Hive with Apache Spark and BI
Integrating Hive and a BI tool
Integrating with Atlas
Integrating with Kafka
Integrating with NiFi
Integrating with Schema Registry
Integrating your identity provider's SAML server with Hue
Inter-broker security
Interacting with Hive views
Internal and external Impala tables
Internal private key infrastructure (PKI)
Introducing the S3A Committers
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction to alert policies in Streams Messaging Manager
Introduction to Apache HBase
Introduction to Apache Phoenix
Introduction to Atlas ADLS Extractor
Introduction to Azure Storage and the ABFS Connector
Introduction to HDFS metadata files and directories
Introduction to Hive metastore
Introduction to monitoring Kafka cluster replications in SMM
Introduction to S3Guard
Introduction to Streams Messaging Manager
Invalid method name: 'GetLog' error
Invalid query handle
INVALIDATE METADATA statement
ISR management
Issues starting or restarting the master or the tablet server
Java API example
Java client
JBOD
JBOD Disk migration
JBOD setup
JDBC connection string syntax
JDBC mode configuration properties
JDBC mode limitations
JDBC read mode introduction
Joins in Impala SELECT statements
JournalNodes
JournalNodes
JVM and garbage collection
Kafka
Kafka
Kafka
Kafka actions that produce Atlas entities
Kafka Architecture
Kafka audit entries
Kafka brokers and Zookeeper
Kafka clients and ZooKeeper
Kafka cluster load balancing using Cruise Control
Kafka consumers
Kafka credentials property reference
Kafka FAQ
Kafka Introduction
Kafka lineage
Kafka metadata collection
Kafka producers
Kafka public APIs
Kafka relationships
Kafka security hardening with Zookeeper ACLs
Kafka storage handler and table properties
Kafka Streams
kafka-*-perf-test
kafka-configs
kafka-console-consumer
kafka-console-producer
kafka-consumer-groups
kafka-delegation-tokens
kafka-log-dirs
kafka-reassign-partitions
kafka-topics
Kafka-ZooKeeper performance tuning
Keep replicas current
Kerberos configurations for HWC
Kerberos setup guidelines for Distcp between secure clusters
Kernel stack watchdog traces
Key Differences between INSERT-ONLY and FULL ACID Tables
Key Features
Known issues and limitations
Known Issues In Cloudera Runtime 7.2.12
Knox
Knox
Knox Supported Services Matrix
Kudu
Kudu
Kudu
Kudu
Kudu
Kudu architecture in a CDP public cloud deployment
Kudu authentication with Kerberos
Kudu backup
Kudu concepts
Kudu example applications
Kudu integration with Spark
Kudu introduction
Kudu master web interface
Kudu metrics
Kudu network architecture
Kudu Python client
Kudu recovery
Kudu schema design
Kudu tablet server web interface
Kudu tracing
Kudu transaction semantics
Kudu web interfaces
Kudu-Impala integration
LAG
LAST_VALUE
Launch a YARN service
Launch distcp
Launch Zeppelin
LAZY_PERSIST memory storage policy
LDAP authentication
LDAP properties
LDAP Settings
LEAD
Leader positions and in-sync replicas
Lengthy BalancerMember Route length
Leveraging Business Metadata
Lily HBase batch indexing for Cloudera Search
Lily HBase NRT indexing
LIMIT clause
Limit CPU usage with Cgroups
Limitations
Limitations
Limitations and restrictions for Impala UDFs
Limitations of Amazon S3
Limitations of Atlas-NiFi integration
Limitations of erasure coding
Limitations of Phoenix-Hive connector
Limitations of the S3A Committers
Limiting the speed of compactions
Lineage lifecycle
Lineage overview
Linux Container Executor
List files in Hadoop archives
List restored snapshots
List snapshots
Listing available metrics
Literals
Live write access
Livy API reference for batch jobs
Livy API reference for interactive sessions
Livy batch object
Livy interpreter configuration
Livy objects for interactive sessions
LOAD DATA statement
Loading ORC data into DataFrames using predicate push-down
Loading the Oozie database
Local file system support
Log Aggregation File Controllers
Log Aggregation Properties
Log cleaner
Logging Extractor Activity
Logical operators, comparison operators and comparators
Logs and log segments
Main Use Cases
Maintenance manager
Manage databases and tables
Manage dynamic queues
Manage HBase snapshots using COD CLI
Manage HBase snapshots using the HBase shell
Manage individual delegation tokens
Manage placement rules
Manage queries
Manage queues
Manage Ranger authorization in Solr
Manage reports
Manage the YARN service life cycle through the REST API
Managed Parent Queues
Management basics
Managing Access Control Lists
Managing alert policies and notifiers in SMM
Managing Alert Policies using Streams Messaging Manager
Managing and Allocating Cluster Resources using Capacity Scheduler
Managing Apache Hadoop YARN Services
Managing Apache HBase
Managing Apache HBase Security
Managing Apache Hive
Managing Apache Impala
Managing Apache Kafka
Managing Apache Kudu
Managing Apache Kudu Security
Managing Apache Phoenix Security
Managing Apache Phoenix security
Managing Apache ZooKeeper
Managing Apache ZooKeeper Security
Managing Auditing with Ranger
Managing Business Terms with Atlas Glossaries
Managing Cloudera Search
Managing collection configuration
Managing collections
Managing columns
Managing Cruise Control
Managing Data Storage
Managing dynamic child creation enabled parent queues
Managing Dynamic Configurations
Managing dynamically created child queues
Managing Hue permissions
Managing Kafka Topics using Streams Messaging Manager
Managing logging properties for Ranger services
Managing Logs
Managing Metadata in Impala
Managing Metadata in Impala
Managing partition retention time
Managing partitions
Managing query rewrites
Managing Resources in Impala
Managing snapshot policies using Cloudera Manager
Managing tables
Managing topics across multiple Kafka clusters
Manually configuring SAML authentication
Manually failing over to the standby NameNode
MAP complex type
Mapping Apache Phoenix schemas to Apache HBase namespaces
Mapping Atlas Identity to CDP users
MapReduce indexing
MapReduce Job ACLs
MapReduceIndexerTool
MapReduceIndexerTool input splits
MapReduceIndexerTool metadata
MapReduceIndexerTool usage syntax
Materialized views
Mathematical functions
Maven Artifacts for Cloudera Runtime 7.2.12
MAX
MAX function
Memory
Memory limits
Merge process stops during Sqoop incremental imports
Merging data in tables
Metrics
Metrics and Insight
Migrate brokers by modifying broker IDs in meta.properties
Migrate data on the same host
Migrate to multiple Kudu masters
Migrate to strongly consistent indexing
Migrating Consumer Groups Between Clusters
Migrating Data from Cloudera Navigator to Atlas
Migrating Data Using Sqoop
Migrating database configuration to a new location
Migrating Solr replicas
Migration from Fair Scheduler to Capacity Scheduler
Migration Guide
MIN
MIN function
Min/Max Filtering
Minimize cluster distruption during planned downtime
Miscellaneous functions
MOB cache properties
Modify GCS Bucket Permissions
Modify interpreter settings
Modifying a collection configuration generated using an instance directory
Modifying a Kafka topic
Modifying Impala Startup Options
Monitor cluster health with ksck
Monitor RegionServer grouping
Monitor the BlockCache
Monitor the performance of hedged reads
Monitoring
Monitoring and Debugging Spark Applications
Monitoring and Maintaining S3Guard
Monitoring Apache Impala
Monitoring Apache Kudu
Monitoring checkpoint latency for cluster replication
Monitoring end to end latency for Kafka topic
Monitoring End-to-End Latency using Streams Messaging Manager
Monitoring heap memory usage
Monitoring Kafka brokers
Monitoring Kafka cluster replications by quick ranges
Monitoring Kafka Cluster Replications using Streams Messaging Manager
Monitoring Kafka clusters
Monitoring Kafka Clusters using Streams Messaging Manager
Monitoring Kafka consumers
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring replication latency for cluster replication
Monitoring replication throughput and latency by values
Monitoring Replication with Streams Messaging Manager
Monitoring status of the clusters to be replicated
Monitoring throughput for cluster replication
Monitoring topics to be replicated
More Resources
Move HBase Master Role to another host
Moving a NameNode to a different host using Cloudera Manager
Moving highly available NameNode, failover controller, and JournalNode roles using the Migrate Roles wizard
Moving NameNode roles
Moving the JournalNode edits directory for a role group using Cloudera Manager
Moving the JournalNode edits directory for a role instance using Cloudera Manager
Multi-server LDAP/AD autentication
Multilevel partitioning
MySQL: 1040, 'Too many connections' exception
NameNode architecture
NameNodes
NameNodes
NDV function
Network and I/O threads
Networking parameters
New topic and consumer group discovery
NiFi lineage
NiFi metadata collection
Non-covering range partitions
Notes about replication
Notifiers
NTILE
Number-of-Regions Quotas
Number-of-Tables Quotas
ODBC sign-on walkthrough
Off-heap BucketCache
OFFSET clause
Offsets Subcommand
On-demand Metadata
On-demand Metadata
On-premise to Cloud and Kafka Version Upgrade
Oozie
Oozie
Oozie configurations with CDP services
Oozie database configurations
Oozie High Availability
Oozie Load Balancer configuration
Oozie scheduling examples
Oozie security enhancements
Operating system requirements
Operational Database
Operational Database
Operational database cluster
Operational Database Overview
Operational Database overview
Operators
Optimize mountable HDFS
Optimize performance for evaluating SQL predicates
Optimizer hints
Optimizing data storage
Optimizing HBase I/O
Optimizing NameNode disk space with Hadoop archives
Optimizing performance
Optimizing queries using partition pruning
Optimizing S3A read performance for different file types
Options to determine differences between contents of snapshots
ORC file format
ORC vs Parquet formats
Orchestrate a rolling restart with no downtime
ORDER BY clause
Other known issues
OVER
Overview
Overview
Overview
Overview
Overview
Overview of Hadoop archives
Overview of HDFS
Overview of Oozie
Packaging different versions of libraries with an Apache Spark application
PAM authentication
Parameters to configure the Disk Balancer
Parquet
Partition pruning
Partition Pruning for Queries
Partition refresh and configuration
Partitioning
Partitioning
Partitioning examples
Partitioning for Kudu Tables
Partitioning guidelines
Partitioning limitations
Partitioning limitations
Partitioning tables
Partitions
Partitions and performance
PERCENT_RANK
Perform a backup of the HDFS metadata
Perform a disk hot swap for DataNodes using Cloudera Manager
Perform ETL by ingesting data from Kafka into Hive
Perform hostname changes
Perform scans using HBase Shell
Perform the migration
Perform the recovery
Perform the removal
Performance and Scalability
Performance and storage considerations for Spark SQL DROP TABLE PURGE
Performance Best Practices
Performance considerations
Performance Considerations
Performance considerations for UDFs
Performance Impact of Encryption
Performance improvement using partitions
Performance issues
Performance tuning
Performant .NET producer
Periodically rebuilding a materialized view
Phoenix
Phoenix
Phoenix
Phoenix-Spark connector usage examples
Physical backups of an entire node
Placement rule policies
Plan the data movement across disks
Planning for Apache Impala
Planning for Apache Kudu
Planning for Streams Replication Manager
Planning overview
Populating an HBase Table
Populating an S3 bucket
Ports Used by Impala
POST /admin/audit/ API
Post-migration verification
Pre-defined Access Policies for Schema Registry
Pre-defined Access Policies for Schema Registry
Predicate push-down optimization
Preloaded resource-based services and policies
Prepare for hostname changes
Prepare for removal
Prepare for the migration
Prepare for the recovery
Prepare to back up the HDFS metadata
Preparing the hardware resources for HDFS High Availability
Preparing the S3 Bucket
Prerequisite
Prerequisites
Prerequisites
Prerequisites
Prerequisites for configuring short-ciruit local reads
Prerequisites for configuring TLS/SSL for Oozie
Prerequisites for enabling erasure coding
Prerequisites for enabling HDFS HA using Cloudera Manager
Prerequisites for setting up Atlas HA
Prerequisites to configure TLS/SSL for HBase
Preventing inadvertent deletion of directories
Previewing tables using Data Preview
Primary key design
Primary key index
Principal name mapping
Problem area: Compose page
Problem area: Queries page
Problem area: Reports page
Propagating classifications through lineage
Propagation of tags as deferred actions
Properties for configuring centralized caching
Properties for configuring short-circuit local reads on HDFS
Properties for configuring the Balancer
Properties to set the size of the NameNode edits directory
Protocol between consumer and broker
Provide Read-only access to Queue Manager UI
Provision an operational database cluster
Proxy Cloudera Manager through Apache Knox
Pruning Old Data from S3Guard Tables
Purging deleted entities
Purposely using a stale materialized view
PUT /admin/purge/ API
Queries are not appearing on the Queries page
Query an existing Kudu table from Impala
Query column is empty but you can see the DAG ID and Application ID
Query Join Performance
Query options
Query results cache
Query sample data
Query scheduling
Query vectorization
Querying
Querying a schema
Querying correlated data
Querying existing HBase tables
Querying files into a DataFrame
Querying Kafka data
Querying live data from Kafka
Querying the information_schema database
Queue ACLs
Quota enforcement
Quota violation policies
Quotas
Rack awareness
Rack awareness (Location awareness)
Range partitioning
Range partitioning
Ranger
Ranger
Ranger
Ranger access conditions
Ranger AD Integration
Ranger Audit Filters
Ranger client caching
Ranger console navigation
Ranger policies for Kudu
Ranger Policies Overview
Ranger Security Zones
Ranger tag-based policies
Ranger UI authentication
Ranger UI authorization
Ranger user management
Ranger Usersync
RANK
Read access
Read and write operations
Read operations (scans)
Read replica properties
Reading and writing Hive tables in R
Reading and writing Hive tables in Zeppelin
Reading data from HBase
Reading data through HWC
Reading Hive ORC tables
Reads (scans)
REAL data type
Reassigning replicas between log directories
Reassignment examples
Rebalancing partitions
Rebalancing with Cruise Control
Rebuild a Kudu filesystem layout
Recommendations for client development
Recommended configurations for the Balancer
Recommended configurations for the balancer
Recommended deployment architecture
Recommended settings for G1GC
Record management
Record order and assignment
Records
Recover data from a snapshot
Recover from a dead Kudu master
Recover from disk failure
Recover from full disks
Redaction
Redeploying the Oozie ShareLib
Redeploying the Oozie sharelib using Cloudera Manager
Reducing the Size of Data Structures
Refer to a table using dot notation
Reference architecture
Referencing S3 Data in Applications
Refining query search using filters
REFRESH AUTHORIZATION statement
REFRESH FUNCTIONS statement
REFRESH statement
Registering a Lily HBase Indexer Configuration with the Lily HBase Indexer Service
Registering the UDF
Reloading, viewing, and filtering functions
Remote Querying [Technical Preview]
Remote Topics
Remove a DataNode
Remove a RegionServer from RegionServer grouping
Remove Kudu masters
Remove or add storage directories for NameNode data directories
Remove storage directories using Cloudera Manager
Reorder placement rules
Repairing partitions manually using MSCK repair
Replace a disk on a DataNode host
Replace a ZooKeeper disk
Replace a ZooKeeper role on an unmanaged cluster
Replace a ZooKeeper role with ZooKeeper service downtime
Replace a ZooKeeper role without ZooKeeper service downtime
Replicate data between Data Hub clusters with cloud SRM
Replicate pre-exist data in an active-active deployment
Replicating Data
Replicating data from PvC Base to Data Hub with cloud SRM
Replicating data from PvC Base to Data Hub with on-prem SRM
Replication
Replication across three or more clusters
Replication caveats
Replication Flows Overview
Replication requirements
Report craches using breakpad
Request a timeline-consistent read
Requirements for Oozie High Availability
Reserved words
Resetting Hue user password
Resource allocation overview
Resource distribution workflow
Resource scheduling and management
Resource Tuning Example
Resource-based Services and Policies
Resources for on-boarding Azure for CDP users
Restore a snapshot
Restore data from a replica
Restore HDFS metadata from a backup using Cloudera Manager
Restore tables from backups
Restoring a collection
Restoring NameNode metadata
Restricting access to Kafka metadata in Zookeeper
Restricting Access to S3Guard Tables
Restricting classifications based on user permission
Restricting supported ciphers for Hue
Retries
Retrieving log directory replica assignment information
REVOKE statement
Rotate Auto-TLS Certificate Authority and Host Certificates
Rotate the master key/secret
Row-level filtering and column masking in Hive
Row-level filtering in Hive with Ranger policies
Row-level filtering in Impala with Ranger policies
ROW_NUMBER
RPC timeout traces
Run a tablet rebalancing tool in Cloudera Manager
Run a tablet rebalancing tool on a rack-aware cluster
Run the Disk Balancer plan
Run the spark-submit job
Run the tablet rebalancing tool
Running a Hive command
Running a Spark MLlib example
Running ADLS Metadata Extractor
Running an interactive session with the Livy REST API
Running Apache Spark Applications
Running bulk extraction
Running Bulk Extraction
Running Commands and SQL Statements in Impala Shell
Running incremental extraction
Running Incremental Extraction
Running PySpark in a virtual environment
Running sample Spark applications
Running shell commands
Running Spark 3 Applications
Running Spark applications on secure clusters
Running Spark applications on YARN
Running Spark Python applications
Running the balancer
Running the HBaseMapReduceIndexerTool
Running the HBCK2 tool
Running YARN Services
Running your first Spark application
Runtime environment for UDFs
Runtime error: Could not create thread: Resource temporarily unavailable (error 11)
Runtime Filtering
S3 actions that produce or update Atlas entities
S3 entities created in Atlas
S3 entity audit entries
S3 Extractor configuration
S3 Performance Checklist
S3 relationships
S3A and Checksums (Advanced Feature)
S3Guard: Operational Issues
Safely Writing to S3 Through the S3A Committers
SAML properties
Sample pom.xml file for Spark Streaming with Kafka
Save a YARN service definition
Saving searches
Saving the search results
Scalability Considerations
Scaling Kudu
Scaling Limits and Guidelines
Scaling recommendations and limitations
Scaling recommendations and limitations
Scheduler performance improvements
Scheduling among queues
Scheduling in Oozie using cron-like syntax
Schema alterations
Schema design limitations
Schema design limitations
Schema Entities
Schema objects
Schema Registry
Schema Registry
Schema Registry
Schema Registry actions that produce Atlas entities
Schema Registry audit entries
Schema Registry Authorization through Ranger Access Policies
Schema Registry Authorization through Ranger Access Policies
Schema Registry Component Architecture
Schema Registry Concepts
Schema Registry metadata collection
Schema Registry Overview
Schema Registry Overview
Schema Registry TLS Properties
Schema Registry Use Cases
Schema replationships
Schemaless mode overview and best practices
Script with HBase Shell
Search
Search
Search
Search
Search and other Runtime components
Search applications
Search Ranger reports
Search Tutorial
Searching by topic name
Searching for entities using Business Metadata attributes
Searching for entities using classifications
Searching Kafka cluster replications by source
Searching metadata tags
Searching overview
Searching queries
Searching tables
Searching using terms
Searching with Metadata
Secondary Sort
Securing Access to Hadoop Cluster: Apache Knox
Securing an endpoint under AutoTLS
Securing Apache Hive
Securing Apache Impala
Securing Apache Kafka
Securing Atlas
Securing Atlas
Securing Cloudera Search
Securing configs with ZooKeeper ACLs and Ranger
Securing Cruise Control
Securing database connections with TLS/SSL
Securing HiveServer using LDAP
Securing Hue
Securing Hue passwords with scripts
Securing Impala
Securing Schema Registry
Securing sessions
Securing Streams Messaging Manager
Securing Streams Messaging Manager
Securing Streams Replication Manager
Securing the S3A Committers
Security
Security considerations
Security considerations for UDFs
Security examples
Security examples
Security limitations
Security limitations
Security Model and Operations on S3
Security overview
SELECT statement
Server management limitations
Server management limitations
Service Pack in Cloudera Runtime 7.2.12
Services backed by PostgreSQL fail or stop responding
Set Application-Master resource-limit for a specific queue
Set default Application Master resource limit
Set global application limits
Set global maximum application priority
Set HADOOP_CONF to the destination cluster
Set Maximum Application limit for a specific queue
Set Ordering policies within a specific queue
Set properties in Cloudera Manager
Set quotas using Cloudera Manager
SET statement
Set up
Set up a storage policy for HDFS
Set up MirrorMaker in Cloudera Manager
Set up SSD storage using Cloudera Manager
Set up the HortonworksSchemaRegistry Controller Service
Set up WebHDFS on a secure cluster
Set user limits within a queue
Setting a default partition expression
Setting consumer and producer table properties
Setting HDFS quotas
Setting Java system properties for Solr
Setting Oozie permissions
Setting Python path variables for Livy
Setting the cache timeout
Setting the Idle Query and Idle Session Timeouts
Setting the Oozie database timezone
Setting the secure storage password as an environment variable
Setting the trash interval
Setting Timeout and Retries for Thrift Connections to Backend Client
Setting Timeouts in Impala
Setting up a CDW client
Setting up and configuring the ABFS connector
Setting up Atlas High Availability
Setting up Azure managed Identity for Extraction
Setting up Data Cache for Remote Reads
Setting up Data Cache for Remote Reads
Setting Up HDFS Caching
Setting up the development environment
Setting user limits for HBase
Setting user limits for Kafka
Settings to avoid data loss
Shell commands
Shiro Settings: Reference
shiro.ini Example
SHOW MATERIALIZED VIEWS
SHOW statement
Showing Atlas Server status
Showing materialized views
Shut Down Impala
SHUTDOWN statement
Signing on and running queries
Simple .NET consumer
Simple .NET producer
Simple Java consumer
Simple Java producer
Single tablet write operations
Size the BlockCache
Sizing estimation based on network and disk message throughput
Sizing NameNode heap memory
Slow name resolution and nscd
SMALLINT data type
Solr
Solr and HDFS - the block cache
Solr server tuning categories
solrctl Reference
Space quotas
Spark
Spark
Spark
Spark actions that produce Atlas entities
Spark application model
Spark audit entries
Spark cluster execution overview
Spark entities created in Apache Atlas
Spark entity metadata migration
Spark execution model
Spark indexing using morphlines
Spark integration best practices
Spark integration known issues and limitations
Spark integration limitations
Spark Job ACLs
Spark lineage
Spark metadata collection
Spark on YARN deployment modes
Spark relationships
Spark security
Spark SQL example
Spark Streaming and Dynamic Allocation
Spark Streaming Example
Spark troubleshooting
Spark tuning
spark-submit command options
Specify truststore properties
Specifying domains or pages to which Hue can redirect users
Specifying HTTP request methods
Specifying Impala Credentials to Access S3
Specifying racks for hosts
Speeding up Job Commits by Increasing the Number of Threads
Spooling Query Results
SQL migration to Impala
SQL statements
SQLContext and HiveContext
Sqoop
Sqoop
Sqoop
Sqoop Hive import stops when HS2 does not use Kerberos authentication
SRM Command Line Tools
SRM security example
SRM Service data traffic reference
srm-control
srm-control Options Reference
SSE-C: Server-Side Encryption with Customer-Provided Encryption Keys
SSE-KMS: Amazon S3-KMS Managed Encryption Keys
SSE-S3: Amazon S3-Managed Encryption Keys
Start and stop Kudu processes
Start and stop queues
Start and stop the NFS Gateway services
Start HBase
Start Queue
Starting and Stopping Apache Impala
Starting and stopping HBase using Cloudera Manager
Starting Apache Hive
Starting compaction manually
Starting Hive on an insecure cluster
Starting Hive using a password
Starting the Lily HBase NRT indexer service
Starting the Oozie server
Statistics generation and viewing commands
STDDEV, STDDEV_SAMP, STDDEV_POP functions
Step 1: Worker host configuration
Step 2: Worker host planning
Step 3: Cluster size
Step 6: Verify container settings on cluster
Step 6A: Cluster container capacity
Step 6B: Container parameters checking
Step 7: MapReduce configuration
Step 7A: MapReduce settings checking
Steps 4 and 5: Verify settings
Stop HBase
Stop Queue
Stop replication in an emergency
Stopping the Oozie server
Storage
Storage
Storage group classification
Storage group pairing
Storage Systems Supports
Storing medium objects (MOBs)
Streams Messaging
Streams Messaging
Streams Messaging
Streams Messaging Manager
Streams Messaging Manager
Streams Messaging Manager Overview
Streams Replication Manager
Streams Replication Manager
Streams Replication Manager
Streams Replication Manager Architecture
Streams Replication Manager Driver
Streams Replication Manager Overview
Streams Replication Manager Reference
Streams Replication Manager requirements
Streams Replication Manager Service
STRING data type
String functions
STRUCT complex type
Submit Oozie Jobs in Data Engineering Cluster
Submitting a Python app
Submitting a Scala or Java application
Submitting a Spark job to a Data Hub cluster using Livy
Submitting batch applications using the Livy REST API
Submitting Spark applications
Submitting Spark Applications to YARN
Submitting Spark applications using Livy
Subqueries in Impala SELECT statements
Subquery restrictions
Subscribing to a topic
SUM
SUM function
Supported special characters
Switching from CMS to G1GC
Symbolizing stack traces
Synchronize table data using HashTable/SyncTable tool
Synchronizing the contents of JournalNodes
System Level Broker Tuning
System metadata migration
Table and Column Statistics
Tables
TABLESAMPLE clause
Tablet history garbage collection and the ancient history mark
Tag-based Services and Policies
Tags and policy evaluation
Take a snapshot using a shell script
Task architecture and load-balancing
Terminologies
Terms
Test MOB storage and retrieval performance
Testing the LDAP configuration
The Cloud Storage Connectors
The HDFS mover command
The Hue load balancer not distributing users evenly across various Hue servers
The perfect schema
The S3A Committers and Third-Party Object Stores
Thread Tuning for S3A Data Upload
Threads
Thrift Server crashes after receiving invalid data
Throttle quota examples
Throttle quotas
Timeline consistency
TIMESTAMP compatibility for Parquet files
TIMESTAMP data type
TINYINT data type
TLS Certificate Requirements and Recommendations
TLS Encryption
TLS/SSL client authentication
To configure an S3 bucket to publish events
To configure an SQS queue suitable for Atlas extraction
Token-based authentication for Cloudera Data Warehouse integrations
Tombstoned or STOPPED tablet replicas
Tool usage
Top-down process for adding a new metadata source
Topics
Topics and Groups Subcommand
Transactional table access
Transactions
Transactions
Transitioning Navigator content to Atlas
Trash behavior with HDFS Transparent Encryption enabled
Troubleshoot RegionServer grouping
Troubleshooting
Troubleshooting ABFS
Troubleshooting Apache Hadoop YARN
Troubleshooting Apache HBase
Troubleshooting Apache Hive
Troubleshooting Apache Impala
Troubleshooting Apache Kudu
Troubleshooting Apache Sqoop
Troubleshooting Cloudera Search
Troubleshooting Data Analytics Studio
Troubleshooting Docker on YARN
Troubleshooting HBase
Troubleshooting Hue
Troubleshooting Impala
Troubleshooting Linux Container Executor
Troubleshooting NTP stability problems
Troubleshooting replication failure in the DAS Event Processor
Troubleshooting S3 and S3Guard
Troubleshooting SAML authentication
Troubleshooting Schema Registry
Troubleshooting the S3A Committers
TRUNCATE TABLE statement
Trusted users
Tuning Apache Hadoop YARN
Tuning Apache Impala
Tuning Apache Kafka Performance
Tuning Apache Spark
Tuning Apache Spark Applications
Tuning Cloudera Search
Tuning garbage collection
Tuning Hue
Tuning replication
Tuning Resource Allocation
Tuning S3A Uploads
Tuning Spark Shuffle Operations
Tuning the Number of Partitions
Turning safe mode on HA NameNodes
Tutorial
UDF concepts
UI Tools
Unable to alter S3-backed tables
Unable to authenticate users in Hue using SAML
Unable to connect to database with provided credential
Unable to view new databases and tables, or unable to see changes to the existing databases or tables
Unable to view Snappy-compressed files
Unaffected Components in this release
Understand the NiFi Record Based Processors and Controller Services
Understanding --go-live and HDFS ACLs
Understanding co-located and external clusters
Understanding erasure coding policies
Understanding HBase garbage collection
Understanding Hue users and groups
Understanding Impala integration with Kudu
Understanding Performance using EXPLAIN Plan
Understanding Performance using Query Profile
Understanding Performance using SUMMARY Report
Understanding Replication Flows
Understanding SRM properties, their configuration and hierarchy
Understanding the data that flow into Atlas
Understanding the extractHBaseCells Morphline Command
Understanding the extractHBaseCells Morphline Command
Understanding the kafka-run-class Bash Script
Understanding YARN architecture
UNION, INTERSECT, and EXCEPT clauses
Unlocking access to Kafka metadata in Zookeeper
Unsupported Apache Spark Features
Unsupported command line tools
Update data
Update Ranger audit configration parameters
UPDATE statement
Updating a notifier
Updating an alert policy
Updating data in a table
Updating Extractor Configuration with ADLS Authentication
Updating the schema in a collection
Uploading tables
Upsert a row
Upsert option in Kudu Spark
UPSERT statement
Usability issues
Use a CTE in a query
Use a custom MapReduce job
Use BulkLoad
Use Case 1: Registering and Querying a Schema for a Kafka Topic
Use Case 2: Reading/Deserializing and Writing/Serializing Data from and to a Kafka Topic
Use Case 3: Dataflow Management with Schema-based Routing
Use Case Architectures
Use cases
Use cases for ACLs on HDFS
Use cases for BulkLoad
Use cases for centralized cache management
Use cases for Streams Replication Manager in CDP Public Cloud
Use Cgroups
Use cluster names in the kudu command line tool
Use cluster replication
Use CopyTable
Use CPU scheduling
Use CPU scheduling with distributed shell
Use CREATE TABLE AS SELECT
Use curl to access a URL protected by Kerberos HTTP SPNEGO
Use Digest Authentication Provider
Use FPGA scheduling
Use FPGA with distributed shell
Use GZipCodec with a one-time job
Use HashTable and SyncTable Tool
Use multiple ZooKeeper services
Use partitions when submitting a job
Use rsync to copy files from one broker to another
Use snapshots
Use Spark
Use Spark with a secure Kudu cluster
Use Sqoop
USE statement
Use strongly consistent indexing
Use the Apache Thrift Proxy API
Use the Charts Library
Use the HBase APIs for Java
Use the HBase command-line utilities
Use the HBase REST server
Use the HBase shell
Use the Hue HBase app
Use the JDBC interpreter to access Hive
Use the JDBC interpreter to access Phoenix
Use the Livy interpreter to access Spark
Use the Network Time Protocol (NTP) with HBase
Use the YARN CLI to View Logs for Applications
Use the YARN REST APIs to manage applications
Use the yarn rmadmin tool to administer ResourceManager high availability
Use transactions with tables
Use wildcards with SHOW DATABASES
User Account Requirements
User authentication in Hue
User management in Hue
User-defined functions (UDFs)
Using --go-live with SSL or Kerberos
Using a credential provider to secure S3 credentials
Using a subquery
Using ABFS using CLI
Using advanced search
Using Amazon S3 with Hue
Using Apache HBase Backup and Disaster Recovery
Using Apache HBase Hive integration
Using Apache Hive
Using Apache Impala with Apache Kudu
Using Apache Phoenix to Store and Access Data
Using Apache Phoenix-Hive connector
Using Apache Phoenix-Spark connector
Using Apache Zeppelin
Using Avro Data Files
Using Azure Data Lake Storage Gen2 with Hue
Using Basic Search
Using Breakpad Minidumps for Crash Reporting
Using CLI commands to create and list ACLs
Using Cloudera Manager to manage HDFS HA
Using common table expressions
Using Configuration Properties to Authenticate
Using constraints
Using custom JAR files with Search
Using custom libraries with Spark
Using Data Analytics Studio
Using dfs.datanode.max.transfer.threads with HBase
Using Direct Reader mode
Using DistCp
Using DistCp between HA clusters using Cloudera Manager
Using DistCp to copy files
Using DistCp with Amazon S3
Using DistCp with Highly Available remote clusters
Using DNS with HBase
Using EC2 Instance Metadata to Authenticate
Using Environment Variables to Authenticate
Using erasure coding for existing data
Using erasure coding for new data
Using Free-text Search
Using functions
Using governance-based data discovery
Using HBase blocksize
Using HBase coprocessors
Using HBase Hive integration
Using HBase replication
Using HBase scanner heartbeat
Using HDFS snapshots for data protection
Using hedged reads
Using Hive Warehouse Connector with Oozie Spark action
Using HLL Datasketch Algorithms in Impala
Using HttpFS to provide access to HDFS
Using Hue
Using Hue
Using HWC for streaming
Using Ignore and Prune patterns
Using Impala to query Kudu tables
Using JDBC API
Using JDBC read mode
Using JMX for accessing HDFS metrics
Using KLL Datasketch Algorithms in Impala
Using Livy with interactive notebooks
Using Livy with Spark
Using Load Balancer with HttpFS
Using MapReduce batch indexing to index sample Tweets
Using metadata for cluster governance
Using Morphlines to index Avro
Using Morphlines with Syslog
Using non-JDBC drivers
Using optimizations from a subquery
Using ORC Data Files
Using Parquet Data Files
Using Per-Bucket Credentials to Authenticate
Using PySpark
Using quota management
Using rack awareness for read replicas
Using Ranger client libraries
Using Ranger to Provide Authorization in CDP
Using RCFile Data Files
Using Record-Enabled Processors
Using RegionServer grouping
Using S3Guard for Consistent S3 Metadata
Using Schema Registry
Using Search filters
Using secondary indexing
Using SequenceFile Data Files
Using session cookies to validate Ranger policies
Using solrctl with an HTTP proxy
Using Spark Hive Warehouse and HBase Connector Client .jar files with Livy
Using Spark MLlib
Using Spark SQL
Using Spark Streaming
Using SQL to query HBase from Hue
Using Sqoop actions with Oozie
Using SRM in CDP Public Cloud overview
Using Streams Replication Manager
Using tag attributes and values in Ranger tag-based policy conditions
Using Text Data Files
Using the CldrCopyTable utility to copy data
Using the Cloudera Runtime Maven repository
Using the cursor to return record sets
Using the Database Explorer
Using the Directory Committer in MapReduce
Using the HBase-Spark connector
Using the HBCK2 tool to remediate HBase clusters
Using the indexer HTTP interface
Using the Lily HBase NRT indexer service
Using the Livy API to run Spark jobs
Using the NFS Gateway for accessing HDFS
Using the Note Toolbar
Using the Passcode token
Using the Ranger Console
Using the REST API
Using the REST API
Using the REST proxy API
Using the S3Guard CLI
Using the S3Guard Command to List and Delete Uploads
Using the Spark DataFrame API
Using transactions
Using Unique Filenames to Avoid File Update Inconsistency
Using YARN Web UI and CLI
Using Zeppelin Interpreters
UTF-8 codec error
Validating the Cloudera Search deployment
VALUES statement
VARCHAR data type
Varchar type
VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP functions
Variations on Put
Vectorization default
Verifing use of a query rewrite
Verify that replication works
Verify the ZooKeeper authentication
Verify validity of the NFS services
Verifying Atlas for the extracted data
Verifying if a memory limit is sufficient
Verifying That an S3A Committer Was Used
Verifying that Indexing Works
Verifying that S3Guard is Enabled on a Bucket
Verifying the Impala dependency on Kudu
Verifying the setup
Versions
View All Applications
View application details
View audit details
View Cluster Overview
View Nodes and Node Details
View partitions
View query details
View Queues and Queue Details
View Ranger reports
View the API documentation
Viewing and modifying log levels for Search and related services
Viewing and modifying Search configuration using Cloudera Manager
Viewing compaction progress
Viewing detailed information
Viewing existing collections
Viewing Kafka cluster replication details
Viewing lineage
Viewing racks assigned to cluster hosts
Viewing storage information
Viewing table and column statistics
Viewing the DAG counters
Viewing the DAG flow
Viewing the Hive configurations for a query
Viewing the Join report
Viewing the query details
Viewing the query recommendations
Viewing the query timeline
Viewing the Read and Write report
Viewing the task-level DAG information
Viewing the Tez configurations for a query
Viewing the visual explain for a query
Viewing transaction locks
Viewing transactions
Views
Virtual machine options for HBase Shell
Virtual memory handling
Web User Interface for Debugging
What is Cloudera Search
What's New
When Shuffles Do Not Occur
When to Add a Shuffle Transformation
When to use Atlas classifications for access control
Why HDFS data becomes unbalanced
Why one scheduler?
Wildcards and variables in resource-based policies
WINDOW
WITH clause
Work Preserving Recovery for YARN components
Working with Amazon S3
Working with Apache Hive Metastore
Working with Atlas classifications and labels
Working with Classifications and Labels
Working with Google Cloud Storage
Working with S3 buckets in the same AWS region
Working with the ABFS Connector
Working with the Oozie server
Working with Third-party S3-compatible Object Stores
Working with versioned S3 buckets
Working with Zeppelin Notes
Write-ahead log garbage collection
Writes
Writing data through HWC
Writing data to HBase
Writing data to Kafka
Writing to multiple tablets
Writing transformed Hive data to Kafka
Writing UDFs
Writing user-defined aggregate functions (UDAFs)
YARN
YARN
YARN ACL rules
YARN ACL syntax
YARN ACL types
YARN Configuration Properties
YARN Features
YARN Log Aggregation Overview
YARN Ranger authorization support
YARN Ranger authorization support compatibility matrix
YARN resource allocation of multiple resource-types
YARN ResourceManager High Availability
YARN ResourceManager high availability architecture
YARN services API examples
YARN tuning overview
Zeppelin
Zeppelin
ZooKeeper
ZooKeeper
ZooKeeper ACLs Best Practices
ZooKeeper ACLs Best Practices: Atlas
ZooKeeper ACLs Best Practices: Cruise Control
ZooKeeper ACLs Best Practices: HBase
ZooKeeper ACLs Best Practices: HDFS
ZooKeeper ACLs Best Practices: Kafka
ZooKeeper ACLs Best Practices: Oozie
ZooKeeper ACLs Best Practices: Ranger
ZooKeeper ACLs best practices: Search
ZooKeeper ACLs Best Practices: YARN
ZooKeeper ACLs Best Practices: ZooKeeper
ZooKeeper Authentication
zookeeper-security-migration
«
Filter topics
▼
Fixed Issues in Apache Solr
Apache patch information
CVE-2021-45105 & CVE-2021-44832 Remediation for 7.2.12
CVE-2021-4428 Remediation for 7.2.12
Overview
Cloudera Runtime Component Versions
▶︎
Using the Cloudera Runtime Maven repository
Maven Artifacts for Cloudera Runtime 7.2.12
▶︎
What's New
Atlas
Cruise Control
HBase
Hive
Hue
Impala
Kudu
Schema Registry
Search
Spark
Sqoop
Streams Replication Manager
Unaffected Components in this release
▼
Fixed Issues In Cloudera Runtime 7.2.12
Atlas
Avro
Cloud Connectors
Cruise Control
DAS
Hadoop
HBase
HDFS
Hive
HWC
Hue
Impala
Kafka
Knox
Kudu
Oozie
Phoenix
Parquet
Ranger
Schema Registry
Search
Solr
Spark
Sqoop
Streams Messaging Manager
Streams Replication Manager
YARN
Zeppelin
ZooKeeper
Service Pack in Cloudera Runtime 7.2.12
Fixed Issues In Cloudera Runtime 7.2.12.7
Fixed Issues In Cloudera Runtime 7.2.12.8
Fixed Issues In Cloudera Runtime 7.2.12.9
Fixed Issues In Cloudera Runtime 7.2.12.10
Fixed Issues In Cloudera Runtime 7.2.12.11
Fixed Issues In Cloudera Runtime 7.2.12.12
▶︎
Known Issues In Cloudera Runtime 7.2.12
Atlas
Avro
Cruise Control
DAS
HBase
HDFS
Hive
Hue
Impala
Kafka
Knox
Kudu
Oozie
Phoenix
Ranger
Schema Registry
Search
Spark
Sqoop
Streams Messaging Manager
Streams Replication Manager
YARN
Zeppelin
ZooKeeper
▶︎
Behavioral Changes In Cloudera Runtime 7.2.12
Kudu
Search
Phoenix
▶︎
Deprecation Notices In Cloudera Runtime 7.2.12
Kudu
Kafka
HBase
»
Cloudera Runtime Release Notes
Fixed Issues in Apache Solr
There are no fixed issues for Solr in Cloudera Runtime 7.2.12.
Apache patch information
🔗
None
Parent topic:
Fixed Issues In Cloudera Runtime 7.2.12
7.3.1
7.2
7.2.18
7.2.17
7.2.16
7.2.15
7.2.14
7.2.12
7.2.11
7.2.10
7.2.9
7.2.8
7.2.7
7.2.6
7.2.2
7.2.1
7.2.0
7.1.0
7.0
7.0.2
7.0.1
7.0.0
This site uses cookies and related technologies, as described in our
privacy policy
, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or
manage your own preferences.
Accept all