Homepage
/
Cloudera on cloud Runtime
7.0.1
(Public Cloud)
Search Documentation
▶︎
Cloudera
Reference Architectures
▼
Cloudera Public Cloud
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
Runtime
▶︎
Cloudera Private Cloud
Data Services
Getting Started
Cloudera Manager
Management Console
Replication Manager
Data Catalog
Data Engineering
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Base
Getting Started
Runtime
Upgrade
Storage
Flow Management
Streaming Analytics
Flow Management Operator
Streaming Analytics Operator
Streams Messaging Operator
▶︎
Cloudera Manager
Cloudera Manager
▶︎
Applications
Cloudera Streaming Community Edition
Data Science Workbench
Data Visualization
Edge Management
Observability SaaS
Observability on premises
Workload XM On-Prem
▶︎
Legacy
Cloudera Enterprise
Flow Management
Stream Processing
HDP
HDF
Streams Messaging Manager
Streams Replication Manager
▶︎
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
Runtime
«
Filter topics
Cloudera Runtime
▶︎
Cloudera Runtime Release Notes
▶︎
What's New
Atlas
HBase
HDFS
Hive
Hue
Impala
Knox
Kudu
Oozie
Ranger
Spark
YARN
ZooKeeper
▶︎
Known Issues
Atlas
Hadoop
HBase
HDFS
Hive
Hue
Impala
Knox
Kudu
Oozie
Ranger
Solr
Spark
Sqoop
YARN
Zeppelin
ZooKeeper
Cloudera Runtime Component Versions
Cloudera Manager Release Notes
▶︎
Concepts
▶︎
Storage
▶︎
HDFS Overview
▶︎
Introduction
Overview of HDFS
▶︎
NameNodes
▶︎
Moving NameNode roles
Moving highly available NameNode, failover controller, and JournalNode roles using the Migrate Roles wizard
Moving a NameNode to a different host using Cloudera Manager
▶︎
Sizing NameNode heap memory
Environment variables for sizing NameNode heap memory
Monitoring heap memory usage
Files and directories
Disk space versus namespace
Replication
Examples of estimating NameNode heap memory
▶︎
DataNodes
How NameNode manages blocks on a failed DataNode
Remove a DataNode
Add storage directories using Cloudera Manager
Remove storage directories using Cloudera Manager
▶︎
JournalNodes
Moving the JournalNode edits directory for a role group using Cloudera Manager
Moving the JournalNode edits directory for a role instance using Cloudera Manager
▶︎
Apache Kudu Overview
▶︎
Apache Kudu overview
Kudu-Impala integration
Example use cases
▶︎
Apache Kudu concepts and architecture
Columnar datastore
Raft consensus algorithm
Table
Tablet
Tablet server
Master
Catalog table
Logical replication
Architectural overview
▶︎
Apache Kudu usage limitations
Schema design limitations
Partitioning limitations
Scaling recommendations and limitations
Server management limitations
Cluster management limitations
Impala integration limitations
Spark integration limitations
Security limitations
Other known issues
More Resources
▶︎
Apache Kudu Design
▶︎
Apache Kudu schema design
The perfect schema
▶︎
Column design
Decimal type
Column encoding
Column compression
▶︎
Primary key design
Primary key index
Considerations for backfill inserts
▶︎
Partitioning
▶︎
Range partitioning
Adding and Removing Range Partitions
Hash partitioning
Multilevel partitioning
Partition pruning
▶︎
Partitioning examples
Range partitioning
Hash partitioning
Hash and range partitioning
Hash and hash partitioning
Schema alterations
Schema design limitations
▶︎
Apache Kudu transaction semantics
Single tablet write operations
Writing to multiple tablets
Read operations (scans)
▶︎
Known issues and limitations
Writes
Reads (scans)
▶︎
Scaling Kudu
Terms
Example workload
▶︎
Memory
Verifying if a memory limit is sufficient
File descriptors
Threads
▶︎
Apache Hadoop YARN Overview
▶︎
Introduction
YARN Features
YARN Unsupported Features
Understanding YARN architecture
▶︎
Data Access
▶︎
Apache Hive Overview
Apache Hive key features
Changes after upgrading
Unsupported interfaces
Apache Hive 3 architectural overview
Apache Hive content roadmap
▶︎
Apache Impala Overview
Introduction
Components
▶︎
Hue Overview
Introduction
▶︎
Apache HBase Overview
Overview
Use cases for HBase
HBase on CDP
▶︎
Data Science
▶︎
Apache Spark Overview
Apache Spark Overview
Unsupported Apache Spark Features
▶︎
Apache Zeppelin Overview
Overview
▶︎
CDP Security Overview
CDP user management system
Data Lake security
▶︎
CDP identity management
FreeIPA identity management
Cloud identity federation
Authentication with Apache Knox
Security terminology
▶︎
Governance
▶︎
Governance Overview
▶︎
Governance overview
Data Stewardship with Apache Atlas
Atlas dashboard tour
Apache Atlas metadata collection overview
Atlas Metadata model overview
▶︎
Controlling Data Access with Tags
▶︎
Controlling data access with tags
When to use Atlas classifications for access control
How tag-based access control works
Examples of controlling data access using classifications
▶︎
Extending Atlas to Manage Metadata from Additional Sources
Extending Atlas to manage metadata from additional sources
▶︎
Planning for Apache Impala
Guidelines for Schema Design
User Account Requirements
▼
How To
▶︎
Storage
▶︎
Managing Data Storage
▶︎
Optimizing data storage
▶︎
Erasure coding overview
Understanding erasure coding policies
Comparing replication and erasure coding
Prerequisites for enabling erasure coding
Limitations of erasure coding
Using erasure coding for existing data
Using erasure coding for new data
Advanced erasure coding configuration
Erasure coding CLI command
Erasure coding examples
▶︎
Increasing storage capacity with HDFS compression
Enable GZipCodec as the default compression codec
Use GZipCodec with a one-time job
▶︎
Setting HDFS quotas
Set quotas using Cloudera Manager
▶︎
Configuring heterogeneous storage in HDFS
Set up a storage policy for HDFS
Set up SSD storage using Cloudera Manager
▶︎
Balancing data across an HDFS cluster
Why HDFS data becomes unbalanced
▶︎
Configurations and CLI options for the HDFS Balancer
Properties for configuring the Balancer
Balancer commands
Recommended configurations for the Balancer
▶︎
Configuring and running the HDFS balancer using Cloudera Manager
Configuring the balancer threshold
Configuring concurrent moves
Recommended configurations for the balancer
Running the balancer
Configuring block size
▶︎
Cluster balancing algorithm
Storage group classification
Storage group pairing
Block move scheduling
Block move execution
Exit statuses for the HDFS Balancer
▶︎
Optimizing performance
▶︎
Improving performance with centralized cache management
Benefits of centralized cache management in HDFS
Use cases for centralized cache management
Centralized cache management architecture
Caching terminology
Properties for configuring centralized caching
Commands for using cache pools and directives
▶︎
Customizing HDFS
Customize the HDFS home directory
Properties to set the size of the NameNode edits directory
▶︎
Optimizing NameNode disk space with Hadoop archives
Overview of Hadoop archives
Hadoop archive components
Create a Hadoop archive
List files in Hadoop archives
Format for using Hadoop archives with MapReduce
▶︎
Detecting slow DataNodes
Enable detection of slow DataNodes
▶︎
Allocating DataNode memory as storage
HDFS storage types
LAZY_PERSIST memory storage policy
Configure DataNode memory as storage
▶︎
Improving performance with short-circuit local reads
Prerequisites for configuring short-ciruit local reads
Properties for configuring short-circuit local reads on HDFS
▶︎
Using DistCp to copy files
Using DistCp
Update and overwrite
DistCp and security settings
Secure-to-secure: Kerberos principal name
Secure-to-secure: ResourceManager mapping rules
DistCp between HA clusters
▶︎
Using DistCp with Amazon S3
Using a credential provider to secure S3 credentials
Examples of DistCp commands using the S3 protocol and hidden credentials
DistCp additional considerations
▶︎
APIs for accessing HDFS
Set up WebHDFS on a secure cluster
▶︎
Using HttpFS to provide access to HDFS
Add the HttpFS role
Using Load Balancer with HttpFS
▶︎
Data storage metrics
Using JMX for accessing HDFS metrics
▶︎
Configure the G1GC garbage collector
Recommended settings for G1GC
Switching from CMS to G1GC
HDFS Metrics
▶︎
Configuring Data Protection
▶︎
Data protection
▶︎
Backing up HDFS metadata
▶︎
Introduction to HDFS metadata files and directories
▶︎
Files and directories
NameNodes
JournalNodes
DataNodes
▶︎
HDFS commands for metadata files and directories
Configuration properties
▶︎
Back up HDFS metadata
Prepare to back up the HDFS metadata
Backing up NameNode metadata
Back up HDFS metadata using Cloudera Manager
Restoring NameNode metadata
Restore HDFS metadata from a backup using Cloudera Manager
Perform a backup of the HDFS metadata
▶︎
Using HDFS snapshots for data protection
Considerations for working with HDFS snapshots
Enable snapshot creation on a directory
Create snapshots on a directory
Recover data from a snapshot
Options to determine differences between contents of snapshots
CLI commands to perform snapshot operations
▶︎
Managing snapshot policies using Cloudera Manager
Create a snapshot policy
Edit or delete a snapshot policy
Enable and disable snapshot creation using Cloudera Manager
Create snapshots using Cloudera Manager
Delete snapshots using Cloudera Manager
▶︎
Configuring HDFS trash
Trash behavior with HDFS Transparent Encryption enabled
Enabling and disabling trash
Setting the trash interval
▶︎
Configuring Fault Tolerance
▶︎
High Availability on HDFS clusters
▶︎
Configuring HDFS High Availability
NameNode architecture
Preparing the hardware resources for HDFS High Availability
▶︎
Using Cloudera Manager to manage HDFS HA
Enabling HDFS HA
Prerequisites for enabling HDFS HA using Cloudera Manager
Enabling High Availability and automatic failover
Disabling and redeploying HDFS HA
▶︎
Configuring other CDP components to use HDFS HA
Configuring HBase to use HDFS HA
Configuring the Hive Metastore to use HDFS HA
Configuring Impala to work with HDFS HA
Configuring oozie to use HDFS HA
Changing a nameservice name for Highly Available HDFS using Cloudera Manager
Manually failing over to the standby NameNode
Additional HDFS haadmin commands to administer the cluster
Turning safe mode on HA NameNodes
Converting from an NFS-mounted shared edits directory to Quorum-Based Storage
Administrative commands
▶︎
Configuring HDFS ACLs
Apache HDFS ACLs
Configuring ACLs on HDFS
Using CLI commands to create and list ACLs
ACL examples
ACLS on HDFS features
Use cases for ACLs on HDFS
▶︎
Administering Apache Kudu
▶︎
Apache Kudu administration
Starting and stopping Kudu processes
▶︎
Kudu web interfaces
Kudu master web interface
Kudu tablet server web interface
Common web interface pages
▶︎
Kudu metrics
Listing available metrics
Collecting metrics via HTTP
Diagnostics logging
Rack awareness (Location awareness)
▶︎
Common Kudu workflows
▶︎
Migrating to multiple Kudu masters
Prepare for the migration
Perform the migration
▶︎
Recovering from a dead Kudu master in a multi-master deployment
Prepare for the recovery
Perform the recovery
▶︎
Removing Kudu masters from a multi-master deployment
Prepare for removal
Perform the removal
▶︎
Changing master hostnames
Prepare for hostname changes
Perform hostname changes
Best practices when adding new tablet servers
Monitoring cluster health with ksck
Changing directory configuration
Recovering from disk failure
Recovering from full disks
Bringing a tablet that has lost a majority of replicas back online
Rebuilding a Kudu filesystem layout
Physical backups of an entire node
Scaling storage on Kudu master and tablet servers in the cloud
Migrating Kudu data from one directory to another on the same host
Minimizing cluster disruption during temporary planned downtime of a single tablet server
▶︎
Running tablet rebalancing tool
Running a tablet rebalancing tool on a rack-aware cluster
Running a tablet rebalancing tool in Cloudera Manager
Decommissioning or permanently removing a tablet server from a cluster
Using cluster names in the kudu command line tool
▶︎
Managing Kudu with Cloudera Manager
Enabling core dump for the Kudu service
Verifying the Impala dependency on Kudu
Using the Charts Library with the Kudu service
▶︎
Kudu security
▶︎
Kudu authentication with Kerberos
Internal private key infrastructure (PKI)
Authentication tokens
Client authentication to secure Kudu clusters
Scalability
Coarse-grained authorization
Encryption
Web UI encryption
Web UI redaction
Log redaction
▶︎
Configuring a secure Kudu cluster using Cloudera Manager
Enabling Kerberos authentication and RPC encryption
Configuring coarse-grained authorization with ACLs
Configuring HTTPS encryption for the Kudu master and tablet server web UIs
Configuring a secure Kudu cluster using the command line
▶︎
Apache Kudu background maintenance tasks
Maintenance manager
Flushing data to disk
Compacting on-disk data
Write-ahead log garbage collection
Tablet history garbage collection and the ancient history mark
▶︎
Developing Applications with Apache Kudu
▶︎
Developing applications with Apache Kudu
Viewing the API documentation
Kudu example applications
Maven artifacts
Building the Java client
Kudu Python client
▶︎
Kudu integration with Spark
Upsert option in Kudu Spark
Using Spark with a secure Kudu cluster
Spark integration known issues and limitations
Spark integration best practices
▶︎
Using Apache Impala with Apache Kudu
▶︎
Using Apache Impala with Apache Kudu
Impala database containment model
Internal and external Impala tables
▶︎
Using Impala to query Kudu tables
Querying an existing Kudu table from Impala
Creating a new Kudu table from Impala
CREATE TABLE AS SELECT
▶︎
Partitioning tables
Basic partitioning
Advanced partitioning
Non-covering range partitions
Partitioning guidelines
Optimizing performance for evaluating SQL predicates
Inserting a row
Inserting in bulk
INSERT and primary key uniqueness violations
Updating a row
Updating in bulk
Upserting a row
Altering a table
Deleting a row
Deleting in bulk
Failures during INSERT, UPDATE, UPSERT, and DELETE operations
Altering table properties
Dropping a Kudu table using Impala
Security considerations
Known issues and limitations
Next steps
▶︎
Accessing Cloud Data
Cloud storage connectors overview
The Cloud Storage Connectors
▶︎
Working with Amazon S3
Limitations of Amazon S3
▶︎
Configuring Access to S3
Using EC2 Instance Metadata to Authenticate
Referencing S3 Data in Applications
▶︎
Configuring Per-Bucket Settings
Customizing Per-Bucket Secrets Held in Credential Files
Configuring Per-Bucket Settings to Access Data Around the World
▶︎
Encrypting Data on S3
▶︎
SSE-S3: Amazon S3-Managed Encryption Keys
Enabling SSE-S3
▶︎
SSE-KMS: Amazon S3-KMS Managed Encryption Keys
Enabling SSE-KMS
IAM Role permissions for working with SSE-KMS
▶︎
SSE-C: Server-Side Encryption with Customer-Provided Encryption Keys
Enabling SSE-C
Configuring Encryption for Specific Buckets
Encrypting an S3 Bucket with Amazon S3 Default Encryption
Performance Impact of Encryption
▶︎
Using S3Guard for Consistent S3 Metadata
Introduction to S3Guard
Configuring S3Guard
Monitoring and Maintaining S3Guard
Disabling S3Guard and destroying a table
Pruning Old Data from S3Guard Tables
Importing a Bucket into S3Guard
Verifying that S3Guard is Enabled on a Bucket
Using the S3Guard CLI
S3Guard: Operational Issues
▶︎
Safely Writing to S3 Through the S3A Committers
Introducing the S3A Committers
Configuring Directories for Intermediate Data
Using the Directory Committer in MapReduce
Verifying That an S3A Committer Was Used
Cleaning up after failed jobs
Using the S3Guard Command to List and Delete Uploads
▶︎
Advanced Committer Configuration
Enabling Speculative Execution
Using Unique Filenames to Avoid File Update Inconsistency
Speeding up Job Commits by Increasing the Number of Threads
Securing the S3A Committers
The S3A Committers and Third-Party Object Stores
Limitations of the S3A Committers
Troubleshooting the S3A Committers
Security Model and Operations on S3
S3A and Checksums (Advanced Feature)
A List of S3A Configuration Properties
Working with versioned S3 buckets
Working with Third-party S3-compatible Object Stores
▶︎
Improving Performance for S3A
Working with S3 buckets in the same AWS region
▶︎
Configuring and tuning S3A block upload
Tuning S3A Uploads
Thread Tuning for S3A Data Upload
Optimizing S3A read performance for different file types
S3 Performance Checklist
Troubleshooting S3 and S3Guard
▶︎
Compute
▶︎
Managing Applications on Apache Hadoop YARN
▶︎
Application development
Use the YARN REST APIs to manage applications
▶︎
Managing applications
▶︎
Manage long-running YARN applications
▶︎
Use the YARN Services API
Configure YARN Services using Cloudera Manager
▶︎
Deploying and Managing Services and Microservices on YARN
Launch the Service YARN file
Save the YARN file as Service App
Managing the YARN service life cycle through the REST API
YARN Services API Examples
Configure Cross-Origin Support on YARN
▶︎
Managing and Allocating Cluster Resources using Capacity Scheduler
▶︎
Cluster Management with Capacity Scheduler
▶︎
Using scheduling to allocate resources
YARN resource allocation
Use CPU scheduling
Configure CPU scheduling and isolation
▶︎
Limit CPU usage with Cgroups
Enable Cgroups
Using Cgroups
▶︎
Partition a cluster using node labels
Configure node labels
Use node labels
▶︎
Allocating Resources with Capacity Scheduler
Capacity Scheduler Overview
Enable the Capacity Scheduler
▶︎
Set up queues
Hierarchical Queue Characteristics
Scheduling Among Queues
Control access to queues with ACLs
▶︎
Define queue mapping policies
Configure queue mapping for users and groups to specific queues
Configure queue mapping for users and groups to queues with the same name
Configure queue mapping to use the user name from the application tag
Enable override of default queue mappings
Manage cluster capacity with queues
Set queue priorities
Resource distribution workflow
Resource distribution workflow example
Set user limits
Application reservations
▶︎
Set flexible scheduling policies
Examples of FIFO and Fair Sharing policies
Configure queue ordering policies
Best practices for ordering policies
Start and stop queues
Set application limits
▶︎
Enable preemption
Preemption workflow
Configure preemption
Enable priority scheduling
Configure ACLs for application priorities
▶︎
Enable intra-queue preemption
Properties for configuring intra-queue preemption
Intra-Queue preemption based on application priorities
Intra-Queue preemption based on user limits
▶︎
Configuring Apache Hadoop YARN Security
▶︎
Managing Access Control Lists
YARN ACL rules
YARN ACL syntax
▶︎
YARN ACL types
YARN Admin ACLs
▶︎
YARN Application ACLs
MapReduce Application ACL
Spark Application ACL
Viewing application logs
Killing an application
Application ACL evaluation
Example: Killing an application in the "Production" queue
Example: Moving the application and viewing the log in the "Test" queue
Enable ACLs
Configure ACLs
▶︎
Using YARN with a secure cluster
Configure YARN for long-running applications
▶︎
Configure TLS/SSL for core Hadoop services
Configure TLS/SSL for HDFS
Configure TLS/SSL for YARN
▶︎
Linux Container Executor
Troubleshooting
▶︎
Configuring Apache Hadoop YARN High Availability
▶︎
YARN ResourceManager High Availability
YARN ResourceManager High Availability architecture
▶︎
Configure YARN ResourceManager High Availability
Enable High Availability
Disable High Availability
Use the yarn rmadmin tool to Administer ResourceManager High Availability
▶︎
Work preserving recovery for YARN components
Disable work preserving recovery on ResourceManager
Enable work preserving recovery on NodeManager
Example: Configuration for work preserving recovery
▶︎
Configuring Apache Hadoop YARN Log Aggregation
▶︎
Monitoring clusters using YARN web user interface
Access the YARN web user interface
Monitor clusters
Monitor queues
▶︎
Monitor applications
Search an application
View application details
▶︎
Manage and monitor services
▶︎
Create new services
Create a standard service
Create a custom service
Monitor nodes
Tools
▶︎
Accessing YARN logs
Use the YARN CLI to view logs for applications
▶︎
Use log aggregation
Log Aggregation File Controllers
Configure log aggregation
Log aggregation properties
Enable debug-delay
▶︎
Managing Apache ZooKeeper
Add a ZooKeeper service
Use multiple ZooKeeper services
Replace a ZooKeeper disk
Replace a ZooKeeper role with ZooKeeper service downtime
Replace a ZooKeeper role without ZooKeeper service downtime
Replace a ZooKeeper role on an unmanaged cluster
Confirm the election status of a ZooKeeper service
▶︎
Managing Apache ZooKeeper Security
▶︎
ZooKeeper Authentication
Configure ZooKeeper server for Kerberos authentication
Configure ZooKeeper client shell for Kerberos authentication
Verify the ZooKeeper authentication
Enable server-server mutual authentication
▶︎
ZooKeeper ACLs Best Practices
ZooKeeper ACLs Best Practices: Atlas
ZooKeeper ACLs Best Practices: HBase
ZooKeeper ACLs Best Practices: HDFS
ZooKeeper ACLs Best Practices: Oozie
ZooKeeper ACLs Best Practices: Ranger
ZooKeeper ACLs Best Practices: YARN
ZooKeeper ACLs Best Practices: ZooKeeper
▼
Data Access
▶︎
Starting Apache Hive
Start Hive on an insecure cluster
Start Hive using a password
Run a Hive command
Convert Hive CLI scripts to Beeline
▶︎
Using Apache Hive
▶︎
Apache Hive 3 tables
Hive table locations
Create a CRUD transactional table
Create an insert-only transactional table
Create an S3-based external table
Drop an external table along with data
Using constraints
Determine the table type
Hive 3 ACID transactions
▶︎
Using materialized views
Create and use a materialized view
Use a materialized view in a subquery
Drop a materialized view
Show materialized views
Describe a materialized view
Manage query rewrites
Create and use a partitioned materialized view
▶︎
Apache Hive Query basics
Query the information_schema database
Insert data into an ACID table
Update data in a table
Merge data in tables
Delete data from a table
▶︎
Create a temporary table
Configure temporary table storage
▶︎
Use a subquery
Subquery restrictions
Aggregate and group data
Query correlated data
▶︎
Using common table expressions
Use a CTE in a query
Escape an illegal identifier
CHAR data type support
Partitions introduction
Create partitions dynamically
▶︎
Manage partitions
Automate partition discovery and repair
Repair partitions manually using MSCK repair
Manage partition retention time
Generate surrogate keys
Using JdbcStorageHandler to query an RDBMS
▶︎
Using functions
Reload, view, and filter functions
▶︎
Create a user-defined function
Set up the development environment
Create the UDF class
Build the project and upload the JAR
Register the UDF
Call the UDF in a query
▶︎
Managing Apache Hive
▶︎
ACID operations
Configure partitions for transactions
View transactions
View transaction locks
▶︎
Data compaction
Compaction prerequisites
Enable automatic compaction
Start compaction manually
View compaction progress
Disable automatic compaction
Compactor properties
▶︎
Query vectorization
Check query execution
▶︎
Configuring Apache Hive
Limit concurrent connections
Configuring HiveServer high availability
▶︎
Generating statistics
Set up the cost-based optimizer and statistics
Generate and view Apache Hive statistics
Statistics generation and viewing commands
▶︎
Configuring Apache Hive Metastore
Introduction to Hive metastore
Configuring HMS for high availability
HMS table storage
▶︎
Setting up the metastore database
▶︎
Set up the backend Hive metastore database
Set up MariaDB or MySQL database
Set up a PostgreSQL database
Set up an Oracle database
Configure metastore database properties
Configure metastore location and HTTP mode
Set up a JDBC URL connection override
Tuning the metastore
▶︎
Setting up a shared Amazon RDS as a Hive metastore
Set up Amazon RDS as a Hive metastore
▶︎
Securing Apache Hive
Authorizing Apache Hive Access
▶︎
HDFS ACL permissions model and YARN queues
Configure storage-based authorization
Storage-based operation permissions
Configure HiveServer for ETL using YARN queues
Transactional table access
External table access
Apache Spark access to Apache Hive
▶︎
Hive Authentication
Secure HiveServer using LDAP
Client connections to HiveServer
JDBC connection string syntax
▶︎
Encrypting Communication
Enable TLS/SSL for HiveServer
Enable SASL in HiveServer
Secure Hive Metastore
▶︎
Integrating Apache Hive with Apache Spark and BI
▶︎
Hive Warehouse Connector for accessing Apache Spark data
Apache Spark-Apache Hive connection configuration
Submit a Hive Warehouse Connector Scala or Java application
Submit a Hive Warehouse Connector Python app
Hive Warehouse Connector supported types
▶︎
HiveWarehouseSession API operations
Catalog operations
Read and write operations
Close HiveWarehouseSession operations
Use the Hive Warehouse Connector for streaming
Hive Warehouse Connector API Examples
Hive Warehouse Connector Interfaces
▶︎
Connecting Hive to BI tools using a JDBC/ODBC driver
Specify the JDBC connection string
JDBC connection string syntax
Using JdbcStorageHandler to query an RDBMS
▶︎
Apache Hive Performance Tuning
Low-latency analytical processing
Best practices for performance tuning
Key components of warehouse processing
Query result cache and metastore cache
▶︎
Maximizing storage resources using ORC
Advanced ORC properties
Improving performance using partitions
Handling bucketed tables
▶︎
Migrating Data Using Sqoop
Data migration to Apache Hive
Set Up Sqoop
▶︎
Moving data from databases to Apache Hive
Create a Sqoop import command
Import RDBMS data into Hive
▶︎
Moving data from HDFS to Apache Hive
Import RDBMS data to HDFS
Convert an HDFS file to ORC
Incrementally update an imported table
Import command options
▶︎
Managing Apache Impala
Modifying Impala Startup Options
▶︎
Monitoring Impala
▶︎
Impala Logs
Managing Logs
Impala lineage
▶︎
Web User Interface for Debugging
Debug Web UI for Impala Daemon
Debug Web UI for StateStore
Debug Web UI for Catalog Server
Configuring Impala Web UI
Stopping Impala
▶︎
Securing Impala
Configuring Impala TLS/SSL
▶︎
Impala Authentication
Configuring Kerberos Authentication
▶︎
Configuring LDAP Authentication
Enabling LDAP for in Hue
Enabling LDAP Authentication for impala-shell
▶︎
Impala Authorization
Configuring Authorization
▶︎
Tuning Impala
Setting Up HDFS Caching
Setting up Data Cache for Remote Reads
Configuring Dedicated Coordinators and Executors
▶︎
Managing Resources in Impala
Admission Control and Query Queuing
Enabling Admission Control
Creating Static Pools
Configuring Dynamic Resource Pool
Dynamic Resource Pool Settings
Admission Control Sample Scenario
Cancelling a Query
▶︎
Managing Metadata in Impala
On-demand Metadata
Automatic Invalidation of Metadata Cache
▶︎
Automatic Invalidation/Refresh of Metadata
Configuring Event Based Automatic Metadata Sync
▶︎
Setting Timeouts in Impala
Setting Timeout and Retries for Thrift Connections to Backend Client
Increasing StateStore Timeout
Setting the Idle Query and Idle Session Timeouts
Configuring Load Balancer for Impala
▶︎
Configuring Client Access to Impala
▶︎
Impala Shell Tool
Impala Shell Configuration Options
Impala Shell Configuration File
Connecting to Impala Daemon in Impala Shell
Running Commands and SQL Statements in Impala Shell
Impala Shell Command Reference
Configuring ODBC for Impala
Configuring JDBC for Impala
Configuring Delegation for Clients
▼
Using Hue
Using Hue
Enabling the SQL editor autocompleter
▶︎
Using governance-based data discovery
Searching metadata tags
▼
Using Amazon S3 with Hue
Populating an S3 bucket
Creating a table from an Amazon S3 file
Exporting query results to Amazon S3
▶︎
Administering Hue
Hue configuration files
Hue safety valves
Hue logs
Hue supported browsers
Adding a Hue service with Cloudera Manager
Adding a Hue role instance with Cloudera Manager
▶︎
Customizing the Hue web UI
Adding a custom banner
Changing the page logo
Setting the cache timeout
Enabling or disabling anonymous usage date collection
Enabling Hue applications with Cloudera Manager
Running shell commands
▶︎
Operational Database
▶︎
Getting Started with Apache HBase
Operational database cluster
Before you create an operational database cluster
▶︎
Creating an operational database cluster
Default operational database cluster definition
Provision an operational database cluster
▶︎
Configuring Apache HBase
Use DNS with HBase
Use the Network Time Protocol (NTP) with HBase
Configure the graceful shutdown timeout property
Configure the HBase Thrift Server Role
▶︎
Setting User Limits for HBase
Configure ulimit for HBase using Cloudera Manager
Configure ulimit for HBase using the Command Line
Configure ulimit using Pluggable Authentication Modules using the Command Line
Use dfs.datanode.max.transfer.threads with Hbase
Configure encryption in HBase
▶︎
Using Hedged Reads
Enable hedged reads for HBase
Monitor the performance of hedged reads
▶︎
Understanding HBase Garbage Collection
Configure HBase garbage collection
Disable the BoundedByteBufferPool
Configure the HBase Canary
▶︎
Using HBase Blocksize
Configure the blocksize for a column family
Monitor blocksize metrics
▶︎
Configuring HBase BlockCache
Contents of the BlockCache
Size the BlockCache
Decide to use the BucketCache
▶︎
About the Off-heap BucketCache
Off-heap BucketCache
BucketCache IO engine
Configure BucketCache IO engine
Configure the off-heap BucketCache using Cloudera Manager
Configure the off-heap BucketCache using the command line
Cache eviction priorities
Bypass the BlockCache
Monitor the BlockCache
▶︎
Using HBase Scanner Heartbeat
Configure the scanner heartbeat using Cloudera Manager
▶︎
Limiting the Speed of Compactions
Configure the compaction speed using Cloudera Manager
Enable HBase indexing
▶︎
Using HBase Coprocessors
Add a custom coprocessor
Disable loading of coprocessors
▶︎
Configuring HBase MultiWAL
Configuring MultiWAL support using Cloudera Manager
▶︎
Configuring the Storage Policy for the Write-Ahead Log (WAL)
Configure the storage policy for WALs using Cloudera Manager
Configure the storage policy for WALs using the Command Line
▶︎
Using RegionServer Grouping
Enable RegionServer grouping using Cloudera Manager
Configure RegionServer grouping
Monitor RegionServer grouping
Remove a RegionServer from RegionServer grouping
Enable ACL for RegionServer grouping
Best practices when using RegionServer grouping
Disable RegionServer grouping
▶︎
Optimizing HBase I/O
HBase I/O components
Advanced configuration for write-heavy workloads
▶︎
Using Amazon S3 with Apache HBase
HBase Object Store Semantics Overview
HBase configuration to use S3 as a storage layer
▶︎
Managing Apache HBase Security
▶︎
HBase Authentication
Configure HBase servers to authenticate with a secure HDFS cluster
Configure secure HBase replication
Configure the HBase client TGT renewal period
HBase Authorization
▶︎
Configuring TLS/SSL for HBase
Prerequisites to configure TLS/SSL for HBase
Configure TLS/SSL for HBase Web UIs
Configure TLS/SSL for HBase REST Server
Configure TLS/SSL for HBase Thrift Server
▶︎
Accessing Apache HBase
HBase Shell overview
Virtual machine options for HBase Shell
Script with HBase Shell
Use HBase command-line utilities
Use the Java API
Use the Apache Thrift Proxy API
Use the Hue HBase app
▶︎
Managing Apache HBase
▶︎
Starting and Stopping HBase using Cloudera Manager
Start HBase
Stop HBase
▶︎
Graceful HBase Shutdown
Gracefully shut down an HBase RegionServer
Gracefully shut down the HBase service
▶︎
Importing data into HBase
Choose the right import method
Use snapshots
Use CopyTable
▶︎
Use BulkLoad
Use cases for BulkLoad
Use cluster replication
Use Sqoop
Use Spark
Use a custom MapReduce job
▶︎
Use HashTable and SyncTable Tool
HashTable/SyncTable tool configuration
Synchronize table data using HashTable/SyncTable tool
▶︎
Writing data to HBase
Variations on Put
Versions
Deletion
Examples
▶︎
Reading data from HBase
Perform scans using HBase Shell
▶︎
HBase filtering
Dynamically loading a custom filter
Logical operators, comparison operators and comparators
Compound operators
Filter types
HBase Shell example
Java API example
HBase online merge
Move HBase Master Role to another host
Expose HBase metrics to a Ganglia server
▶︎
Configuring Apache HBase High Availability
Enable HBase high availability using Cloudera Manager
HBase read replicas
Timeline consistency
Keep replicas current
Read replica properties
Configure read replicas using Cloudera Manager
▶︎
Using rack awareness for read replicas
Create a topology map
Create a topology script
Activate read replicas on a table
Request a timeline-consistent read
▶︎
Using Apache HBase Backup and Disaster Recovery
HBase backup and disaster recovery strategies
▶︎
Configuring HBase Snapshots
About HBase snapshots
Configure snapshots
▶︎
Manage HBase snapshots using Cloudera Manager
Browse HBase tables
Take HBase snapshots
▶︎
Store HBase snapshots on Amazon S3
Configure HBase in Cloudera Manager to store snapshots in Amazon S3
Configure the dynamic resource pool used for exporting and importing snapshots in Amazon S3
HBase snapshots on Amazon S3 with Kerberos enabled
Manage HBase snapshots on Amazon S3 in Cloudera Manager
Delete HBase snapshots from Amazon S3
Restore an HBase snapshot from Amazon S3
Restore an HBase snapshot from Amazon S3 with a new name
Manage Policies for HBase snapshots in Amazon S3
▶︎
Manage HBase snapshots using the Command Line
Shell commands
Take a snapshot using a shell script
Export a snapshot to another cluster
▶︎
Snapshot failures
Information and debugging
▶︎
Using HBase Replication
Common replication topologies
Notes about replication
Replication requirements
▶︎
Deploy HBase replication
Replication across three or more clusters
Enable replication on a specific table
Configure secure replication
Create empty table on the destination cluster
Disable replication at the peer level
Stop replication in an emergency
▶︎
Initiate replication when data already exist
Replicate pre-exist data in an active-active deployment
Effects of WAL rolling on replication
Configure secure HBase replication
Restore data from a replica
Verify that replication works
Replication caveats
▶︎
Data Science
▶︎
Configuring Apache Spark
▶︎
Configuring dynamic resource allocation
Customize dynamic resource allocation settings
Configure a Spark job for dynamic resource allocation
Dynamic resource allocation properties
▶︎
Spark security
Enabling Spark authentication
Enabling Spark Encryption
Running Spark applications on secure clusters
Accessing compressed files in Spark
▶︎
Developing Apache Spark Applications
Introduction
Spark application model
Spark execution model
Developing and running an Apache Spark WordCount application
Using the Spark DataFrame API
▶︎
Building Spark Applications
Best practices for building Apache Spark applications
Building reusable modules in Apache Spark applications
Packaging different versions of libraries with an Apache Spark application
▶︎
Using Spark SQL
SQLContext and HiveContext
Querying files into a DataFrame
Spark SQL example
Interacting with Hive views
Performance and storage considerations for Spark SQL DROP TABLE PURGE
TIMESTAMP compatibility for Parquet files
Accessing Spark SQL through the Spark shell
Calling Hive user-defined functions (UDFs)
▶︎
Using Spark Streaming
Spark Streaming and Dynamic Allocation
Spark Streaming Example
Enabling fault-tolerant processing in Spark Streaming
Configuring authentication for long-running Spark Streaming jobs
Building and running a Spark Streaming application
Sample pom.xml file for Spark Streaming with Kafka
▶︎
Accessing external storage from Spark
▶︎
Accessing data stored in Amazon S3 through Spark
Examples of accessing Amazon S3 data from Spark
Accessing Hive from Spark
Accessing HDFS Files from Spark
▶︎
Accessing ORC Data in Hive Tables
Accessing ORC files from Spark
Predicate push-down optimization
Loading ORC data into DataFrames using predicate push-down
Optimizing queries using partition pruning
Enabling vectorized query execution
Reading Hive ORC tables
Accessing Avro data files from Spark SQL applications
Accessing Parquet files from Spark SQL applications
▶︎
Using Spark MLlib
Running a Spark MLlib example
Enabling Native Acceleration For MLlib
Using custom libraries with Spark
▶︎
Running Apache Spark Applications
Introduction
Running your first Spark application
Running sample Spark applications
▶︎
Configuring Spark Applications
Configuring Spark application properties in spark-defaults.conf
Configuring Spark application logging properties
▶︎
Submitting Spark applications
spark-submit command options
Spark cluster execution overview
Canary test for pyspark command
Fetching Spark Maven dependencies
Accessing the Spark History Server
▶︎
Running Spark applications on YARN
Spark on YARN deployment modes
Submitting Spark Applications to YARN
Monitoring and Debugging Spark Applications
Example: Running SparkPi on YARN
Configuring Spark on YARN Applications
Dynamic allocation
▶︎
Submitting Spark applications using Livy
Using Livy with Spark
Using Livy with interactive notebooks
Using the Livy API to run Spark jobs
▶︎
Running an interactive session with the Livy API
Livy objects for interactive sessions
Setting Python path variables for Livy
Livy API reference for interactive sessions
▶︎
Submitting batch applications using the Livy API
Livy batch object
Livy API reference for batch jobs
▶︎
Using PySpark
Running PySpark in a virtual environment
Running Spark Python applications
Automating Spark Jobs with Oozie Spark Action
▶︎
Tuning Apache Spark
Introduction
Check Job Status
Check Job History
Improving Software Performance
▶︎
Tuning Apache Spark Applications
Tuning Spark Shuffle Operations
Choosing Transformations to Minimize Shuffles
When Shuffles Do Not Occur
When to Add a Shuffle Transformation
Secondary Sort
Tuning Resource Allocation
Resource Tuning Example
Tuning the Number of Partitions
Reducing the Size of Data Structures
Choosing Data Formats
▶︎
Configuring Apache Zeppelin
Introduction
Configuring Livy
Configure User Impersonation for Access to Hive
Configure User Impersonation for Access to Phoenix
▶︎
Enabling Access Control for Zeppelin Elements
Enable Access Control for Interpreter, Configuration, and Credential Settings
Enable Access Control for Notebooks
Enable Access Control for Data
▶︎
Shiro Settings: Reference
Active Directory Settings
LDAP Settings
General Settings
shiro.ini Example
▶︎
Using Apache Zeppelin
Introduction
Launch Zeppelin
▶︎
Working with Zeppelin Notes
Create and Run a Note
Import a Note
Export a Note
Using the Note Toolbar
Import External Packages
▶︎
Configuring and Using Zeppelin Interpreters
Modify interpreter settings
Using Zeppelin Interpreters
Customize interpreter settings in a note
Use the JDBC interpreter to access Hive
Use the JDBC interpreter to access Phoenix
Use the Livy interpreter to access Spark
Using Spark Hive Warehouse and HBase Connector Client .jar files with Livy
▶︎
Security
▶︎
Apache Ranger Auditing
Audit Overview
▶︎
Managing Auditing with Ranger
View audit details
Create a read-only Admin user (Auditor)
▶︎
Apache Ranger Authorization
Using Ranger to Provide Authorization in CDP
▶︎
Ranger Policies Overview
Ranger tag-based policies
Tags and policy evaluation
Ranger access conditions
▶︎
Using the Ranger Console
Accessing the Ranger console
Ranger console navigation
▶︎
Resource-based Services and Policies
▶︎
Configuring resource-based services
Configure a resource-based service: Atlas
Configure a resource-based service: HBase
Configure a resource-based service: HDFS
Configure a resource-based service: Hive
Configure a resource-based service: Kafka
Configure a resource-based service: Knox
Configure a resource-based service: NiFi
Configure a resource-based service: NiFi Registry
Configure a resource-based service: Solr
Configure a resource-based service: Storm
Configure a resource-based service: YARN
▶︎
Configuring resource-based policies
Configure a resource-based policy: Atlas
Configure a resource-based policy: HBase
Configure a resource-based policy: HDFS
Configure a resource-based policy: Hive
Configure a resource-based policy: Kafka
Configure a resource-based policy: Knox
Configure a resource-based policy: NiFi
Configure a resource-based policy: NiFi Registry
Configure a resource-based policy: Solr
Configure a resource-based policy: Storm
Configure a resource-based policy: YARN
Wildcards and variables in resource-based policies
Preloaded resource-based services and policies
▶︎
Importing and exporting resource-based policies
Import resource-based policies for a specific service
Import resource-based policies for all services
Export resource-based policies for a specific service
Export all resource-based policies for all services
▶︎
Row-level filtering and column masking in Hive
Row-level filtering in Hive with Ranger policies
Dynamic resource-based column masking in Hive with Ranger policies
Dynamic tag-based column masking in Hive with Ranger policies
▶︎
Tag-based Services and Policies
Adding a tag-based service
▶︎
Adding tag-based policies
Using tag attributes and values in Ranger tag-based policy conditions
Adding a tag-based PII policy
Default EXPIRES ON tag policy
▶︎
Importing and exporting tag-based policies
Import tag-based policies
Export tag-based policies
Create a time-bound policy
▶︎
Ranger Security Zones
Overview
Adding a Ranger security zone
▶︎
Administering Ranger Users, Groups, Roles, and Permissions
Add a user
Edit a user
Delete a user
Add a group
Edit a group
Delete a group
Add or edit permissions
▶︎
Administering Ranger Reports
View Ranger reports
Search Ranger reports
Export Ranger reports
▶︎
Configuring Ranger Authentication with UNIX, LDAP, or AD
▶︎
Configuring Ranger Authentication with UNIX, LDAP, or AD
Configure Ranger authentication for UNIX
Configure Ranger authentication for AD
Configure Ranger authentication for LDAP
▶︎
Ranger AD Integration
Ranger UI authentication
Ranger UI authorization
Ranger Usersync
Ranger user management
Known issue: Ranger group mapping
▶︎
Governance
▶︎
Searching with Metadata
Searching overview
Using Basic Search
Using Free-text Search
Saving searches
Using advanced search
▶︎
Working with Classifications and Labels
Working with Atlas classifications
Creating classifications
Adding attributes to classifications
Associating classifications with entities
Propagating classifications through lineage
Searching for entities using classifications
▶︎
Exploring using Lineage
Lineage overview
Viewing lineage
Lineage lifecycle
▶︎
Managing Business Terms with Atlas Glossaries
Glossaries overview
Creating glossaries
Creating terms
Associating terms with entities
Defining related terms
Creating categories
Assigning terms to categories
Searching using terms
▶︎
Configuring Oozie
Overview of Oozie
Adding the Oozie service using Cloudera Manager
Considerations for Oozie to work with aws
▶︎
Redeploying the Oozie ShareLib
Redeploying the Oozie sharelib using Cloudera Manager
▶︎
Oozie configurations with CDP services
▶︎
Using Sqoop actions with Oozie
Deploying and configuring Oozie Sqoop1 Action JDBC drivers
Configuring Oozie Sqoop1 Action workflow JDBC drivers
Configuring Oozie to enable MapReduce jobs to read or write from Amazon S3
Configuring Oozie to use HDFS HA
▶︎
Scheduling in Oozie using cron-like syntax
Oozie scheduling examples
▶︎
Configuring an external database for Oozie
Configuring PostgreSQL for Oozie
Configuring MariaDB for Oozie
Configuring MySQL for Oozie
Configuring Oracle for Oozie
▶︎
Working with the Oozie server
Starting the Oozie server
Stopping the Oozie server
Accessing the Oozie server with the Oozie CLIent
Accessing the Oozie server with a browser
Adding schema to Oozie using Cloudera Manager
Enabling the Oozie web console on managed clusters
Enabling Oozie SLA with Cloudera Manager
▶︎
Oozie database configurations
Configuring Oozie data purge settings using Cloudera Manager
Loading the Oozie database
Dumping the Oozie database
Setting the Oozie database timezone
Prerequisites for configuring TLS/SSL for Oozie
Configure TLS/SSL for Oozie
Additional considerations when configuring TLS/SSL for Oozie HA
▶︎
Troubleshooting
▶︎
Troubleshooting Apache Hive
Unable to alter S3-backed tables
▶︎
Troubleshooting Apache Impala
Troubleshooting Impala
Using Breakpad Minidumps for Crash Reporting
▶︎
Troubleshooting Apache HBase
Troubleshooting HBase
▶︎
Using the HBCK2 tool to remediate HBase clusters
Running the HBCK2 tool
Finding issues
Fixing issues
HBCK2 tool command reference
Thrift Server crashes after receiving invalid data
HBase is using more disk space than expected
Troubleshoot RegionServer grouping
▶︎
Troubleshooting Apache Kudu
▶︎
Troubleshooting Apache Kudu
▶︎
Issues starting or restarting the master or the tablet server
Errors during hole punching test
Already present: FS layout already exists
▶︎
NTP clock synchronization
Installing NTP
▶︎
Monitoring NTP status
Using chrony for time synchronization
NTP configuration best practices
Troubleshooting NTP stability problems
Disk space usage
Reporting Kudu crashes using breakpad
▶︎
Troubleshooting performance issues
▶︎
Kudu tracing
Accessing the tracing web interface
RPC timeout traces
Kernel stack watchdog traces
Memory limits
Block cache size
Heap sampling
Slow name resolution and nscd
▶︎
Usability issues
ClassNotFoundException: com.cloudera.kudu.hive.KuduStorageHandler
Runtime error: Could not create thread: Resource temporarily unavailable (error 11)
Tombstoned or STOPPED tablet replicas
Corruption: checksum error on CFile block
▶︎
Troubleshooting Apache Sqoop
Merge process stops during Sqoop incremental imports
Sqoop Hive import stops when HS2 does not use Kerberos authentication
▶︎
Reference
▶︎
Apache Atlas Reference
Apache Atlas Advanced Search language reference
Apache Atlas Statistics reference
▶︎
Hive Server 2 metadata collection
Hive Server 2 actions that produce Atlas entities
Hive Server 2 entities created in Atlas
Hive Server 2 relationships
Hive Server 2 lineage
Hive Server 2 audit entries
▶︎
HBase metadata collection
HBase actions that produce Atlas entities
HBase entities created in Atlas
Hbase lineage
HBase audit entries
▶︎
Impala metadata collection
Impala actions that produce Atlas entities
Impala entities created in Atlas
Impala lineage
Impala audit entries
▶︎
Spark metadata collection
Spark actions that produce Atlas entities
Spark entities created in Apache Atlas
Spark lineage
Spark relationships
Spark audit entries
Spark troubleshooting
▶︎
Apache Hadoop YARN Tuning
YARN tuning overview
Step 1: Worker host configuration
Step 2: Worker host planning
Step 3: Cluster size
Steps 4 and 5: Verify settings
Step 6: Verify container settings on cluster
Step 6A: Cluster container capacity
Step 6B: Container sanity checking
Step 7: MapReduce configuration
Step 7A: MapReduce sanity checking
Set properties in Cloudera Manager
Configure memory settings
▶︎
Apache Hive Materialized View Commands
ALTER MATERIALIZED VIEW REBUILD
ALTER MATERIALIZED VIEW REWRITE
CREATE MATERIALIZED VIEW
DESCRIBE EXTENDED and DESCRIBE FORMATTED
DROP MATERIALIZED VIEW
SHOW MATERIALIZED VIEWS
▶︎
Apache Impala Reference
▶︎
Performance Considerations
Performance Best Practices
Query Join Performance
▶︎
Table and Column Statistics
Generating Table and Column Statistics
Runtime Filtering
▶︎
Partitioning
Partition Pruning for Queries
HDFS Caching
HDFS Block Skew
Understanding Performance using EXPLAIN Plan
Understanding Performance using SUMMARY Report
Understanding Performance using Query Profile
▶︎
Scalability Considerations
Scaling Limits and Guidelines
Dedicated Coordinator
▶︎
Hadoop File Formats Supports
Using Text Data Files
Using Parquet Data Files
Using ORC Data Files
Using Avro Data Files
Using RCFile Data Files
Using SequenceFile Data Files
▶︎
Storage Systems Supports
Impala with HDFS
▶︎
Impala with Kudu
Configuring for Kudu Tables
▶︎
Impala DDL for Kudu
Partitioning for Kudu Tables
Impala DML for Kudu Tables
Impala with HBase
Impala with Azure Data Lake Store (ADLS)
▶︎
Impala with Amazon S3
Specifying Impala Credentials to Access S3
Ports Used by Impala
Migration Guide
▶︎
Apache Impala SQL Reference
▶︎
Impala SQL
▶︎
Schema objects
Aliases
Databases
Functions
Identifiers
Tables
Views
▶︎
Data types
ARRAY complex type
BIGINT data type
BOOLEAN data type
CHAR data type
DATE data type
DECIMAL data type
DOUBLE data type
FLOAT data type
INT data type
MAP complex type
REAL data type
SMALLINT data type
STRING data type
STRUCT complex type
▶︎
TIMESTAMP data type
Customizing time zones
TINYINT data type
VARCHAR data type
Complex types
Literals
Operators
Comments
▶︎
SQL statements
DDL statements
DML statements
ALTER DATABASE statement
ALTER TABLE statement
ALTER VIEW statement
COMMENT statement
COMPUTE STATS statement
CREATE DATABASE statement
CREATE FUNCTION statement
CREATE TABLE statement
CREATE VIEW statement
DELETE statement
DESCRIBE statement
DROP DATABASE statement
DROP FUNCTION statement
DROP STATS statement
DROP TABLE statement
DROP VIEW statement
EXPLAIN statement
GRANT statement
INSERT statement
INVALIDATE METADATA statement
LOAD DATA statement
REFRESH statement
REFRESH AUTHORIZATION statement
REFRESH FUNCTIONS statement
REVOKE statement
▶︎
SELECT statement
Joins in Impala SELECT statements
ORDER BY clause
GROUP BY clause
HAVING clause
LIMIT clause
OFFSET clause
UNION clause
Subqueries in Impala SELECT statements
TABLESAMPLE clause
WITH clause
DISTINCT operator
▶︎
SET statement
Query options
SHOW statement
SHUTDOWN statement
TRUNCATE TABLE statement
UPDATE statement
UPSERT statement
USE statement
VALUES statement
Optimizer hints
▶︎
Built-in functions
Mathematical functions
Bit functions
Conversion functions
Date and time functions
Conditional functions
String functions
Miscellaneous functions
▶︎
Aggregate functions
APPX_MEDIAN function
AVG function
COUNT function
GROUP_CONCAT function
MAX function
MIN function
NDV function
STDDEV, STDDEV_SAMP, STDDEV_POP functions
SUM function
VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP functions
▶︎
Analytic functions
OVER
WINDOW
AVG
COUNT
CUME_DIST
DENSE_RANK
FIRST_VALUE
LAG
LAST_VALUE
LEAD
MAX
MIN
NTILE
PERCENT_RANK
RANK
ROW_NUMBER
SUM
▶︎
User-defined functions (UDFs)
UDF concepts
Runtime environment for UDFs
Installing the UDF development package
Writing UDFs
Writing user-defined aggregate functions (UDAFs)
Building and deploying UDFs
Performance considerations for UDFs
Examples of creating and using UDFs
Security considerations for UDFs
Limitations and restrictions for Impala UDFs
Transactions
Reserved words
Impala SQL and Hive SQL
SQL migration
A List of S3A Configuration Properties
About HBase snapshots
About the Off-heap BucketCache
Access the YARN web user interface
Accessing Apache HBase
Accessing Avro data files from Spark SQL applications
Accessing Cloud Data
Accessing compressed files in Spark
Accessing data stored in Amazon S3 through Spark
Accessing external storage from Spark
Accessing HDFS Files from Spark
Accessing Hive from Spark
Accessing ORC Data in Hive Tables
Accessing ORC files from Spark
Accessing Parquet files from Spark SQL applications
Accessing Spark SQL through the Spark shell
Accessing the Oozie server with a browser
Accessing the Oozie server with the Oozie CLIent
Accessing the Ranger console
Accessing the Spark History Server
Accessing the tracing web interface
Accessing YARN logs
ACID operations
ACL examples
ACLS on HDFS features
Activate read replicas on a table
Active Directory Settings
Add a custom coprocessor
Add a group
Add a user
Add a ZooKeeper service
Add or edit permissions
Add storage directories using Cloudera Manager
Add the HttpFS role
Adding a custom banner
Adding a Hue role instance with Cloudera Manager
Adding a Hue service with Cloudera Manager
Adding a Ranger security zone
Adding a tag-based PII policy
Adding a tag-based service
Adding and Removing Range Partitions
Adding attributes to classifications
Adding schema to Oozie using Cloudera Manager
Adding tag-based policies
Adding the Oozie service using Cloudera Manager
Additional considerations when configuring TLS/SSL for Oozie HA
Additional HDFS haadmin commands to administer the cluster
Administering Apache Kudu
Administering Hue
Administering Ranger Reports
Administering Ranger Users, Groups, Roles, and Permissions
Administrative commands
Admission Control and Query Queuing
Admission Control Sample Scenario
Advanced Committer Configuration
Advanced configuration for write-heavy workloads
Advanced erasure coding configuration
Advanced ORC properties
Advanced partitioning
Aggregate and group data
Aggregate functions
Aliases
Allocating DataNode memory as storage
Allocating Resources with Capacity Scheduler
Already present: FS layout already exists
ALTER DATABASE statement
ALTER MATERIALIZED VIEW REBUILD
ALTER MATERIALIZED VIEW REWRITE
ALTER TABLE statement
ALTER VIEW statement
Altering a table
Altering table properties
Analytic functions
Apache Atlas Advanced Search language reference
Apache Atlas metadata collection overview
Apache Atlas Reference
Apache Atlas Statistics reference
Apache Hadoop YARN Overview
Apache Hadoop YARN Tuning
Apache HBase Overview
Apache HDFS ACLs
Apache Hive 3 architectural overview
Apache Hive 3 tables
Apache Hive content roadmap
Apache Hive key features
Apache Hive Materialized View Commands
Apache Hive Overview
Apache Hive Performance Tuning
Apache Hive Query basics
Apache Impala Overview
Apache Impala Reference
Apache Impala SQL Reference
Apache Kudu administration
Apache Kudu background maintenance tasks
Apache Kudu concepts and architecture
Apache Kudu Design
Apache Kudu Overview
Apache Kudu overview
Apache Kudu schema design
Apache Kudu transaction semantics
Apache Kudu usage limitations
Apache Ranger Auditing
Apache Ranger Authorization
Apache Spark access to Apache Hive
Apache Spark Overview
Apache Spark Overview
Apache Spark-Apache Hive connection configuration
Apache Zeppelin Overview
APIs for accessing HDFS
Application ACL evaluation
Application development
Application reservations
APPX_MEDIAN function
Architectural overview
ARRAY complex type
Assigning terms to categories
Associating classifications with entities
Associating terms with entities
Atlas
Atlas
Atlas dashboard tour
Atlas Metadata model overview
Audit Overview
Authentication tokens
Authentication with Apache Knox
Authorizing Apache Hive Access
Automate partition discovery and repair
Automatic Invalidation of Metadata Cache
Automatic Invalidation/Refresh of Metadata
Automating Spark Jobs with Oozie Spark Action
AVG
AVG function
Back up HDFS metadata
Back up HDFS metadata using Cloudera Manager
Backing up HDFS metadata
Backing up NameNode metadata
Balancer commands
Balancing data across an HDFS cluster
Basic partitioning
Before you create an operational database cluster
Benefits of centralized cache management in HDFS
Best practices for building Apache Spark applications
Best practices for ordering policies
Best practices for performance tuning
Best practices when adding new tablet servers
Best practices when using RegionServer grouping
BIGINT data type
Bit functions
Block cache size
Block move execution
Block move scheduling
BOOLEAN data type
Bringing a tablet that has lost a majority of replicas back online
Browse HBase tables
BucketCache IO engine
Build the project and upload the JAR
Building and deploying UDFs
Building and running a Spark Streaming application
Building reusable modules in Apache Spark applications
Building Spark Applications
Building the Java client
Built-in functions
Bypass the BlockCache
Cache eviction priorities
Caching terminology
Call the UDF in a query
Calling Hive user-defined functions (UDFs)
Canary test for pyspark command
Cancelling a Query
Capacity Scheduler Overview
Catalog operations
Catalog table
CDP identity management
CDP Security Overview
CDP user management system
Centralized cache management architecture
Changes after upgrading
Changing a nameservice name for Highly Available HDFS using Cloudera Manager
Changing directory configuration
Changing master hostnames
Changing the page logo
CHAR data type
CHAR data type support
Check Job History
Check Job Status
Check query execution
Choose the right import method
Choosing Data Formats
Choosing Transformations to Minimize Shuffles
ClassNotFoundException: com.cloudera.kudu.hive.KuduStorageHandler
Cleaning up after failed jobs
CLI commands to perform snapshot operations
Client authentication to secure Kudu clusters
Client connections to HiveServer
Close HiveWarehouseSession operations
Cloud identity federation
Cloud storage connectors overview
Cloudera Runtime
Cloudera Runtime Component Versions
Cloudera Runtime Release Notes
Cluster balancing algorithm
Cluster management limitations
Cluster Management with Capacity Scheduler
Coarse-grained authorization
Collecting metrics via HTTP
Column compression
Column design
Column encoding
Columnar datastore
Commands for using cache pools and directives
COMMENT statement
Comments
Common Kudu workflows
Common replication topologies
Common web interface pages
Compacting on-disk data
Compaction prerequisites
Compactor properties
Comparing replication and erasure coding
Complex types
Components
Compound operators
Compute
COMPUTE STATS statement
Conditional functions
Configuration properties
Configurations and CLI options for the HDFS Balancer
Configure a resource-based policy: Atlas
Configure a resource-based policy: HBase
Configure a resource-based policy: HDFS
Configure a resource-based policy: Hive
Configure a resource-based policy: Kafka
Configure a resource-based policy: Knox
Configure a resource-based policy: NiFi
Configure a resource-based policy: NiFi Registry
Configure a resource-based policy: Solr
Configure a resource-based policy: Storm
Configure a resource-based policy: YARN
Configure a resource-based service: Atlas
Configure a resource-based service: HBase
Configure a resource-based service: HDFS
Configure a resource-based service: Hive
Configure a resource-based service: Kafka
Configure a resource-based service: Knox
Configure a resource-based service: NiFi
Configure a resource-based service: NiFi Registry
Configure a resource-based service: Solr
Configure a resource-based service: Storm
Configure a resource-based service: YARN
Configure a Spark job for dynamic resource allocation
Configure ACLs
Configure ACLs for application priorities
Configure BucketCache IO engine
Configure CPU scheduling and isolation
Configure Cross-Origin Support on YARN
Configure DataNode memory as storage
Configure encryption in HBase
Configure HBase garbage collection
Configure HBase in Cloudera Manager to store snapshots in Amazon S3
Configure HBase servers to authenticate with a secure HDFS cluster
Configure HiveServer for ETL using YARN queues
Configure log aggregation
Configure memory settings
Configure metastore database properties
Configure metastore location and HTTP mode
Configure node labels
Configure partitions for transactions
Configure preemption
Configure queue mapping for users and groups to queues with the same name
Configure queue mapping for users and groups to specific queues
Configure queue mapping to use the user name from the application tag
Configure queue ordering policies
Configure Ranger authentication for AD
Configure Ranger authentication for LDAP
Configure Ranger authentication for UNIX
Configure read replicas using Cloudera Manager
Configure RegionServer grouping
Configure secure HBase replication
Configure secure HBase replication
Configure secure replication
Configure snapshots
Configure storage-based authorization
Configure temporary table storage
Configure the blocksize for a column family
Configure the compaction speed using Cloudera Manager
Configure the dynamic resource pool used for exporting and importing snapshots in Amazon S3
Configure the G1GC garbage collector
Configure the graceful shutdown timeout property
Configure the HBase Canary
Configure the HBase client TGT renewal period
Configure the HBase Thrift Server Role
Configure the off-heap BucketCache using Cloudera Manager
Configure the off-heap BucketCache using the command line
Configure the scanner heartbeat using Cloudera Manager
Configure the storage policy for WALs using Cloudera Manager
Configure the storage policy for WALs using the Command Line
Configure TLS/SSL for core Hadoop services
Configure TLS/SSL for HBase REST Server
Configure TLS/SSL for HBase Thrift Server
Configure TLS/SSL for HBase Web UIs
Configure TLS/SSL for HDFS
Configure TLS/SSL for Oozie
Configure TLS/SSL for YARN
Configure ulimit for HBase using Cloudera Manager
Configure ulimit for HBase using the Command Line
Configure ulimit using Pluggable Authentication Modules using the Command Line
Configure User Impersonation for Access to Hive
Configure User Impersonation for Access to Phoenix
Configure YARN for long-running applications
Configure YARN ResourceManager High Availability
Configure YARN Services using Cloudera Manager
Configure ZooKeeper client shell for Kerberos authentication
Configure ZooKeeper server for Kerberos authentication
Configuring a secure Kudu cluster using Cloudera Manager
Configuring a secure Kudu cluster using the command line
Configuring Access to S3
Configuring ACLs on HDFS
Configuring an external database for Oozie
Configuring and running the HDFS balancer using Cloudera Manager
Configuring and tuning S3A block upload
Configuring and Using Zeppelin Interpreters
Configuring Apache Hadoop YARN High Availability
Configuring Apache Hadoop YARN Log Aggregation
Configuring Apache Hadoop YARN Security
Configuring Apache HBase
Configuring Apache HBase High Availability
Configuring Apache Hive
Configuring Apache Hive Metastore
Configuring Apache Spark
Configuring Apache Zeppelin
Configuring authentication for long-running Spark Streaming jobs
Configuring Authorization
Configuring block size
Configuring Client Access to Impala
Configuring coarse-grained authorization with ACLs
Configuring concurrent moves
Configuring Data Protection
Configuring Dedicated Coordinators and Executors
Configuring Delegation for Clients
Configuring Directories for Intermediate Data
Configuring dynamic resource allocation
Configuring Dynamic Resource Pool
Configuring Encryption for Specific Buckets
Configuring Event Based Automatic Metadata Sync
Configuring Fault Tolerance
Configuring for Kudu Tables
Configuring HBase BlockCache
Configuring HBase MultiWAL
Configuring HBase Snapshots
Configuring HBase to use HDFS HA
Configuring HDFS ACLs
Configuring HDFS High Availability
Configuring HDFS trash
Configuring heterogeneous storage in HDFS
Configuring HiveServer high availability
Configuring HMS for high availability
Configuring HTTPS encryption for the Kudu master and tablet server web UIs
Configuring Impala TLS/SSL
Configuring Impala to work with HDFS HA
Configuring Impala Web UI
Configuring JDBC for Impala
Configuring Kerberos Authentication
Configuring LDAP Authentication
Configuring Livy
Configuring Load Balancer for Impala
Configuring MariaDB for Oozie
Configuring MultiWAL support using Cloudera Manager
Configuring MySQL for Oozie
Configuring ODBC for Impala
Configuring Oozie
Configuring Oozie data purge settings using Cloudera Manager
Configuring Oozie Sqoop1 Action workflow JDBC drivers
Configuring Oozie to enable MapReduce jobs to read or write from Amazon S3
Configuring oozie to use HDFS HA
Configuring Oozie to use HDFS HA
Configuring Oracle for Oozie
Configuring other CDP components to use HDFS HA
Configuring Per-Bucket Settings
Configuring Per-Bucket Settings to Access Data Around the World
Configuring PostgreSQL for Oozie
Configuring Ranger Authentication with UNIX, LDAP, or AD
Configuring Ranger Authentication with UNIX, LDAP, or AD
Configuring resource-based policies
Configuring resource-based services
Configuring S3Guard
Configuring Spark application logging properties
Configuring Spark application properties in spark-defaults.conf
Configuring Spark Applications
Configuring Spark on YARN Applications
Configuring the balancer threshold
Configuring the Hive Metastore to use HDFS HA
Configuring the Storage Policy for the Write-Ahead Log (WAL)
Configuring TLS/SSL for HBase
Confirm the election status of a ZooKeeper service
Connecting Hive to BI tools using a JDBC/ODBC driver
Connecting to Impala Daemon in Impala Shell
Considerations for backfill inserts
Considerations for Oozie to work with aws
Considerations for working with HDFS snapshots
Contents of the BlockCache
Control access to queues with ACLs
Controlling Data Access with Tags
Controlling data access with tags
Conversion functions
Convert an HDFS file to ORC
Convert Hive CLI scripts to Beeline
Converting from an NFS-mounted shared edits directory to Quorum-Based Storage
Corruption: checksum error on CFile block
COUNT
COUNT function
Create a CRUD transactional table
Create a custom service
Create a Hadoop archive
Create a read-only Admin user (Auditor)
Create a snapshot policy
Create a Sqoop import command
Create a standard service
Create a temporary table
Create a time-bound policy
Create a topology map
Create a topology script
Create a user-defined function
Create an insert-only transactional table
Create an S3-based external table
Create and Run a Note
Create and use a materialized view
Create and use a partitioned materialized view
CREATE DATABASE statement
Create empty table on the destination cluster
CREATE FUNCTION statement
CREATE MATERIALIZED VIEW
Create new services
Create partitions dynamically
Create snapshots on a directory
Create snapshots using Cloudera Manager
CREATE TABLE AS SELECT
CREATE TABLE statement
Create the UDF class
CREATE VIEW statement
Creating a new Kudu table from Impala
Creating a table from an Amazon S3 file
Creating an operational database cluster
Creating categories
Creating classifications
Creating glossaries
Creating Static Pools
Creating terms
CUME_DIST
Customize dynamic resource allocation settings
Customize interpreter settings in a note
Customize the HDFS home directory
Customizing HDFS
Customizing Per-Bucket Secrets Held in Credential Files
Customizing the Hue web UI
Customizing time zones
Data Access
Data Access
Data compaction
Data Lake security
Data migration to Apache Hive
Data protection
Data Science
Data Science
Data Stewardship with Apache Atlas
Data storage metrics
Data types
Databases
DataNodes
DataNodes
Date and time functions
DATE data type
DDL statements
Debug Web UI for Catalog Server
Debug Web UI for Impala Daemon
Debug Web UI for StateStore
Decide to use the BucketCache
DECIMAL data type
Decimal type
Decommissioning or permanently removing a tablet server from a cluster
Dedicated Coordinator
Default EXPIRES ON tag policy
Default operational database cluster definition
Define queue mapping policies
Defining related terms
Delete a group
Delete a user
Delete data from a table
Delete HBase snapshots from Amazon S3
Delete snapshots using Cloudera Manager
DELETE statement
Deleting a row
Deleting in bulk
Deletion
DENSE_RANK
Deploy HBase replication
Deploying and configuring Oozie Sqoop1 Action JDBC drivers
Deploying and Managing Services and Microservices on YARN
Describe a materialized view
DESCRIBE EXTENDED and DESCRIBE FORMATTED
DESCRIBE statement
Detecting slow DataNodes
Determine the table type
Developing and running an Apache Spark WordCount application
Developing Apache Spark Applications
Developing Applications with Apache Kudu
Developing applications with Apache Kudu
Diagnostics logging
Disable automatic compaction
Disable High Availability
Disable loading of coprocessors
Disable RegionServer grouping
Disable replication at the peer level
Disable the BoundedByteBufferPool
Disable work preserving recovery on ResourceManager
Disabling and redeploying HDFS HA
Disabling S3Guard and destroying a table
Disk space usage
Disk space versus namespace
DistCp additional considerations
DistCp and security settings
DistCp between HA clusters
DISTINCT operator
DML statements
DOUBLE data type
Drop a materialized view
Drop an external table along with data
DROP DATABASE statement
DROP FUNCTION statement
DROP MATERIALIZED VIEW
DROP STATS statement
DROP TABLE statement
DROP VIEW statement
Dropping a Kudu table using Impala
Dumping the Oozie database
Dynamic allocation
Dynamic resource allocation properties
Dynamic Resource Pool Settings
Dynamic resource-based column masking in Hive with Ranger policies
Dynamic tag-based column masking in Hive with Ranger policies
Dynamically loading a custom filter
Edit a group
Edit a user
Edit or delete a snapshot policy
Effects of WAL rolling on replication
Enable Access Control for Data
Enable Access Control for Interpreter, Configuration, and Credential Settings
Enable Access Control for Notebooks
Enable ACL for RegionServer grouping
Enable ACLs
Enable and disable snapshot creation using Cloudera Manager
Enable automatic compaction
Enable Cgroups
Enable debug-delay
Enable detection of slow DataNodes
Enable GZipCodec as the default compression codec
Enable HBase high availability using Cloudera Manager
Enable HBase indexing
Enable hedged reads for HBase
Enable High Availability
Enable intra-queue preemption
Enable override of default queue mappings
Enable preemption
Enable priority scheduling
Enable RegionServer grouping using Cloudera Manager
Enable replication on a specific table
Enable SASL in HiveServer
Enable server-server mutual authentication
Enable snapshot creation on a directory
Enable the Capacity Scheduler
Enable TLS/SSL for HiveServer
Enable work preserving recovery on NodeManager
Enabling Access Control for Zeppelin Elements
Enabling Admission Control
Enabling and disabling trash
Enabling core dump for the Kudu service
Enabling fault-tolerant processing in Spark Streaming
Enabling HDFS HA
Enabling High Availability and automatic failover
Enabling Hue applications with Cloudera Manager
Enabling Kerberos authentication and RPC encryption
Enabling LDAP Authentication for impala-shell
Enabling LDAP for in Hue
Enabling Native Acceleration For MLlib
Enabling Oozie SLA with Cloudera Manager
Enabling or disabling anonymous usage date collection
Enabling Spark authentication
Enabling Spark Encryption
Enabling Speculative Execution
Enabling SSE-C
Enabling SSE-KMS
Enabling SSE-S3
Enabling the Oozie web console on managed clusters
Enabling the SQL editor autocompleter
Enabling vectorized query execution
Encrypting an S3 Bucket with Amazon S3 Default Encryption
Encrypting Communication
Encrypting Data on S3
Encryption
Environment variables for sizing NameNode heap memory
Erasure coding CLI command
Erasure coding examples
Erasure coding overview
Errors during hole punching test
Escape an illegal identifier
Example use cases
Example workload
Example: Configuration for work preserving recovery
Example: Killing an application in the "Production" queue
Example: Moving the application and viewing the log in the "Test" queue
Example: Running SparkPi on YARN
Examples
Examples of accessing Amazon S3 data from Spark
Examples of controlling data access using classifications
Examples of creating and using UDFs
Examples of DistCp commands using the S3 protocol and hidden credentials
Examples of estimating NameNode heap memory
Examples of FIFO and Fair Sharing policies
Exit statuses for the HDFS Balancer
EXPLAIN statement
Exploring using Lineage
Export a Note
Export a snapshot to another cluster
Export all resource-based policies for all services
Export Ranger reports
Export resource-based policies for a specific service
Export tag-based policies
Exporting query results to Amazon S3
Expose HBase metrics to a Ganglia server
Extending Atlas to Manage Metadata from Additional Sources
Extending Atlas to manage metadata from additional sources
External table access
Failures during INSERT, UPDATE, UPSERT, and DELETE operations
Fetching Spark Maven dependencies
File descriptors
Files and directories
Files and directories
Filter types
Finding issues
FIRST_VALUE
Fixing issues
FLOAT data type
Flushing data to disk
Format for using Hadoop archives with MapReduce
FreeIPA identity management
Functions
General Settings
Generate and view Apache Hive statistics
Generate surrogate keys
Generating statistics
Generating Table and Column Statistics
Getting Started with Apache HBase
Glossaries overview
Governance
Governance
Governance Overview
Governance overview
Graceful HBase Shutdown
Gracefully shut down an HBase RegionServer
Gracefully shut down the HBase service
GRANT statement
GROUP BY clause
GROUP_CONCAT function
Guidelines for Schema Design
Hadoop
Hadoop archive components
Hadoop File Formats Supports
Handling bucketed tables
Hash and hash partitioning
Hash and range partitioning
Hash partitioning
Hash partitioning
HashTable/SyncTable tool configuration
HAVING clause
HBase
HBase
HBase actions that produce Atlas entities
HBase audit entries
HBase Authentication
HBase Authorization
HBase backup and disaster recovery strategies
HBase configuration to use S3 as a storage layer
HBase entities created in Atlas
HBase filtering
HBase I/O components
HBase is using more disk space than expected
Hbase lineage
HBase metadata collection
HBase Object Store Semantics Overview
HBase on CDP
HBase online merge
HBase read replicas
HBase Shell example
HBase Shell overview
HBase snapshots on Amazon S3 with Kerberos enabled
HBCK2 tool command reference
HDFS
HDFS
HDFS ACL permissions model and YARN queues
HDFS Block Skew
HDFS Caching
HDFS commands for metadata files and directories
HDFS Metrics
HDFS Overview
HDFS storage types
Heap sampling
Hierarchical Queue Characteristics
High Availability on HDFS clusters
Hive
Hive
Hive 3 ACID transactions
Hive Authentication
Hive Server 2 actions that produce Atlas entities
Hive Server 2 audit entries
Hive Server 2 entities created in Atlas
Hive Server 2 lineage
Hive Server 2 metadata collection
Hive Server 2 relationships
Hive table locations
Hive Warehouse Connector API Examples
Hive Warehouse Connector for accessing Apache Spark data
Hive Warehouse Connector Interfaces
Hive Warehouse Connector supported types
HiveWarehouseSession API operations
HMS table storage
How NameNode manages blocks on a failed DataNode
How tag-based access control works
Hue
Hue
Hue configuration files
Hue logs
Hue Overview
Hue safety valves
Hue supported browsers
IAM Role permissions for working with SSE-KMS
Identifiers
Impala
Impala
Impala actions that produce Atlas entities
Impala audit entries
Impala Authentication
Impala Authorization
Impala database containment model
Impala DDL for Kudu
Impala DML for Kudu Tables
Impala entities created in Atlas
Impala integration limitations
Impala lineage
Impala lineage
Impala Logs
Impala metadata collection
Impala Shell Command Reference
Impala Shell Configuration File
Impala Shell Configuration Options
Impala Shell Tool
Impala SQL
Impala SQL and Hive SQL
Impala with Amazon S3
Impala with Azure Data Lake Store (ADLS)
Impala with HBase
Impala with HDFS
Impala with Kudu
Import a Note
Import command options
Import External Packages
Import RDBMS data into Hive
Import RDBMS data to HDFS
Import resource-based policies for a specific service
Import resource-based policies for all services
Import tag-based policies
Importing a Bucket into S3Guard
Importing and exporting resource-based policies
Importing and exporting tag-based policies
Importing data into HBase
Improving Performance for S3A
Improving performance using partitions
Improving performance with centralized cache management
Improving performance with short-circuit local reads
Improving Software Performance
Increasing StateStore Timeout
Increasing storage capacity with HDFS compression
Incrementally update an imported table
Information and debugging
Initiate replication when data already exist
INSERT and primary key uniqueness violations
Insert data into an ACID table
INSERT statement
Inserting a row
Inserting in bulk
Installing NTP
Installing the UDF development package
INT data type
Integrating Apache Hive with Apache Spark and BI
Interacting with Hive views
Internal and external Impala tables
Internal private key infrastructure (PKI)
Intra-Queue preemption based on application priorities
Intra-Queue preemption based on user limits
Introducing the S3A Committers
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction to HDFS metadata files and directories
Introduction to Hive metastore
Introduction to S3Guard
INVALIDATE METADATA statement
Issues starting or restarting the master or the tablet server
Java API example
JDBC connection string syntax
JDBC connection string syntax
Joins in Impala SELECT statements
JournalNodes
JournalNodes
Keep replicas current
Kernel stack watchdog traces
Key components of warehouse processing
Killing an application
Known issue: Ranger group mapping
Known Issues
Known issues and limitations
Known issues and limitations
Knox
Knox
Kudu
Kudu
Kudu authentication with Kerberos
Kudu example applications
Kudu integration with Spark
Kudu master web interface
Kudu metrics
Kudu Python client
Kudu security
Kudu tablet server web interface
Kudu tracing
Kudu web interfaces
Kudu-Impala integration
LAG
LAST_VALUE
Launch the Service YARN file
Launch Zeppelin
LAZY_PERSIST memory storage policy
LDAP Settings
LEAD
LIMIT clause
Limit concurrent connections
Limit CPU usage with Cgroups
Limitations and restrictions for Impala UDFs
Limitations of Amazon S3
Limitations of erasure coding
Limitations of the S3A Committers
Limiting the Speed of Compactions
Lineage lifecycle
Lineage overview
Linux Container Executor
List files in Hadoop archives
Listing available metrics
Literals
Livy API reference for batch jobs
Livy API reference for interactive sessions
Livy batch object
Livy objects for interactive sessions
LOAD DATA statement
Loading ORC data into DataFrames using predicate push-down
Loading the Oozie database
Log Aggregation File Controllers
Log aggregation properties
Log redaction
Logical operators, comparison operators and comparators
Logical replication
Low-latency analytical processing
Maintenance manager
Manage and monitor services
Manage cluster capacity with queues
Manage HBase snapshots on Amazon S3 in Cloudera Manager
Manage HBase snapshots using Cloudera Manager
Manage HBase snapshots using the Command Line
Manage long-running YARN applications
Manage partition retention time
Manage partitions
Manage Policies for HBase snapshots in Amazon S3
Manage query rewrites
Managing Access Control Lists
Managing and Allocating Cluster Resources using Capacity Scheduler
Managing Apache HBase
Managing Apache HBase Security
Managing Apache Hive
Managing Apache Impala
Managing Apache ZooKeeper
Managing Apache ZooKeeper Security
Managing applications
Managing Applications on Apache Hadoop YARN
Managing Auditing with Ranger
Managing Business Terms with Atlas Glossaries
Managing Data Storage
Managing Kudu with Cloudera Manager
Managing Logs
Managing Metadata in Impala
Managing Resources in Impala
Managing snapshot policies using Cloudera Manager
Managing the YARN service life cycle through the REST API
Manually failing over to the standby NameNode
MAP complex type
MapReduce Application ACL
Master
Mathematical functions
Maven artifacts
MAX
MAX function
Maximizing storage resources using ORC
Memory
Memory limits
Merge data in tables
Merge process stops during Sqoop incremental imports
Migrating Data Using Sqoop
Migrating Kudu data from one directory to another on the same host
Migrating to multiple Kudu masters
Migration Guide
MIN
MIN function
Minimizing cluster disruption during temporary planned downtime of a single tablet server
Miscellaneous functions
Modify interpreter settings
Modifying Impala Startup Options
Monitor applications
Monitor blocksize metrics
Monitor clusters
Monitor nodes
Monitor queues
Monitor RegionServer grouping
Monitor the BlockCache
Monitor the performance of hedged reads
Monitoring and Debugging Spark Applications
Monitoring and Maintaining S3Guard
Monitoring cluster health with ksck
Monitoring clusters using YARN web user interface
Monitoring heap memory usage
Monitoring Impala
Monitoring NTP status
More Resources
Move HBase Master Role to another host
Moving a NameNode to a different host using Cloudera Manager
Moving data from databases to Apache Hive
Moving data from HDFS to Apache Hive
Moving highly available NameNode, failover controller, and JournalNode roles using the Migrate Roles wizard
Moving NameNode roles
Moving the JournalNode edits directory for a role group using Cloudera Manager
Moving the JournalNode edits directory for a role instance using Cloudera Manager
Multilevel partitioning
NameNode architecture
NameNodes
NameNodes
NDV function
Next steps
Non-covering range partitions
Notes about replication
NTILE
NTP clock synchronization
NTP configuration best practices
Off-heap BucketCache
OFFSET clause
On-demand Metadata
Oozie
Oozie
Oozie configurations with CDP services
Oozie database configurations
Oozie scheduling examples
Operational Database
Operational database cluster
Operators
Optimizer hints
Optimizing data storage
Optimizing HBase I/O
Optimizing NameNode disk space with Hadoop archives
Optimizing performance
Optimizing performance for evaluating SQL predicates
Optimizing queries using partition pruning
Optimizing S3A read performance for different file types
Options to determine differences between contents of snapshots
ORDER BY clause
Other known issues
OVER
Overview
Overview
Overview
Overview of Hadoop archives
Overview of HDFS
Overview of Oozie
Packaging different versions of libraries with an Apache Spark application
Partition a cluster using node labels
Partition pruning
Partition Pruning for Queries
Partitioning
Partitioning
Partitioning examples
Partitioning for Kudu Tables
Partitioning guidelines
Partitioning limitations
Partitioning tables
Partitions introduction
PERCENT_RANK
Perform a backup of the HDFS metadata
Perform hostname changes
Perform scans using HBase Shell
Perform the migration
Perform the recovery
Perform the removal
Performance and storage considerations for Spark SQL DROP TABLE PURGE
Performance Best Practices
Performance Considerations
Performance considerations for UDFs
Performance Impact of Encryption
Physical backups of an entire node
Planning for Apache Impala
Populating an S3 bucket
Ports Used by Impala
Predicate push-down optimization
Preemption workflow
Preloaded resource-based services and policies
Prepare for hostname changes
Prepare for removal
Prepare for the migration
Prepare for the recovery
Prepare to back up the HDFS metadata
Preparing the hardware resources for HDFS High Availability
Prerequisites for configuring short-ciruit local reads
Prerequisites for configuring TLS/SSL for Oozie
Prerequisites for enabling erasure coding
Prerequisites for enabling HDFS HA using Cloudera Manager
Prerequisites to configure TLS/SSL for HBase
Primary key design
Primary key index
Propagating classifications through lineage
Properties for configuring centralized caching
Properties for configuring intra-queue preemption
Properties for configuring short-circuit local reads on HDFS
Properties for configuring the Balancer
Properties to set the size of the NameNode edits directory
Provision an operational database cluster
Pruning Old Data from S3Guard Tables
Query correlated data
Query Join Performance
Query options
Query result cache and metastore cache
Query the information_schema database
Query vectorization
Querying an existing Kudu table from Impala
Querying files into a DataFrame
Rack awareness (Location awareness)
Raft consensus algorithm
Range partitioning
Range partitioning
Ranger
Ranger
Ranger access conditions
Ranger AD Integration
Ranger console navigation
Ranger Policies Overview
Ranger Security Zones
Ranger tag-based policies
Ranger UI authentication
Ranger UI authorization
Ranger user management
Ranger Usersync
RANK
Read and write operations
Read operations (scans)
Read replica properties
Reading data from HBase
Reading Hive ORC tables
Reads (scans)
REAL data type
Rebuilding a Kudu filesystem layout
Recommended configurations for the Balancer
Recommended configurations for the balancer
Recommended settings for G1GC
Recover data from a snapshot
Recovering from a dead Kudu master in a multi-master deployment
Recovering from disk failure
Recovering from full disks
Redeploying the Oozie ShareLib
Redeploying the Oozie sharelib using Cloudera Manager
Reducing the Size of Data Structures
Referencing S3 Data in Applications
REFRESH AUTHORIZATION statement
REFRESH FUNCTIONS statement
REFRESH statement
Register the UDF
Reload, view, and filter functions
Remove a DataNode
Remove a RegionServer from RegionServer grouping
Remove storage directories using Cloudera Manager
Removing Kudu masters from a multi-master deployment
Repair partitions manually using MSCK repair
Replace a ZooKeeper disk
Replace a ZooKeeper role on an unmanaged cluster
Replace a ZooKeeper role with ZooKeeper service downtime
Replace a ZooKeeper role without ZooKeeper service downtime
Replicate pre-exist data in an active-active deployment
Replication
Replication across three or more clusters
Replication caveats
Replication requirements
Reporting Kudu crashes using breakpad
Request a timeline-consistent read
Reserved words
Resource distribution workflow
Resource distribution workflow example
Resource Tuning Example
Resource-based Services and Policies
Restore an HBase snapshot from Amazon S3
Restore an HBase snapshot from Amazon S3 with a new name
Restore data from a replica
Restore HDFS metadata from a backup using Cloudera Manager
Restoring NameNode metadata
REVOKE statement
Row-level filtering and column masking in Hive
Row-level filtering in Hive with Ranger policies
ROW_NUMBER
RPC timeout traces
Run a Hive command
Running a Spark MLlib example
Running a tablet rebalancing tool in Cloudera Manager
Running a tablet rebalancing tool on a rack-aware cluster
Running an interactive session with the Livy API
Running Apache Spark Applications
Running Commands and SQL Statements in Impala Shell
Running PySpark in a virtual environment
Running sample Spark applications
Running shell commands
Running Spark applications on secure clusters
Running Spark applications on YARN
Running Spark Python applications
Running tablet rebalancing tool
Running the balancer
Running the HBCK2 tool
Running your first Spark application
Runtime environment for UDFs
Runtime error: Could not create thread: Resource temporarily unavailable (error 11)
Runtime Filtering
S3 Performance Checklist
S3A and Checksums (Advanced Feature)
S3Guard: Operational Issues
Safely Writing to S3 Through the S3A Committers
Sample pom.xml file for Spark Streaming with Kafka
Save the YARN file as Service App
Saving searches
Scalability
Scalability Considerations
Scaling Kudu
Scaling Limits and Guidelines
Scaling recommendations and limitations
Scaling storage on Kudu master and tablet servers in the cloud
Scheduling Among Queues
Scheduling in Oozie using cron-like syntax
Schema alterations
Schema design limitations
Schema design limitations
Schema objects
Script with HBase Shell
Search an application
Search Ranger reports
Searching for entities using classifications
Searching metadata tags
Searching overview
Searching using terms
Searching with Metadata
Secondary Sort
Secure Hive Metastore
Secure HiveServer using LDAP
Secure-to-secure: Kerberos principal name
Secure-to-secure: ResourceManager mapping rules
Securing Apache Hive
Securing Impala
Securing the S3A Committers
Security
Security considerations
Security considerations for UDFs
Security limitations
Security Model and Operations on S3
Security terminology
SELECT statement
Server management limitations
Set application limits
Set flexible scheduling policies
Set properties in Cloudera Manager
Set queue priorities
Set quotas using Cloudera Manager
SET statement
Set up a JDBC URL connection override
Set up a PostgreSQL database
Set up a storage policy for HDFS
Set up Amazon RDS as a Hive metastore
Set up an Oracle database
Set up MariaDB or MySQL database
Set up queues
Set Up Sqoop
Set up SSD storage using Cloudera Manager
Set up the backend Hive metastore database
Set up the cost-based optimizer and statistics
Set up the development environment
Set up WebHDFS on a secure cluster
Set user limits
Setting HDFS quotas
Setting Python path variables for Livy
Setting the cache timeout
Setting the Idle Query and Idle Session Timeouts
Setting the Oozie database timezone
Setting the trash interval
Setting Timeout and Retries for Thrift Connections to Backend Client
Setting Timeouts in Impala
Setting up a shared Amazon RDS as a Hive metastore
Setting up Data Cache for Remote Reads
Setting Up HDFS Caching
Setting up the metastore database
Setting User Limits for HBase
Shell commands
Shiro Settings: Reference
shiro.ini Example
Show materialized views
SHOW MATERIALIZED VIEWS
SHOW statement
SHUTDOWN statement
Single tablet write operations
Size the BlockCache
Sizing NameNode heap memory
Slow name resolution and nscd
SMALLINT data type
Snapshot failures
Solr
Spark
Spark
Spark actions that produce Atlas entities
Spark Application ACL
Spark application model
Spark audit entries
Spark cluster execution overview
Spark entities created in Apache Atlas
Spark execution model
Spark integration best practices
Spark integration known issues and limitations
Spark integration limitations
Spark lineage
Spark metadata collection
Spark on YARN deployment modes
Spark relationships
Spark security
Spark SQL example
Spark Streaming and Dynamic Allocation
Spark Streaming Example
Spark troubleshooting
spark-submit command options
Specify the JDBC connection string
Specifying Impala Credentials to Access S3
Speeding up Job Commits by Increasing the Number of Threads
SQL migration
SQL statements
SQLContext and HiveContext
Sqoop
Sqoop Hive import stops when HS2 does not use Kerberos authentication
SSE-C: Server-Side Encryption with Customer-Provided Encryption Keys
SSE-KMS: Amazon S3-KMS Managed Encryption Keys
SSE-S3: Amazon S3-Managed Encryption Keys
Start and stop queues
Start compaction manually
Start HBase
Start Hive on an insecure cluster
Start Hive using a password
Starting and Stopping HBase using Cloudera Manager
Starting and stopping Kudu processes
Starting Apache Hive
Starting the Oozie server
Statistics generation and viewing commands
STDDEV, STDDEV_SAMP, STDDEV_POP functions
Step 1: Worker host configuration
Step 2: Worker host planning
Step 3: Cluster size
Step 6: Verify container settings on cluster
Step 6A: Cluster container capacity
Step 6B: Container sanity checking
Step 7: MapReduce configuration
Step 7A: MapReduce sanity checking
Steps 4 and 5: Verify settings
Stop HBase
Stop replication in an emergency
Stopping Impala
Stopping the Oozie server
Storage
Storage
Storage group classification
Storage group pairing
Storage Systems Supports
Storage-based operation permissions
Store HBase snapshots on Amazon S3
STRING data type
String functions
STRUCT complex type
Submit a Hive Warehouse Connector Python app
Submit a Hive Warehouse Connector Scala or Java application
Submitting batch applications using the Livy API
Submitting Spark applications
Submitting Spark Applications to YARN
Submitting Spark applications using Livy
Subqueries in Impala SELECT statements
Subquery restrictions
SUM
SUM function
Switching from CMS to G1GC
Synchronize table data using HashTable/SyncTable tool
Table
Table and Column Statistics
Tables
TABLESAMPLE clause
Tablet
Tablet history garbage collection and the ancient history mark
Tablet server
Tag-based Services and Policies
Tags and policy evaluation
Take a snapshot using a shell script
Take HBase snapshots
Terms
The Cloud Storage Connectors
The perfect schema
The S3A Committers and Third-Party Object Stores
Thread Tuning for S3A Data Upload
Threads
Thrift Server crashes after receiving invalid data
Timeline consistency
TIMESTAMP compatibility for Parquet files
TIMESTAMP data type
TINYINT data type
Tombstoned or STOPPED tablet replicas
Tools
Transactional table access
Transactions
Trash behavior with HDFS Transparent Encryption enabled
Troubleshoot RegionServer grouping
Troubleshooting
Troubleshooting Apache HBase
Troubleshooting Apache Hive
Troubleshooting Apache Impala
Troubleshooting Apache Kudu
Troubleshooting Apache Kudu
Troubleshooting Apache Sqoop
Troubleshooting HBase
Troubleshooting Impala
Troubleshooting NTP stability problems
Troubleshooting performance issues
Troubleshooting S3 and S3Guard
Troubleshooting the S3A Committers
TRUNCATE TABLE statement
Tuning Apache Spark
Tuning Apache Spark Applications
Tuning Impala
Tuning Resource Allocation
Tuning S3A Uploads
Tuning Spark Shuffle Operations
Tuning the metastore
Tuning the Number of Partitions
Turning safe mode on HA NameNodes
UDF concepts
Unable to alter S3-backed tables
Understanding erasure coding policies
Understanding HBase Garbage Collection
Understanding Performance using EXPLAIN Plan
Understanding Performance using Query Profile
Understanding Performance using SUMMARY Report
Understanding YARN architecture
UNION clause
Unsupported Apache Spark Features
Unsupported interfaces
Update and overwrite
Update data in a table
UPDATE statement
Updating a row
Updating in bulk
Upsert option in Kudu Spark
UPSERT statement
Upserting a row
Usability issues
Use a CTE in a query
Use a custom MapReduce job
Use a materialized view in a subquery
Use a subquery
Use BulkLoad
Use cases for ACLs on HDFS
Use cases for BulkLoad
Use cases for centralized cache management
Use cases for HBase
Use cluster replication
Use CopyTable
Use CPU scheduling
Use dfs.datanode.max.transfer.threads with Hbase
Use DNS with HBase
Use GZipCodec with a one-time job
Use HashTable and SyncTable Tool
Use HBase command-line utilities
Use log aggregation
Use multiple ZooKeeper services
Use node labels
Use snapshots
Use Spark
Use Sqoop
USE statement
Use the Apache Thrift Proxy API
Use the Hive Warehouse Connector for streaming
Use the Hue HBase app
Use the Java API
Use the JDBC interpreter to access Hive
Use the JDBC interpreter to access Phoenix
Use the Livy interpreter to access Spark
Use the Network Time Protocol (NTP) with HBase
Use the YARN CLI to view logs for applications
Use the YARN REST APIs to manage applications
Use the yarn rmadmin tool to Administer ResourceManager High Availability
Use the YARN Services API
User Account Requirements
User-defined functions (UDFs)
Using a credential provider to secure S3 credentials
Using advanced search
Using Amazon S3 with Apache HBase
Using Amazon S3 with Hue
Using Apache HBase Backup and Disaster Recovery
Using Apache Hive
Using Apache Impala with Apache Kudu
Using Apache Impala with Apache Kudu
Using Apache Zeppelin
Using Avro Data Files
Using Basic Search
Using Breakpad Minidumps for Crash Reporting
Using Cgroups
Using chrony for time synchronization
Using CLI commands to create and list ACLs
Using Cloudera Manager to manage HDFS HA
Using cluster names in the kudu command line tool
Using common table expressions
Using constraints
Using custom libraries with Spark
Using DistCp
Using DistCp to copy files
Using DistCp with Amazon S3
Using EC2 Instance Metadata to Authenticate
Using erasure coding for existing data
Using erasure coding for new data
Using Free-text Search
Using functions
Using governance-based data discovery
Using HBase Blocksize
Using HBase Coprocessors
Using HBase Replication
Using HBase Scanner Heartbeat
Using HDFS snapshots for data protection
Using Hedged Reads
Using HttpFS to provide access to HDFS
Using Hue
Using Hue
Using Impala to query Kudu tables
Using JdbcStorageHandler to query an RDBMS
Using JdbcStorageHandler to query an RDBMS
Using JMX for accessing HDFS metrics
Using Livy with interactive notebooks
Using Livy with Spark
Using Load Balancer with HttpFS
Using materialized views
Using ORC Data Files
Using Parquet Data Files
Using PySpark
Using rack awareness for read replicas
Using Ranger to Provide Authorization in CDP
Using RCFile Data Files
Using RegionServer Grouping
Using S3Guard for Consistent S3 Metadata
Using scheduling to allocate resources
Using SequenceFile Data Files
Using Spark Hive Warehouse and HBase Connector Client .jar files with Livy
Using Spark MLlib
Using Spark SQL
Using Spark Streaming
Using Spark with a secure Kudu cluster
Using Sqoop actions with Oozie
Using tag attributes and values in Ranger tag-based policy conditions
Using Text Data Files
Using the Charts Library with the Kudu service
Using the Directory Committer in MapReduce
Using the HBCK2 tool to remediate HBase clusters
Using the Livy API to run Spark jobs
Using the Note Toolbar
Using the Ranger Console
Using the S3Guard CLI
Using the S3Guard Command to List and Delete Uploads
Using the Spark DataFrame API
Using Unique Filenames to Avoid File Update Inconsistency
Using YARN with a secure cluster
Using Zeppelin Interpreters
VALUES statement
VARCHAR data type
VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP functions
Variations on Put
Verify that replication works
Verify the ZooKeeper authentication
Verifying if a memory limit is sufficient
Verifying That an S3A Committer Was Used
Verifying that S3Guard is Enabled on a Bucket
Verifying the Impala dependency on Kudu
Versions
View application details
View audit details
View compaction progress
View Ranger reports
View transaction locks
View transactions
Viewing application logs
Viewing lineage
Viewing the API documentation
Views
Virtual machine options for HBase Shell
Web UI encryption
Web UI redaction
Web User Interface for Debugging
What's New
When Shuffles Do Not Occur
When to Add a Shuffle Transformation
When to use Atlas classifications for access control
Why HDFS data becomes unbalanced
Wildcards and variables in resource-based policies
WINDOW
WITH clause
Work preserving recovery for YARN components
Working with Amazon S3
Working with Atlas classifications
Working with Classifications and Labels
Working with S3 buckets in the same AWS region
Working with the Oozie server
Working with Third-party S3-compatible Object Stores
Working with versioned S3 buckets
Working with Zeppelin Notes
Write-ahead log garbage collection
Writes
Writing data to HBase
Writing to multiple tablets
Writing UDFs
Writing user-defined aggregate functions (UDAFs)
YARN
YARN
YARN ACL rules
YARN ACL syntax
YARN ACL types
YARN Admin ACLs
YARN Application ACLs
YARN Features
YARN resource allocation
YARN ResourceManager High Availability
YARN ResourceManager High Availability architecture
YARN Services API Examples
YARN tuning overview
YARN Unsupported Features
Zeppelin
ZooKeeper
ZooKeeper
ZooKeeper ACLs Best Practices
ZooKeeper ACLs Best Practices: Atlas
ZooKeeper ACLs Best Practices: HBase
ZooKeeper ACLs Best Practices: HDFS
ZooKeeper ACLs Best Practices: Oozie
ZooKeeper ACLs Best Practices: Ranger
ZooKeeper ACLs Best Practices: YARN
ZooKeeper ACLs Best Practices: ZooKeeper
ZooKeeper Authentication
«
Filter topics
Using Amazon S3 with Hue
Using Hue
Enabling the SQL editor autocompleter
▶︎
Using governance-based data discovery
Searching metadata tags
▼
Using Amazon S3 with Hue
Populating an S3 bucket
Creating a table from an Amazon S3 file
Exporting query results to Amazon S3
»
Using Hue
Using Amazon S3 with Hue
Hue can read to and write to an Amazon S3 bucket.
Populating an S3 bucket
Use the Hue Web UI to populate buckets in Amazon S3.
Creating a table from an Amazon S3 file
Using Hue to create a table from an Amazon S3 file streamlines the process.
Exporting query results to Amazon S3
Use Hue to export query results to Amazon S3 as a custom file, a MapReduce file, or as a table.
This site uses cookies and related technologies, as described in our
privacy policy
, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or
manage your own preferences.
Accept all
7.3.1
7.2
7.2.18
7.2.17
7.2.16
7.2.15
7.2.14
7.2.12
7.2.11
7.2.10
7.2.9
7.2.8
7.2.7
7.2.6
7.2.2
7.2.1
7.2.0
7.1.0
7.0
7.0.2
7.0.1
7.0.0