Data Warehousing
Cloudera Data Platform Runtime includes Apache Hive, Apache Iceberg, and Apache Impala for storing and accessing data in the Hive metastore database. Hive addresses enterprise data warehouse demands for transactional data in the ORC file format. Iceberg is a table format for huge analytics that you can query using familiar SQL statements. Impala performs high-performance, low-latency SQL queries on data in Parquet and other formats.
Apache Hive
Working with Hive Metastore
Describes how Hive metastore (HMS) detects client types and and stores compatible tables, authorizes access to tables from Spark, and stores metadata of multiple services, such as Hive and Impala.
Starting Apache Hive
Describes how to launch Hive, execute commands, and issue queries from Beeline.
Using Hive
Covers how to use Hive 3 to query flat and transactional data using SQL statements.
Managing Apache Hive
Includes information about mature ACID v2 operations on transactions, compaction of files that accumulate during, ingestion, and query vectorization.
Configuring Apache Hive
Describes how to set up Hive to generate statistics and control the number of concurrent connections to Hive.
Securing Apache Hive
Discusses how to choose an authorization model based on your use case.
Integrating Apache Hive with Apache Spark and BI
Covers accessing Hive tables from Spark through the Spark Direct Reader or Hive Warehouse Connector, using JdbcStorageHandler to access an external data source, and connecting to Business Intelligence (BI) tools.
Apache Hive Performance Tuning
Explains low-latency analytical processing, caching, and tuning options.
Apache Iceberg
Using Apache Iceberg
You build on your SQL experience to analyze big data tables in Iceberg format on your file system or object store.
Apache Impala
Starting and Stopping Apache Impala
Presents the task topics for configuring client access to Impala, and starting and stopping Impala.
Securing Apache Impala
Describes a set of security features Impala provides to protect your critical and sensitive data.
Configuring Apache Impala
Describes how to customize your environment after installing Impala.
Tuning Apache Impala
Describes how to tune Impala queries and other SQL operations.
Managing Apache Impala
Presents the task topics for managing resources and metadata in Impala.
Monitoring Apache Impala
Describes how to monitor Impala service to run smoothly and avoid conflicts with other components running on the same cluster.