Data Access

Cloudera Data Platform Runtime includes Apache Hive 3 and Apache Impala for storing and accessing data in the Hive metastore database. Hive 3 addresses enterprise data warehouse demands for transactional data in the ORC file format. Impala performs high-performance, low-latency SQL queries on data in Parquet and other formats. Hue is a web-based interactive editor for querying the Hive metastore that also creates Oozie workflows. DAS is a web application for performing operations on Hive tables and also provides recommendations for optimizing the performance of your queries.

Data Analytics Studio

Using Data Analytics Studio

Describes how to work with queries, manage databases and tables, and generate reports.

Apache Hive

Working with Hive Metastore

Describes how Hive metastore (HMS) detects client types and and stores compatible tables, authorizes access to tables from Spark, and stores metadata of multiple services, such as Hive and Impala.

Starting Apache Hive

Describes how to launch Hive, execute commands, and issue queries from Beeline.

Using Hive

Covers how to use Hive 3 to query flat and transactional data using SQL statements.

Managing Apache Hive

Includes information about mature ACID v2 operations on transactions, compaction of files that accumulate during, ingestion, and query vectorization.

Configuring Apache Hive

Describes how to set up Hive to generate statistics and control the number of concurrent connections to Hive.

Securing Apache Hive

Discusses how to choose an authorization model based on your use case.

Integrating Apache Hive with Apache Spark and BI

Covers accessing Hive tables from Spark through the Spark Direct Reader or Hive Warehouse Connector, using JdbcStorageHandler to access an external data source, and connecting to Business Intelligence (BI) tools.