What's New in Apache Hive

This topic lists new features for Apache Hive in this release of Cloudera Runtime.

ACID transaction processing

Hive 3 tables are ACID (Atomicity, Consistency, Isolation, and Durability)-compliant, which is critical to observing the right to be forgotten requirement of the GDPR (General Data Protection Regulation).

Shared metastore

Hive metastore (HMS) interoperates with multiple engines, Impala and Spark for example, simplifying interoperation between engines and user data access.

Low-latency analytical processing

Hive processes transactions using low-latency analytical processing (LLAP) or the Hive-on-Tez execution engine.

Spark integration with Hive

You can use Hive to query data from Apache Spark applications without workarounds. The Hive Warehouse Connector supports reading and writing Hive tables from Spark.

Security improvements

Apache Ranger secures Hive data by default. To meet demands for concurrency improvements, ACID support for GDPR (General Data Protection Regulation), render security, and other features, Hive tightly controls the location of the warehouse on a file system, or object store, and memory resources.

Workload management at the query level

You can configure who uses query resources, how much can be used, and how fast Hive responds to resource requests. Workload management can improve parallel query execution, cluster sharing for queries running on Hive LLAP, and performance of non-LLAP queries.

Materialized views

Because multiple queries frequently need the same intermediate roll up or joined table, you can avoid costly, repetitious query portion sharing, by precomputing and caching intermediate tables into views.

Information schema database

When launched, Hive creates two databases from JDBC data sources: information_schema and sys. All Metastore tables are mapped into your tablespace and available in sys. The information_schema data reveals the state of the system, similar to sys database data. You can query information_schema using SQL standard queries, which are portable from one DBMS to another.