Key components of warehouse processing

A brief introduction to these software and systems, which affect performance, helps you understand the scope of the tuning task: HiveServer, batch processing using Apache Tez, interactive processing using Apache Tez, and LLAP.

HiveServer

HiveServer provides Hive service to multiple clients that simultaneously execute queries against Hive using an open API driver, such as JDBC or ODBC. For optimal performance, you should use HiveServer to connect your client application and the data warehouse. Using Cloudera Manager, you can install, configure, and monitor the Hive service and HiveServer.

Batch processing using Apache Tez

Each queue must have the capacity to support one complete Tez application, as defined by its Application Master (AM) (tez.am.resource.memory.mb property). Consequently, the number of Apache Tez AMs limits the maximum number of queries a cluster can run concurrently.

Hive on Tez is an advancement over earlier application frameworks for Hadoop data processing, such as using Hive on MapReduce. The Tez framework is suitable for high-performance batch workloads.

After query compilation, HiveServer generates a Tez graph that is submitted to YARN. A Tez AM monitors the query as it runs.