Apache Tez processing of Hive jobs

If you were running Hive on HDP or Cloudera, you have been running Hive queries using the Apache Tez execution engine. Hive in Cloudera Data Warehouse on premises also uses Tez to run queries and is a HiveServer2 endpoint as it is in HDP or Cloudera. Learn how Tez processes Hive jobs in Cloudera and Cloudera Data Warehouse and understand the tasks that you need to perform after migrating your workloads to Cloudera Data Warehouse.

Hive is fundamentally the same technology in HDP, Cloudera Base on premises, and Cloudera Data Warehouse on premises. Hive syntax and semantics are basically the same after upgrading from HDP to Cloudera on premises or to Cloudera Data Warehouse on premises.

Apache Tez provides the framework to run a job that creates a graph with vertices and tasks. SQL semantics for deciding the query physical plan, which identifies how to execute the query in a distributed fashion, is based on Apache Tez. The entire execution plan is created under this framework. Apache Tez provides the following execution modes:

Container mode — Every time you run a Hive query, Tez requests a container from YARN.
LLAP mode — Every time you run a Hive query, Tez asks the LLAP daemon for a free thread, and starts running a fragment.

In Cloudera Data Warehouse, the Hive execution mode is LLAP. In Cloudera Data Hubon Cloudera on cloud and Cloudera Base on premises, the Hive execution mode is container, and LLAP mode is not supported. When Apache Tez runs Hive in container mode, it has traditionally been called Hive on Tez.

Considerations

There are certain differences between Hive on Tez and LLAP that you need to be aware of before migrating to Cloudera Data Warehouse on premises.

The HiveServer2 endpoints authenticate using LDAP instead of Kerberos.
Your old Hive JDBC drivers need to be replaced with the latest drivers.
If you have Hive User-Defined Functions (UDFs) in Cloudera Base on premises then the UDF JARs have to be added to the Cloudera Data Warehouse Hive classpath and registered.

Post-migration tasks

After migrating to Cloudera Data Warehouse on premises, perform the following tasks:

Download the latest Hive JDBC drivers from the Hive JDBC driver download page and follow the driver installation instructions on the download page.
Update the JDBC client connection URL to point to the Virtual Warehouse instance of HiveServer2.
If your previous connection in Cloudera Base on premises used Kerberos for authentication, you must modify the connection URL accordingly.
Ensure that the UDF JARs are added to the CDW_HIVE_AUX_JARS_PATH environment variable.