Apache Tez processing of Hive jobs

If you were running Hive on HDP or CDP, you have been running Hive queries using the Apache Tez execution engine. Hive in CDW PvC also uses Tez to run queries and is a HiveServer2 endpoint as it is in HDP or CDP. Learn how Tez processes Hive jobs in CDP and CDW and understand the tasks that you need to perform after migrating your workloads to CDW.

Hive is fundamentally the same technology in HDP, CDP Private Cloud Base, and CDW Private Cloud. Hive syntax and semantics are basically the same after upgrading from HDP to CDP Private Cloud or to CDW Private Cloud.

Apache Tez provides the framework to run a job that creates a graph with vertices and tasks. SQL semantics for deciding the query physical plan, which identifies how to execute the query in a distributed fashion, is based on Apache Tez. The entire execution plan is created under this framework. Apache Tez provides the following execution modes:
  • Container mode — Every time you run a Hive query, Tez requests a container from YARN.
  • LLAP mode — Every time you run a Hive query, Tez asks the LLAP daemon for a free thread, and starts running a fragment.

In CDW, the Hive execution mode is LLAP. In Cloudera Data Hub on CDP Public Cloud and CDP Private Cloud Base, the Hive execution mode is container, and LLAP mode is not supported. When Apache Tez runs Hive in container mode, it has traditionally been called Hive on Tez.

Considerations

There are certain differences between Hive on Tez and LLAP that you need to be aware of before migrating to CDW Private Cloud.

  • The HiveServer2 endpoints authenticate using LDAP instead of Kerberos.
  • Your old Hive JDBC drivers need to be replaced with the latest drivers.
  • If you have Hive User-Defined Functions (UDFs) in CDP Private Cloud Base then the UDF JARs have to be added to the CDW Hive classpath and registered.

Post-migration tasks

After migrating to CDW Private Cloud, perform the following tasks:

  1. Download the latest Hive JDBC drivers from the Hive JDBC driver download page and follow the driver installation instructions on the download page.
  2. Update the JDBC client connection URL to point to the Virtual Warehouse instance of HiveServer2.
  3. If your previous connection in CDP PvC Base used Kerberos for authentication, you must modify the connection URL accordingly.
  4. Ensure that the UDF JARs are added to the CDW_HIVE_AUX_JARS_PATH environment variable.