Configuring Tez
Tez works correctly after installing it without additional configuration, especially in later releases of HDP (HDP 2.2.4 and Hive 0.14 and later). In current releases of HDP, Tez is the default query execution engine. Make sure that you are using Tez by setting the following property in hive-site.xml or the Hive web UI in Ambari:
SET hive.execution.engine=tez;
Tip | |
---|---|
To analyze query execution in Tez, use the Ambari Tez View, which provides a graphical view of executing Hive queries. See the Ambari Views Guide. |
Advanced Settings
Using map joins is very efficient because one table (usually a dimension table) is held in memory as a hash map on every node and the larger fact table is streamed. This minimizes data movement, resulting in very fast joins. However, there must be enough memory for the in-memory table so you must set more memory for a Tez container with the following settings in hive-site.xml:
Set the Tez container size to be a larger multiple of the YARN container size (4GB):
SET hive.tez.container.size=4096MB
Set how much of this memory can be used for tables stored as the hash map (one-third of the Tez container size is recommended):
SET hive.auto.convert.join.noconditionaltask.size=1370MB
Note The size is shown in bytes in the hive-site.xml file, but set in MB with Ambari. MB are shown in the above examples to make the size settings easier to understand.
Tez Container Size Configuration Example
If you discover that you are not getting map joins, check the size of your Tez containers in relation to YARN containers. The size of Tez containers must be a multiple of the YARN container size. For example, if your YARN containers are set to 2GB, set Tez container size to 4GB. Then run the EXPLAIN command with your query to view the query execution plan to make sure you are getting map joins instead of shuffle joins. Keep in mind that if your Tez containers are too large, the space is wasted. Also, do not configure more than one processor per Tez container to limit the size of your largest container. As an example, if you have 16 processors and 64GB of memory, configure one Tez container per processor and set their size to 4GB and no larger.