Previous versions of Hive allowed subqueries only
in FROM
clauses of SQL statements. A subquery is a SQL expression that
returns a set of rows. The subquery is evaluated and its query result set used to
evaluate the parent query, the outer query that
contains the subquery. Version 0.13 of Hive expands the use the use of subqueries to
include WHERE
clauses, as shown in the following
example.
SELECT state, net_payments FROM transfer_payments WHERE transfer_payments.year IN (SELECT year FROM us_census);
No configuration is required to enable execution of subqueries in Hive; the feature is
available by default. However, several restrictions exist for the use of subqueries in
WHERE
clauses. The next section,Understanding Subqueries,
describes the concepts necessary to understand these restrictions, and the following
section,Restrictions on Subqueries in
WHERE Clauses explains the restrictions.