Using a subquery
Hive and Impala supports subqueries in FROM clauses and WHERE clauses that you can use for many operations, such as filtering data from one table based on contents of another table.
CREATE TABLE us_census(year int);
INSERT INTO us_census VALUES(2020),(2021),(2022);
CREATE TABLE transfer_payments(state STRING, net_payments FLOAT, year INT);
INSERT INTO transfer_payments VALUES('CA',12000.75,2010),('CA',45789.95,2020),('WA',78987.52,2022);
Select all the state and net_payments values from the transfer_payments table if the value of the year column in the table matches a year in the us_census table.
Hive and Impala
example:
SELECT state, net_payments
FROM transfer_payments
WHERE transfer_payments.year IN (SELECT year FROM us_census);
The predicate starts with the first WHERE keyword. The predicate operator is the IN
keyword.
The predicate returns true for a row in the transfer_payments table if the year
value in at least one row of the us_census table matches a year value in the
transfer_payments
table.
+--------+---------------+
| state | net_payments |
+--------+---------------+
| CA | 45789.95 |
| WA | 78987.52 |
+--------+---------------+