Unified Analytics supports anti join queries to find the rows in a table that do not
match rows in another table. The performance of queries involving NULL filters, "not exists"
clauses, and "not in" clauses" are improved using the anti join optimization.
In this task, run an anti join query. For example:
SELECT * FROM left_table t1 LEFT JOIN right_table t2 ON t1.id = t2.id WHERE t2.id IS NULL;
-
Log into the Data Warehouse service as DWUser.
-
Go to the Virtual Warehouses tab, locate the Virtual
Warehouse you want to edit, and click and select hive-site from the
Configuration files drop-down menu.
-
Click Add Custom Configuration.
The Custom Configuration is displayed.
-
Specify the following values in the Configuration Key:
and Configuration Value: fields:
- Configuration Key:
hive.auto.convert.anti.join
- Configuration Value:
true
-
Open Hue, and create two tables, one having more rows than the other.
create table tt1 (order_num bigint, price decimal(5, 2));
create table tt2 (order_num bigint);
-
Insert several, say five, rows of values (order number, price) in the first
table and fewer, say 2, rows of values (order number only) in the second
table.
insert into tt1 values (42, 300.50), (141, 250.25), (150, 85.00), (150, 500.44), (166, 125.65);
insert into tt2 values (150), (166);
-
Take a look at the content of the first table.
tt1.order_num tt1.price
42 300.50
141 250.25
150 85.00
150 500.44
166 125.65
-
Take a look at the content of the second table.
-
Perform an equality left join on the order numbers in the tables.
select * from tt1 t1 LEFT JOIN tt2 t2 ON t1.order_num=t2.order_num;
t1.order_num t1.price t2.order_num
42 300.50 NULL
141 250.25 NULL
150 85.00 150
150 500.44 150
166 125.65 166
-
Add a null filter to the join query to anti join the tables.
select * from tt1 t1 LEFT JOIN tt2 t2 ON t1.order_num=t2.order_num where t2.order_num is NULL;
The anti join returns the rows having NULL in the price
column.
t1.order_num t1.price t2.order_num
42 300.50 NULL
141 250.25 NULL