Using anti joins

Unified Analytics supports anti join queries to find the rows in a table that do not match rows in another table. The performance of queries involving NULL filters, "not exists" clauses, and "not in" clauses" are improved using the anti join optimization.

In this task, run an anti join query. For example:
SELECT * FROM left_table t1 LEFT JOIN right_table t2 ON t1.id = t2.id WHERE t2.id IS NULL; 
  1. Log into the Data Warehouse service as DWUser.
  2. Go to the Virtual Warehouses tab, locate the Virtual Warehouse you want to edit, and click > Edit > Configurations > Hiveserver2 and select hive-site from the Configuration files drop-down menu.
  3. Click Add Custom Configuration.
    The Custom Configuration is displayed.
  4. Specify the following values in the Configuration Key: and Configuration Value: fields:
    • Configuration Key: hive.auto.convert.anti.join
    • Configuration Value: true
  5. Open Hue, and create two tables, one having more rows than the other.
    create table tt1 (order_num bigint, price decimal(5, 2));
    
    create table tt2 (order_num bigint);
  6. Insert several, say five, rows of values (order number, price) in the first table and fewer, say 2, rows of values (order number only) in the second table.
    insert into tt1 values (42, 300.50), (141, 250.25), (150, 85.00), (150, 500.44), (166, 125.65);
    
    insert into tt2 values (150), (166);
  7. Take a look at the content of the first table.
    select * from tt1;
    tt1.order_num	tt1.price
    42	            300.50
    141	            250.25
    150	            85.00
    150	            500.44
    166	            125.65
  8. Take a look at the content of the second table.
    select * from tt2;
    tt2.order_num
    150
    166
  9. Perform an equality left join on the order numbers in the tables.
    select * from tt1 t1 LEFT JOIN tt2 t2 ON t1.order_num=t2.order_num;
                    
    t1.order_num	t1.price	t2.order_num
    42	          300.50	   NULL
    141	         250.25	   NULL
    150	         85.00	    150
    150	         500.44	   150
    166	         125.65	   166
  10. Add a null filter to the join query to anti join the tables.
    select * from tt1 t1 LEFT JOIN tt2 t2 ON t1.order_num=t2.order_num where t2.order_num is NULL;
    The anti join returns the rows having NULL in the price column.
    t1.order_num	t1.price	t2.order_num
    42	          300.50	  NULL
    141	         250.25	  NULL