Use materialized view optimations from a subquery
You can create a query having a subquery that the optimizer rewrites based on a materialized view. You create a materialized view, and then run a query that Hive optimizes using that materialized view.
flights_hdfsthat has the following data:
Create a table schema definition named flights_hdfs for destination and origin
CREATE TABLE flights_hdfs( c_id INT, dest VARCHAR(256), origin VARCHAR(256));
Create a materialized view that counts destinations and origins.
CREATE MATERIALIZED VIEW mv1 AS SELECT dest, origin, count(*) FROM flights_hdfs GROUP BY dest, origin;
Take advantage of the materialized view to speed your queries when you have to
count destinations and origins again.
For example, use a subquery to select the number of destination-origin pairs like the materialized view.
SELECT count(*)/2 FROM( SELECT dest, origin, count(*) FROM flights_hdfs GROUP BY dest, origin ) AS t;Transparently, Hive uses the work already in place since creation of the materialized view instead of reprocessing.