Using Apache HivePDF version

Use materialized view optimations from a subquery

You can create a query having a subquery that the optimizer rewrites based on a materialized view. You create a materialized view, and then run a query that Hive optimizes using that materialized view.

In this task, you create a materialized view and use it in a subquery to return the number of destination-origin pairs. Suppose the data resides in a table named flights_hdfs that has the following data:
c_id dest origin
1 Chicago Hyderabad
2 London Moscow
...
  1. Create a table schema definition named flights_hdfs for destination and origin data.
    CREATE TABLE flights_hdfs(
      c_id INT,
      dest VARCHAR(256),
      origin VARCHAR(256));
  2. Create a materialized view that counts destinations and origins.
    CREATE MATERIALIZED VIEW mv1
    AS
      SELECT dest, origin, count(*)
      FROM flights_hdfs
      GROUP BY dest, origin;
  3. Take advantage of the materialized view to speed your queries when you have to count destinations and origins again.

    For example, use a subquery to select the number of destination-origin pairs like the materialized view.

    SELECT count(*)/2
    FROM(
      SELECT dest, origin, count(*)
      FROM flights_hdfs
      GROUP BY dest, origin
    ) AS t;
    Transparently, Hive uses the work already in place since creation of the materialized view instead of reprocessing.