Using Apache HivePDF version

Use materialized view optimations from a subquery

You can create a query having a subquery that the optimizer rewrites based on a materialized view. You create a materialized view, and then run a query that Hive optimizes using that materialized view.

In this task, you create a materialized view and use it in a subquery to return the number of destination-origin pairs. Suppose the data resides in a table named flights_hdfs that has the following data:
c_id dest origin
1 Chicago Hyderabad
2 London Moscow
...
  1. Create a table schema definition named flights_hdfs for destination and origin data.
    CREATE TABLE flights_hdfs(
      c_id INT,
      dest VARCHAR(256),
      origin VARCHAR(256));
  2. Create a materialized view that counts destinations and origins.
    CREATE MATERIALIZED VIEW mv1
    AS
      SELECT dest, origin, count(*)
      FROM flights_hdfs
      GROUP BY dest, origin;
  3. Take advantage of the materialized view to speed your queries when you have to count destinations and origins again.

    For example, use a subquery to select the number of destination-origin pairs like the materialized view.

    SELECT count(*)/2
    FROM(
      SELECT dest, origin, count(*)
      FROM flights_hdfs
      GROUP BY dest, origin
    ) AS t;
    Transparently, Hive uses the work already in place since creation of the materialized view instead of reprocessing.

We want your opinion

How can we improve this page?

What kind of feedback do you have?