Using Apache HivePDF version

Using optimizations from a subquery

You can create a query having a subquery that the optimizer rewrites based on a materialized view. You create a materialized view, and then run a query that uses the materialized view.

In this task, you create a materialized view and use it in a subquery to return the number of destination-origin pairs. Suppose the data resides in a table named flights_data that has the following columns:
c_id dest origin
1 Chicago Hyderabad
2 London Moscow
...
  1. Create a table schema definition named flights_data for destination and origin data.
    CREATE TABLE flights_data(
      c_id INT,
      dest VARCHAR(256),
      origin VARCHAR(256));
  2. Create a materialized view that counts destinations and origins.
    CREATE MATERIALIZED VIEW mv1
    AS
      SELECT dest, origin, count(*)
      FROM flights_data
      GROUP BY dest, origin;
  3. Take advantage of the materialized view to speed your queries when you have to count destinations and origins again.

    For example, use a subquery to select the number of destination-origin pairs like the materialized view.

    SELECT count(*)/2
    FROM(
      SELECT dest, origin, count(*)
      FROM flights_data
      GROUP BY dest, origin
    ) AS t;
    Transparently, the SQL engine uses the work already in place since creation of the materialized view instead of reprocessing.

We want your opinion

How can we improve this page?

What kind of feedback do you have?