Use materialized view optimations from a subquery

You can create a query having a subquery that the optimizer rewrites based on a materialized view. You create a materialized view, and then run a query that Hive optimizes using that materialized view.

In this task, you create a materialized view and use it in a subquery to return the number of destination-origin pairs. Suppose the data resides in a table named flights_hdfs that has the following data:


c_id	dest	origin
1	Chicago	Hyderabad
2	London	Moscow
...

Create a table schema definition named flights_hdfs for destination and origin data.
```
CREATE TABLE flights_hdfs(
  c_id INT,
  dest VARCHAR(256),
  origin VARCHAR(256));
```

Create a materialized view that counts destinations and origins.

CREATE MATERIALIZED VIEW mv1
AS
  SELECT dest, origin, count(*)
  FROM flights_hdfs
  GROUP BY dest, origin;

Take advantage of the materialized view to speed your queries when you have to count destinations and origins again.
For example, use a subquery to select the number of destination-origin pairs like the materialized view.
```
SELECT count(*)/2
FROM(
  SELECT dest, origin, count(*)
  FROM flights_hdfs
  GROUP BY dest, origin
) AS t;
```
Transparently, Hive uses the work already in place since creation of the materialized view instead of reprocessing.

Use materialized view optimations from a subquery

We want your opinion

How can we improve this page?