OVER clause is required for calls to pure
analytic functions such as
When you include an
OVER clause with calls to
aggregate functions such as
operate as analytic functions.
function(args) OVER([partition_by_clause] [order_by_clause [window_clause]]) partition_by_clause ::= PARTITION BY expr [, expr ...] order_by_clause ::= ORDER BY expr [ASC | DESC] [NULLS FIRST | NULLS LAST] [, expr [ASC | DESC] [NULLS FIRST | NULLS LAST] ...] window_clause: See Window clause
PARTITION BY clause:
PARTITION BY clause acts much like the
GROUP BY clause in the outermost block of a
query. It divides the rows into groups containing identical values
in one or more columns. These logical groups are known as
partitions. Throughout the discussion of analytic
functions, “partitions” refers to the groups produced by the
PARTITION BY clause, not to partitioned tables.
However, note the following limitation that applies specifically
to analytic function calls involving partitioned tables.
In queries involving both analytic functions and partitioned tables, partition pruning only occurs for columns named in the
clause of the analytic function call. For example, if an analytic function query has a clause such as
the way to make the query prune all other
YEAR partitions is to include
PARTITION BY year in the analytic function call;
OVER (PARTITION BY year,other_columns other_analytic_clauses).
The sequence of results from an analytic function “resets”
for each new partition in the result set. That is, the set of
preceding or following rows considered by the analytic function
always come from a single partition. Any
ROW_NUMBER(), and so on
apply to each partition independently. Omit the
BY clause to apply the analytic operation to all the
rows in the table.
ORDER BY clause:
ORDER BY clause works much like the
ORDER BY clause in the outermost block of a
query. It defines the order in which rows are evaluated for the
entire input set, or for each group produced by a
PARTITION BY clause. You can order by one or
multiple expressions, and for each expression optionally choose
ascending or descending order and whether nulls come first or last
in the sort order. Because this
ORDER BY clause
only defines the order in which rows are evaluated, if you want
the results to be output in a specific order, also include an
ORDER BY clause in the outer block of the
ORDER BY clause is omitted, the
analytic function applies to all items in the group produced by
PARTITION BY clause. When the
BY clause is included, the analysis can apply to all or
a subset of the items in the group, depending on the optional
The order in which the rows are analyzed is only defined for
those columns specified in
ORDER BY clauses.
One difference between the analytic and outer uses of the
ORDER BY clause: inside the
ORDER BY 1 or
other integer value is interpreted as a constant sort value
(effectively a no-op) rather than referring to column 1.
The window clause is only allowed in combination with an
ORDER BY clause. If the
BY clause is specified but the window clause is not,
the default window is
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW.
Because HBase tables are optimized for single-row lookups rather
than full scans, analytic functions using the
OVER() clause are not recommended for HBase
tables. Although such queries work, their performance is lower
than on comparable tables using HDFS data files.
Analytic functions are very efficient for Parquet tables. The
data that is examined during evaluation of the
OVER() clause comes from a specified set of
columns, and the values for each column are arranged sequentially
within each data file.
Text table considerations:
Analytic functions are convenient to use with text tables for exploratory business intelligence. When the volume of data is substantial, prefer to use Parquet tables for performance-critical analytic queries.
The following example shows how to synthesize a numeric sequence
corresponding to all the rows in a table. The new table has the
same columns as the old one, plus an additional column
ID containing the integers 1, 2, 3, and so on,
corresponding to the order of a
in the original table.
CREATE TABLE events_with_id AS SELECT row_number() OVER (ORDER BY date_and_time) AS id, c1, c2, c3, c4 FROM events;
The following example shows how to determine the number of rows
containing each value for a column. Unlike a corresponding
GROUP BY query, this one can analyze a single
column and still return all values (not just the distinct ones)
from the other columns.
SELECT x, y, z, count() OVER (PARTITION BY x) AS how_many_x FROM t1;
You cannot directly combine the
operator with analytic function calls. You can put the analytic
function call in a
WITH clause or an inline view,
and apply the
DISTINCT operator to its result
WITH t1 AS (SELECT x, sum(x) OVER (PARTITION BY x) AS total FROM t1) SELECT DISTINCT x, total FROM t1;