Impala analytic functions
Analytic functions (also known as window functions) are a special category of built-in
functions. Like aggregate functions, they examine the contents of multiple input rows to compute
each output value. However, rather than being limited to one result value per GROUP
BY
group, they operate on windows where the input rows are ordered and
grouped using flexible conditions expressed through an OVER()
clause.
Some functions, such as LAG()
and RANK()
, can only be used in this analytic
context. Some aggregate functions do double duty: when you call the aggregation functions such as
MAX()
, SUM()
, AVG()
, and so on with an
OVER()
clause, they produce an output value for each row, based on computations across other
rows in the window.
Although analytic functions often compute the same value you would see from an aggregate function in a
GROUP BY
query, the analytic functions produce a value for each row in the result set rather
than a single value for each group. This flexibility lets you include additional columns in the
SELECT
list, offering more opportunities for organizing and filtering the result set.
Analytic function calls are only allowed in the SELECT
list and in the outermost
ORDER BY
clause of the query. During query processing, analytic functions are evaluated
after other query stages such as joins, WHERE
, and GROUP BY
,
The rows that are part of each partition are analyzed by computations across an ordered or unordered set of
rows. For example, COUNT()
and SUM()
might be applied to all the rows in
the partition, in which case the order of analysis does not matter. The ORDER BY
clause
might be used inside the OVER()
clause to defines the ordering that applies to functions
such as LAG()
and FIRST_VALUE()
.
Analytic functions are frequently used in fields such as finance and science to provide trend, outlier, and
bucketed analysis for large data sets. You might also see the term window functions
in database
literature, referring to the sequence of rows (the window
) that the function call applies to,
particularly when the OVER
clause includes a ROWS
or RANGE
keyword.