STDDEV, STDDEV_SAMP, STDDEV_POP functions
An aggregate function that returns the standard deviation of a set of numbers.
Syntax:
{ STDDEV | STDDEV_SAMP | STDDEV_POP } ([DISTINCT | ALL] expression)
This function works with any numeric data type.
Return type: DOUBLE
in Impala 2.0 and higher; STRING
in earlier
releases
This function is typically used in mathematical formulas related to probability distributions.
The STDDEV_POP()
and STDDEV_SAMP()
functions compute the population
standard deviation and sample standard deviation, respectively, of the input values.
(STDDEV()
is an alias for STDDEV_SAMP()
.) Both functions evaluate all input
rows matched by the query. The difference is that STDDEV_SAMP()
is scaled by
1/(N-1)
while STDDEV_POP()
is scaled by 1/N
.
If no input rows match the query, the result of any of these functions is NULL
. If a single
input row matches the query, the result of any of these functions is "0.0"
.
Examples:
This example demonstrates how STDDEV()
and STDDEV_SAMP()
return the same
result, while STDDEV_POP()
uses a slightly different calculation to reflect that the input
data is considered part of a larger population
.
[localhost:21000] > select stddev(score) from test_scores;
+---------------+
| stddev(score) |
+---------------+
| 28.5 |
+---------------+
[localhost:21000] > select stddev_samp(score) from test_scores;
+--------------------+
| stddev_samp(score) |
+--------------------+
| 28.5 |
+--------------------+
[localhost:21000] > select stddev_pop(score) from test_scores;
+-------------------+
| stddev_pop(score) |
+-------------------+
| 28.4858 |
+-------------------+
This example demonstrates that, because the return value of these aggregate functions is a
STRING
, you must currently convert the result with CAST
.
[localhost:21000] > create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores;
+-------------------+
| summary |
+-------------------+
| Inserted 1 row(s) |
+-------------------+
[localhost:21000] > desc score_stats;
+--------------------+--------------+---------+
| name | type | comment |
+--------------------+--------------+---------+
| standard_deviation | decimal(7,4) | |
| variance | decimal(7,4) | |
+--------------------+--------------+---------+
Restrictions:
This function cannot be used in an analytic context. That is, the OVER()
clause is not allowed at all with this function.
Related information:
The STDDEV()
, STDDEV_POP()
, and STDDEV_SAMP()
functions compute the standard deviation (square root of the variance) based on the results
of VARIANCE()
, VARIANCE_POP()
, and
VARIANCE_SAMP()
respectively. See VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP functions for details about the variance property.