This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

COUNT Function

An aggregate function that returns the number of rows, or the number of non-NULL rows, that meet certain conditions:

The notation COUNT(*) includes NULL values in the total.
The notation COUNT(column_name) only considers rows where the column contains a non-NULL value.
You can also combine COUNT with the DISTINCT operator to eliminate duplicates before counting, and to count the combinations of values across multiple columns.

When the query contains a GROUP BY clause, returns one value for each combination of grouping values.

Return type: BIGINT

Examples:

-- How many rows total are in the table, regardless of NULL values?
select count(*) from t1;
-- How many rows are in the table with non-NULL values for a column?
select count(c1) from t1;
-- Count the rows that meet certain conditions.
-- Again, * includes NULLs, so COUNT(*) might be greater than COUNT(col).
select count(*) from t1 where x > 10;
select count(c1) from t1 where x > 10;
-- Can also be used in combination with DISTINCT and/or GROUP BY.
-- Combine COUNT and DISTINCT to find the number of unique values.
-- Must use column names rather than * with COUNT(DISTINCT ...) syntax.
-- Rows with NULL values are not counted.
select count(distinct c1) from t1;
-- Rows with a NULL value in _either_ column are not counted.
select count(distinct c1, c2) from t1;
-- Return more than one result.
select month, year, count(distinct visitor_id) from web_stats group by month, year;

Note:

Impala only allows a single COUNT(DISTINCT columns) expression in each query.

If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by specifying NDV(column); a query can contain multiple instances of NDV(column).

To produce the same result as multiple COUNT(DISTINCT) expressions, you can use the following technique for queries involving a single table:

select v1.c1 result1, v2.c1 result2 from
  (select count(distinct col1) as c1 from t1) v1
    cross join
  (select count(distinct col2) as c1 from t1) v2;

Because CROSS JOIN is an expensive operation, prefer to use the NDV() technique wherever practical.

Page generated September 3, 2015.