Overview of Impala functions

Functions let you apply arithmetic, string, or other computations and transformations to Impala data. You typically use them in SELECT lists and WHERE clauses to filter and format query results so that the result set is exactly what you want, with no further processing needed on the application side.

Scalar functions return a single result for each input row.

[localhost:21000] > select name, population from country where continent = 'North America' order by population desc limit 4;
[localhost:21000] > select upper(name), population from country where continent = 'North America' order by population desc limit 4;
+-------------+------------+
| upper(name) | population |
+-------------+------------+
| USA         | 320000000  |
| MEXICO      | 122000000  |
| CANADA      | 25000000   |
| GUATEMALA   | 16000000   |
+-------------+------------+

Aggregate functions combine the results from multiple rows: either a single result for the entire table, or a separate result for each group of rows. Aggregate functions are frequently used in combination with GROUP BY and HAVING clauses in the SELECT statement.

[localhost:21000] > select continent, sum(population) as howmany from country group by continent order by howmany desc;
+---------------+------------+
| continent     | howmany    |
+---------------+------------+
| Asia          | 4298723000 |
| Africa        | 1110635000 |
| Europe        | 742452000  |
| North America | 565265000  |
| South America | 406740000  |
| Oceania       | 38304000   |
+---------------+------------+

User-defined functions (UDFs) let you code your own logic. They can be either scalar or aggregate functions. UDFs let you implement important business or scientific logic using high-performance code for Impala to automatically parallelize. You can also use UDFs to implement convenience functions to simplify reporting or porting SQL from other database systems.

[localhost:21000] > select rot13('Hello world!') as 'Weak obfuscation';
+------------------+
| weak obfuscation |
+------------------+
| Uryyb jbeyq!     |
+------------------+
[localhost:21000] > select likelihood_of_new_subatomic_particle(sensor1, sensor2, sensor3) as probability
                  > from experimental_results group by experiment;

Each function is associated with a specific database. For example, if you issue a USE somedb statement followed by CREATE FUNCTION somefunc, the new function is created in the somedb database, and you could refer to it through the fully qualified name somedb.somefunc. You could then issue another USE statement and create a function with the same name in a different database.

Impala built-in functions are associated with a special database named _impala_builtins, which lets you refer to them from any database without qualifying the name.

[localhost:21000] > show databases;
+-------------------------+
| name                    |
+-------------------------+
| _impala_builtins        |
| analytic_functions      |
| avro_testing            |
| data_file_size          |
...
[localhost:21000] > show functions in _impala_builtins like '*subs*';
+-------------+-----------------------------------+
| return type | signature                         |
+-------------+-----------------------------------+
| STRING      | substr(STRING, BIGINT)            |
| STRING      | substr(STRING, BIGINT, BIGINT)    |
| STRING      | substring(STRING, BIGINT)         |
| STRING      | substring(STRING, BIGINT, BIGINT) |
+-------------+-----------------------------------+