Overview of Impala functions
Functions let you apply arithmetic, string, or other computations and transformations
to Impala data. You typically use them in SELECT
lists and
WHERE
clauses to filter and format query results so that the result set is
exactly what you want, with no further processing needed on the application side.
Scalar functions return a single result for each input row.
[localhost:21000] > select name, population from country where continent = 'North America' order by population desc limit 4;
[localhost:21000] > select upper(name), population from country where continent = 'North America' order by population desc limit 4;
+-------------+------------+
| upper(name) | population |
+-------------+------------+
| USA | 320000000 |
| MEXICO | 122000000 |
| CANADA | 25000000 |
| GUATEMALA | 16000000 |
+-------------+------------+
Aggregate functions combine the results from multiple rows: either a single result for the
entire table, or a separate result for each group of rows. Aggregate functions are
frequently used in combination with GROUP BY
and HAVING
clauses in the SELECT
statement.
[localhost:21000] > select continent, sum(population) as howmany from country group by continent order by howmany desc;
+---------------+------------+
| continent | howmany |
+---------------+------------+
| Asia | 4298723000 |
| Africa | 1110635000 |
| Europe | 742452000 |
| North America | 565265000 |
| South America | 406740000 |
| Oceania | 38304000 |
+---------------+------------+
User-defined functions (UDFs) let you code your own logic. They can be either scalar or aggregate functions. UDFs let you implement important business or scientific logic using high-performance code for Impala to automatically parallelize. You can also use UDFs to implement convenience functions to simplify reporting or porting SQL from other database systems.
[localhost:21000] > select rot13('Hello world!') as 'Weak obfuscation';
+------------------+
| weak obfuscation |
+------------------+
| Uryyb jbeyq! |
+------------------+
[localhost:21000] > select likelihood_of_new_subatomic_particle(sensor1, sensor2, sensor3) as probability
> from experimental_results group by experiment;
Each function is associated with a specific database. For example, if you issue a USE somedb
statement followed by CREATE FUNCTION somefunc
, the new function is created in the
somedb
database, and you could refer to it through the fully qualified name
somedb.somefunc
. You could then issue another USE
statement
and create a function with the same name in a different database.
Impala built-in functions are associated with a special database named _impala_builtins
,
which lets you refer to them from any database without qualifying the name.
[localhost:21000] > show databases;
+-------------------------+
| name |
+-------------------------+
| _impala_builtins |
| analytic_functions |
| avro_testing |
| data_file_size |
...
[localhost:21000] > show functions in _impala_builtins like '*subs*';
+-------------+-----------------------------------+
| return type | signature |
+-------------+-----------------------------------+
| STRING | substr(STRING, BIGINT) |
| STRING | substr(STRING, BIGINT, BIGINT) |
| STRING | substring(STRING, BIGINT) |
| STRING | substring(STRING, BIGINT, BIGINT) |
+-------------+-----------------------------------+