CDW stored procedures
Cloudera Data Warehouse (CDW) supports Hive Hybrid Procedural SQL (HPL/SQL). HPL/SQL is an Apache open source procedural extension for SQL for Hive users. You connect over JDBC to CDW from a client to run HPL/SQL queries. You can also run stored procedures from Hue in CDW.
HPL/SQL includes imperative programming structures (variables, procedures, control flow, and exceptions), and is typically used for ETL. The HPL/SQL language understands the syntax and semantics of most procedural SQL dialects, such as Oracle PL/SQL.
HPL/SQL has its own grammar. For more information about the HPL/SQL language, see the HPL/SQL Reference.
HPL/SQL architecture
HPL/SQL has been re-architected from a command line tool to an integrated part of HiveServer (HS2). From a JDBC client, such as Beeline, you connect to HiveServer through CDW. The interpreter executes the abstract syntax tree (AST) from the parser. Hive metastore securely stores the function and procedure code permanently. The procedure is loaded and cached on demand to the interpreter's memory when needed. You can close the session or restart Hive without losing the definitions.
You can enable and use HPL/SQL from any host or third-party tool that can make a JDBC connection to HiveServer. Beeline is a popular client for use with HPL/SQL because other third-party tools do not show you some of the error messages about syntax mistakes.
Enabling HPL/SQL in the beeline connection string
After setting up a client to connect to CDW, you append
mode=hplsql
to the JDBC URL that connects the client to CDW:
beeline -u "jdbc:hive2://<HiveServer host>:10000/default;mode=hplsql"
When the client connects to HiveServer in CDW configuring this mode, HPL/SQL is enabled; otherwise, HPL/SQL is disabled.
HPL/SQL key features
- Flow of Control Statements (FOR, WHILE, IF, CASE, LOOP, LEAVE, RETURN)
- Functions, procedures, and packages
- Built-in functions (string manipulations, datetime functions, conversions)
- Exception handling and conditions
- Constants and variable, assignment (DECLARE count INT := 1)
- Processing results using a CURSOR
- UDF to run HPL/SQL scripts from Hive queries
(SELECT hplsql('mycustomfunc(:1)', name) FROM users;)
- Bulk data processing with BULK COLLECT statement
HPL/SQL limitations
- Some of the Hive specific CREATE TABLE parameters are missing.
- No colon syntax to parametrize SQL strings.
- No quoted string literals.
- No GOTO and Label.
- EXECUTE does not have output parameters.
- Some complex data types, such as Arrays and Records are not supported.
- No object-oriented extension.
- Data Analytics Studio (DAS) does not support HPL/SQL.