Unified Analytics overview

You can take advantage of SQL engine enhancements in the Cloudera Data Platform (CDP) by using Unified Analytics. Unified Analytics includes semantics commonality, backward compatibility, and optimizations.

This release of Unified Analytics is the first product of a long-term vision that brings SQL equivalency without syntax changes to CDW SQL engines. Unified Analytics is available in the following CDP environments:

Red Hat OpenShift
Embedded Container Service (ECS)

Unified Analytics SQL engines support SQL:2016 syntax and semantics, including the following key operations:

Automatic query rewrites to use materialized views
Command-line materialized view recommender
DataSketches functions and rewrites
Ranger column masking and row filtering
Query results cache
SQL set operations and grouping sets
Atlas integration
Extensive subquery support
Advanced join reordering with bushy plans generation
Integrity constraints-based rewritings
User defined functions (UDFs) in Hive
Other extensions to query optimization, such as column pruning, sort/limit merge and pushdown

Unified Analytics also brings significant optimization equivalency to the SQL engines, unifying common techniques such as subquery processing, join ordering and materialized views.

Lexical conventions🔗

The default behavior for SQL queries in Unified Analytics is to use single quotation marks for the literals. The ANSI SQL standard is to use single quotation marks for string literals. If a single quotation mark or special character appears within that string literal, it needs to be escaped. For example:

INSERT INTO MOVIES_INFO VALUES
(1,cast('Toy Story (1995)' as varchar(50)), 'Animation|Children\'s|Comedy'),
...

You can change the SQL behavior. In your Virtual Warehouse, click Edit > Configurations > HiveServer2, and add the hive.support.quoted.identifiers property. Set this property to one of the following values:

none
Quotation of identifiers and special characters in identifiers are not allowed, but regular expressions in backticks are supported for column names.
column
Use the backtick character to enclose identifiers having special characters. `col1`. Use single quotation marks to enclose string literals, for example: 'value'. Double quotation marks are also accepted, but not recommended.
standard (default)
SQL standard way to enclose identifiers. Use double quotation marks to enclose identifiers having special characters "col1" and single quotation marks for string literals, for example 'value'.

Limitations🔗

Unified Analytics does not support left/right ANTI JOIN syntax.
Unified Analytics does not support complex types - ARRAY, STRUCT, MAP.