Unified Analytics overview

You can take advantage of SQL engine enhancements in the Cloudera Data Platform (CDP) by using Unified Analytics. Unified Analytics includes semantics commonality, backward compatibility, and optimizations.

This release of Unified Analytics is the first product of a long-term vision that brings SQL equivalency without syntax changes to CDW SQL engines. Unified Analytics is available in the following CDP environments:
  • AWS
  • Azure
Unified Analytics SQL engines support SQL:2016 syntax and semantics, including the following key operations:
  • Automatic query rewrites to use materialized views
  • Command-line materialized view recommender
  • DataSketches functions and rewrites
  • Ranger column masking and row filtering
  • Query results cache
  • SQL set operations and grouping sets
  • Atlas integration
  • Extensive subquery support
  • Advanced join reordering with bushy plans generation
  • Integrity constraints-based rewritings
  • User defined functions (UDFs) in Hive
  • Other extensions to query optimization, such as column pruning, sort/limit merge and pushdown

Unified Analytics also brings significant optimization equivalency to the SQL engines, unifying common techniques such as subquery processing, join ordering and materialized views.

Lexical conventions

The default behavior for SQL queries in Unified Analytics is to use single quotation marks for the literals. The ANSI SQL standard is to use single quotation marks for string literals. If a single quotation mark or special character appears within that string literal, it needs to be escaped. For example:
INSERT INTO MOVIES_INFO VALUES
(1,cast('Toy Story (1995)' as varchar(50)), 'Animation|Children\'s|Comedy'),
...
You can change the SQL behavior. In your Virtual Warehouse, click Edit > Configurations > HiveServer2, and add the hive.support.quoted.identifiers property. Set this property to one of the following values:
  • none

    Quotation of identifiers and special characters in identifiers are not allowed, but regular expressions in backticks are supported for column names.

  • column

    Use the backtick character to enclose identifiers having special characters. `col1`. Use single quotation marks to enclose string literals, for example: 'value'. Double quotation marks are also accepted, but not recommended.

  • standard (default)

    SQL standard way to enclose identifiers. Use double quotation marks to enclose identifiers having special characters "col1" and single quotation marks for string literals, for example 'value'.

Limitations

  • Unified Analytics does not support left/right ANTI JOIN syntax.
  • Unified Analytics does not support complex types - ARRAY, STRUCT, MAP.