Unified Analytics architecture
The architecture of Unified Analytics supports Cloudera's long-term vision to make Impala and Hive engines equivalent in SQL functionality without syntax changes. You learn the part that each major Unified Analytics component in Cloudera Data Warehouse (CDW) plays in improving your analytics experience.
The architecture brings significant optimization equivalency to the both SQL engines. Each engine has its own style of doing optimizations within the query processor, but there are common techniques such as subquery processing, join ordering and materialized views, which the architecture unifies. The architecture is compatible with the HiveServer (HS2) protocol.
The following diagram shows the Unified Analytics architecture.
- Clients
- Connect to the Unified Analytics endpoint parser and validator.
- Ranger
- Secures data and supports column masking and row filtering in both engines; whereas, only column masking is available to the un-unified Impala.
- Metastore, metadata catalog and caching service
- Shared by both engines
- Calcite planner
- Receives SQL queries from the parser/validator, and generates a plan for Hive or Impala.
- Optimized Calcite plan
- Generated for Hive or Impala.
- Tez compiler
- Hands off Tez tasks that run on LLAP daemons to the Tez Application Manager.
- Impala session
- Connects to the coordinator, runs on executors, and the Web UI shows runtime query profiles for diagnosing problems.
- Tez AM and LLAP daemons
- The Tez Application Manager (AM) runs the Hive job in LLAP mode, obtaining free threads from LLAP daemons to quickly execute fragments of the plan.
- Impala plan
- Capabilities and strengths of Impala, such as different types of join algorithms, different types of distribution, broadcast and partitioning for example, generate an Impala specific plan if your client connected to the Impala Virtual Warehouse.
- Function validation
- Type system implementation
- Expressions simplification
- Static partition pruning
Partition pruning and predicate push-down are supported by Hive and Impala. Unified Analytics seamlessly handles partition pruning semantics, which can differ because you can have expressions in the pruning predicates. Support for join reordering, materialized view rewriting, and other capabilities, formerly only in Hive, are available in Impala. The query results cache supported in Hive 3 is also now available to Impala.