Benefits of Using Navigator Optimizer
Cloudera Navigator Optimizer provides a web UI that you can use to profile and analyze the SQL text in large, complex SQL workloads. This analysis helps you gain in-depth understanding of your workloads, identify queries best-suited for Hadoop, and optimize Hive or Impala workloads. It also shows you which workloads can be offloaded without redesign and with minimal risk.
How Does Navigator Optimizer Deliver Workload Insights?
Navigator Optimizer primarily searches workloads for:
|Duplication||Many workloads contain duplicate queries and sets of queries. Navigator Optimizer uses advanced semantic analysis and algorithms to determine the core set of unique queries in workloads. With a significantly smaller set of queries to analyze, you can more easily focus on query optimization opportunities for offloading to Hadoop.|
|Complexity||Categorizing queries by analyzing their complexity and determining frequency patterns across the workload enables you to discover optimization opportunities across the workload and on a “per-query” basis. To determine complexity, Navigator Optimizer analyzes SQL constructs in the workloads, such as joins, subqueries, aggregation functions, inline views, CASE expressions, UNIONs, and so on.|
|Compatibility||Checking the syntax compatibility between the SQL query from an EDW system against the supported SQL syntax for a SQL-on-Hadoop platform enables you to find and resolve query syntax differences so you can offload targeted workloads to Hadoop.|
How Can You Use Navigator Optimizer?
Navigator Optimizer can be used by Business Intelligence architects, database administrators, and data analysts to:
- Assess their SQL workloads, providing a dashboard or a deep dive UI where they can:
- Glean business logic from the workload queries.
- View the most frequently used tables, queries, joins, and column patterns in the workload.
- Determine where complexities lie in the workload.
- Analyze optimization opportunities, enabling them to explore table and query access patterns such as:
- Join patterns and join distributions.
- Query access patterns. For example, subqueries.
- Column usage information, such as filter columns.
- Identify tables and queries for optimization or offloading, based on risk and by using metrics, such as:
- Number of times similar queries are repeated in a workload.
- Average elapsed time for queries to execute.
- Group queries into custom categories for analysis.
- View risk alerts and recommendations based on Hive and Impala best practices.
- Get design recommendations for Hadoop, such as:
- Data modeling and schema design recommendations.
- Partitioning key design recommendations for Impala and Hive.
- Query design recommendations. For example, when using a materialized view may be preferable to an inline view.