Identifying inefficient phases of your Hive queries

Identify inefficient phases of your Hive queries for query optimization and performance tuning, such as viewing the execution phases, the order in which the operations are executed, comparing two execution plans, and validating the events performed.

Describes how to locate the Cloudera Observability Hive SQL Query Plan and DAG, the Hive Query Plan Graph, and the Counters and Configuration panels for identifying and troubleshooting inefficient operational phases of your Hive queries.

  1. Verify that you are logged in to the Cloudera Observability web UI and that you selected an environment from the Analytics Environments page.
    1. In a supported browser, log into the Cloudera Data Platform (CDP).
      The CDP web interface landing page opens.
    2. From the Your Enterprise Data Cloud landing page, select the Observability tile.
      The Cloudera Observability landing page opens to the main navigation panel.
    3. From the Cloudera Observability Environments page, select the environment required for analysis.

      The Environment navigation panel opens.

  2. Depending on the environment selected, verify that the Cluster Summary page is displayed for the environment's cluster required for analysis.
    To display the Cluster Summary page, do one of the following:
    • For Private Cloud Base and Classic Cluster environment type, verify that the Cluster Summary title is displayed in the browser tab.
    • For Data Hub and Database Catalog environment types, expand its Data Lake category in the Environment's panel and then locate and select the cluster or Virtual Warehouse of interest.

    The Cluster Summary page displays a series of performance trends and metric chart widgets about the processed jobs and queries.

  3. Optional: From the time-range list in the Cluster Summary page, select a time period that meets your requirements.
  4. Locate the Hive query of interest by doing one of the following:
    • In the Cluster Summary page, locate the Hive Query Trend chart widget, click its Total Queries value, and then from the Job column in the Queries page, locate and click the query of interest.
    • If not already expanded, from the Environment navigation panel, expand the ENGINES or Virtual Warehouse category and then select the Hive engine. Locate the Slow Queries chart widget and click the query of interest.
  5. From the .../Hive/Queries/ queryname page, select the Execution Details tab.
    The execution stages appear, displaying the Query Details Summary panel.
  6. To review the query's execution instructions and logical steps, do the following:
    1. In the stages column, verify that the query and not the DAG is selected.
    2. From the Query Details Summary panel, click Text.

    The query's Query Plan panel opens.

    The query plan displays the execution statistics in a list of execution stages that include the execution instructions and steps, such as the operations that are performed, the operators that are used, the resources that are allocated, and the stage dependencies. These stages can help you diagnose and improve a query's performance.

  7. To visually display a graphical representation of the Query Plan's DAG, which contains individual components that represent each event and the order they are executed, do the following:
    1. In the stages column, verify that the query and not the DAG is selected.
    2. From the Query Details Summary panel, click Graphical.

    The query's Query Plan Graph page opens in another tab of your browser.

    The Query Plan Graph displays the order of events and the steps and phases of the query. This page enables you to visually inspect where each operation is executed and if the order is the most efficient. For example, an operator that could be draining your CPU and memory because it is joining tables that contain a large number of records before a filtering operation was performed.

    You can also view an operator's location within the Query Plan and the operator's execution details by doing the following:
    • To display where the operator is located in the Query Plan, hover over an event box.
    • To display, in the right-side panel, the operator's execution details from the Query Plan, click on an event box.
  8. To identify and validate which tasks completed or are taking too long to run, do the following:
    1. In the stages column, click the dag_xxx link. Where, xxx is the DAG ID number.
    2. From the Dag Details Summary panel, click Counters.

    The Counters panel opens.

    This panel lists in detail the events performed and the total number of occurrences, which enable you to track, compare, and validate the events that were run for the query. For example, you can verify that the correct number of tasks were run and completed, that the number of records, rows, and the amount of bytes were read and written, and that the correct amount of CPU and memory was consumed for the query.

  9. To verify that the query's configuration settings align with your expectations, do one or more of the following:
    • To understand the query's execution configuration setting details:
      1. In the stages column, click the query link.
      2. From the Query Details Summary panel, click Configurations.
    • To understand the query's DAG execution configuration setting details:
      1. In the stages column, click the dag_xxx link. Where, xxx is the DAG ID number.
      2. From the Dag Details Summary panel, click Configurations.
  10. To troubleshoot performance-related issues between two different runs of the same query, do the following:
    1. From the query's page, select the Trends tab.
    2. Scroll down and from the table, select the check boxes adjacent to the query's job runs that you require, such as the latest run with a run from a week ago, and then click Compare.
      The Job Comparison page opens displaying more details about each job.
    3. From the Details section, select the Query Plan tab.

    You can view and analyze the selected query plans Side By Side or as a Unified plan that highlights the differences in color, which enables you to quickly identify what changed between the selected execution runs of the query.

    The Job Comparison page not only enables you to compare the query plans but also the following:
    • The Duration, Data Input, and Data Output of the selected job runs from the Performance section.
    • Their run-times by selecting the Structure tab.
    • Configuration differences by selecting the Configurations tab.
    • Statistical differences by selecting the Metrics tab.