Locations of Impala Log Files in Azure

This topic describes how to identify the Azure locations of Impala logs for the different Impala components.

The Cloudera Data Warehouse service collects logs from Impala Virtual Warehouses and uploads them to the Azure storage account that was provided while registering the Environment. This ABFS log location is configured under an external warehouse directory so that the logs are preserved even if the Virtual Warehouse they are collected from is destroyed.

To identify the location of the logs in Azure, you must have the environment_ID, database_catalog_ID, and impala_ID identifiers and to access the logs from the Azure Portal you must know your storage account name.

Finding the environment_ID, database_catalog_ID, and impala_ID identifiers
  1. In the Data Warehouse service, expand the Environments column by clicking More….
  2. From the Overview page, note down the environment_ID, database_catalog_ID, and impala_ID identifiers.
Retrieving your storage account name
  1. In the Management Console navigate to the Environments page.
  2. On the Environments page, click on your Environment and click on the Summary tab.
  3. Scroll down to the Logs Storage and Audits section.
    Note down your storage account name.
Accessing the different directories in the Azure Portal
  1. Log in to the Azure Portal and search for your storage account name using the Search bar.
  2. On the Overview page of your storage account, click on the Containers menu.
  3. Click on the file system you used during the Environment registration.
Log locations in ABFS
  1. Use the environment_ID, database_catalog_ID, and impala_ID identifiers, in the following prefix to find the logs generated by specific components in the following directories. Use the different directories listed here to view Impala/Hue logs
    PREFIX = 
    /clusters/<environment_ID>/<database_catalog_ID>/warehouse/tablespace/external/hive/sys.db/logs/dt=<date_stamp>/ns=<impala_ID>/
    Impala component ABFS directory location
    impalad PREFIX + “app=impala-executor-log
    catalogd PREFIX + “app=catalogd-log
    coordinator PREFIX + “app=coordinator-log
    auto-scaler PREFIX + “app=impala-autoscaler-log
    Hue

    PREFIX + “app=huebackend-log

    PREFIX + “app=hue-huedb-create-job-log

    PREFIX + “app=huefrontend-log

    statestored PREFIX + “app=statestored-log

    The impalad logs for 8 March 2020 are located in the following ABFS location:

    /clusters/<environment_ID>/<database_catalog_ID>/warehouse/tablespace/external/hive/sys.db/logs/dt=2020-03-08/ns=<impala_ID>/app=impala-executor-log/
    

    In the above location, you can find multiple logs that were generated on the specified day.

Impala Minidumps
  1. Impala minidumps can be found under the ‘debug-artifacts/impala’ directory
    /clusters/<environment_ID>/<database_catalog_ID>/warehouse/debug-artifacts/impala/<impala_ID>/minidump/<pod_name>/
Impala Query Profiles
  1. Impala query profiles are written in thrift encoded format in this location:
    Impala component S3 directory location
    Impala query profiles PREFIX + “app=impala-profiles
    Use the binary tool to decode thrift to text. This binary tool is provided with the upstream runtime Impala 4.0 as a docker image. Run the following command to use this tool.
    docker run -i apache/impala:4.0.0-impala_profile_tool < name of the thrift encoded file to decode
    You can use the docker image available here to use this decoding tool.