Locations of Impala Log Files in S3

This topic describes how to identify the Amazon S3 locations of Impala logs for the different Impala components.

The Cloudera Data Warehouse service collects logs from Impala Virtual Warehouses and uploads them to an Amazon S3 location. This S3 log location is configured under an external warehouse directory so that the logs are preserved even if the Virtual Warehouse they are collected from is destroyed.

To identify the location of the logs in S3, you must have the environment_ID, database_catalog_ID, impala_ID identifiers, and S3 bucket name.

Finding the environment_ID, database_catalog_ID, and impala_ID identifiers
  1. In the Data Warehouse service, expand the Environments column by clicking More….
  2. From the Overview page, note down the environment_ID, database_catalog_ID, and impala_ID identifiers.
Identifying the external bucket name
  1. On the Overview page, locate the environment for which you want to find the external bucket name.
  2. In the Environment tile, click the Options menu and select Edit.
  3. A dialog opens that shows the general details of the environment including the CDW External Bucket name.
    This name is required to identify the S3 location of the logs.
Log locations in S3
  1. Now that you have identified the S3 bucket name, environment_ID, database_catalog_ID, and impala_ID identifiers, use the following prefix to find the logs generated by specific components in the following directories. Use the different directories listed here to view Impala/Hue logs.
    PREFIX = 
    s3://<s3_bucket_name>/clusters/<environment_ID>/<database_catalog_ID>/warehouse/tablespace/external/hive/sys.db/logs/dt=<date_stamp>/ns=<impala_ID>
    Impala component S3 directory location
    impalad PREFIX + “app=impala-executor-log
    catalogd PREFIX + “app=catalogd-log
    coordinator PREFIX + “app=coordinator-log
    auto-scaler PREFIX + “app=impala-autoscaler-log
    Hue

    PREFIX + “app=huebackend-log

    PREFIX + “app=hue-huedb-create-job-log

    PREFIX + “app=huefrontend-log

    statestored PREFIX + “app=statestored-log
    hs2 (applies only to UA) PREFIX + “app=hiveserver2

    The impalad logs for 8 March 2020 are located in the following S3 location:

    s3://<s3_bucket_name>/clusters/<environment_ID>/<database_catalog_ID>/warehouse/tablespace/external/hive/sys.db/logs/dt=2020-03-08/ns=<impala_ID>/app=impala-executor-log/

    In the above location, you can find multiple logs that were generated on the specified day.

Impala Minidumps
  1. Impala minidumps can be found under the ‘debug-artifacts/impala’ directory
    /clusters/{{environment_ID}}/{{database_catalog_ID}}/warehouse/debug-artifacts/impala/{{impala_ID}}/minidump/$POD_NAME/$file
Impala Query Profiles
  1. Impala query profiles are written in thrift encoded format in this location:
    Impala component S3 directory location
    Impala query profiles PREFIX + “app=impala-profiles
    Use the binary tool to decode thrift to text. This binary tool is provided with the upstream runtime Impala 4.0 as a docker image. Run the following command to use this tool.
    docker run -i apache/impala:4.0.0-impala_profile_tool < name of the thrift encoded file to decode
    You can use the docker image available here to use this decoding tool.