This topic describes how to identify the Amazon S3 locations of Impala logs for the
different Impala components.
The Cloudera Data Warehouse service collects logs from Impala Virtual Warehouses and
uploads them to an Amazon S3 location. This S3 log location is configured under an
external warehouse directory so that the logs are preserved even if the Virtual
Warehouse they are collected from is destroyed.
To identify the location of the logs in S3, you must have the environment_ID,
database_catalog_ID, impala_ID identifiers, and S3 bucket name.
Finding the environment_ID, database_catalog_ID, and impala_ID
identifiers
-
In the Data Warehouse service, expand the Environments column by
clicking More….
-
From the Overview page, note down the environment_ID, database_catalog_ID,
and impala_ID identifiers.
Identifying the external bucket name
-
On the Overview page, locate the environment for which
you want to find the external bucket name.
-
In the Environment tile, click the Options menu and select
Edit.
-
A dialog opens that shows the general details of the environment including the
CDW External Bucket name.
This name is required to identify the S3 location of the logs.
Log locations in S3
-
Now that you have identified the S3 bucket name, environment_ID,
database_catalog_ID, and impala_ID identifiers, use the following prefix to find
the logs generated by specific components in the following directories. Use the different directories listed here to view Impala/Hue logs.
PREFIX =
s3://<s3_bucket_name>/clusters/<environment_ID>/<database_catalog_ID>/warehouse/tablespace/external/hive/sys.db/logs/dt=<date_stamp>/ns=<impala_ID>
Impala component |
S3 directory location |
impalad |
PREFIX +
“app=impala-executor-log ” |
catalogd |
PREFIX +
“app=catalogd-log ” |
coordinator |
PREFIX +
“app=coordinator-log ” |
auto-scaler |
PREFIX +
“app=impala-autoscaler-log ” |
Hue |
PREFIX +
“app=huebackend-log ”
PREFIX +
“app=hue-huedb-create-job-log ”
PREFIX +
“app=huefrontend-log ”
|
statestored |
PREFIX +
“app=statestored-log ” |
hs2 (applies only to UA) |
PREFIX +
“app=hiveserver2 ” |
The impalad logs for 8 March 2020 are located in the following S3
location:
s3://<s3_bucket_name>/clusters/<environment_ID>/<database_catalog_ID>/warehouse/tablespace/external/hive/sys.db/logs/dt=2020-03-08/ns=<impala_ID>/app=impala-executor-log/
In the above location, you can find multiple logs that were generated on the
specified day.
Impala Minidumps
-
Impala minidumps can be found under the ‘debug-artifacts/impala’ directory
/clusters/{{environment_ID}}/{{database_catalog_ID}}/warehouse/debug-artifacts/impala/{{impala_ID}}/minidump/$POD_NAME/$file
Impala Query Profiles
-
Impala query profiles are written in thrift encoded format in this location:
Impala component |
S3 directory location |
Impala query profiles |
PREFIX +
“app=impala-profiles ” |
Use the binary tool to decode thrift to text. This binary tool is provided
with the upstream runtime Impala 4.0 as a docker image. Run the following
command to use this tool.
docker run -i apache/impala:4.0.0-impala_profile_tool < name of the thrift encoded file to decode
You can use the docker image available
here to use this decoding tool.