Known issues and technical limitations for Impala are addressed in Cloudera Runtime 7.3.2, its service packs, and cumulative hotfixes.
Known issues identified in Cloudera Runtime 7.3.2
CDPD-90807: Thrift protocol limitation during Impala zero downtime upgrade (ZDU)
7.3.1.500 through 7.3.1.706, 7.3.2
Zero Downtime Upgrades (ZDU) for Impala are not supported
when upgrading to version 7.3.2. This is due to Thrift protocol incompatibilities that
can cause queries to fail during upgrade.
None
CDPD-90250: Incorrect file storage location for Impala tables when using S3 as default filesystem
7.3.2
When you created an external table in Impala by using the LOCATION parameter, the files were stored in S3 instead of HDFS. This occurred even if the provided path was intended for HDFS, provided that S3 was configured as the default filesystem (fs.defaultFS) in the cluster. This inconsistency leads to AccessDeniedException errors if the Impala service does not have the necessary Ranger permissions to write to the S3 bucket.
None
IMPALA-14472: Writing arrays to Kudu tables
7.3.2
Impala does not currently support writing array data into
Kudu tables.
Known issues identified before Cloudera Runtime 7.3.2
DWX-20490: Impala queries fail with "Caught exception The read
operation timed out, type=<class 'socket.timeout'> in ExecuteStatement"
7.3.1.500
Queries in impala-shell fail with a socket timeout error in
execute statement which submits the query to the coordinator. The error occurs when
query execution takes longer to start mainly when query planning is slow due to frequent
metadata changes.
Increase the socket timeout on the client side. Set
--client_connect_timeout_ms to a higher value, e.g. add
--client_connect_timeout_ms=600000 to the impala-shell command
line.
DWX-20491: Impala queries fail with EOFException: End of
file reached before reading fully
7.3.1.500
Impala queries fail with an EOFException when
reading from an HDFS file stored in an S3A location. The error occurs when the file is
removed. If the file is removed using SQL commands like DROP
PARTITION, there may be a significant lag in Hive Metastore event
processing. If removed by non-SQL operations, run REFRESH or
INVALIDATE METADATA on the table to resolve the issue.
Run REFRESH/INVALIDATE METADATA
<table>;
CDPD-94720: Impala startup failure due to invalid TLS v1.3 ciphers
7.3.1 and its higher versions
When running Impala on a machine with OpenSSL 1.1.1, providing an invalid or TLS v1.2 ciphersuite in the --tls_ciphersuites startup flag causes the process to fail during startup. While OpenSSL 3.x ignores invalid ciphers, OpenSSL 1.1.1 returns an error if any ciphersuite in the list is invalid, even if other valid TLS v1.3 ciphers are present.
Ensure that the list in the
--tls_ciphersuites startup flag contains only valid TLS v1.3
ciphersuites and does not contain any TLS v1.2 ciphersuites.
IMPALA-532: Impala should tolerate bad locale settings
7.3.1 and its higher versions
If the LC_* environment variables specify an
unsupported locale, Impala does not start.
Add LC_ALL="C" to the environment
settings for both the Impala daemon and the Statestore daemon.
IMPALA-691: Process mem limit does not account for the JVM's
memory usage
7.3.1 and its higher versions
Some memory allocated by the JVM used internally by Impala is
not counted against the memory limit for the impalad daemon.
To monitor overall memory usage, use the top command, or
add the memory figures in the Impala web UI /memz tab to JVM memory usage shown
on the /metrics tab.
IMPALA-635: Avro Scanner fails to parse some schemas
7.3.1 and its higher versions
The default value in Avro schema must match type of first union
type, e.g. if the default value is null, then the first type in the
UNION must be "null".
Swap the order of the fields in the schema specification.
For example, use ["null", "string"] instead of ["string",
"null"]. Note that the files written with the problematic schema must be
rewritten with the new schema because Avro files have embedded schemas.
IMPALA-1024: Impala BE cannot parse Avro schema that contains a
trailing semi-colon
7.3.1 and its higher versions
If an Avro table has a schema definition with a trailing
semicolon, Impala encounters an error when the table is queried.
Remove trailing semicolon from the Avro schema.
IMPALA-1652: Incorrect results with basic predicate on CHAR
typed column
7.3.1 and its higher versions
When comparing a CHAR column value to a string
literal, the literal value is not blank-padded and so the comparison might fail when it
should match.
Use the RPAD() function to blank-pad
literals compared with CHAR columns to the expected length.
IMPALA-1821: Casting scenarios with invalid/inconsistent
results
7.3.1 and its higher versions
Using a CAST() function to convert large
literal values to smaller types, or to convert special values such as
NaN or Inf, produces values not consistent with
other database systems. This could lead to unexpected results from queries.
None
IMPALA-2005: A failed CTAS does not drop the table if the insert
fails
7.3.1 and its higher versions
If a CREATE TABLE AS SELECT operation
successfully creates the target table but an error occurs while querying the source
table or copying the data, the new table is left behind rather than being dropped.
Drop the new table manually after a failed CREATE
TABLE AS SELECT
IMPALA-3509: Breakpad minidumps can be very large when the
thread count is high
7.3.1 and its higher versions
The size of the breakpad minidump files grows linearly with the
number of threads. By default, each thread adds 8 KB to the minidump size. Minidump
files could consume significant disk space when the daemons have a high number of
threads.
Add
-\-minidump_size_limit_hint_kb=size to set a soft upper
limit on the size of each minidump file. If the minidump file would exceed that limit,
Impala reduces the amount of information for each thread from 8 KB to 2 KB. (Full thread
information is captured for the first 20 threads, then 2 KB per thread after that.) The
minidump file can still grow larger than the "hinted" size. For example, if you have
10,000 threads, the minidump file can be more than 20 MB.
IMPALA-4978: Impala requires FQDN from hostname command on
Kerberized clusters
7.3.1 and its higher versions
The method Impala uses to retrieve the host name while
constructing the Kerberos principal is the gethostname() system call.
This function might not always return the fully qualified domain name, depending on the
network configuration. If the daemons cannot determine the FQDN, Impala does not start
on a Kerberized cluster.
Test if a host is affected by checking whether the output
of the hostname command includes the FQDN. On hosts where
hostname, only returns the short name, pass the command-line flag
‑‑hostname=fully_qualified_domain_name in the
startup options of all Impala-related daemons.
IMPALA-6671: Metadata operations block read-only operations on
unrelated tables
7.3.1 and its higher versions
Metadata operations that change the state of a table, like
COMPUTE STATS or ALTER RECOVER PARTITIONS, may delay
metadata propagation of unrelated unloaded tables triggered by statements like
DESCRIBE or SELECT queries.
None
IMPALA-7072: Impala does not support Heimdal Kerberos
None
CDPD-28139: Set spark.hadoop.hive.stats.autogather to false by
default
As an Impala user, if you submit a query against a table containing data ingested
using Spark and you are concerned about the quality of the query plan, you must run
COMPUTE STATS against such a table in any case after an ETL operation because numRows
created by Spark could be incorrect. Also, use other stats computed by COMPUTE STATS,
e.g., Number of Distinct Values (NDV) and NULL count for good selectivity estimates.
For example, when a user ingests data from a file into a partition of an existing
table using Spark, if spark.hadoop.hive.stats.autogather is not set to false explicitly,
numRows associated with this partition would be 0 even though there is at least one row
in the file. To avoid this, the workaround is to set
"spark.hadoop.hive.stats.autogather=false" in the "Spark Client Advanced Configuration
Snippet (Safety Valve) for spark-conf/spark-defaults.conf" in Spark's CM Configuration
section.
IMPALA-2422: % escaping does not work correctly when occurs at
the end in a LIKE clause
7.3.1 and its higher versions
If the final character in the RHS argument of a
LIKE operator is an escaped \% character, it does
not match a % final character of the LHS argument.
A query could encounter a serious error if includes multiple
nested levels of INNER JOIN clauses involving subqueries.
None
CDPD-59625: Impala shell in RHEL 9 with Python 2 as default does
not work
7.1.9, 7.3.1 and its higher version
If you try to run impala-shell on RHEL 9 by setting the default python executable
available in PATH to Python 2, it will fail since RHEL 9 is compatible only with Python
3.
If you run into such issues, set this parameter pointing
to Python 3, IMPALA_PYTHON_EXECUTABLE=python3.
Impala cannot update table if the 'external.table.purge'
property is not set to true
Impala cannot update a table using DDL statements if the 'external.table.purge'
property is FALSE. ALTER TABLE statements return success with no changes to the table.
ALTER TABLE statements should be issued twice if
"external.table.purge" was FALSE initially.
Impala's known limitation when querying compacted tables
When the compaction process deletes the files for a table from the underlying HDFS
location, the Impala service does not detect the changes as the compactions does not
allocate new write ids. When the same table is queried from Impala it throws a 'File
does not exist' exception that looks something like
this:
Query Status: Disk I/O error on <node>:22000: Failed to open HDFS file hdfs://nameservice1/warehouse/tablespace/managed/hive/<database>/<table>/xxxxx
Error(2): No such file or directory Root cause: RemoteException: File does not exist: /warehouse/tablespace/managed/hive/<database>/<table>/xxxx
Use the REFRESH/INVALIDATE statements on the affected table to overcome the 'File does
not exist' exception.
Impala api calls via knox require configuration if the knox
customized kerberos principal name is a default service user name
To access impala api calls via knox, if the knox
customized kerberos principal name is a default service user name, then configure
"authorized_proxy_user_config" by clicking Clusters->impala->configuration.
Include the knox customized kerberos principal name in the comma separated list of
values <knox_custom_kerberos_principal_name>=*" where
<knox_custom_kerberos_principal_name> is the value of the Kerberos Principal in
the Knox service. Select Clusters>Knox>Configuration and search for Kerberos
Principal to display this value.
CDPD-28431: Intermittent errors could be potentially encountered
when Impala UI is accessed from multiple Knox nodes.
7.1.7
You must use a single Knox node to access Impala UI.
CDPD-21828: Multiple permission assignment through grant is not
working
7.1.7
None
IMPALA-11871: INSERT statement does not respect Ranger policies
for HDFS
7.3.1, 7.3.1.300 and its higher version
7.3.1.100, 7.3.1.200
In a cluster with Ranger auth (and with legacy catalog mode), even if you provide RWX
to cm_hdfs -> all-path for the user impala, inserting into a table whose HDFS POSIX
permissions happen to exclude impala access will result in "AnalysisException: Unable
to INSERT into target table (default.t1) because Impala does not have WRITE access to
HDFS location: hdfs://XXXXXXXXXXXX"
OPSAPS-46641: A single parameter exists in Cloudera Manager for specifying the Impala Daemon Load Balancer. Because
BDR and Hue need to use different ports when connecting to the load balancer, it is not
possible to configure the load balancer value so that BDR and Hue will work correctly in
the same cluster.
The workaround is to use the load balancer configuration
either without a port specification, or with the Beeswax port: this will configure BDR.
To configure Hue use the "Hue Server Advanced Configuration Snippet (Safety Valve) for
impalad_flags" to specify the the load balancer address with the HiveServer2 port.
Impala known limitation when querying compacted tables
7.3.1 and its higher versions
When the compaction process deletes the files for a table from the underlying HDFS
location, the Impala service does not detect the changes as the compactions does not
allocate new write ids. When the same table is queried from Impala it throws a 'File
does not exist' exception that looks something like
this:
Query Status: Disk I/O error on <node>:22000: Failed to open HDFS file hdfs://nameservice1/warehouse/tablespace/managed/hive/<database>/<table>/xxxxx
Error(2): No such file or directory Root cause: RemoteException: File does not exist: /warehouse/tablespace/managed/hive/<database>/<table>/xxxx
Use the REFRESH/INVALIDATE statements on the affected table to overcome the 'File does
not exist' exception.
Impala Virtual Warehouses might produce an error when querying
transactional (ACID) tables
Problem: If you are querying transactional (ACID) tables with an
Impala Virtual Warehouse and compaction is run on the compacting Hive Virtual Warehouse,
the query might fail. The compacting process deletes files and the Impala Virtual
Warehouse might not be aware of the deletion. Then when the Impala Virtual Warehouse
attempts to read the deleted file, an error can occur. This situation occurs
randomly.
Run the INVALIDATE METADATA statement on
the transactional (ACID) table to refresh the metadata. This fixes the problem until the
next compaction occurs.
IMPALA-5605: Configuration to prevent crashes caused by thread
resource limits
Impala could encounter a serious error due to resource usage
under very high concurrency. The error message is similar to:
F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
To prevent such errors, configure each host running an
impalad daemon with the following settings:
Add the following lines in
/etc/security/limits.conf:
impala soft nproc 262144
impala hard nproc 262144
IMPALA-9350: Ranger audit logs for applying column masking
policies missing
Impala is not producing these logs.
None
IMPALA-1792: ImpalaODBC: Can not get the value in the
SQLGetData(m-x th column) after the SQLBindCol(m th column)
If the ODBC SQLGetData is called on a series of
columns, the function calls must follow the same order as the columns. For example, if
data is fetched from column 2 then column 1, the SQLGetData call for
column 1 returns NULL.
Fetch columns in the same order they are defined in the
table.