Known Issues in Apache Impala
This topic describes known issues and workarounds for using Impala in this release of Cloudera Runtime.
- Queries stuck on failed HDFS calls and not timing out
- In Impala 3.2 and higher, if the following error appears multiple
times in a short duration while running a query, it would mean that
the connection between the
impaladand the HDFS NameNode is in a bad state.
In Impala 3.1 and lower, the same issue would cause Impala to wait for a long time or hang without showing the above error message.
"hdfsOpenFile() for <filename> at backend <hostname:port> failed to finish before the <hdfs_operation_timeout_sec> second timeout "
- Workaround: Restart the
- Apache JIRA: HADOOP-15720
- Impala should tolerate bad locale settings
LC_*environment variables specify an unsupported locale, Impala does not start.
- Workaround: Add
LC_ALL="C"to the environment settings for both the Impala daemon and the Statestore daemon.
- Apache JIRA: IMPALA-532
- Configuration to prevent crashes caused by thread resource limits
- Impala could encounter a serious error due to resource usage under very high
concurrency. The error message is similar to:
F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory! terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
- Workaround: To prevent such errors, configure each host running an
impalad daemon with the following settings:
Add the following lines in /etc/security/limits.conf:
echo 2000000 > /proc/sys/kernel/threads-max echo 2000000 > /proc/sys/kernel/pid_max echo 8000000 > /proc/sys/vm/max_map_count
impala soft nproc 262144 impala hard nproc 262144
- Apache JIRA: IMPALA-5605
- Avro Scanner fails to parse some schemas
The default value in Avro schema must match type of first union type, e.g. if the
default value is
null, then the first type in the
- Workaround: Swap the order of the fields in the schema specification. For
["null", "string"]instead of
["string", "null"]. Note that the files written with the problematic schema must be rewritten with the new schema because Avro files have embedded schemas.
- Apache JIRA: IMPALA-635
- Process mem limit does not account for the JVM's memory usage
- Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the impalad daemon.
- Workaround: To monitor overall memory usage, use the top command, or add the memory figures in the Impala web UI /memz tab to JVM memory usage shown on the /metrics tab.
- Apache JIRA: IMPALA-691
- Ranger audit logs for applying column masking policies missing
- Impala is not producing these logs.
- Workaround: None.
- Apache JIRA: IMPALA-9350
- Impala BE cannot parse Avro schema that contains a trailing semi-colon
- If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.
- Workaround: Remove trailing semicolon from the Avro schema.
- Apache JIRA: IMPALA-1024
- Incorrect results with basic predicate on CHAR typed column
- When comparing a
CHARcolumn value to a string literal, the literal value is not blank-padded and so the comparison might fail when it should match.
- Workaround: Use the
RPAD()function to blank-pad literals compared with
CHARcolumns to the expected length.
- Apache JIRA: IMPALA-1652
- ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)
- If the ODBC
SQLGetDatais called on a series of columns, the function calls must follow the same order as the columns. For example, if data is fetched from column 2 then column 1, the
SQLGetDatacall for column 1 returns
- Workaround: Fetch columns in the same order they are defined in the table.
- Apache JIRA: IMPALA-1792
- Casting scenarios with invalid/inconsistent results
- Using a
CAST()function to convert large literal values to smaller types, or to convert special values such as
Inf, produces values not consistent with other database systems. This could lead to unexpected results from queries.
- Apache JIRA: IMPALA-1821
- A failed CTAS does not drop the table if the insert fails
- If a
CREATE TABLE AS SELECToperation successfully creates the target table but an error occurs while querying the source table or copying the data, the new table is left behind rather than being dropped.
- Workaround: Drop the new table manually after a failed
CREATE TABLE AS SELECT
- Apache JIRA: IMPALA-2005
- % escaping does not work correctly when occurs at the end in a LIKE clause
- If the final character in the RHS argument of a
LIKEoperator is an escaped
\%character, it does not match a
%final character of the LHS argument.
- Apache JIRA: IMPALA-2422
- Crash: impala::Coordinator::ValidateCollectionSlots
- A query could encounter a serious error if includes multiple nested levels of
INNER JOINclauses involving subqueries.
- Apache JIRA: IMPALA-2603
- Incorrect result due to constant evaluation in query with outer join
- Workaround: An
OUTER JOINquery could omit some expected result rows due to a constant such as
FALSEin another join clause. For example:
explain SELECT 1 FROM alltypestiny a1 INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col; +-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+ | Explain String | +-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+ | Estimated Per-Host Requirements: Memory=1.00KB VCores=1 | | | | 00:EMPTYSET | +-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+
- Apache JIRA: IMPALA-3094
- Breakpad minidumps can be very large when the thread count is high
- The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.
- Workaround: Add -\-minidump_size_limit_hint_kb=size to set a soft upper limit on the size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump file can still grow larger than the "hinted" size. For example, if you have 10,000 threads, the minidump file can be more than 20 MB.
- Apache JIRA: IMPALA-3509
- Impala requires FQDN from hostname command on Kerberized clusters
- The method Impala uses to retrieve the host name while constructing the Kerberos
principal is the
gethostname()system call. This function might not always return the fully qualified domain name, depending on the network configuration. If the daemons cannot determine the FQDN, Impala does not start on a Kerberized cluster.
- Workaround: Test if a host is affected by checking whether the output of the
hostname command includes the FQDN. On hosts where
hostname, only returns the short name, pass the command-line flag
‑‑hostname=fully_qualified_domain_namein the startup options of all Impala-related daemons.
- Apache JIRA: IMPALA-4978
- Metadata operations block read-only operations on unrelated tables
- Metadata operations that change the state of a table, like
ALTER RECOVER PARTITIONS, may delay metadata propagation of unrelated unloaded tables triggered by statements like
- Apache JIRA: IMPALA-6671
- Impala does not support Heimdal Kerberos
- Apache JIRA: IMPALA-7072