Known Issues in Apache Impala
Learn about the known issues in Impala, the impact or changes to the functionality, and the workaround.
- Impala known limitation when querying compacted tables
- When the compaction process deletes the files for a table from the underlying HDFS
location, the Impala service does not detect the changes as the compactions does not
allocate new write ids. When the same table is queried from Impala it throws a 'File does
not exist' exception that looks something like
this:
Query Status: Disk I/O error on <node>:22000: Failed to open HDFS file hdfs://nameservice1/warehouse/tablespace/managed/hive/<database>/<table>/xxxxx Error(2): No such file or directory Root cause: RemoteException: File does not exist: /warehouse/tablespace/managed/hive/<database>/<table>/xxxx
- CDPD-28431: Intermittent errors could be potentially encountered when Impala UI is accessed from multiple Knox nodes.
- You must use a single Knox node to access Impala UI.
- Impala api calls via knox require configuration if the knox customized kerberos principal name is a default service user name
- To access impala api calls via knox, if the knox customized kerberos principal name is a default service user name, then configure "authorized_proxy_user_config" by clicking Clusters->impala->configuration. Include the knox customized kerberos principal name in the comma separated list of values <knox_custom_kerberos_principal_name>=*" where <knox_custom_kerberos_principal_name> is the value of the Kerberos Principal in the Knox service. Select Clusters>Knox>Configuration and search for Kerberos Principal to display this value.
- CDPD-21828: Multiple permission assignment through grant is not working
- None
- Problem configuring masking on tables using Ranger
- The following Knowledge Base article describes the behavior when we configure masking on tables using Ranger. This configuration works for Hive, but breaks queries in some scenarios for Impala.
- IMPALA-532: Impala should tolerate bad locale settings
-
If the
LC_*
environment variables specify an unsupported locale, Impala does not start. - IMPALA-5605: Configuration to prevent crashes caused by thread resource limits
- Impala could encounter a serious error due to resource usage under very high
concurrency. The error message is similar to:
F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory! terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
- Avro Scanner fails to parse some schemas
-
The default value in Avro schema must match type of first union type, e.g. if the
default value is
null
, then the first type in theUNION
must be"null"
. - IMPALA-691: Process mem limit does not account for the JVM's memory usage
- Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the impalad daemon.
- IMPALA-9350: Ranger audit logs for applying column masking policies missing
- Impala is not producing these logs.
- IMPALA-1024: Impala BE cannot parse Avro schema that contains a trailing semi-colon
- If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.
- IMPALA-1652: Incorrect results with basic predicate on CHAR typed column
- When comparing a
CHAR
column value to a string literal, the literal value is not blank-padded and so the comparison might fail when it should match. - IMPALA-1821: Casting scenarios with invalid/inconsistent results
- Using a
CAST()
function to convert large literal values to smaller types, or to convert special values such asNaN
orInf
, produces values not consistent with other database systems. This could lead to unexpected results from queries. - IMPALA-2005: A failed CTAS does not drop the table if the insert fails
- If a
CREATE TABLE AS SELECT
operation successfully creates the target table but an error occurs while querying the source table or copying the data, the new table is left behind rather than being dropped. - IMPALA-2422: % escaping does not work correctly when occurs at the end in a LIKE clause
- If the final character in the RHS argument of a
LIKE
operator is an escaped\%
character, it does not match a%
final character of the LHS argument. - IMPALA-2603: Crash: impala::Coordinator::ValidateCollectionSlots
- A query could encounter a serious error if includes multiple nested levels of
INNER JOIN
clauses involving subqueries. - IMPALA-3094: Incorrect result due to constant evaluation in query with outer join
- IMPALA-3509: Breakpad minidumps can be very large when the thread count is high
- The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.
- IMPALA-4978: Impala requires FQDN from hostname command on Kerberized clusters
- The method Impala uses to retrieve the host name while constructing the Kerberos
principal is the
gethostname()
system call. This function might not always return the fully qualified domain name, depending on the network configuration. If the daemons cannot determine the FQDN, Impala does not start on a Kerberized cluster. - IMPALA-7072: Impala does not support Heimdal Kerberos
- None
- OPSAPS-46641: A single parameter exists in Cloudera Manager for specifying the Impala Daemon Load Balancer. Because BDR and Hue need to use different ports when connecting to the load balancer, it is not possible to configure the load balancer value so that BDR and Hue will work correctly in the same cluster.
- The workaround is to use the load balancer configuration either without a port specification, or with the Beeswax port: this will configure BDR. To configure Hue use the "Hue Server Advanced Configuration Snippet (Safety Valve) for impalad_flags" to specify the the load balancer address with the HiveServer2 port.
- IMPALA-6841
- IMPALA-635
Technical Service Bulletins
- TSB 2022-543: Impala query with predicate on analytic function may produce incorrect results
- Apache Impala may produce incorrect results for a query which has all of the following
conditions:
- There are two or more analytic functions (for example,
row_number()
) in an inline view - Some of the functions have partition-by expression while the others do not
- There is a predicate on the inline view's output expression corresponding to the analytic function
- There are two or more analytic functions (for example,
- Knowledge article
- For the latest update on this issue, see the corresponding Knowledge article: TSB 2022-543: Impala query with predicate on analytic function may produce incorrect results
- TSB 2023-632: Apache Impala reads minor compacted tables incorrectly on CDP Private Cloud Base
- The issue occurs when Apache Impala (Impala) reads insert-only Hive ACID tables that
were minor compacted by Apache Hive (Hive).Insert-only ACID table (also known as micro-managed ACID table) is the default table format in Impala in CDP Private Cloud Base 7.1.x and can be identified by having the following table properties:
Minor compactions can be initiated in Hive with the following statement:“transactional”=”true” “transactional_properties”=”insert_only”
A minor compaction differs from a major compaction in compacting only the files created by INSERTs since the last compaction instead of compacting all files in the table.ALTER TABLE <table_name> COMPACT 'minor'
Performing a minor compaction results in creation of delta directories in the table (or partition) folder like
delta_0000001_0000008_v0000564
. These delta directories are not handled correctly by Impala, which can lead to returning different results compared to Hive. This means either missing rows from some data files or duplicating rows from some data files. The exact results depend on whether a major compaction was run on the table and on whether the old files compacted during a minor compaction have been deleted.If the last compaction was a major compaction or if neither a minor nor a major compaction was performed on the table, then the issue does not occur.
Minor compaction is not initiated automatically by Hive Metastore (HMS) or any other CDP (Cloudera Data Platform) component, meaning that this issue can only occur if minor compactions were initiated explicitly by users or scripts.
- Knowledge article
- For the latest update on this issue see the corresponding Knowledge article: TSB 2022-632 Impala reads minor compacted tables incorrectly on CDP Private Cloud Base