Known Issues in Apache Impala

Learn about the known issues in Impala, the impact or changes to the functionality, and the workaround.

IMPALA-12043: Large catalog info triggers "TTransportException: MaxMessageSize reached"
This is a known issue in 7.1.7 SP2. This release missed overriding the thrift default limit (100MB) with the value from thrift_rpc_max_message_size. This will be fixed in SP2 CHF6.
IMPALA-10343: control_service_queue_mem_limit default is too low for large clusters
This is a known issue in 7.1.7 SP2. Workaround is not available for SP2. This has been fixed by PATCH-5622. This will be fixed in SP2 CHF1.
None
Impala known limitation when querying compacted tables
When the compaction process deletes the files for a table from the underlying HDFS location, the Impala service does not detect the changes as the compactions does not allocate new write ids. When the same table is queried from Impala it throws a 'File does not exist' exception that looks something like this:
Query Status: Disk I/O error on <node>:22000: Failed to open HDFS file hdfs://nameservice1/warehouse/tablespace/managed/hive/<database>/<table>/xxxxx
Error(2): No such file or directory Root cause: RemoteException: File does not exist: /warehouse/tablespace/managed/hive/<database>/<table>/xxxx
Use the REFRESH/INVALIDATE statements on the affected table to overcome the 'File does not exist' exception.
CDPD-28431: Intermittent errors could be potentially encountered when Impala UI is accessed from multiple Knox nodes.
You must use a single Knox node to access Impala UI.
Impala api calls via knox require configuration if the knox customized kerberos principal name is a default service user name
To access impala api calls via knox, if the knox customized kerberos principal name is a default service user name, then configure "authorized_proxy_user_config" by clicking Clusters->impala->configuration. Include the knox customized kerberos principal name in the comma separated list of values <knox_custom_kerberos_principal_name>=*" where <knox_custom_kerberos_principal_name> is the value of the Kerberos Principal in the Knox service. Select Clusters>Knox>Configuration and search for Kerberos Principal to display this value.
Problem configuring masking on tables using Ranger
The following Knowledge Base article describes the behavior when we configure masking on tables using Ranger. This configuration works for Hive, but breaks queries in some scenarios for Impala.
For a workaround, see the following Knowledge Base article: ERROR: "AnalysisException: No matching function with signature: mask(FLOAT)" when Impala jobs fail with the following error with signature: mask(FLOAT)
HADOOP-15720: Queries stuck on failed HDFS calls and not timing out
In Impala 3.2 and higher, if the following error appears multiple times in a short duration while running a query, it would mean that the connection between the impalad and the HDFS NameNode is in a bad state.
"hdfsOpenFile() for <filename> at backend <hostname:port> failed to finish before the <hdfs_operation_timeout_sec> second timeout " 
In Impala 3.1 and lower, the same issue would cause Impala to wait for a long time or not respond without showing the above error message.
Restart the impalad.
IMPALA-532: Impala should tolerate bad locale settings
If the LC_* environment variables specify an unsupported locale, Impala does not start.
Add LC_ALL="C" to the environment settings for both the Impala daemon and the Statestore daemon.
Avro Scanner fails to parse some schemas
The default value in Avro schema must match type of first union type, e.g. if the default value is null, then the first type in the UNION must be "null".
Swap the order of the fields in the schema specification. For example, use ["null", "string"] instead of ["string", "null"]. Note that the files written with the problematic schema must be rewritten with the new schema because Avro files have embedded schemas.
IMPALA-1024: Impala BE cannot parse Avro schema that contains a trailing semi-colon
If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.
Remove trailing semicolon from the Avro schema.
IMPALA-1652: Incorrect results with basic predicate on CHAR typed column
When comparing a CHAR column value to a string literal, the literal value is not blank-padded and so the comparison might fail when it should match.
Use the RPAD() function to blank-pad literals compared with CHAR columns to the expected length.
IMPALA-1821: Casting scenarios with invalid/inconsistent results
Using a CAST() function to convert large literal values to smaller types, or to convert special values such as NaN or Inf, produces values not consistent with other database systems. This could lead to unexpected results from queries.
None
IMPALA-2005: A failed CTAS does not drop the table if the insert fails
If a CREATE TABLE AS SELECT operation successfully creates the target table but an error occurs while querying the source table or copying the data, the new table is left behind rather than being dropped.
Drop the new table manually after a failed CREATE TABLE AS SELECT
IMPALA-2422: % escaping does not work correctly when occurs at the end in a LIKE clause
If the final character in the RHS argument of a LIKE operator is an escaped \% character, it does not match a % final character of the LHS argument.
None
IMPALA-3094: Incorrect result due to constant evaluation in query with outer join
An OUTER JOIN query could omit some expected result rows due to a constant such as FALSE in another join clause. For example:

explain SELECT 1 FROM alltypestiny a1
  INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
  RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
+-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+
| Explain String                                          |
+-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+
| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
|                                                         |
| 00:EMPTYSET                                             |
+-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+

          
IMPALA-4978: Impala requires FQDN from hostname command on Kerberized clusters
The method Impala uses to retrieve the host name while constructing the Kerberos principal is the gethostname() system call. This function might not always return the fully qualified domain name, depending on the network configuration. If the daemons cannot determine the FQDN, Impala does not start on a Kerberized cluster.
Test if a host is affected by checking whether the output of the hostname command includes the FQDN. On hosts where hostname, only returns the short name, pass the command-line flag ‑‑hostname=fully_qualified_domain_name in the startup options of all Impala-related daemons.
OPSAPS-46641: A single parameter exists in Cloudera Manager for specifying the Impala Daemon Load Balancer. Because BDR and Hue need to use different ports when connecting to the load balancer, it is not possible to configure the load balancer value so that BDR and Hue will work correctly in the same cluster.
The workaround is to use the load balancer configuration either without a port specification, or with the Beeswax port: this will configure BDR. To configure Hue use the "Hue Server Advanced Configuration Snippet (Safety Valve) for impalad_flags" to specify the the load balancer address with the HiveServer2 port.
Some of the unresolved issues include:
  • IMPALA-6841
  • HADOOP-15720