Cloudera Impala Known Issues and Workarounds
The following sections describe known issues and workarounds in Impala, and also the major issues fixed in each Impala release.
- Known Issues in the Current Production Release (Impala 1.3.x)
- Issues Fixed in the 1.3.3 Release
- Issues Fixed in the 1.3.2 Release
- Issues Fixed in the 1.3.1 Release
- Issues Fixed in the 1.3.0 Release
- Issues Fixed in the 1.2.4 Release
- Issues Fixed in the 1.2.3 Release
- Issues Fixed in the 1.2.2 Release
- Issues Fixed in the 1.2.1 Release
- Issues Fixed in the 1.2.0 Beta Release
- Issues Fixed in the 1.1.1 Release
- Issues Fixed in the 1.1.0 Release
- Issues Fixed in the 1.0.1 Release
- Issues Fixed in the 1.0 GA Release
- Issues Fixed in Version 0.7 of the Beta Release
- Issues Fixed in Version 0.6 of the Beta Release
- Issues Fixed in Version 0.5 of the Beta Release
- Issues Fixed in Version 0.4 of the Beta Release
- Issues Fixed in Version 0.3 of the Beta Release
- Issues Fixed in Version 0.2 of the Beta Release
Known Issues in the Current Production Release (Impala 1.3.x)
These known issues affect the current release. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and whether a fix is in the pipeline.
- Impala Alternatives must be added back after RPM upgrade
- AsyncTimer runtime profile counter breaks QueryMonitoring
- CatalogServer should not require HBase to be up to reload its metadata
- Excessively long query plan serialization time in FE when querying huge tables
- Impala cannot read data written by using the LazyBinaryColumnarSerDe
- Kerberos tickets must be renewable
- Avro Scanner fails to parse some schemas
- Configuration needed for Flume to be compatible with Impala
- Impala does not support running on clusters with federated namespaces
- Impala INSERT OVERWRITE ... SELECT behavior differs from Hive in that partitions are only deleted/re-written if the SELECT statement returns data.
- Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)
- Deviation from Hive behavior: Impala does not do implicit casts between string and numeric and boolean types.
- If Hue and Impala are installed on the same host, and if you configure Hue Beeswax in CDH 4.1 to execute Impala queries, Beeswax cannot list Hive tables and shows an error on Beeswax startup.
- Impala should tolerate bad locale settings
- Log Level 3 Not Recommended for Impala
Impala Alternatives must be added back after RPM upgrade
After upgrading Impala RPMs, the alternatives symbolic links are deleted. These symlinks configure which configuration files and which executables are used are deleted, and have to be added back before Impala can start.
Severity: High
Workaround: Execute the following commands after upgrading the RPMs, then start the services again:
alternatives --install /etc/impala/conf impala-conf /etc/impala/conf.dist 30 alternatives --install /usr/lib/impala/sbin impala /usr/lib/impala/sbin-retail 20 alternatives --install /usr/lib/impala/sbin impala /usr/lib/impala/sbin-debug 10
If you have installed the impala-udf-devel package, also replace these alternatives:
alternatives --install /usr/lib64/libImpalaUdf.a libImpalaUdf /usr/lib64/libImpalaUdf-retail.a 20 alternatives --install /usr/lib64/libImpalaUdf.a libImpalaUdf /usr/lib64/libImpalaUdf-debug.a 10
On SLES, issue the command update-alternatives rather than alternatives.
If you have previously manually activated one of the symlinks (as opposed to just going with the default priorities), you will need to repeat that selection.
AsyncTimer runtime profile counter breaks QueryMonitoring
Cloudera Manager could be unable to monitor Impala queries, for Impala 1.3.1 on CDH 4. This issue affects Cloudera Manager 4.8.2 or lower.
Bug: IMPALA-977
Severity: High
Resolution: Fixed in Cloudera Manager 4.8.3 and higher. Does not affect Impala and Cloudera Manager on CDH 5.
Workaround: Upgrade to CDH 5.0.1 or higher, or Cloudera Manager 4.8.3 or higher.
CatalogServer should not require HBase to be up to reload its metadata
If HBase is unavailable during Impala startup or after an INVALIDATE METADATA statement, the catalogd daemon could go into an error loop, making Impala unresponsive.
Bug: IMPALA-788
Severity: High
Workaround: For systems not managed by Cloudera Manager, add the following settings to /etc/impala/conf/hbase-site.xml:
<property> <name>hbase.client.retries.number</name> <value>3</value> </property> <property> <name>hbase.rpc.timeout</name> <value>3000</value> </property>
Currently, Cloudera Manager does not have an Impala-only override for HBase settings, so any HBase configuration change you make through Cloudera Manager would take affect for all HBase applications. Therefore, this change is not recommended on systems managed by Cloudera Manager.
Excessively long query plan serialization time in FE when querying huge tables
For tables with many different HDFS data blocks, due to number of files or number of partitions, the overall query time could be slower than necessary because of overhead in analyzing the table metadata.
Bug: IMPALA-958
Severity: High
Impala cannot read data written by using the LazyBinaryColumnarSerDe
The addition of a new Hive SerDe LazyBinaryColumnarSerDe for RCFile data means that RCFile tables created in Hive 0.12 could be unreadable by Impala or Impala queries could return incorrect results. The symptoms of the issue could include unexpected NULL values, error messages about incorrect conversion, or more serious errors due to the unexpected binary data format.
Bug: IMPALA-781
Severity: High
- If you use the CREATE TABLE ... STORED AS RCFILE statement in Impala, you will sidestep this problem. (The Impala CREATE TABLE statement always creates a table with an Impala-compatible SerDe.)
- Most levels of CDH that you would use with Impala come with earlier levels of Hive that do not write these incompatible files. This issue could occur with the CDH 5 beta, which does include Hive 0.12 but keeps the original default SerDe for RCFile tables. This issue is more likely to occur with files created in other Hadoop distributions, which might use this new SerDe by default for RCFiles.
- If you have a problematic Hive table, create one with a similar structure using a file format or an RCFile SerDe that Impala can read, and use Hive to copy the data into the new table. If you create the new table in Hive, use the ColumnarSerDe rather than LazyBinaryColumnarSerDe. If you create the new table with the Impala syntax CREATE TABLE ... STORED AS RCFILE, Impala automatically uses compatible properties for the table.
Kerberos tickets must be renewable
In a Kerberos environment, the impalad daemon might not start if Kerberos tickets are not renewable.
Workaround: Configure your KDC to allow tickets to be renewed, and configure krb5.conf to request renewable tickets.
Avro Scanner fails to parse some schemas
Querying certain Avro tables could cause a crash or return no rows, even though Impala could DESCRIBE the table.
Bug: IMPALA-635
Severity: High
Workaround: Swap the order of the fields in the schema specification. For example, ["null", "string"] instead of ["string", "null"].
Resolution: Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the crashing issue is resolved.
Configuration needed for Flume to be compatible with Impala
For compatibility with Impala, the value for the Flume HDFS Sink hdfs.writeFormat must be set to Text, rather than its default value of Writable. The hdfs.writeFormat setting must be changed to Text before creating data files with Flume; otherwise, those files cannot be read by either Impala or Hive.
Severity: High
Resolution: This information has been requested to be added to the upstream Flume documentation.
Impala does not support running on clusters with federated namespaces
Impala does not support running on clusters with federated namespaces. The impalad process will not start on a node running such a filesystem based on the org.apache.hadoop.fs.viewfs.ViewFs class.
Bug: IMPALA-77
Severity: Undetermined
Anticipated Resolution: Limitation
Workaround: Use standard HDFS on all Impala nodes.
Impala INSERT OVERWRITE ... SELECT behavior differs from Hive in that partitions are only deleted/re-written if the SELECT statement returns data.
Impala INSERT OVERWRITE ... SELECT behavior differs from Hive in that the partitions are only deleted or rewritten if the SELECT statement returns data. Hive always deletes the data.
Bug: IMPALA-89
Severity: Medium
Workaround: None
Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)
Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum allowed value of type (Hive returns NULL).
Severity: Low
Workaround: None
Deviation from Hive behavior: Impala does not do implicit casts between string and numeric and boolean types.
Severity: Low
Anticipated Resolution: None
Workaround: Use explicit casts.
If Hue and Impala are installed on the same host, and if you configure Hue Beeswax in CDH 4.1 to execute Impala queries, Beeswax cannot list Hive tables and shows an error on Beeswax startup.
Hue requires Beeswaxd to be running in order to list the Hive tables. Because of a port conflict bug in Hue in CDH4.1, when Hue and Impala are installed on the same host, an error page is displayed when you start the Beeswax application, and when you open the Tables page in Beeswax.
Severity: High
Anticipated Resolution: Fixed in an upcoming CDH4 release
Workarounds: Choose one of the following workarounds (but only one):
- Install Hue and Impala on different hosts. OR
-
Upgrade to CDH4.1.2 and add the following property in the
beeswax section of the
/etc/hue/hue.ini configuration file:
beeswax_meta_server_only=9004
OR
-
If you are using CDH4.1.1 and you want to install Hue and Impala on the same host, change the code in this
file:
/usr/share/hue/apps/beeswax/src/beeswax/management/commands/beeswax_server.py
Replace line 66:
str(beeswax.conf.BEESWAX_SERVER_PORT.get()),
With this line:
'8004',
Beeswaxd will then use port 8004.
Note: If you used Cloudera Manager to install Impala, refer to the Cloudera Manager release notes for information about using an equivalent workaround by specifying the beeswax_meta_server_only=9004 configuration value in the options field for Hue. In Cloudera Manager 4, these fields are labelled Safety Valve; in Cloudera Manager 5, they are called Advanced Configuration Snippet
Impala should tolerate bad locale settings
If the LC_* environment variables specify an unsupported locale, Impala does not start.
Bug: IMPALA-532
Severity: Low
Workaround: Add LC_ALL="C" to the environment settings for both the Impala daemon and the Statestore daemon. See Modifying Impala Startup Options for details about modifying these environment settings.
Resolution: Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution.
Log Level 3 Not Recommended for Impala
The extensive logging produced by log level 3 can cause serious performance overhead and capacity issues.
Severity: Low
Workaround: Reduce the log level to its default value of 1, that is, GLOG_v=1. See Setting Logging Levels for details about the effects of setting different logging levels.
Issues Fixed in the 1.3.3 Release
Impala 1.3.3 includes fixes to address what is known as the POODLE vulnerability in SSLv3. SSLv3 access is disabled in the Impala debug web UI.
Issues Fixed in the 1.3.2 Release
This backported bug fix is the only change between Impala 1.3.1 and Impala 1.3.2.
Failed DCHECK in disk-io-mgr-reader-context.cc:174
The serious error in the title could occur, with the supplemental message:
num_used_buffers_ < 0: #used=-1 during cancellation HDFS cached data
The issue was due to the use of HDFS caching with data files accessed by Impala. Support for HDFS caching in Impala was introduced in Impala 1.4.0 for CDH 5.1.0. The fix for this issue was backported to Impala 1.3.x, and is the only change in Impala 1.3.2 for CDH 5.0.4.
Bug: IMPALA-1019
Severity: High
Workaround: On CDH 5.0.x, upgrade to CDH 5.0.4 with Impala 1.3.2, where this issue is fixed. In Impala 1.3.0 or 1.3.1 on CDH 5.0.x, do not use HDFS caching for Impala data files in Impala internal or external tables. If some of these data files are cached (for example because they are used by other components that take advantage of HDFS caching), set the query option DISABLE_CACHED_READS=true. To set that option for all Impala queries across all sessions, start impalad with the -default_query_options option and include this setting in the option argument, or on a cluster managed by Cloudera Manager, fill in this option setting on the Impala Daemon options page.
Resolution: This issue is fixed in Impala 1.3.2 for CDH 5.0.4. The addition of HDFS caching support in Impala 1.4 means that this issue does not apply to any new level of Impala on CDH 5.
Issues Fixed in the 1.3.1 Release
This section lists the most significant issues fixed in Impala 1.3.1.
For the full list of fixed issues in Impala 1.3.1, see this report in the JIRA system. Because 1.3.1 is the first 1.3.x release for CDH 4, if you are on CDH 4, also consult Issues Fixed in the 1.3.0 Release.
- Impalad crashes when left joining inline view that has aggregate using distinct
- Incorrect result with group by query with null value in group by data
- Drop Function does not clear local library cache
- Compute stats doesn't propagate underlying error correctly
- Inserts should respect changes in partition location
- Text data with carriage returns generates wrong results for count(*)
- IO Mgr should take instance memory limit into account when creating io buffers
- Impala should provide an option for new sub directories to automatically inherit the permissions of the parent directory
- Illegal state exception (or crash) in query with UNION in inline view
- INSERT column reordering doesn't work with SELECT clause
Impalad crashes when left joining inline view that has aggregate using distinct
Impala could encounter a severe error in a query combining a left outer join with an inline view containing a COUNT(DISTINCT) operation.
Bug: IMPALA-904
Severity: High
Incorrect result with group by query with null value in group by data
If the result of a GROUP BY operation is NULL, the resulting row might be omitted from the result set. This issue depends on the data values and data types in the table.
Bug: IMPALA-901
Severity: High
Drop Function does not clear local library cache
When a UDF is dropped through the DROP FUNCTION statement, and then the UDF is re-created with a new .so library or JAR file, the original version of the UDF is still used when the UDF is called from queries.
Bug: IMPALA-786
Severity: High
Workaround: Restart the impalad daemon on all nodes.
Compute stats doesn't propagate underlying error correctly
If a COMPUTE STATS statement encountered an error, the error message is
Bug: IMPALA-762
Severity: High
Inserts should respect changes in partition location
After an ALTER TABLE statement that changes the LOCATION property of a partition, a subsequent INSERT statement would always use a path derived from the base data directory for the table.
Bug: IMPALA-624
Severity: High
Text data with carriage returns generates wrong results for count(*)
A COUNT(*) operation could return the wrong result for text tables using nul characters (ASCII value 0) as delimiters.
Bug: IMPALA-13
Severity: High
Workaround: Impala adds support for ASCII 0 characters as delimiters through the clause FIELDS TERMINATED BY '\0'.
IO Mgr should take instance memory limit into account when creating io buffers
Impala could allocate more memory than necessary during certain operations.
Bug: IMPALA-488
Severity: High
Workaround: Before issuing a COMPUTE STATS statement for a Parquet table, reduce the number of threads used in that operation by issuing SET NUM_SCANNER_THREADS=2 in impala-shell. Then issue UNSET NUM_SCANNER_THREADS before continuing with queries.
Impala should provide an option for new sub directories to automatically inherit the permissions of the parent directory
When new subdirectories are created underneath a partitioned table by an INSERT statement, previously the new subdirectories always used the default HDFS permissions for the impala user, which might not be suitable for directories intended to be read and written by other components also.
Bug: IMPALA-827
Severity: High
Resolution: In Impala 1.3.1 and higher, you can specify the --insert_inherit_permissions configuration when starting the impalad daemon.
Illegal state exception (or crash) in query with UNION in inline view
Impala could encounter a severe error in a query where the FROM list contains an inline view that includes a UNION. The exact type of the error varies.
Bug: IMPALA-888
Severity: High
INSERT column reordering doesn't work with SELECT clause
The ability to specify a subset of columns in an INSERT statement, with order different than in the target table, was not working as intended.
Bug: IMPALA-945
Severity: High
Issues Fixed in the 1.3.0 Release
This section lists the most significant issues fixed in Impala 1.3.0, primarily issues that could cause wrong results, or cause problems running the COMPUTE STATS statement, which is very important for performance and scalability.
For the full list of fixed issues, see this report in the JIRA system.
- Inner join after right join may produce wrong results
- Incorrect results with codegen on multi-column group by with NULLs.
- Using distinct inside aggregate function may cause incorrect result when using having clause
- Aggregation on union inside (inline) view not distributed properly.
- Wrong expression may be used in aggregate query if there are multiple similar expressions
- Incorrect results when changing the order of aggregates in the select list with codegen enabled
- Union queries give Wrong result in a UNION followed by SIGSEGV in another union
- String data in MR-produced parquet files may be read incorrectly
- Compute stats need to use quotes with identifiers that are Impala keywords
- COMPUTE STATS child queries do not inherit parent query options.
- COMPUTE STATS should update partitions in batches
- Fail early (in analysis) when COMPUTE STATS is run against Avro table with no columns
Inner join after right join may produce wrong results
The automatic join reordering optimization could incorrectly reorder queries with an outer join or semi join followed by an inner join, producing incorrect results.
Bug: IMPALA-860
Severity: High
Workaround: Including the STRAIGHT_JOIN keyword in the query prevented the issue from occurring.
Incorrect results with codegen on multi-column group by with NULLs.
A query with a GROUP BY clause referencing multiple columns could introduce incorrect NULL values in some columns of the result set. The incorrect NULL values could appear in rows where a different GROUP BY column actually did return NULL.
Bug: IMPALA-850
Severity: High
Using distinct inside aggregate function may cause incorrect result when using having clause
A query could return incorrect results if it combined an aggregate function call, a DISTINCT operator, and a HAVING clause, without a GROUP BY clause.
Bug: IMPALA-845
Severity: High
Aggregation on union inside (inline) view not distributed properly.
An aggregation query or a query with ORDER BY and LIMIT could be executed on a single node in some cases, rather than distributed across the cluster. This issue affected queries whose FROM clause referenced an inline view containing a UNION.
Bug: IMPALA-831
Severity: High
Wrong expression may be used in aggregate query if there are multiple similar expressions
If a GROUP BY query referenced the same columns multiple times using different operators, result rows could contain multiple copies of the same expression.
Bug: IMPALA-817
Severity: High
Incorrect results when changing the order of aggregates in the select list with codegen enabled
Referencing the same columns in both a COUNT() and a SUM() call in the same query, or some other combinations of aggregate function calls, could incorrectly return a result of 0 from one of the aggregate functions. This issue affected references to TINYINT and SMALLINT columns, but not INT or BIGINT columns.
Bug: IMPALA-765
Severity: High
Workaround: Setting the query option DISABLE_CODEGEN=TRUE prevented the incorrect results. Switching the order of the function calls could also prevent the issue from occurring.
Union queries give Wrong result in a UNION followed by SIGSEGV in another union
A UNION query could produce a wrong result, followed by a serious error for a subsequent UNION query.
Bug: IMPALA-723
Severity: High
String data in MR-produced parquet files may be read incorrectly
Impala could return incorrect string results when reading uncompressed Parquet data files containing multiple row groups. This issue only affected Parquet data files produced by MapReduce jobs.
Bug: IMPALA-729
Severity: High
Compute stats need to use quotes with identifiers that are Impala keywords
Using a column or table name that conflicted with Impala keywords could prevent running the COMPUTE STATS statement for the table.
Bug: IMPALA-777
Severity: High
COMPUTE STATS child queries do not inherit parent query options.
The COMPUTE STATS statement did not use the setting of the MEM_LIMIT query option in impala-shell, potentially causing problems gathering statistics for wide Parquet tables.
Bug: IMPALA-903
Severity: High
COMPUTE STATS should update partitions in batches
The COMPUTE STATS statement could be slow or encounter a timeout while analyzing a table with many partitions.
Bug: IMPALA-880
Severity: High
Fail early (in analysis) when COMPUTE STATS is run against Avro table with no columns
If the columns for an Avro table were all defined in the TBLPROPERTIES or SERDEPROPERTIES clauses, the COMPUTE STATS statement would fail after completely analyzing the table, potentially causing a long delay. Although the COMPUTE STATS statement still does not work for such tables, now the problem is detected and reported immediately.
Bug: IMPALA-867
Severity: High
Workaround: Re-create the Avro table with columns defined in SQL style, using the output of SHOW CREATE TABLE. (See the JIRA page for detailed steps.)
Issues Fixed in the 1.2.4 Release
This section lists the most significant issues fixed in Impala 1.2.4. For the full list of fixed issues, see this report in the JIRA system.
- The Catalog Server exits with an OOM error after a certain number of CREATE statements
- Catalog Server consumes excessive cpu cycle
- Query against Avro table crashes Impala with codegen enabled
- Statestore seems to send concurrent heartbeats to the same subscriber leading to repeated "Subscriber 'hostname' is registering with statestore, ignoring update" messages
- Join predicate incorrectly ignored
- Query result differing between Impala and Hive
- ArrayIndexOutOfBoundsException / Invalid query handle when reading large HBase cell
- select with distinct and full outer join, impalad coredump
- Impala cannot load tables with more than Short.MAX_VALUE number of partitions
- Various issues with HBase row key specification
The Catalog Server exits with an OOM error after a certain number of CREATE statements
A large number of concurrent CREATE TABLE statements can cause the catalogd process to consume excessive memory, and potentially be killed due to an out-of-memory condition.
Bug: IMPALA-818
Severity: High
Workaround: Restart the catalogd service and re-try the DDL operations that failed.
Catalog Server consumes excessive cpu cycle
A large number of tables and partitions could result in unnecessary CPU overhead during Impala idle time and background operations.
Bug: IMPALA-821
Severity: High
Resolution: Catalog server processing was optimized in several ways.
Query against Avro table crashes Impala with codegen enabled
A query against a TIMESTAMP column in an Avro table could encounter a serious issue.
Bug: IMPALA-828
Severity: High
Workaround: Set the query option DISABLE_CODEGEN=TRUE
Statestore seems to send concurrent heartbeats to the same subscriber leading to repeated "Subscriber 'hostname' is registering with statestore, ignoring update" messages
Impala nodes could produce repeated error messages after recovering from a communication error with the statestore service.
Bug: IMPALA-809
Severity: High
Join predicate incorrectly ignored
A join query could produce wrong results if multiple equality comparisons between the same tables referred to the same column.
Bug: IMPALA-805
Severity: High
Query result differing between Impala and Hive
Certain outer join queries could return wrong results. If one of the tables involved in the join was an inline view, some tests from the WHERE clauses could be applied to the wrong phase of the query.
Severity: High
ArrayIndexOutOfBoundsException / Invalid query handle when reading large HBase cell
An HBase cell could contain a value larger than 32 KB, leading to a serious error when Impala queries that table. The error could occur even if the applicable row is not part of the result set.
Bug: IMPALA-715
Severity: High
Workaround: Use smaller values in the HBase table, or exclude the column containing the large value from the result set.
select with distinct and full outer join, impalad coredump
A query involving a DISTINCT operator combined with a FULL OUTER JOIN could encounter a serious error.
Bug: IMPALA-735
Severity: High
Workaround: Set the query option DISABLE_CODEGEN=TRUE
Impala cannot load tables with more than Short.MAX_VALUE number of partitions
If a table had more than 32,767 partitions, Impala would not recognize the partitions above the 32K limit and query results could be incomplete.
Bug: IMPALA-749
Severity: High
Various issues with HBase row key specification
Queries against HBase tables could fail with an error if the row key was compared to a function return value rather than a string constant. Also, queries against HBase tables could fail if the WHERE clause contained combinations of comparisons that could not possibly match any row key.
Severity: High
Resolution: Queries now return appropriate results when function calls are used in the row key comparison. For queries involving non-existent row keys, such as WHERE row_key IS NULL or where the lower bound is greater than the upper bound, the query succeeds and returns an empty result set.
Issues Fixed in the 1.2.3 Release
This release is a fix release that supercedes Impala 1.2.2, with the same features and fixes as 1.2.2 plus one additional fix for compatibility with Parquet files generated outside of Impala by components such as Hive, Pig, or MapReduce.
Impala cannot read Parquet files with multiple row groups
The parquet-mr library included with CDH4.5 writes files that
are not readable by Impala, due to the presence of multiple row groups.
Queries involving these data files might result in a crash or a failure with an error such as
This issue does not occur for Parquet files produced by Impala INSERT statements, because Impala only produces files with a single row group.
Bug: IMPALA-720
Severity: High
Issues Fixed in the 1.2.2 Release
This section lists the most significant issues fixed in Impala 1.2.2. For the full list of fixed issues, see this report in the JIRA system.
- Order of table references in FROM clause is critical for optimal performance
- Parquet in CDH4.5 writes data files that are sometimes unreadable by Impala
- Deadlock in statestore when unregistering a subscriber and building a topic update
- IllegalStateException when doing a union involving a group by
- Impala Parquet Writer hit DCHECK in RleEncoder
- Hive UDF jars cannot be loaded by the FE
Order of table references in FROM clause is critical for optimal performance
Impala does not currently optimize the join order of queries; instead, it joins tables in the order in which they are listed in the FROM clause. Queries that contain one or more large tables on the right hand side of joins (either an explicit join expressed as a JOIN statement or a join implicit in the list of table references in the FROM clause) may run slowly or crash Impala due to out-of-memory errors. For example:
SELECT ... FROM small_table JOIN large_table
Severity: Medium
Anticipated Resolution: Fixed in Impala 1.2.2.
Workaround: In Impala 1.2.2 and higher, use the COMPUTE STATS statement to gather statistics for each table involved in the join query, after data is loaded. Prior to Impala 1.2.2, modify the query, if possible, to join the largest table first. For example:
SELECT ... FROM small_table JOIN large_table
should be modified to:
SELECT ... FROM large_table JOIN small_table
Parquet in CDH4.5 writes data files that are sometimes unreadable by Impala
Some Parquet files could be generated by other components that Impala could not read.
Bug: IMPALA-694
Severity: High
Resolution: The underlying issue is being addressed by a fix in the CDH Parquet libraries. Impala 1.2.2 works around the problem and reads the existing data files.
Deadlock in statestore when unregistering a subscriber and building a topic update
The statestore service cound experience an internal error leading to a hang.
Bug: IMPALA-699
Severity: High
IllegalStateException when doing a union involving a group by
A UNION query where one side involved a GROUP BY operation could cause a serious error.
Bug: IMPALA-687
Severity: High
Impala Parquet Writer hit DCHECK in RleEncoder
A serious error could occur when doing an INSERT into a Parquet table.
Bug: IMPALA-689
Severity: High
Hive UDF jars cannot be loaded by the FE
If the JAR file for a Java-based Hive UDF was not in the CLASSPATH, the UDF could not be called during a query.
Bug: IMPALA-695
Severity: High
Issues Fixed in the 1.2.1 Release
This section lists the most significant issues fixed in Impala 1.2.1. For the full list of fixed issues, see this report in the JIRA system.
Scanners use too much memory when reading past scan range
While querying a table with long column values, Impala could over-allocate memory leading to an out-of-memory error. This problem was observed most frequently with tables using uncompressed RCFile or text data files.
Bug: IMPALA-525
Severity: High
Resolution: Fixed in 1.2.1
Join node consumes memory way beyond mem-limit
A join query could allocate a temporary work area that was larger than needed, leading to an out-of-memory error. The fix makes Impala return unused memory to the system when the memory limit is reached, avoiding unnecessary memory errors.
Bug: IMPALA-657
Severity: High
Resolution: Fixed in 1.2.1
Excessive memory consumption when query tables with 1k columns (Parquet file)
Impala could encounter an out-of-memory condition setting up work areas for Parquet tables with many columns. The fix reduces the size of the allocated memory when not actually needed to hold table data.
Bug: IMPALA-652
Severity: High
Resolution: Fixed in 1.2.1
Issues Fixed in the 1.2.0 Beta Release
This section lists the most significant issues fixed in Impala 1.2 (beta). For the full list of fixed issues, see this report in the JIRA system.
Issues Fixed in the 1.1.1 Release
This section lists the most significant issues fixed in Impala 1.1.1. For the full list of fixed issues, see this report in the JIRA system.
- Unexpected LLVM Crash When Querying Doubles on CentOS 5.x
- "block size is too big" error with Snappy-compressed RCFile containing null
- Cannot query RC file for table that has more columns than the data file
- Views Sometimes Not Utilizing Partition Pruning
- Update the serde name we write into the metastore for Parquet tables
- Selective queries over large tables produce unnecessary memory consumption
- Impala stopped to query AVRO tables
- Impala continues to allocate more memory even though it has exceed its mem-limit
Unexpected LLVM Crash When Querying Doubles on CentOS 5.x
Certain queries involving DOUBLE columns could fail with a serious error. The fix improves the generation of native machine instructions for certain chipsets.
Bug: IMPALA-477
Severity: High
"block size is too big" error with Snappy-compressed RCFile containing null
Queries could fail with a
Bug: IMPALA-482
Severity: High
Cannot query RC file for table that has more columns than the data file
Queries could fail if an Impala RCFile table was defined with more columns than in the corresponding RCFile data files.
Bug: IMPALA-510
Severity: High
Views Sometimes Not Utilizing Partition Pruning
Certain combinations of clauses in a view definition for a partitioned table could result in inefficient performance and incorrect results.
Bug: IMPALA-495
Severity: High
Update the serde name we write into the metastore for Parquet tables
The SerDes class string written into Parquet data files created by Impala was updated for compatibility with Parquet support in Hive. See Incompatible Changes Introduced in Cloudera Impala 1.1.1 for the steps to update older Parquet data files for Hive compatibility.
Bug: IMPALA-485
Severity: High
Selective queries over large tables produce unnecessary memory consumption
A query returning a small result sets from a large table could tie up memory unnecessarily for the duration of the query.
Bug: IMPALA-534
Severity: High
Impala stopped to query AVRO tables
Queries against Avro tables could fail depending on whether the Avro schema URL was specified in the TBLPROPERTIES or SERDEPROPERTIES field. The fix causes Impala to check both fields for the schema URL.
Bug: IMPALA-538
Severity: High
Impala continues to allocate more memory even though it has exceed its mem-limit
Queries could allocate substantially more memory than specified in the impalad -mem_limit startup option. The fix causes more frequent checking of the limit during query execution.
Bug: IMPALA-520
Severity: High
Issues Fixed in the 1.1.0 Release
This section lists the most significant issues fixed in Impala 1.1. For the full list of fixed issues, see this report in the JIRA system.
- 10-20% perf regression for most queries across all table formats
- planner fails with "Join requires at least one equality predicate between the two tables" when "from" table order does not match "where" join order
- Parquet writer uses excessive memory with partitions
- Comments in impala-shell in interactive mode are not handled properly causing syntax errors or wrong results
- Cancelled queries sometimes aren't removed from the inflight query list
- Impala's 1.0.1 Shell Broke Python 2.4 Compatibility (AttributeError: 'module' object has no attribute 'field_size_limit)
10-20% perf regression for most queries across all table formats
This issue is due to a performance tradeoff between systems running many queries concurrently, and systems running a single query. Systems running only a single query could experience lower performance than in early beta releases. Systems running many queries simultaneously should experience higher performance than in the beta releases.
Severity: High
planner fails with "Join requires at least one equality predicate between the two tables" when "from" table order does not match "where" join order
A query could fail if it involved 3 or more tables and the last join table was specified as a subquery.
Bug: IMPALA-85
Severity: High
Parquet writer uses excessive memory with partitions
INSERT statements against partitioned tables using the Parquet format could use excessive amounts of memory as the number of partitions grew large.
Bug: IMPALA-257
Severity: High
Comments in impala-shell in interactive mode are not handled properly causing syntax errors or wrong results
The impala-shell interpreter did not accept comment entered at the command line, making it problematic to copy and paste from scripts or other code examples.
Bug: IMPALA-192
Severity: Low
Cancelled queries sometimes aren't removed from the inflight query list
The Impala web UI would sometimes display a query as if it were still running, after the query was cancelled.
Bug: IMPALA-364
Severity: High
Impala's 1.0.1 Shell Broke Python 2.4 Compatibility (AttributeError: 'module' object has no attribute 'field_size_limit)
The impala-shell command in Impala 1.0.1 does not work with Python 2.4, which is the default on Red Hat 5.
For the impala-shell command in Impala 1.0, the -o option (pipe output to a file) does not work with Python 2.4.
Bug: IMPALA-396
Severity: High
Issues Fixed in the 1.0.1 Release
This section lists the most significant issues fixed in Impala 1.0.1. For the full list of fixed issues, see this report in the JIRA system.
- Impala parquet scanner can not read all data files generated by other frameworks
- Impala is unable to query RCFile tables which describe fewer columns than the file's header.
- Impala does not correctly substitute _HOST with hostname in --principal
- HBase query missed the last region
- Hbase region changes are not handled correctly
- Query state for successful create table is EXCEPTION
- Double check release of JNI-allocated byte-strings
- Impala returns 0 for bad time values in UNIX_TIMESTAMP, Hive returns NULL
- INSERT INTO TABLE SELECT <constant> does not work.
Impala parquet scanner can not read all data files generated by other frameworks
Impala might issue an erroneous error message when processing a Parquet data file produced by a non-Impala Hadoop component.
Bug: IMPALA-333
Severity: High
Resolution: Fixed
Impala is unable to query RCFile tables which describe fewer columns than the file's header.
If an RCFile table definition had fewer columns than the fields actually in the data files, queries would fail.
Bug: IMPALA-293
Severity: High
Resolution: Fixed
Impala does not correctly substitute _HOST with hostname in --principal
The _HOST placeholder in the --principal startup option was not substituted with the correct hostname, potentially leading to a startup error in setups using Kerberos authentication.
Bug: IMPALA-351
Severity: High
Resolution: Fixed
HBase query missed the last region
A query for an HBase table could omit data from the last region.
Bug: IMPALA-356
Severity: High
Resolution: Fixed
Hbase region changes are not handled correctly
After a region in an HBase table was split or moved, an Impala query might return incomplete or out-of-date results.
Bug: IMPALA-300
Severity: High
Resolution: Fixed
Query state for successful create table is EXCEPTION
After a successful CREATE TABLE statement, the corresponding query state would be incorrectly reported as EXCEPTION.
Bug: IMPALA-349
Severity: High
Resolution: Fixed
Double check release of JNI-allocated byte-strings
Operations involving calls to the Java JNI subsystem (for example, queries on HBase tables) could allocate memory but not release it.
Bug: IMPALA-358
Severity: High
Resolution: Fixed
Impala returns 0 for bad time values in UNIX_TIMESTAMP, Hive returns NULL
Impala returns 0 for bad time values in UNIX_TIMESTAMP, Hive returns NULL.
Impala:
impala> select UNIX_TIMESTAMP('10:02:01') ; impala> 0
Hive:
hive> select UNIX_TIMESTAMP('10:02:01') FROM tmp; hive> NULL
Bug: IMPALA-16
Severity: Low
Anticipated Resolution: Fixed
INSERT INTO TABLE SELECT <constant> does not work.
Insert INTO TABLE SELECT <constant> will not insert any data and may return an error.
Severity: Low
Anticipated Resolution: Fixed
Issues Fixed in the 1.0 GA Release
Here are the major user-visible issues fixed in Impala 1.0. For a full list of fixed issues, see this report in the public issue tracker.
- Undeterministically receive "ERROR: unknown row bach destination..." and "ERROR: Invalid query handle" from impala shell when running union query
- Insert with NULL partition keys results in SIGSEGV.
- INSERT queries don't show completed profiles on the debug webpage
- Impala HBase scan is very slow
- Add some library version validation logic to impalad when loading impala-lzo shared library
- Problems inserting into tables with TIMESTAMP partition columns leading table metadata loading failures and failed dchecks
- Ctrl-C sometimes interrupts shell in system call, rather than cancelling query
- Empty string partition value causes metastore update failure
- Round() does not output the right precision
- Cannot cast string literal to string
- Excessive mem usage for certain queries which are very selective
- HdfsScanNode crashes in UpdateCounters
- Parquet performance issues on large dataset
- impala not populating hive metadata correctly for create table
- impala daemons die if statestore goes down
- Constant SELECT clauses do not work in subqueries
- Right outer Join includes NULLs as well and hence wrong result count
- Parquet scanner hangs for some queries
Undeterministically receive "ERROR: unknown row bach destination..." and "ERROR: Invalid query handle" from impala shell when running union query
A query containing both UNION and LIMIT clauses could intermittently cause the impalad process to halt with a segmentation fault.
Bug: IMPALA-183
Severity: High
Resolution: Fixed
Insert with NULL partition keys results in SIGSEGV.
An INSERT statement specifying a NULL value for one of the partitioning columns could cause the impalad process to halt with a segmentation fault.
Bug: IMPALA-190
Severity: High
Resolution: Fixed
INSERT queries don't show completed profiles on the debug webpage
In the Impala web user interface, the profile page for an INSERT statement showed obsolete information for the statement once it was complete.
Bug: IMPALA-217
Severity: High
Resolution: Fixed
Impala HBase scan is very slow
Queries involving an HBase table could be slower than expected, due to excessive memory usage on the Impala nodes.
Bug: IMPALA-231
Severity: High
Resolution: Fixed
Add some library version validation logic to impalad when loading impala-lzo shared library
No validation was done to check that the impala-lzo shared library was compatible with the version of Impala, possibly leading to a crash when using LZO-compressed text files.
Bug: IMPALA-234
Severity: High
Resolution: Fixed
Workaround: Always upgrade the impala-lzo library at the same time as you upgrade Impala itself.
Problems inserting into tables with TIMESTAMP partition columns leading table metadata loading failures and failed dchecks
INSERT statements for tables partitioned on columns involving datetime types could appear to succeed, but cause errors for subsequent queries on those tables. The problem was especially serious if an improperly formatted timestamp value was specified for the partition key.
Bug: IMPALA-238
Severity: Critical
Resolution: Fixed
Ctrl-C sometimes interrupts shell in system call, rather than cancelling query
Pressing Ctrl-C in the impala-shell interpreter could sometimes display an error and return control to the shell, making it impossible to cancel the query.
Bug: IMPALA-243
Severity: Critical
Resolution: Fixed
Empty string partition value causes metastore update failure
Specifying an empty string or NULL for a partition key in an INSERT statement would fail.
Bug: IMPALA-252
Severity: High
Resolution: Fixed. The behavior for empty partition keys was made more compatible with the corresponding Hive behavior.
Round() does not output the right precision
The round() function did not always return the correct number of significant digits.
Bug: IMPALA-266
Severity: High
Resolution: Fixed
Cannot cast string literal to string
Casting from a string literal back to the same type would cause an
Bug: IMPALA-267
Severity: High
Resolution: Fixed
Excessive mem usage for certain queries which are very selective
Some queries that returned very few rows experienced unnecessary memory usage.
Bug: IMPALA-288
Severity: High
Resolution: Fixed
HdfsScanNode crashes in UpdateCounters
A serious error could occur for relatively small and inexpensive queries.
Bug: IMPALA-289
Severity: High
Resolution: Fixed
Parquet performance issues on large dataset
Certain aggregation queries against Parquet tables were inefficient due to lower than required thread utilization.
Bug: IMPALA-292
Severity: High
Resolution: Fixed
impala not populating hive metadata correctly for create table
The Impala CREATE TABLE command did not fill in the owner and tbl_type columns in the Hive metastore database.
Bug: IMPALA-295
Severity: High
Resolution: Fixed. The metadata was made more Hive-compatible.
impala daemons die if statestore goes down
The impalad instances in a cluster could halt when the statestored process became unavailable.
Bug: IMPALA-312
Severity: High
Resolution: Fixed
Constant SELECT clauses do not work in subqueries
A subquery would fail if the SELECT statement inside it returned a constant value rather than querying a table.
Bug: IMPALA-67
Severity: High
Resolution: Fixed
Right outer Join includes NULLs as well and hence wrong result count
The result set from a right outer join query could include erroneous rows containing NULL values.
Bug: IMPALA-90
Severity: High
Resolution: Fixed
Parquet scanner hangs for some queries
The Parquet scanner non-deterministically hangs when executing some queries.
Bug: IMPALA-204
Severity: Medium
Resolution: Fixed
Issues Fixed in Version 0.7 of the Beta Release
Impala does not gracefully handle unsupported Hive table types (INDEX and VIEW tables)
When attempting to load metadata from an unsupported Hive table type (INDEX and VIEW tables), Impala fails with an unclear error message.
Bug: IMPALA-167
Severity: Low
Resolution: Fixed in 0.7
DDL statements (CREATE/ALTER/DROP TABLE) are not supported in the Impala Beta Release
Severity: Medium
Resolution: Fixed in 0.7
Avro is not supported in the Impala Beta Release
Severity: Medium
Resolution: Fixed in 0.7
Workaround: None
Impala does not currently allow limiting the memory consumption of a single query
It is currently not possible to limit the memory consumption of a single query. All tables on the right hand side of JOIN statements need to be able to fit in memory. If they do not, Impala may crash due to out of memory errors.
Severity: High
Resolution: Fixed in 0.7
Aggregate of a subquery result set returns wrong results if the subquery contains a 'limit' and data is distributed across multiple nodes
Aggregate of a subquery result set returns wrong results if the subquery contains a 'limit' clause and data is distributed across multiple nodes. From the query plan, it looks like we are just summing the results from each slave.
Bug: IMPALA-20
Severity: Low
Resolution: Fixed in 0.7
Partition pruning for arbitrary predicates that are fully bound by a particular partition column
We currently can't utilize a predicate like "country_code in ('DE', 'FR', 'US')" to do partitioning pruning, because that requires an equality predicate or a binary comparison.
We should create a superclass of planner.ValueRange, ValueSet, that can be constructed with an arbitrary predicate, and whose isInRange(analyzer, valueExpr) constructs a literal predicate by substitution of the valueExpr into the predicate.
Bug: IMPALA-144
Severity: Medium
Resolution: Fixed in 0.7
Issues Fixed in Version 0.6 of the Beta Release
Impala reads the NameNode address and port as command line parameters
Impala reads the NameNode address and port as command line parameters rather than reading them from core-site.xml. Updating the NameNode address in the core-site.xml file does not propagate to Impala.
Severity: Low
Resolution: Fixed in 0.6 - Impala reads the namenode location and port from the Hadoop configuration files, though setting -nn and -nn_port overrides this. Users are advised not to set -nn or -nn_port.
Queries may fail on secure environment due to impalad Kerberos ticket expiration
Queries may fail on secure environment due to impalad Kerberos tickets expiring. This can happen if the Impala -kerberos_reinit_interval flag is set to a value ten minutes or less. This may lead to an impalad requesting a ticket with a lifetime that is less than the time to the next ticket renewal.
Bug: IMPALA-64
Severity: Medium
Resolution: Fixed in 0.6
Concurrent queries may fail when Impala uses Thrift to communicate with the Hive Metastore
Concurrent queries may fail when Impala is using Thrift to communicate with part of the Hive Metastore such as the Hive Metastore Service. In such a case, the error get_fields failed: out of sequence response" may occur because Impala shared a single Hive Metastore Client connection across threads. With Impala 0.6, a separate connection is used for each metadata request.
Bug: IMPALA-48
Severity: Low
Resolution: Fixed in 0.6
impalad fails to start if unable to connect to the Hive Metastore
Impala fails to start if it is unable to establish a connection with the Hive Metastore. This behavior was fixed, allowing Impala to start, even when no Metastore is available.
Bug: IMPALA-58
Severity: Low
Resolution: Fixed in 0.6
Impala treats database names as case-sensitive in some contexts
In some queries (including "USE database" statements), database names are treated as case-sensitive. This may lead queries to fail with an IllegalStateException.
Bug: IMPALA-44
Severity: Medium
Resolution: Fixed in 0.6
Impala does not ignore hidden HDFS files
Impala does not ignore hidden HDFS files, meaning those files prefixed with a period '.' or underscore '_'. This diverges from Hive/MapReduce, which skips these files.
Bug: IMPALA-18
Severity: Low
Resolution: Fixed in 0.6
Issues Fixed in Version 0.5 of the Beta Release
Impala may have reduced performance on tables that contain a large number of partitions
Impala may have reduced performance on tables that contain a large number of partitions. This is due to extra overhead reading/parsing the partition metadata.
Severity: High
Resolution: Fixed in 0.5
Backend client connections not getting cached causes an observable latency in secure clusters
Backend impalads do not cache connections to the coordinator. On a secure cluster, this introduces a latency proportional to the number of backend clients involved in query execution, as the cost of establishing a secure connection is much higher than in the non-secure case.
Bug: IMPALA-38
Severity: Medium
Resolution: Fixed in 0.5
Concurrent queries may fail with error: "Table object has not been been initialised : `PARTITIONS`"
Concurrent queries may fail with error: "Table object has not been been initialised : `PARTITIONS`". This was due to a lack of locking in the Impala table/database metadata cache.
Bug: IMPALA-30
Severity: Medium
Resolution: Fixed in 0.5
UNIX_TIMESTAMP format behaviour deviates from Hive when format matches a prefix of the time value
The Impala UNIX_TIMESTAMP(val, format) operation compares the length of format and val and returns NULL if they do not match. Hive instead effectively truncates val to the length of the format parameter.
Bug: IMPALA-15
Severity: Medium
Resolution: Fixed in 0.5
Issues Fixed in Version 0.4 of the Beta Release
Impala fails to refresh the Hive metastore if a Hive temporary configuration file is removed
Impala is impacted by Hive bug HIVE-3596 which may cause metastore refreshes to fail if a Hive temporary configuration file is deleted (normally located at /tmp/hive-<user>-<tmp_number>.xml). Additionally, the impala-shell will incorrectly report that the failed metadata refresh completed successfully.
Severity: Medium
Anticipated Resolution: To be fixed in a future release
Workaround: Restart the impalad service. Use the impalad log to check for metadata refresh errors.
lpad/rpad builtin functions is not correct.
The lpad/rpad builtin functions generate the wrong results.
Severity: Mild
Resolution: Fixed in 0.4
Files with .gz extension reported as 'not supported'
Compressed files with extensions incorrectly generate an exception.
Bug: IMPALA-14
Severity: High
Resolution: Fixed in 0.4
Queries with large limits would hang.
Some queries with large limits were hanging.
Severity: High
Resolution: Fixed in 0.4
Order by on a string column produces incorrect results if there are empty strings
Severity: Low
Resolution: Fixed in 0.4
Issues Fixed in Version 0.3 of the Beta Release
All table loading errors show as unknown table
If Impala is unable to load the metadata for a table for any reason, a subsequent query referring to that table will return an unknown table error message, even if the table is known.
Severity: Mild
Resolution: Fixed in 0.3
A table that cannot be loaded will disappear from SHOW TABLES
After failing to load metadata for a table, Impala removes that table from the list of known tables returned in SHOW TABLES. Subsequent attempts to query the table returns 'unknown table', even if the metadata for that table is fixed.
Severity: Mild
Resolution: Fixed in 0.3
Impala cannot read from HBase tables that are not created as external tables in the hive metastore.
Attempting to select from these tables fails.
Severity: Medium
Resolution: Fixed in 0.3
Certain queries that contain OUTER JOINs may return incorrect results
Queries that contain OUTER JOINs may not return the correct results if there are predicates referencing any of the joined tables in the WHERE clause.
Severity: Medium
Resolution: Fixed in 0.3.
Issues Fixed in Version 0.2 of the Beta Release
Subqueries which contain aggregates cannot be joined with other tables or Impala may crash
Subqueries that contain an aggregate cannot be joined with another table or Impala may crash. For example:
SELECT * FROM (SELECT sum(col1) FROM some_table GROUP BY col1) t1 JOIN other_table ON (...);
Severity: Medium
Resolution: Fixed in 0.2
An insert with a limit that runs as more than one query fragment inserts more rows than the limit.
For example:
INSERT OVERWRITE TABLE test SELECT * FROM test2 LIMIT 1;
Severity: Medium
Resolution: Fixed in 0.2
Query with limit clause might fail.
For example:
SELECT * FROM test2 LIMIT 1;
Severity: Medium
Resolution: Fixed in 0.2
Files in unsupported compression formats are read as plain text.
Attempting to read such files does not generate a diagnostic.
Severity: Medium
Resolution: Fixed in 0.2
Impala server raises a null pointer exception when running an HBase query.
When querying an HBase table whose row-key is string type, the Impala server may raise a null pointer exception.
Severity: Medium
Resolution: Fixed in 0.2
<< Hue Known Issues | Apache Oozie Known Issues >> | |