Fixed Issues in Apache Impala
The following sections describe the major issues fixed in each Impala release.
For known issues that are currently unresolved, see Apache Impala Known Issues.
Continue reading:
- Issues Fixed in Impala for CDH 5.16.2
- Issues Fixed in Impala for CDH 5.16.1
- Issues Fixed in Impala for CDH 5.15.2
- Issues Fixed in Impala for CDH 5.15.1
- Issues Fixed in Impala for CDH 5.15.0
- Issues Fixed in Impala for CDH 5.14.4
- Issues Fixed in Impala for CDH 5.14.2
- Issues Fixed in Impala for CDH 5.14.0
- Issues Fixed in Impala for CDH 5.13.3
- Issues Fixed in Impala for CDH 5.13.2
- Issues Fixed in Impala for CDH 5.13.1
- Issues Fixed in Impala for CDH 5.13.0
- Issues Fixed in Impala for CDH 5.12.2
- Issues Fixed in Impala for CDH 5.12.1
- Issues Fixed in Impala for CDH 5.12.0
- Issues Fixed in Impala for CDH 5.11.2
- Issues Fixed in Impala for CDH 5.11.1
- Issues Fixed in Impala for CDH 5.11.0
- Issues Fixed in Impala for CDH 5.10.2
- Issues Fixed in Impala for CDH 5.10.1
- Issues Fixed in Impala for CDH 5.10.0
- Issues Fixed in Impala for CDH 5.9.3
- Issues Fixed in Impala for CDH 5.9.2
- Issues Fixed in Impala for CDH 5.9.1
- Issues Fixed in Impala for CDH 5.9.0
- Issues Fixed in Impala for CDH 5.8.5
- Issues Fixed in Impala for CDH 5.8.4
- Issues Fixed in Impala for CDH 5.8.3
- Issues Fixed in Impala for CDH 5.8.2
- Issues Fixed in Impala for CDH 5.8.0
- Issues Fixed in Impala for CDH 5.7.6
- Issues Fixed in Impala for CDH 5.7.5
- Issues Fixed in Impala for CDH 5.7.4
- Issues Fixed in Impala for CDH 5.7.2
- Issues Fixed in Impala for CDH 5.7.1
- Issues Fixed in Impala for CDH 5.7.0
- Issues Fixed in Impala for CDH 5.6.1
- Issues Fixed in Impala for CDH 5.6.0
- Issues Fixed in Impala for CDH 5.5.6
- Issues Fixed in Impala for CDH 5.5.4
- Issues Fixed in Impala for CDH 5.5.2
- Issues Fixed in Impala for CDH 5.5.1
- Issues Fixed in Impala for CDH 5.5.0
- Issues Fixed in Impala for CDH 5.4.10
- Issues Fixed in Impala for CDH 5.4.9
- Issues Fixed in Impala for CDH 5.4.8
- Issues Fixed in Impala for CDH 5.4.7
- Issues Fixed in Impala for CDH 5.4.5
- Issues Fixed in Impala for CDH 5.4.3
- Issues Fixed in Impala for CDH 5.4.1
- Issues Fixed in CDH 5.4 / Impala 2.2
- Issues Fixed in Impala for CDH 5.3.10
- Issues Fixed in the 2.1.7 Release / CDH 5.3.9
- Issues Fixed in the 2.1.6 Release / CDH 5.3.8
- Issues Fixed in the 2.1.5 Release / CDH 5.3.6
- Issues Fixed in the 2.1.4 Release / CDH 5.3.4
- Issues Fixed in the 2.1.3 Release / CDH 5.3.3
- Issues Fixed in the 2.1.2 Release / CDH 5.3.2
- Issues Fixed in the 2.1.1 Release / CDH 5.3.1
- Issues Fixed in the 2.1.0 Release / CDH 5.3.0
- Issues Fixed in the 2.0.5 Release / CDH 5.2.6
- Issues Fixed in the 2.0.4 Release / CDH 5.2.5
- Issues Fixed in the 2.0.3 Release / CDH 5.2.4
- Issues Fixed in the 2.0.2 Release / CDH 5.2.3
- Issues Fixed in the 2.0.1 Release / CDH 5.2.1
- Issues Fixed in the 2.0.0 Release / CDH 5.2.0
- Issues Fixed in the 1.4.4 Release / CDH 5.1.5
- Issues Fixed in the 1.4.3 Release / CDH 5.1.4
- Issues Fixed in the 1.4.2 Release / CDH 5.1.3
- Issues Fixed in the 1.4.1 Release / CDH 5.1.2
- Issues Fixed in the 1.4.0 Release / CDH 5.1.0
- Issues Fixed in the 1.3.3 Release / CDH 5.0.5
- Issues Fixed in the 1.3.2 Release / CDH 5.0.4
- Issues Fixed in the 1.3.1 Release / CDH 5.0.3
- Issues Fixed in the 1.3.0 Release / CDH 5.0.0
- Issues Fixed in the 1.2.4 Release
- Issues Fixed in the 1.2.3 Release
- Issues Fixed in the 1.2.2 Release
- Issues Fixed in the 1.2.1 Release
- Issues Fixed in the 1.2.0 Beta Release
- Issues Fixed in the 1.1.1 Release
- Issues Fixed in the 1.1.0 Release
- Issues Fixed in the 1.0.1 Release
- Issues Fixed in the 1.0 GA Release
- Issues Fixed in Version 0.7 of the Beta Release
- Issues Fixed in Version 0.6 of the Beta Release
- Issues Fixed in Version 0.5 of the Beta Release
- Issues Fixed in Version 0.4 of the Beta Release
- Issues Fixed in Version 0.3 of the Beta Release
- Issues Fixed in Version 0.2 of the Beta Release
Issues Fixed in Impala for CDH 5.16.2
For the full list of fixed issues for all CDH components in CDH 5.16.2, see Issues Fixed in CDH 5.16.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-6323 - Impala now supports a constant in the window specifications.
- IMPALA-7960 - Impala now returns a correct result when comparing TIMESTAMP to a string literal in a binary predicate where the TIMESTAMP is casted to VARCHAR of smaller length.
- IMPALA-7961 - Fixed an issue where queries running with the SYNC_DDL query option can fail when the Catalog Server is under a heavy load with concurrent catalog operations of long-running DDLs.
- IMPALA-8058 - Fixed cardinality estimates for HBase queries, which could sometimes yield hugely high numbers.
- IMPALA-8109 - Impala can now read the gzip files bigger than 2 GB.
- IMPALA-8212 - Fixed a race condition in the Kerberos authentication code.
Issues Fixed in Impala for CDH 5.16.1
For the full list of fixed issues for all CDH components in CDH 5.16.1, see Issues Fixed in CDH 5.16.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-6086 - Require the SELECT privilege on the database for built-in function calls.
- IMPALA-6451 - Fixed the AuthorizationException in CTAS for Kudu tables.
- IMPALA-6479 - DESCRIBE now respects column level privileges and only shows the columns that the user has the privilege to view.
- IMPALA-6571 - Fixed the NullPointerException in SHOW CREATE TABLE for HBase tables.
- IMPALA-7225 - REFRESH..PARTITION no longer reset the number of rows in a partition.
- IMPALA-7272 - Fixed the crash in StringMinMaxFilter.
- IMPALA-7360 - Fixed an issue where Avro scanner sometimes skipped blocks when skip marker was on HDFS block boundary.
- IMPALA-7419 - Fixed the NullPointerException in SimplifyConditionalsRule.
- IMPALA-7483 - impalad/catalogd on JVM deadlock now get aborted.
- IMPALA-7520 - Fixed the NullPointerException in SentryProxy.
Issues Fixed in Impala for CDH 5.15.2
For the full list of fixed issues for all CDH components in CDH 5.15.2, see Issues Fixed in CDH 5.15.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-6907 - Now Impala correctly closes all stale connections to removed impala cluster members.
- IMPALA-7225 - Fixed an issue where the REFRESH...PARTITION statement caused statistics for the refreshed partition to be automatically reset to -1 (unknown) . With the fix, statistics will be changed only if an explicit COMPUTE STATS statement is issued for an object.
- IMPALA-7272 - Fixed a crash caused by a memory management problem when the query execution requires finding strings inside a range defined by the lesser-than and greater-than comparisons.
- IMPALA-7360 - Fixed an issue where Impala could incorrectly skip data if a record separator in a sequence-based file (Avro, RC or sequence file) straddled an HDFS block boundary.
- IMPALA-7537 - Fixed a security issue where REVOKE ALL ON SERVER did not have a permanent effect if the ALL permission was granted using the WITH GRANT option. Running INVALIDATE METADATA no longer causes the permission to reappear.
- IMPALA-7585 - Fixed an issue in KRPC, which can cause slow or hung queries on non-secured clusters.
Issues Fixed in Impala for CDH 5.15.1
For the full list of fixed issues for all CDH components in CDH 5.15.1, see Issues Fixed in CDH 5.15.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-6687 - Fixed INSERT with mixed case in partition column names.
- IMPALA-6822 - Added a query option to control shuffling by distinct expressions.
- IMPALA-6847 - Added a work around high memory estimates for admission control.
- IMPALA-6908 - IsConnResetTException() should include ECONNRESET.
- IMPALA-6934 - Corrected the wrong results with EXISTS subqueries that contain ORDER BY, LIMIT, and OFFSET.
- IMPALA-7014 - Disabled the stacktrace symbolisation by default.
- IMPALA-7078 - Reduced the queue size based on num_scanner_threads.
- IMPALA-7078 - Improved memory consumption of wide Avro scans.
- IMPALA-7288 - Fixed the Codegen crash in FinalizeModule.
- IMPALA-7298 - Impala no longer passes IP address as hostname in Kerberos principal.
Issues Fixed in Impala for CDH 5.15.0
For the full list of fixed issues for all CDH components in CDH 5.15.0, see Issues Fixed in CDH 5.15.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-4315 - Allow USE and SHOW TABLES if there is at least one table in a database where the user has table or column privileges.
- IMPALA-4323 - The SET ROW FORMAT clause was added to the ALTER TABLE statement for the TEXT or SEQUENCE file formats.
- IMPALA-4886 - Table metrics are available in the catalog web UI.
- IMPALA-5654 - Disallows explicitly setting the Kudu table name property for managed Kudu tables in a CREATE TABLE and ALTER TABLE statements, e.g. CREATE TABLE t (i INT) STORED AS KUDU TBLPROPERTIES('kudu.table_name'='some_name').
- IMPALA-6549 - The file handle cache, controlled by the max_cached_file_handles flag , is enabled by default.
Issues Fixed in Impala for CDH 5.14.4
For the full list of fixed issues for all CDH components in CDH 5.14.4, see Issues Fixed in CDH 5.14.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-4315 - Users with the column-level privileges can run the USE and SHOW TABLES statements.
- IMPALA-5270 - Impala now passes fully resolved sort expressions to prevent crashes with inline view and analytic functions.
- IMPALA-6500 - Fixed an issue where Impala crashed under certain hypervisors that returned out-of-range CPU IDs.
- IMPALA-6687 - The INSERT statements with mixed case partition column name no longer fails.
- IMPALA-6822 - A new query option, SHUFFLE_DISTINCT_EXPRS, is available for disabling shuffling by distinct expressions.
- IMPALA-6908 - The ECONNRESET error code handling was added to the IsConnReset function that checks if a given exception is due to a stale connection.
- IMPALA-6934 - Correct results returns from the EXISTS subquery containing ORDER BY, LIMIT, and OFFSET
- IMPALA-6954 - Fixed a problem with expression rewrites that caused losses of partitions with CTAS into Kudu.
- IMPALA-7032 - Codegen for CHAR type null literals are disabled to prevent a crash.
Issues Fixed in Impala for CDH 5.14.2
For the full list of fixed issues for all CDH components in CDH 5.14.2, see Issues Fixed in CDH 5.14.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-3942 - Fixed incorrectly escaped string literal in front-end.
- IMPALA-4664 - Fixed an unexpected string conversion in Shell.
- IMPALA-5139 - Updated mvn-quiet.sh to print execution content to log file.
- IMPALA-5269 - Fix the issue with final line of query followed by a comment.
- IMPALA-5936 - The '%' operator no longer overflows on large decimals.
- IMPALA-5940 - Impala no longer shows stack tracing with Status::Expected().
- IMPALA-6008 - Creating a UDF from a shared library with a .ll extension does not crash Impala.
- IMPALA-6114 - Require type equality for NumericLiteral::localEquals().
- IMPALA-6284 - Mark the intermediate decimal avg struct as packed.
- IMPALA-6295 - Fix mix/max handling of 'nan' and 'inf'.
- IMPALA-6307 - CTAS statement fails with duplicate column exception.
- IMPALA-6348 - Redact only sensitive fields in runtime profiles.
- IMPALA-6353 - Impala no longer crashes in snappy decompressor.
- IMPALA-6384 - RequestPoolService should honor custom group mapping config.
- IMPALA-6388 - Fix the Union node number of hosts estimation
- IMPALA-6418 - Find a reliable way to detect supported TLS versions.
- IMPALA-6435 - Disabled codegen for CHAR literals.
- IMPALA-6454 - CTAS into Kudu fails with mixed-case partition or primary key column names.
- IMPALA-6473 - An analytic sorts with the same expression in PARTITION BY and ORDER BY does not return an error.
- IMPALA-6489 - Impala now uses the correct template tuple size.
- IMPALA-6530 - Impala now tracks time spent opening HDFS file handles.
- IMPALA-6538 - Fixed read path when Parquet min/max statistics contain NaN.
- IMPALA-6542 - Fixed inconsistent write path of Parquet min/max statistics.
- IMPALA-6570 - Support KUDU on SLES12.
- IMPALA-6577 - Avoid slow Status constructor on expiration thread.
Issues Fixed in Impala for CDH 5.14.0
For the full list of fixed issues for all CDH components in CDH 5.14, see Issues Fixed in CDH 5.14.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-467 - enable disabled metric verification
- IMPALA-992 - [DOCS] Document impala-shell 'rerun' command
- IMPALA-1291 - Parquet read fails if io buffer size is less than the footer size
- IMPALA-1575 - part 2: yield admission control resources
- IMPALA-1575 - part 2: yield admission control resources
- IMPALA-1575 - Part 1: eagerly release query exec resources
- IMPALA-1767 - Adds predicate to test boolean values true, false, unknown.
- IMPALA-2107 - [DOCS] Document base64*code() functions
- IMPALA-2181 - Add query option levels for display
- IMPALA-2190 - [DOCS] from_timestamp() and to_timestamp()
- IMPALA-2234 - remove workaround from stress test
- IMPALA-2235 - Fix current db when shell auto-reconnects
- IMPALA-2250 - Make multiple COUNT(DISTINCT) message state workarounds
- IMPALA-2281 - Replace FNV with FastHash in exchange nodes
- IMPALA-2494 - Support for byte array encoded decimals in Parquet scanner
- IMPALA-2636 - HS2 GetTables() returns TABLE_TYPE as TABLE for VIEW
- IMPALA-2758 - Remove BufferedTupleStream::GetRows
- IMPALA-2810 - Remove column stats restoration when altering table
- IMPALA-3200 - [DOCS] Document user-facing aspects of new buffer pool
- IMPALA-3360 - Codegen inserting into runtime filters
- IMPALA-3437 - DECIMAL_V2: avoid implicit decimal->double conversion
- IMPALA-3504 - [DOCS] Document utc_timestamp()
- IMPALA-3516 - Avoid writing to /tmp in testing
- IMPALA-3548 - Prune runtime filters based on query options in the FE
- IMPALA-3613 - Avoid topic updates to unregistered subscriber instances
- IMPALA-3642 - Adding backend addresses to error statuses for some scratch failures.
- IMPALA-3877 - support unpatched LLVM
- IMPALA-3897 - Codegen null-aware constant in PHJ::ProcessBuildBatch()
- IMPALA-3998 - deprecate --refresh_after_connect
- IMPALA-4082 - Remove todo item in getRegionsInRange
- IMPALA-4177 - IMPALA-6039: batched bit reading and rle decoding
- IMPALA-4236 - Codegen CopyRows() for select nodes
- IMPALA-4252 - Min-max runtime filters for Kudu
- IMPALA-4252 - Move runtime filters to ScanNode
- IMPALA-4513 - Promote integer types for ABS()
- IMPALA-4524 - Batch ALTER TABLE...ADD PARTITION calls.
- IMPALA-4591 - Bound Kudu client error mem usage
- IMPALA-4620 - Refactor evalcost computation in query analysis
- IMPALA-4622 - [DOCS] New Kudu ALTER TABLE syntax
- IMPALA-4623 - [DOCS] Document file handle caching
- IMPALA-4669 - [KRPC] Add kudu_rpc library to build
- IMPALA-4670 - Introduces RpcMgr class
- IMPALA-4704 - Turns on client connections when local catalog initialized.
- IMPALA-4736 - Add SIGUSR1 behavior to help string for 'minidump_path' flag
- IMPALA-4786 - Clean up how ImpalaServers are created
- IMPALA-4847 - Simplify HdfsTable block metadata loading code
- IMPALA-4856 - Port data stream service to KRPC
- IMPALA-4856 - IMPALA-4872: Include KRPC services in plan fragment's destinations
- IMPALA-4856 - Rename thrift-deps to gen-deps
- IMPALA-4863 - IMPALA-5311: Correctly account the file type and compression codec
- IMPALA-4918 - Support getting column comments via HS2
- IMPALA-4927 - Impala should handle invalid input from Sentry
- IMPALA-4939 - IMPALA-4940: Decimal V2 multiplication
- IMPALA-4964 - Fix Decimal modulo overflow
- IMPALA-4985 - use parquet stats of nested types for dynamic pruning
- IMPALA-5018 - Error on decimal modulo or divide by zero
- IMPALA-5019 - Decimal V2 addition
- IMPALA-5129 - Use Kudu's Kinit code to avoid expensive fork
- IMPALA-5142 - EventSequence displays negative elapsed time.
- IMPALA-5174 - Suppress kudu flags that aren't relevant to Impala
- IMPALA-5199 - prevent hang on empty row batch exchange
- IMPALA-5210 - Count rows and collection items in parquet scanner separately
- IMPALA-5211 - Simplifying nullif conditional.
- IMPALA-5211 - Simplifying ifnull/isnull/nvl where conditional is a literal.
- IMPALA-5243 - Speed up code gen for wide Avro tables.
- IMPALA-5250 - Unify decompressor output_length semantics
- IMPALA-5307 - Part 4: copy out uncompressed text and seq
- IMPALA-5307 - Part 2: copy out strings in uncompressed Avro
- IMPALA-5307 - part 1: don't transfer disk I/O buffers out of parquet
- IMPALA-5307 - Part 3: remove TODO from RCFile
- IMPALA-5317 - add DATE_TRUNC() function
- IMPALA-5341 - Avoid unintended filter out in fe test
- IMPALA-5376 - Implement all TPCDS test cases or alternates for Impala.
- IMPALA-5383 - [DOCS] Document unpartitioned Kudu tables
- IMPALA-5394 - Change ThriftServer() to always use TAcceptQueueServer
- IMPALA-5416 - Fix an impala-shell command recursion bug
- IMPALA-5417 - make I/O buffer queue fixed-size
- IMPALA-5425 - Add test for validating input when setting query options
- IMPALA-5429 - Multi threaded block metadata loading
- IMPALA-5448 - fix invalid number of splits reported in Parquet scan node
- IMPALA-5501 - [DOCS] Clarify 'binaries' w.r.t. installation procedure
- IMPALA-5525 - Extend TestScannersFuzzing to test uncompressed parquet
- IMPALA-5529 - [DOCS] New trunc() signatures
- IMPALA-5538 - Revert "Use explicit catalog versions for deleted objects"
- IMPALA-5538 - Use explicit catalog versions for deleted objects
- IMPALA-5541 - Reject BATCH_SIZE greater than 65536
- IMPALA-5589 - Re-apply:change "set" in impala-shell to show empty string for unset query options
- IMPALA-5589 - Revert "change "set" in impala-shell to show empty string for unset query options"
- IMPALA-5589 - change "set" in impala-shell to show empty string for unset query options
- IMPALA-5599 - Clean up references to TimestampValue in be/src.
- IMPALA-5599 - Fix for mis-use of TimestampValue
- IMPALA-5624 - Replace "ls -l" with opendir() in ProcessStateInfo
- IMPALA-5638 - [DOCS] Add known issue for Impala-Kudu-Sentry issue
- IMPALA-5653 - Remove "unlimited" process mem_limit option
- IMPALA-5664 - Unix time to timestamp conversions may crash Impala
- IMPALA-5668 - Fix cast(X as timestamp) for negative subsecond Decimals
- IMPALA-5736 - Add impala-shell argument to set default query options
- IMPALA-5789 - Add always_false flag in bloom filter
- IMPALA-5836 - Improvements to Eclipse frontend configuration.
- IMPALA-5844 - use a MemPool for expr result allocations
- IMPALA-5846 - Fix output path for kudu libraries
- IMPALA-5854 - Revert "Update external hadoop versions"
- IMPALA-5854 - Update external hadoop versions
- IMPALA-5860 - upgrade to LLVM 3.9.1
- IMPALA-5870 - Improve runtime profile for partial sort
- IMPALA-5873 - Check for existence of sync_file_range()
- IMPALA-5894 - [DOCS] Clarify placement of STRAIGHT_JOIN hint
- IMPALA-5895 - clean up runtime profile lifecycle
- IMPALA-5902 - add ThreadSanitizer build
- IMPALA-5905 - build-all-flag-combinations addendum
- IMPALA-5905 - add script for all-build-options job
- IMPALA-5908 - Allow SET to unset modified query options.
- IMPALA-5912 - fix crash in trunc(..., "WW") in release build
- IMPALA-5920 - addendum - add missing RAT check
- IMPALA-5920 - Remove admission control dependency on YARN RM jar
- IMPALA-5926 - Avoid printing expensive stack when closing a session
- IMPALA-5927 - Fix enable_distcc for zsh
- IMPALA-5932 - Improve transitive closure computation performance in FE
- IMPALA-5940 - Avoid log spew by using Status::Expected()
- IMPALA-5940 - Avoid log spew by using Status::Expected.
- IMPALA-5941 - Fix Metastore schema creation in create-test-configuration.sh
- IMPALA-5957 - print memory address, not memory
- IMPALA-5965 - avoid per-value switch on NeedsConversionInline() in parquet
- IMPALA-5966 - Fix the result file location of PlannerTest
- IMPALA-5975 - Work around broken beeline clients
- IMPALA-5976 - Remove equivalence class computation in FE
- IMPALA-5986 - Correct set-option logic to recognize digits in names.
- IMPALA-5987 - LZ4 Codec silently produces bogus compressed data for large inputs
- IMPALA-5988 - optimise MemPool::TryAllocate()
- IMPALA-5999 - Fix LLVM linkage errors due LibCache sync issues
- IMPALA-6001 - Part 1: Increase log level during catalog update processing
- IMPALA-6002 - Add a LLVM diagnostic handler for LLVM linker errors
- IMPALA-6004 - Fix test_row_filters failure on ASAN
- IMPALA-6009 - Upgrade Guava to 14.0.1
- IMPALA-6012 - workaround - downgrade hive temporarily
- IMPALA-6016 - Fix logging in TableLoadingMgr class
- IMPALA-6019 - Remove dead code parallel-executor*
- IMPALA-6021 - Revert "IMPALA-6009: Upgrade Guava to 14.0.1"
- IMPALA-6027 - Retry downloading toolchain components.
- IMPALA-6030 - Don't start coordinator specific thread pools if a node isn't a coordinator node
- IMPALA-6040 - skip test_multi_compression_types where Hive isn't supported
- IMPALA-6053 - Fix exception when storadeIds don't match hosts
- IMPALA-6054 - Parquet dictionary pages should be freed on dictionary construction
- IMPALA-6055 - Fix hdfs encryption test far Hadoop 2.8+
- IMPALA-6060 - Check the return value of JNI exception handling functions
- IMPALA-6067 - Enable S3 access via IAM roles for EC2 VMs
- IMPALA-6068 - Revert "Fix dataload for complextypes_fileformat"
- IMPALA-6068 - Fix dataload for complextypes_fileformat
- IMPALA-6069 - Fix CodegenAnyVal's handling of 'nan'
- IMPALA-6076 - Parquet BIT_PACKED deprecation warning
- IMPALA-6080 - clean up table descriptor handling
- IMPALA-6081 - Fix test_basic_filters runtime profile failure
- IMPALA-6084 - Avoid using of global namespace for llvm
- IMPALA-6099 - Fix crash in CheckForAlwaysFalse()
- IMPALA-6100 - increase test_exchange_delays timeout on slow builds
- IMPALA-6105 - Clarify argument order in single_node_perf_run
- IMPALA-6106 - handle comments before set in test parser
- IMPALA-6108 - IMPALA-6070: Parallel data load (re-instated).
- IMPALA-6108 - Revert "IMPALA-6070: Parallel data load."
- IMPALA-6109 - xfail TestHdfsUnknownErrors::test_hdfs_safe_mode_error_255
- IMPALA-6118 - Fix assertion failure in coordinator bloom filter updating
- IMPALA-6121 - remove I/O mgr request context cache
- IMPALA-6123 - Fix column order of a query test in test_inline_view_limit
- IMPALA-6124 - Fix alter table ddl updates and test
- IMPALA-6126 - ASAN detects heap-use-after-free in thrift-server-test
- IMPALA-6127 - Fix timeout in TestRuntimeFilter.test_wait_time
- IMPALA-6128 - Spill-to-disk Encryption(AES-CFB + SHA256) is slow
- IMPALA-6134 - Update code base to use impala::ConditionVariable
- IMPALA-6136 - Part 1: Query duration should not be normally negative.
- IMPALA-6137 - fix text scanner split delim mem mgmt
- IMPALA-6144 - PublishFilter() continues to run after query failure/cancellation
- IMPALA-6148 - Specifying thirdparty deps as URLs
- IMPALA-6151 - add query-level fragment/backend counters
- IMPALA-6155 - Allow tests to pass when ORDER BY does not cover the query.
- IMPALA-6164 - Fix stale query profile in TestAlwaysFalseFilter
- IMPALA-6171 - Revert "IMPALA-1575: part 2: yield admission control resources"
- IMPALA-6173 - Fix SHOW CREATE TABLE for unpartitioned Kudu tables
- IMPALA-6183 - Fix Decimal to Double conversion
- IMPALA-6184 - Clean up after ScalarExprEvaluator::Clone() fails
- IMPALA-6187 - Fix missing conjuncts evaluation with empty projection
- IMPALA-6188 - make test_top_n_reclaim less flaky
- IMPALA-6198 - marks a test as debug-only
- IMPALA-6201 - Fix test_basic_filters on ASAN
- IMPALA-6206 - Fix data load failure with -notests
- IMPALA-6217 - fix DCHECK in Parquet fuzz test
- IMPALA-6220 - Revert "IMPALA-6128: Spill-to-disk Encryption(AES-CFB + SHA256) is slow"
- IMPALA-6225 - Part 1: Query profile date-time strings should have ns precision.
- IMPALA-6232 - Disable file handle cache by default
- IMPALA-6241 - timeout in admission control test under ASAN
- IMPALA-6256 - Incorrect principal will be used for internal connections if FLAGS_be_principal is set
- IMPALA-6262 - Always initialize runtime profile for DataSink
- IMPALA-6280 - Materialize TupleIsNullPredicate for insert sorts
- IMPALA-6281 - Fix use-after-free in InitAuth()
- IMPALA-6285 - Don't print stack trace on RPC errors.
- IMPALA-6286 - Remove invalid runtime filter targets.
- IMPALA-6291 - disable AVX512 codegen in LLVM
- IMPALA-6292 - addendum: Fix failing test
- IMPALA-6292 - Fix incorrect DCHECK in decimal subtraction
- IMPALA-6298 - Skip test_profile_fragment_instances on local filesystem
- IMPALA-6308 - Fix bad Status() usage in data-stream-sender.cc
- IMPALA-5307 - ScannerContext API change
- IMPALA-5493 - Add Protobuf to CMakeLists.txt
Issues Fixed in Impala for CDH 5.13.3
For the full list of fixed issues for all CDH components in CDH 5.13.3, see Issues Fixed in CDH 5.13.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-6353 - Fixed the crash in snappy decompressor.
- IMPALA-6473 - Error in analytic sort with same expr in 'partition by' and 'order by'.
- IMPALA-6500 - Impala crashes randomly on different queries with GROUP BY.
Issues Fixed in Impala for CDH 5.13.2
For the full list of fixed issues for all CDH components in CDH 5.13.2, see Issues Fixed in CDH 5.13.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-2810 - Remove column stats restoration when altering table.
- IMPALA-4664 - Unexpected string conversion in Shell.
- IMPALA-5167 - Revert "Reduce the number of Kudu clients created (FE)".
- IMPALA-5664 - Unix time to timestamp conversions may crash Impala
- IMPALA-5926 - Avoid printing expensive stack when closing a session.
- IMPALA-5936 - Operator '%' overflows on large decimals.
- IMPALA-5940 - Avoid log spew by using Status::Expected.
- IMPALA-5951 - Remove flaky test_catalogd_timeout.
- IMPALA-5987 - LZ4 Codec silently produces bogus compressed data for large inputs.
- IMPALA-6004 - Fix the test_row_filters failure on ASAN.
- IMPALA-6053 - Fix the exception when storadeIds don't match hosts.
- IMPALA-6060 - Check the return value of JNI exception handling functions.
- IMPALA-6067 - Enable S3 access via IAM roles for EC2 VMs.
- IMPALA-6069 - Fix CodegenAnyVal's handling of 'nan'.
- IMPALA-6100 - Increase test_exchange_delays timeout on slow builds.
- IMPALA-6106 - Handle comments before set in test parser.
- IMPALA-6114 - Require type equality for NumericLiteral::localEquals().
- IMPALA-6137 - Fix text scanner split delim mem mgmt.
- IMPALA-6183 - Fix Decimal to Double conversion.
- IMPALA-6184 - Clean up after ScalarExprEvaluator::Clone() fails.
- IMPALA-6280 - Materialize TupleIsNullPredicate for insert sorts.
- IMPALA-6284 - Mark the intermediate decimal avg struct as packed.
- IMPALA-6286 - Remove invalid runtime filter targets.
- IMPALA-6291 - Disable AVX512 codegen in LLVM.
- IMPALA-6295 - Fix mix/max handling of 'nan' and 'inf'.
- IMPALA-6307 - CTAS statement fails with duplicate column exception.
- IMPALA-6308 - Fix bad Status() usage in data-stream-sender.cc.
- IMPALA-6348 - Redact only sensitive fields in runtime profiles.
- IMPALA-6364 - Bypass file handle cache for ineligible files.
- IMPALA-6381 - Increase test_exchange_delays timeout for isilon.
- IMPALA-6384 - RequestPoolService should honor custom group mapping config.
- IMPALA-6388 - Fix the Union node number of hosts estimation.
- IMPALA-6418 - Find a reliable way to detect supported TLS versions.
- IMPALA-6435 - Disable codegen for CHAR literals.
Issues Fixed in Impala for CDH 5.13.1
For the full list of fixed issues for all CDH components in CDH 5.13.1, see Issues Fixed in CDH 5.13.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-4682 - Remove Preconditions check from analyzeAggregation().
- IMPALA-4951 - Fix database visibility for user with only column privilege
- IMPALA-5597 - Try casting targetExpr when building runtime filter plan
- IMPALA-5856 - Fix outer join predicate assignment.
- IMPALA-5923 - Print binary ID as hex in ChildQuery::Cancel()
- IMPALA-5954 - Set DO_NOT_UPDATE_STATS in alterTable() RPCs to HMS.
- IMPALA-5955 - Use totalSize tblproperty instead of rawDataSize.
- IMPALA-5983 - Fix crash in to/from_utc_timestamp
Issues Fixed in Impala for CDH 5.13.0
For the full list of fixed issues for all CDH components in CDH 5.13, see Issues Fixed in CDH 5.13.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-992 - Rerun past queries from history in shell
- IMPALA-1470 - Fix error message with catalog down
- IMPALA-1478 - Improve error message when subquery is used in the ON clause
- IMPALA-1882 - Remove ORDER BY restriction from first_value()/last_value()
- IMPALA-1891 - Statestore won't send deletions in initial non-delta topic
- IMPALA-2373 - Extrapolate row counts for HDFS tables.
- IMPALA-2525 - Treat parquet ENUMs as STRINGs when creating impala tables.
- IMPALA-2615 - support [[nodiscard]] on Status
- IMPALA-3040 - addendum: use specific_build_type_timeout for slow builds
- IMPALA-3208 - max_row_size option
- IMPALA-3504 - UDF for current timestamp in UTC
- IMPALA-3894 - Change the behavior parsing date "YY"
- IMPALA-3905 - HdfsScanner::GetNext() for Avro, RC, and Seq scans.
- IMPALA-3931 - arbitrary fixed-size uda intermediate types
- IMPALA-3937 - Deprecate --be_service_threads
- IMPALA-4039 - Increase width of Operator column in runtime profile
- IMPALA-4086 - Add benchmark for simple scheduler
- IMPALA-4164 - Avoid overly aggressive inlining in LLVM IR
- IMPALA-4192 - Disentangle Expr and ExprContext
- IMPALA-4276 - Profile displays non-default query options set by planner
- IMPALA-4418 - Fixes extra blank lines in query result
- IMPALA-4622 - Add ALTER COLUMN statement.
- IMPALA-4623 - Enable file handle cache
- IMPALA-4669 - Add Kudu's RPC, util, and security libraries
- IMPALA-4674 - Port spilling ExecNodes to new buffer pool
- IMPALA-4687 - Get Impala working against HBase 2.0
- IMPALA-4703 - reservation denial debug action
- IMPALA-4737 - Prevent SIGUSR1 from killing daemons when minidumps are disabled
- IMPALA-4794 - Grouping distinct agg plan robust to data skew
- IMPALA-4795 - Allow fetching function obj from catalog using signature
- IMPALA-4826 - Fix error during a scan on repeated root schema in Parquet.
- IMPALA-4833 - Use scheduling information to make per-node memory reservation tight
- IMPALA-4861 - READ_WRITE warning on CREATE TABLE LIKE PARQUET.
- IMPALA-4862 - make resource profile consistent with backend behaviour
- IMPALA-4866 - Hash join node does not apply limits correctly
- IMPALA-4905 - Don't send empty insert status to coordinator
- IMPALA-4925 - Cancel finstance if query has finished
- IMPALA-5016 - Simplify COALESCE() in SimplifyConditionalsRule.
- IMPALA-5031 - UBSAN clean and method for testing UBSAN cleanliness
- IMPALA-5036 - Parquet count star optimization
- IMPALA-5061 - Populate null_count in parquet::statistics
- IMPALA-5085 - large rows in BufferedTupleStreamV2
- IMPALA-5104 - Admit queries with mem equal to proc mem_limit
- IMPALA-5108 - idle_session_timeout kicks in later than expected
- IMPALA-5109 - Increase range of backend latency histogram
- IMPALA-5116 - Remove deprecated hash_* types in gutil
- IMPALA-5158 - Account for difference between process memory consumption and memory used by queries
- IMPALA-5160 - adjust spill buffer size based on planner estimates
- IMPALA-5167 - Reduce the number of Kudu clients created
- IMPALA-5240 - Allow config of number of disk I/O threads per disk type
- IMPALA-5275 - Avoid printing status stack trace on hot paths
- IMPALA-5280 - Coalesce chains of OR conditions to an IN predicate
- IMPALA-5286 - Query with Kudu col name w/ different casing from 'order by' fails
- IMPALA-5316 - Adds last_day() function
- IMPALA-5327 - Handle return of JNI GetStringUTFChar
- IMPALA-5336 - Fix partition pruning when column is cast
- IMPALA-5347 - reduce codegen overhead of timestamp trunc()
- IMPALA-5350 - Tidy up thread groups for finst exec threads
- IMPALA-5352 - Age out unused file handles from the cache
- IMPALA-5354 - INSERT hints for Kudu tables
- IMPALA-5376 - Loads all TPC-DS tables
- IMPALA-5386 - Fix ReopenCachedHdfsFileHandle failure case
- IMPALA-5389 - simplify BufferDescriptor lifetime
- IMPALA-5407 - Fix crash in HdfsSequenceTableWriter
- IMPALA-5412 - Fix scan result with partitions on same file
- IMPALA-5420 - Skip ACL fetch if the acl bit is not set.
- IMPALA-5427 - Fix race between CRS::UpdateQueryStatus() and beeswax RPCs
- IMPALA-5431 - Remove redundant path exists checks during table load
- IMPALA-5433 - Mark single-argument Status c'tors as explicit
- IMPALA-5446 - dropped Sorter::Reset() status
- IMPALA-5477 - Fix minidump-2-core tool
- IMPALA-5479 - Propagate the argument type in RawValue::Compare()
- IMPALA-5480 - Improve missing filters message
- IMPALA-5481 - Clarify RowDescriptor ownership
- IMPALA-5482 - fix git checkout when workloads are modified
- IMPALA-5483 - Automatically disable codegen for small queries
- IMPALA-5484 - Fix LICENSE issues discovered by IPMC in 2.9 vote
- IMPALA-5488 - Fix handling of exclusive HDFS file handles
- IMPALA-5489 - Improve Sentry authorization for Kudu tables
- IMPALA-5492 - Fix incorrect newline character in the LDAP message
- IMPALA-5494 - Fixes the selectivity of NOT IN predicates
- IMPALA-5495 - Improve error message if no impalad role is configured
- IMPALA-5497 - spilling hash joins that output build rows hit OOM
- IMPALA-5498 - Support for partial sorts in Kudu INSERTs
- IMPALA-5499 - avoid ephemeral port conflicts
- IMPALA-5500 - Reduce catalog update topic size
- IMPALA-5504 - Fix TupleIsNullPredicate evaluation.
- IMPALA-5506 - Add stdin description to help information of query_file option
- IMPALA-5507 - Add clear description to help information of KEYVAL option
- IMPALA-5511 - Add process start time to debug web page
- IMPALA-5513 - Fix display message exception when using invalid KEYVAL
- IMPALA-5514 - Throw an error when --ldap_password_cmd is used without LDAP auth
- IMPALA-5517 - Allow IMPALA_LOGS_DIR to be overridden
- IMPALA-5520 - TopN node periodically reclaims old allocations
- IMPALA-5524 - Fixes NPE during planning with DISABLE_UNSAFE_SPILLS=1
- IMPALA-5529 - Add additional function signatures for TRUNC()
- IMPALA-5531 - Fix correctness issue in correlated aggregate subqueries
- IMPALA-5532 - Stack-allocate compressors in RowBatch (de)serialization
- IMPALA-5536 - Fix TCLIService thrift compilation on Hive 2
- IMPALA-5539 - Fix Kudu timestamp with -use_local_tz_for_unix_ts
- IMPALA-5540 - Revert Sentry version back to 5.13
- IMPALA-5546 - Allow creating unpartitioned Kudu tables
- IMPALA-5547 - Rework FK/PK join detection.
- IMPALA-5548 - Fix some minor nits with HDFS parquet column readers
- IMPALA-5549 - Remove deprecated fields from CatalogService API
- IMPALA-5551 - Fix AggregationNode::Close() when Prepare() fails
- IMPALA-5554 - sorter DCHECK on null column
- IMPALA-5560 - always store CHAR(N) inline in tuple
- IMPALA-5567 - race in fragment instance teardown
- IMPALA-5570 - fix spilling null-aware anti join
- IMPALA-5572 - Timestamp codegen for text scanner
- IMPALA-5573 - Add decimal codegen in text scanner
- IMPALA-5579 - Fix IndexOutOfBoundsException in GetTables metadata request
- IMPALA-5580 - fix Java UDFs that return NULL strings
- IMPALA-5582 - Store sentry privileges in lower case
- IMPALA-5586 - Null-aware anti-join can take a long time to cancel
- IMPALA-5588 - Reduce the frequency of fault injection
- IMPALA-5591 - set should handle negative values
- IMPALA-5594 - don't import shaded classes
- IMPALA-5595 - Only set KuduScanner timestamp feature flag if necessary
- IMPALA-5598 - Fix excessive dumping in MemLimitExceeded
- IMPALA-5602 - Fix query optimization for kudu and datasource tables
- IMPALA-5611 - KuduPartitionExpr holds onto memory unnecessarily
- IMPALA-5612 - join inversion should factor in parallelism
- IMPALA-5615 - Fix compute incremental stats for general partition exprs
- IMPALA-5616 - Add --enable_minidumps startup flag
- IMPALA-5617 - Include full workload name in tpch_nested query filenames
- IMPALA-5618 - buffered-tuple-stream-v2 fixes
- IMPALA-5623 - Fix lag() on STRING cols to release UDF mem
- IMPALA-5627 - fix dropped statuses in HDFS writers
- IMPALA-5629 - avoid expensive list::size() call
- IMPALA-5630 - Add Kudu client version as a common metric
- IMPALA-5636 - Change the metadata in parquet
- IMPALA-5638 - Fix Kudu table set tblproperties inconsistencies
- IMPALA-5641 - mem-estimate should never be less than mem-reservation
- IMPALA-5643 - Add total number of threads created per group to /threadz
- IMPALA-5644 - Fail queries early when their minimum reservation is too high to execute within the given mem_limit
- IMPALA-5648 - fix count(*) mem estimate regression
- IMPALA-5650 - Make sum_init_zero a SUM function
- IMPALA-5652 - deprecate unlimited process mem_limit
- IMPALA-5657 - FunctionCallExpr.toSql() and clone() ignore 'IGNORE NULLS' case
- IMPALA-5658 - addtl. process/system-wide memory metrics
- IMPALA-5659 - Begin standardizing treatment of thirdparty libraries
- IMPALA-5659 - glog/gflags should be dynamically linked
- IMPALA-5661 - buffer pool limit
- IMPALA-5666 - ASAN poisoning for MemPool and BufferPool
- IMPALA-5670 - Misc. tidying of ExecEnv
- IMPALA-5676 - avoid expensive consistency checks in BTSv2
- IMPALA-5677 - limit clean page memory consumption
- IMPALA-5679 - Fix Parquet count(*) with group by string
- IMPALA-5681 - release reservation from blocking operators
- IMPALA-5686 - Update a mini cluster Sentry property
- IMPALA-5689 - Avoid inverting non-equi left joins
- IMPALA-5691 - recalibrate mem limit for Q18
- IMPALA-5696 - Enable cipher configuration when using TLS / Thrift
- IMPALA-5709 - Remove mini-impala-cluster
- IMPALA-5713 - always reserve memory for preaggs
- IMPALA-5714 - Add OpenSSL to bootstrap_toolchain.py
- IMPALA-5715 - (mitigation only) defer destruction of MemTrackers
- IMPALA-5716 - Don't delete cmake_modules/* when enabling distcc
- IMPALA-5722 - Fix string to decimal cast
- IMPALA-5725 - coalesce() with outer join incorrectly rewritten
- IMPALA-5739 - Correctly handle sles12 SP2
- IMPALA-5742 - De-allocate buffer in parquet-reader on exit
- IMPALA-5743 - Support TLS version configuration for Thrift servers
- IMPALA-5744 - Add 'use_krpc' flag and create DataStream interface
- IMPALA-5745 - Bump Breakpad version
- IMPALA-5749 - coordinator race hits DCHECK 'num_remaining_backends_ > 0'
- IMPALA-5750 - Catch exceptions from boost thread creation
- IMPALA-5756 - start memory maintenance thread after metric creation
- IMPALA-5757 - Make tbl property toSql deterministic
- IMPALA-5769 - Add periodic minidump cleanup
- IMPALA-5773 - Correctly account for memory used in data stream receiver queue
- IMPALA-5774 - Prevent FindInSet() from reading off end of string
- IMPALA-5775 - Allow shell to support TLSv1, v1.1 and v1.2
- IMPALA-5776 - HdfsTextScanner::WritePartialTuple() writes the varlen data to an incorrect memory pool
- IMPALA-5778 - Clarify logging around and usage of read_size startup option
- IMPALA-5784 - Separate planner and user set query options in profile
- IMPALA-5787 - Dropped status in KuduTableSink::Send()
- IMPALA-5788 - Fix agg node crash when grouping by nondeterministic exprs
- IMPALA-5791 - Make bootstrap_development.sh survive apt-get failure
- IMPALA-5796 - CTAS for Kudu fails with expr rewrite
- IMPALA-5798 - ASAN use-after-poison in Parquet decoder
- IMPALA-5799 - Kudu DML can crash if schema has changed
- IMPALA-5800 - Configure Squeasel's cipher suite and TLS version
- IMPALA-5811 - Add 'backends' tab to query details pages
- IMPALA-5812 - Fix NPE when joining on empty const select
- IMPALA-5815 - right outer join returns invalid memory
- IMPALA-5819 - DCHECK in HdfsTextScanner::Close()
- IMPALA-5823 - fix SET_DENY_RESERVATION_PROBABILITY
- IMPALA-5825 - Catch exceptions thrown by TSSLSocketFactory c'tor
- IMPALA-5838 - Improve errors on AC buffer mem rejection
- IMPALA-5840 - Don't write page-level statistics in Parquet files.
- IMPALA-5849 - Remove compile-time checks for OpenSSL > 1.0.0
- IMPALA-5850 - Cast sender partition exprs under unions.
- IMPALA-5852 - improve MINIMUM_RESERVATION_UNAVAILABLE error
- IMPALA-5853 - fix GetResultSetMetadata() error message for invalid query id
- IMPALA-5855 - Preaggregation crashes - unable to initialize hash table
- IMPALA-5857 - avoid invalid free of hedged read metrics
- IMPALA-5867 - Fix bugs parsing 2-digit year
- IMPALA-5871 - KuduPartitionExpr incorrectly handles its child types
- IMPALA-5885 - free runtime filter allocations in Parquet
- IMPALA-5888 - free other local allocations in Parquet
- IMPALA-5890 - Abort queries if scanner hits IO errors
- IMPALA-5891 - fix PeriodicCounterUpdater initialization
- IMPALA-5892 - Allow reporting status independent of fragment instance
- IMPALA-2615 - annotate Status with [[nodiscard]]
- IMPALA-5412 - Fix scan result with partitions on same file
Issues Fixed in Impala for CDH 5.12.2
For the full list of fixed issues for all CDH components in CDH 5.12, see Issues Fixed in CDH 5.12.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-4682 - Remove Preconditions check from analyzeAggregation().
- IMPALA-4737 - Prevent SIGUSR1 from killing daemons when minidumps are disabled
- IMPALA-4826 - Fix error during a scan on repeated root schema in Parquet.
- IMPALA-4951 - Fix database visibility for user with only column privilege
- IMPALA-5327 - Handle return of JNI GetStringUTFChar
- IMPALA-5336 - Fix partition pruning when column is cast
- IMPALA-5446 - dropped Sorter::Reset() status
- IMPALA-5495 - Improve error message if no impalad role is configured
- IMPALA-5504 - Fix TupleIsNullPredicate evaluation.
- IMPALA-5531 - Fix correctness issue in correlated aggregate subqueries
- IMPALA-5595 - Only set KuduScanner timestamp feature flag if necessary
- IMPALA-5597 - Try casting targetExpr when building runtime filter plan
- IMPALA-5598 - Fix excessive dumping in MemLimitExceeded
- IMPALA-5689 - Avoid inverting non-equi left joins
- IMPALA-5750 - Catch exceptions from boost thread creation
- IMPALA-5776 - Write partial tuple to the correct mempool
- IMPALA-5784 - Separate planner and user set query options in profile
- IMPALA-5796 - CTAS for Kudu fails with expr rewrite
- IMPALA-5815 - right outer join returns invalid memory
- IMPALA-5819 - DCHECK in HdfsTextScanner::Close()
- IMPALA-5820 - Fix string format() syntax in test_scanners_fuzz.py
- IMPALA-5840 - Don't write page-level statistics in Parquet files.
- IMPALA-5850 - Cast sender partition exprs under unions.
- IMPALA-5856 - Fix outer join predicate assignment.
- IMPALA-5871 - KuduPartitionExpr incorrectly handles its child types
- IMPALA-5885 - free runtime filter allocations in Parquet
- IMPALA-5892 - Allow reporting status independent of fragment instance
- IMPALA-5923 - Print binary ID as hex in ChildQuery::Cancel()
- IMPALA-5954 - Set DO_NOT_UPDATE_STATS in alterTable() RPCs to HMS.
- IMPALA-5983 - Fix crash in to/from_utc_timestamp
- IMPALA-5994 - Lower case struct-field names
- IMPALA-6060 - Check the return value of JNI exception handling functions
Issues Fixed in Impala for CDH 5.12.1
For the full list of fixed issues for all CDH components in CDH 5.12, see Issues Fixed in CDH 5.12.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-4276 - Profile displays non-default query options set by planner
- IMPALA-4866 - Hash join node does not apply limits correctly
- IMPALA-5354 - INSERT hints for Kudu tables
- IMPALA-5427 - Fix race between CRS::UpdateQueryStatus() and beeswax RPCs
- IMPALA-5500 - Reduce catalog update topic size
- IMPALA-5524 - Fixes NPE during planning with DISABLE_UNSAFE_SPILLS=1
- IMPALA-5539 - Fix Kudu timestamp with -use_local_tz_for_unix_ts
- IMPALA-5554 - sorter DCHECK on null column
- IMPALA-5567 - race in fragment instance teardown
- IMPALA-5579 - Fix IndexOutOfBoundsException in GetTables metadata request
- IMPALA-5580 - fix Java UDFs that return NULL strings
- IMPALA-5582 - Store sentry privileges in lower case
- IMPALA-5586 - Null-aware anti-join can take a long time to cancel
- IMPALA-5588 - Reduce the frequency of fault injection
- IMPALA-5611 - KuduPartitionExpr holds onto memory unnecessarily
- IMPALA-5615 - Fix compute incremental stats for general partition exprs
- IMPALA-5616 - Add --enable_minidumps startup flag
- IMPALA-5623 - Fix lag() on STRING cols to release UDF mem
- IMPALA-5627 - fix dropped statuses in HDFS writers
- IMPALA-5638 - Fix Kudu table set tblproperties inconsistencies
- IMPALA-5657 - Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS
- IMPALA-5686 - Update a mini cluster Sentry property
- IMPALA-5691 - recalibrate mem limit for Q18
Issues Fixed in Impala for CDH 5.12.0
For the full list of fixed issues for all CDH components in CDH 5.12, see Issues Fixed in CDH 5.12.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-278 - Mention DIV arithmetic operator.
- IMPALA-1427 - Improvements to "Unknown disk-ID" warning
- IMPALA-1670, IMPALA-4141 - Support multiple partitions in ALTER TABLE ADD PARTITION
- IMPALA-1861 - Simplify conditionals with constant conditions
- IMPALA-1972, IMPALA-3882 - Fix client_request_state_map _lock_ contention
- IMPALA-2020 - Inline big number strings
- IMPALA-2079 - Don't fail when write to scratch dir results in error.
- IMPALA-2328 - Read support for min/max Parquet statistics
- IMPALA-2518 - DROP DATABASE CASCADE removes cache directives of tables
- IMPALA-2522 - Improve the reliability and effectiveness of ETL.
- IMPALA-2550 - Switch to per-query exec rpc
- IMPALA-2615 - warn if Status is ignored
- IMPALA-3079 - Fix sequence file writer
- IMPALA-3200 - Implement suballocator for splitting buffers
- IMPALA-3202 - implement spill-to-disk in new buffer pool
- IMPALA-3203 - Implement scalable buffer recycling in buffer pool.
- IMPALA-3524 - Don't process spilled partitions with 0 probe rows
- IMPALA-3586 - Implement union passthrough
- IMPALA-3654 - Parquet stats filtering for IN predicate
- IMPALA-3671 - Add SCRATCH_LIMIT query option.
- IMPALA-3742 - Partitions and sort INSERTs for Kudu tables
- IMPALA-3748 - minimum buffer requirements in planner
- IMPALA-3748 - add query-wide resource acquisition step
- IMPALA-3785 - Record query handle for invalid handle
- IMPALA-3794 - Workaround for Breakpad ID conflicts
- IMPALA-3905 - Implements HdfsScanner::GetNext() for text scans.
- IMPALA-3909 - Populate min/max statistics in Parquet writer
- IMPALA-3989 - Display skew warning for poorly formatted Parquet files
- IMPALA-4008 - Don't bake fields into generated IR functions of OldHashTable
- IMPALA-4014 - Introduce query-wide execution state.
- IMPALA-4029 - Reduce memory requirements for storing file metadata
- IMPALA-4036 - invalid SQL generated for partitioned table with comment
- IMPALA-4041 - Limit catalog and admission control updates to coordinators
- IMPALA-4163 - Add sortby() query hint
- IMPALA-4166 - Add SORT BY sql clause
- IMPALA-4341 - Add metadata load to planner timeline
- IMPALA-4351, IMPALA-4353 - query generator random profile options for INSERT.
- IMPALA-4355 - random query generator: modify statement execution flow to support DML
- IMPALA-4359 - qgen: add UPSERT support
- IMPALA-4390 - Separate ADD and DROP PARTITION syntax
- IMPALA-4431 - Add audit event log control mechanism to prevent disk overflow
- IMPALA-4477 - Update Kudu to latest commit on master
- IMPALA-4499 - Table name missing from exec summary
- IMPALA-4546 - Fix Moscow timezone conversion after 2014
- IMPALA-4548 - BlockingJoinNode should wait for async build thread
- IMPALA-4549 - consistently treat 9999 as upper bound for timestamp year
- IMPALA-4588 - Enable starting the minicluster when offline
- IMPALA-4611 - Checking perms on S3 files is a very expensive no-op
- IMPALA-4616 - Add missing Kudu column options
- IMPALA-4617 - remove IsConstant() analysis from backend
- IMPALA-4624 - Implement Parquet dictionary filtering
- IMPALA-4631 - avoid DCHECK in PlanFragementExecutor::Close().
- IMPALA-4640 - Fix number of rows displayed by parquet-reader tool
- IMPALA-4647 - fix full data load with ninja
- IMPALA-4648 - remove build_thirdparty.sh
- IMPALA-4649 - add a mechanism to pass flags into make
- IMPALA-4650 - Add Protobuf 2.6.1 to toolchain and as a build dependency
- IMPALA-4651 - Add LibEv to build
- IMPALA-4652 - Add crcutil to build
- IMPALA-4653 - fix sticky config variable problem
- IMPALA-4662 - Fix NULL literal handling in Kudu IN list predicates
- IMPALA-4673 - Use --local_library_dir for tzdb startup scratch space
- IMPALA-4674 - port BufferedTupleStream to BufferPool
- IMPALA-4676 - remove vestigial references to getBlockStorageLocations() API
- IMPALA-4678 - move query MemTracker into QueryState
- IMPALA-4684 - check-hbase-nodes.py: Build failing on RHEL7 when trying to start HBase.
- IMPALA-4686 - Fix schema output for INT96 columns in parquet-reader tool
- IMPALA-4689 - Fix computation of last active time
- IMPALA-4701 - make distcc work reliably with clang
- IMPALA-4707 - fix use-after-free in QueryExecMgr
- IMPALA-4710 - There is an error in control audit log file size number
- IMPALA-4711 - clarify is_null semantics in udf.h
- IMPALA-4716 - Expr rewrite causes IllegalStateException
- IMPALA-4731, IMPALA-397, IMPALA-4728 - Sorter crash Impalad instance.
- IMPALA-4734 - Set parquet::RowGroup::sorting_columns
- IMPALA-4738 - STDDEV_SAMP should return NULL for single record input
- IMPALA-4740 - Add option to use hdfsPread() for HDFS hedged reads
- IMPALA-4747 - macros should only evaluate their arguments once
- IMPALA-4748 - crash in TmpFileMgr when hitting process mem limit
- IMPALA-4752 - make ObjectPool more efficient
- IMPALA-4757 - addendum: avoid double underscore in name
- IMPALA-4758 - Upgrade gutil to recent Kudu version.
- IMPALA-4762 - RECOVER PARTITIONS should batch partition updates
- IMPALA-4764 - Add Hedged read metrics
- IMPALA-4788 - Use HashSet in RECOVER PARTITIONS duplicate checks
- IMPALA-4789 - Fix slow metadata loading due to inconsistent paths.
- IMPALA-4792 - Fix number of distinct values for a CASE with constant outputs
- IMPALA-4801 - fix heap use after free for MemTracker
- IMPALA-4815 - , IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types
- IMPALA-4820 - avoid writing unencrypted data during write cancellation
- IMPALA-4828 - Add "Known Issues" item for
- IMPALA-4831 - enforce BufferPool reservation invariants
- IMPALA-4839 - Remove implicit 'localhost' for KUDU_MASTER_HOSTS
- IMPALA-4840 - Fix REFRESH performance regression.
- IMPALA-4846 - Upgrade Snappy to 1.1.4
- IMPALA-4849 - IllegalStateException from rewritten CASE expr
- IMPALA-4858 - add more info to MemLimitExceeded errors
- IMPALA-4859 - Push down IS NULL / IS NOT NULL to Kudu
- IMPALA-4877 - fix precedence of unary -/+
- IMPALA-4880 - Clarify synchronization policy for 'done_' in KuduScanNode
- IMPALA-4883 - Union Codegen
- IMPALA-4884 - Add JVM heap and non-heap usage in metrics and UI
- IMPALA-4885 - Expose Jvm thread info in web UI
- IMPALA-4890 - /5143: Coordinator race involving TearDown()
- IMPALA-4892 - Session ID included in error message
- IMPALA-4893 - Efficiently update the rows read counter for sequence file
- IMPALA-4897 - AnalysisException: specified cache pool does not exist
- IMPALA-4923 - reduce memory transfer for selective scans
- IMPALA-4926 - Upgrade LZ4 to 1.7.5
- IMPALA-4937 - Remove unused kudu scanner keep alive variable
- IMPALA-4943 - Speed up block md loading for add/recover partition calls.
- IMPALA-4946 - fix hang in BufferPool
- IMPALA-4955 - Fix integer overflow in hdfs table size accounting
- IMPALA-4959 - Avoid picking up the system's boost cmake module
- IMPALA-4962 - Fix SHOW COLUMN STATS for HS2
- IMPALA-4965 - Authorize access to runtime profile and exec summary
- IMPALA-4966 - Add flatbuffers to build
- IMPALA-4983 - Set toolchain version to include LZ4 build flags
- IMPALA-4988 - Add query option read_parquet_statistics
- IMPALA-4996 - Single-threaded KuduScanNode
- IMPALA-5003 - Constant propagation in scan conjuncts
- IMPALA-5008 - Fix reading stats for TINYINT and SMALLINT
- IMPALA-5021 - Fix count(*) remaining rows overflow in Parquet.
- IMPALA-5027 - addendum - remove unneeded #define
- IMPALA-5030 - Adds support for NVL2() function
- IMPALA-5034 - Update Breakpad to newer version
- IMPALA-5038 - Fix file size regex to include bytes
- IMPALA-5041 - Allow AuthManager::Init() to be called more than once
- IMPALA-5042 - Use a HashSet instead of ArrayList for O(1) look ups
- IMPALA-5056 - Ensure analysis uses 'fresh' catalog after metadata loading
- IMPALA-5057 - Upgrade glog (0.3.4-p2) and gflags
- IMPALA-5073 - Part 1: add option to use mmap() for buffer pool
- IMPALA-5077 - add NUMA and current cpu to CpuInfo
- IMPALA-5080 - OutOfMemory PermGen space
- IMPALA-5110 - Add deb support to dump_breakpad_symbols.py
- IMPALA-5111 - Fix check when creating NOT NULL PK col in Kudu
- IMPALA-5113 - fix dirty unpinned invariant
- IMPALA-5120 - Default to partitioned join when stats are missing
- IMPALA-5123 - Fix ASAN use after free in timezone_db
- IMPALA-5125 - SimplifyConditionalsRule incorrectly handles aggregates
- IMPALA-5127 - Add history_max option
- IMPALA-5130 - fix race in MemTracker::EnableReservationReporting()
- IMPALA-5137 - Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP.
- IMPALA-5140 - improve docs building guidelines
- IMPALA-5144 - Remove sortby() hint
- IMPALA-5147 - Add the ability to exclude hosts from query execution
- IMPALA-5154 - Handle 'unpartitioned' Kudu tables
- IMPALA-5158 - Part 1: include untracked memory in MemTracker dumps
- IMPALA-5166 - clean up BufferPool counters
- IMPALA-5167 - Reduce the number of Kudu clients created
- IMPALA-5169 - Add support for async pins in buffer pool
- IMPALA-5171 - update RAT excluded files list
- IMPALA-5172 - crash in tcmalloc::CentralFreeList::FetchFromOneSpans
- IMPALA-5173 - crash with hash join feeding directly into nlj
- IMPALA-5174 - Bump gflags to 2.2.0-p1
- IMPALA-5180 - Don't use non-deterministic exprs in partition pruning
- IMPALA-5181 - Extract PYPI metadata from a webpage
- IMPALA-5182 - Explicitly close connection to impalad on error from shell
- IMPALA-5184 - build fe against both Hive 1 & 2 APIs
- IMPALA-5187 - , IMPALA-5208: Bump Breakpad Version, undo IMPALA-3794
- IMPALA-5188 - Add slot sorting in TupleDescriptor::LayoutEquals()
- IMPALA-5189 - Pin version of setuptools-scm
- IMPALA-5192 - Don't bake MemPool* into IR
- IMPALA-5197 - Erroneous corrupted Parquet file message
- IMPALA-5198 - Error messages are sometimes dropped before reaching client
- IMPALA-5207 - ,IMPALA-5214: distcc fixes
- IMPALA-5217 - KuduTableSink checks null constraints incorrectly
- IMPALA-5220 - memory maintenance cleanup
- IMPALA-5221 - Avoid re-use of stale SASL contexts.
- IMPALA-5222 - don't call Bits::Log2*() functions
- IMPALA-5223 - Add waiting for HBase Zookeeper nodes to retry loop
- IMPALA-5224 - remove defunct codehaus repository
- IMPALA-5229 - huge page-backed buffers with TCMalloc
- IMPALA-5230 - fix non-functional impalad under ASAN
- IMPALA-5232 - Parquet reader error message prints memory address instead of value
- IMPALA-5235 - Initialize resourceProfile_ with a dummy value
- IMPALA-5238 - transfer reservations between trackers
- IMPALA-5258 - Pass CMAKE_BUILD_TYPE to Impala-lzo
- IMPALA-5259 - Add REFRESH FUNCTIONS <db> statement
- IMPALA-5261 - Heap use-after-free in HdfsSequenceTableWriter
- IMPALA-5273 - Replace StringCompare with glibc memcmp
- IMPALA-5282 - Handle overflows in computeResourceProfile().
- IMPALA-5294 - Kudu INSERT partitioning fails with constants
- IMPALA-5301 - Set Kudu minicluster memory limit
- IMPALA-5304 - reduce transfer of Parquet decompression buffers
- IMPALA-5309 - Adds TABLESAMPLE clause for HDFS table refs.
- IMPALA-5318 - Generate access events with fully qualified table names
- IMPALA-5324 - Fix version check in EvalDictionaryFilters
- IMPALA-5325 - Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd
- IMPALA-5331 - Use new libHDFS API to address "Unknown Error 255"
- IMPALA-5333 - Add support for Impala to work with ADLS
- IMPALA-5338 - Fix Kudu timestamp column default values
- IMPALA-5339 - Fix analysis with sort.columns and expr rewrites
- IMPALA-5340 - Query profile displays stale query state
- IMPALA-5342 - Add comments of loaded tables in the response of GetTables
- IMPALA-5347 - Parquet scanner microoptimizations
- IMPALA-5355 - Fix the order of Sentry roles and privileges
- IMPALA-5357 - Fix unixtime to UTC TimestampValue perf
- IMPALA-5358 - Fix repeatable table sample.
- IMPALA-5363 - Reset probe_batch_ after reaching limit
- IMPALA-5364 - Correct title of query locations table
- IMPALA-5375 - Builds on CentOS 6.4 failing with broken python dependencies
- IMPALA-5377 - Impala may crash if given a fragment instance while restarting
- IMPALA-5378 - Disk IO manager needs to understand ADLS
- IMPALA-5381 - Adds DEFAULT_JOIN_DISTRIBUTION_MODE query option.
- IMPALA-5383 - Fix PARQUET_FILE_SIZE option for ADLS
- IMPALA-5388 - Only retry RPC on lost connection in send call
- IMPALA-5391 - remove C++11 from UDF header
- IMPALA-5411 - Avoid log spew from GetRuntimeProfileStr
- IMPALA-5419 - Check for cancellation when building hash tables
- IMPALA-5424 - Ignore errors when removing minidumps folder
- IMPALA-5426 - Update Hive schema script to 1.1.0
- IMPALA-5432 - Remove invalid DCHECK from SetMemLimitExceeded
- IMPALA-5438 - Always eval union const exprs in subplan.
- IMPALA-5454 - Work around template rendering bug in /memz
- IMPALA-5469 - Fix exception when processing catalog update
- IMPALA-5487 - Fix race in RuntimeProfile::toThrift()
- IMPALA-5537 - Retry RPC on somes exceptions with SSL connection
- IMPALA-5558 - /IMPALA-5576: Reopen stale client connection
- IMPALA-5562 - Only recomputeMemLayout() if tuple has a layout.
- IMPALA-3905 - Implements HdfsScanner::GetNext() for LZO text scans.
- IMPALA-5172 - fix incorrect cast in call to LZO decompress
Issues Fixed in Impala for CDH 5.11.2
For the full list of fixed issues for all CDH components in CDH 5.11.2, see Issues Fixed in CDH 5.11.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-4276 - Profile displays non-default query options set by planner
- IMPALA-4546 - Fix Moscow timezone conversion after 2014
- IMPALA-4631 - don't use floating point operations for time unit conversions
- IMPALA-4716 - Expr rewrite causes IllegalStateException
- IMPALA-4738 - STDDEV_SAMP should return NULL for single record input
- IMPALA-4962 - Fix SHOW COLUMN STATS for HS2
- IMPALA-5021 - Fix count(*) remaining rows overflow in Parquet.
- IMPALA-5056 - Ensure analysis uses 'fresh' catalog after metadata loading
- IMPALA-5154 - Handle 'unpartitioned' Kudu tables
- IMPALA-5172 - Buffer overrun for Snappy decompression
- IMPALA-5187 - Bump breakpad version to include the fix for Breakpad #681, re-enable the strict check that was disabled in IMPALA-3794.
- IMPALA-5189 - Pin version of setuptools-scm
- IMPALA-5197 - Erroneous corrupted Parquet file message
- IMPALA-5198 - Error messages are sometimes dropped before reaching client
- IMPALA-5217 - KuduTableSink checks null constraints incorrectly
- IMPALA-5223 - Add waiting for HBase Zookeeper nodes to retry loop
- IMPALA-5301 - Set Kudu minicluster memory limit
- IMPALA-5318 - Generate access events with fully qualified table names
- IMPALA-5355 - Fix the order of Sentry roles and privileges
- IMPALA-5363 - Reset probe_batch_ after reaching limit
- IMPALA-5419 - Check for cancellation when building hash tables
- IMPALA-5469 - Fix exception when processing catalog update
- IMPALA-5487 - Fix race in RuntimeProfile::toThrift()
- IMPALA-5524 - Fixes NPE during planning with DISABLE_UNSAFE_SPILLS=1
- IMPALA-5554 - sorter DCHECK on null column
- IMPALA-5580 - fix Java UDFs that return NULL strings
- IMPALA-5615 - Fix compute incremental stats for general partition exprs
- IMPALA-5623 - Fix lag() on STRING cols to release UDF mem
- IMPALA-5638 - Fix Kudu table set tblproperties inconsistencies
- IMPALA-5657 - Fix a couple of bugs with FunctionCallExpr and IGNORE NULLS
- IMPALA-5172 - fix incorrect cast in call to LZO decompress
Issues Fixed in Impala for CDH 5.11.1
For the full list of fixed issues for all CDH components in CDH 5.11.1, see Issues Fixed in CDH 5.11.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-3641 - Fix catalogd RPC responses to DROP IF EXISTS.
- IMPALA-4088 - Assign fix values to the minicluster server ports
- IMPALA-4293 - query profile should include error log
- IMPALA-4544 - ASAN should ignore SEGV and leaks
- IMPALA-4615 - Fix create_table.sql command order
- IMPALA-4733 - Change HBase ports to non-ephemeral
- IMPALA-4787 - Optimize APPX_MEDIAN() memory usage
- IMPALA-4822 - Implement dynamic log level changes
- IMPALA-4899 - Fix parquet table writer dictionary leak
- IMPALA-4902 - Copy parameters map in HdfsPartition.toThrift().
- IMPALA-4998 - Fix missing table lock acquisition.
- IMPALA-5028 - Lock table in /catalog_objects endpoint.
- IMPALA-5055 - Fix DCHECK in parquet-column-readers.cc ReadPageHeader()
- IMPALA-5088 - Fix heap buffer overflow
- IMPALA-5115 - Handle status from HdfsTableSink::WriteClusteredRowBatch
- IMPALA-5145 - Do not constant fold null in CastExprs
- IMPALA-5156 - Drop VLOG level passed into Kudu client
- IMPALA-5186 - Handle failed CreateAndOpenScanner() in MT scan.
- IMPALA-5193 - Initialize decompressor before finding first tuple
- IMPALA-5251 - Fix propagation of input exprs' types in 2-phase agg
- IMPALA-5252 - Fix crash in HiveUdfCall::GetStringVal() when mem_limit exceeded
- IMPALA-5253 - Use appropriate transport for StatestoreSubscriber
- IMPALA-5322 - Fix a potential crash in Frontend & Catalog JNI startup
Issues Fixed in Impala for CDH 5.11.0
For the full list of fixed issues for all CDH components in CDH 5.11.0, see Issues Fixed in CDH 5.11.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-1430 - IMPALA-4878,IMPALA-4879: codegen native UDAs
- IMPALA-2020 - 4915, 4936: Add rounding for decimal casts
- IMPALA-2020 - IMPALA-4809: Codegen support for DECIMAL_V2
- IMPALA-2605 - Omit the sort and mini stress tests
- IMPALA-4055 - Speed up to_date() with custom implementation.
- IMPALA-4263 - Fix wrong ommission of agg/analytic hash exchanges.
- IMPALA-4282 - Remove max length check for type strings.
- IMPALA-4370 - Divide and modulo result types for DECIMAL version V2
- IMPALA-4449 - Revisit table locking pattern in the catalog
- IMPALA-4675 - Case-insensitive matching of Parquet fields.
- IMPALA-4702 - Fix command line help for webserver_private_key_file
- IMPALA-4705 - IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen
- IMPALA-4725 - Query option to control Parquet array resolution.
- IMPALA-4729 - Implement REPLACE()
- IMPALA-4742 - Change "{}".format() to "{0}".format() for Py 2.6
- IMPALA-4749 - hit DCHECK in sorter with scratch limit
- IMPALA-4767 - Workaround for HIVE-15653 to preserve table stats.
- IMPALA-4808 - old hash join can reference invalid memory
- IMPALA-4809 - Enable support for DECIMAL_V2 in decimal_casting.py
- IMPALA-4810 - fix incorrect expr-test decimal types
- IMPALA-4810 - Make DECIMAL expr-test cases table driven
- IMPALA-4810 - Add DECIMAL_V2 query option
- IMPALA-4813 - Round on divide and multiply
- IMPALA-4821 - Update AVG() for DECIMAL_V2
- IMPALA-4828 - Alter Kudu schema outside Impala may crash on read
- IMPALA-4854 - Fix incremental stats with complex types.
- IMPALA-4916 - Fix maintenance of set of item sets in DisjointSet.
- IMPALA-4929 - Safe concurrent access to IR function call graph
- IMPALA-4933 - IMPALA-4931: Simplify SSL initialization on startup
- IMPALA-4934 - Disable Kudu OpenSSL initialization
- IMPALA-4981 - Re-enable spilling with MT_DOP.
- IMPALA-4995 - Fix integer overflow in TopNNode::PrepareForOutput
- IMPALA-4997 - Fix overflows in Sorter::TupleIterator
- IMPALA-5005 - Don't allow server to send SASL COMPLETE msg out of order
- IMPALA-5025 - Update binutils to 2.26.1
- IMPALA-5027 - make udf headers buildable externally
Issues Fixed in Impala for CDH 5.10.2
For the full list of fixed issues for all CDH components in CDH 5.10.2, see Issues Fixed in CDH 5.10.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-3641 - Fix catalogd RPC responses to DROP IF EXISTS.
- IMPALA-4088 - Assign fix values to the minicluster server ports
- IMPALA-4293 - query profile should include error log
- IMPALA-4544 - ASAN should ignore SEGV and leaks
- IMPALA-4546 - Fix Moscow timezone conversion after 2014
- IMPALA-4615 - Fix create_table.sql command order
- IMPALA-4631 - avoid DCHECK in PlanFragementExecutor::Close().
- IMPALA-4716 - Expr rewrite causes IllegalStateException
- IMPALA-4722 - Disable log caching in test_scratch_disk
- IMPALA-4725 - Query option to control Parquet array resolution.
- IMPALA-4733 - Change HBase ports to non-ephemeral
- IMPALA-4738 - STDDEV_SAMP should return NULL for single record input
- IMPALA-4787 - Optimize APPX_MEDIAN() memory usage
- IMPALA-4822 - Implement dynamic log level changes
- IMPALA-4899 - Fix parquet table writer dictionary leak
- IMPALA-4902 - Copy parameters map in HdfsPartition.toThrift().
- IMPALA-4920 - custom cluster tests: fix generation of py.test options
- IMPALA-4998 - Fix missing table lock acquisition.
- IMPALA-5021 - Fix count(*) remaining rows overflow in Parquet.
- IMPALA-5028 - Lock table in /catalog_objects endpoint.
- IMPALA-5055 - Fix DCHECK in parquet-column-readers.cc ReadPageHeader()
- IMPALA-5088 - Fix heap buffer overflow
- IMPALA-5115 - Handle status from HdfsTableSink::WriteClusteredRowBatch
- IMPALA-5145 - Do not constant fold null in CastExprs
- IMPALA-5154 - Handle 'unpartitioned' Kudu tables
- IMPALA-5156 - Drop VLOG level passed into Kudu client
- IMPALA-5172 - Buffer overrun for Snappy decompression
- IMPALA-5183 - increase write wait timeout in BufferedBlockMgrTest
- IMPALA-5186 - Handle failed CreateAndOpenScanner() in MT scan.
- IMPALA-5189 - Pin version of setuptools-scm
- IMPALA-5193 - Initialize decompressor before finding first tuple
- IMPALA-5197 - Erroneous corrupted Parquet file message
- IMPALA-5198 - Error messages are sometimes dropped before reaching client
- IMPALA-5208 - Bump toolchain to include fixes forand IMPALA-5187
- IMPALA-5217 - KuduTableSink checks null constraints incorrectly
- IMPALA-5244 - test_hdfs_file_open_fail fails on local filesystem build
- IMPALA-5252 - Fix crash in HiveUdfCall::GetStringVal() when mem_limit exceeded
- IMPALA-5253 - Use appropriate transport for StatestoreSubscriber
- IMPALA-5287 - Test skip.header.line.count on gzip
- IMPALA-5297 - Reduce free-pool-test mem requirement to avoid OOM
- IMPALA-5301 - Set Kudu minicluster memory limit
- IMPALA-5318 - Generate access events with fully qualified table names
- IMPALA-5322 - Fix a potential crash in Frontend & Catalog JNI startup
- IMPALA-5487 - Race in runtime-profile.cc::toThrift() can lead to corrupt profiles being generated while query is running
- IMPALA-5172 - fix incorrect cast in call to LZO decompress
Issues Fixed in Impala for CDH 5.10.1
For the full list of fixed issues for all CDH components in CDH 5.10.1, see Issues Fixed in CDH 5.10.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-2605 - Omit the sort and mini stress tests.
- IMPALA-4055 - Speed up to_date() with custom implementation.
- IMPALA-4263 - Fix wrong ommission of agg/analytic hash exchanges.
- IMPALA-4282 - Remove max length check for type strings.
- IMPALA-4449 - Revisit table locking pattern in the catalog.
- IMPALA-4675 - Case-insensitive matching of Parquet fields.
- IMPALA-4702 - Fix command line help for webserver_private_key_file.
- IMPALA-4705, IMPALA-4779, IMPALA-4780 - Fix some Expr bugs with codegen.
- IMPALA-4742 - Change "{}".format() to "{0}".format() for Py 2.6.
- IMPALA-4749 - hit DCHECK in sorter with scratch limit.
- IMPALA-4767 - Workaround for HIVE-15653 to preserve table stats.
- IMPALA-4808 - old hash join can reference invalid memory.
- IMPALA-4828 - Alter Kudu schema outside Impala may crash on read.
- IMPALA-4854 - Fix incremental stats with complex types.
- IMPALA-4916 - Fix maintenance of set of item sets in DisjointSet.
- IMPALA-4981 - Re-enable spilling with MT_DOP.
- IMPALA-4995 - Fix integer overflow in TopNNode::PrepareForOutput.
- IMPALA-4997 - Fix overflows in Sorter::TupleIterator.
Issues Fixed in Impala for CDH 5.10.0
For the full list of Impala fixed issues in CDH 5.10 / Impala 2.8, see this report in the Impala JIRA tracker.
For the full list of fixed issues for all CDH components in CDH 5.10.0, see Issues Fixed in CDH 5.10.x. The following list represents the subset of fixed Impala JIRAs from the CDH fixed issues.
- IMPALA-1169 - Admission control info on the queries debug webpage
- IMPALA-1286 - Extract common conjuncts from disjunctions.
- IMPALA-1430 - IMPALA-4108: codegen all builtin aggregate functions
- IMPALA-1616 - Improve the Memory Limit Exceeded error report
- IMPALA-1654 - General partition exprs in DDL operations.
- IMPALA-1702 - Enforce single-table consistency in query analysis
- IMPALA-1788 - Fold constant expressions
- IMPALA-2013 - Reintroduce steps for checking HBase health in run-hbase.sh
- IMPALA-2057 - Better error message for incorrect avro decimal column declaration
- IMPALA-2521 - Add clustered hint to insert statements
- IMPALA-2523 - Make HdfsTableSink aware of clustered input
- IMPALA-2789 - More compact mem layout with null bits at the end.
- IMPALA-2864 - Ensure that client connections are closed after a failed Open()
- IMPALA-2890 - Support ALTER TABLE statements for Kudu tables
- IMPALA-2905 - Move QueryResultSet implementations into separate module
- IMPALA-2905 - Handle coordinator fragment lifecycle like all others
- IMPALA-2916 - Add warning to query profile if debug build
- IMPALA-2925 - Mark test_alloc_update as xfail.
- IMPALA-3002 - IMPALA-1473: Cardinality observability cleanup
- IMPALA-3125 - Fix assignment of equality predicates from an outer-join On-clause
- IMPALA-3126 - Conservative assignment of inner-join On-clause predicates
- IMPALA-3167 - Fix assignment of WHERE conjunct through grouping agg + OJ
- IMPALA-3200 - move bufferpool under runtime
- IMPALA-3201 - in-memory buffer pool implementation
- IMPALA-3201 - reservation implementation for new buffer pool
- IMPALA-3202 - refactor scratch file management into TmpFileMgr
- IMPALA-3202 - DiskIoMgr improvements for new buffer pool
- IMPALA-3211 - provide toolchain build id for bootstrapping
- IMPALA-3221 - Copyright / license audit
- IMPALA-3229 - Don't assume that AUX exists just because of shell env
- IMPALA-3308 - Get expr-test passing on PPC64LE
- IMPALA-3314 - Fix Avro schema loading for partitioned tables
- IMPALA-3342 - Add thread counters to monitor plan fragment execution
- IMPALA-3346 - DeepCopy() Kudu rows into Impala tuples
- IMPALA-3348 - Avoid per-slot check vector size in KuduScanner
- IMPALA-3398 - Add docs to main Impala branch
- IMPALA-3420 - use gold by default
- IMPALA-3481 - Use Kudu ScanToken API for scan ranges
- IMPALA-3491 - Use unique db in test_scanners.py and test_aggregation.py
- IMPALA-3491 - Use unique database fixture in test_partitioning.py
- IMPALA-3491 - Use unique database fixture in test_insert_parquet.py
- IMPALA-3491 - Use unique database fixture in test_nested_types.py
- IMPALA-3491 - Use unique database fixture in test_ddl.py
- IMPALA-3552 - Make incremental stats max serialized size configurable
- IMPALA-3567 - Part 2, IMPALA-3899: factor out PHJ builder
- IMPALA-3567 - move ExecOption profile helpers to RuntimeProfile
- IMPALA-3586 - Clean up union-node.h/cc to enable improvements.
- IMPALA-3644 - Make predicate order deterministic
- IMPALA-3671 - Add query option to limit scratch space usage
- IMPALA-3676 - Use clang as a static analysis tool
- IMPALA-3710 - Kudu DML should ignore conflicts, pt2
- IMPALA-3710 - Kudu DML should ignore conflicts by default
- IMPALA-3713 - ,IMPALA-4439: Fix Kudu DML shell reporting
- IMPALA-3718 - Add test_cancellation tests for Kudu
- IMPALA-3718 - Support subset of functional-query for Kudu
- IMPALA-3719 - Simplify CREATE TABLE statements with Kudu tables
- IMPALA-3724 - Support Kudu non-covering range partitions
- IMPALA-3725 - Support Kudu UPSERT in Impala
- IMPALA-3726 - Add support for Kudu-specific column options
- IMPALA-3739 - Enable stress tests on Kudu
- IMPALA-3771 - Expose kudu client timeout and set default
- IMPALA-3786 - Replace "cloudera" with "apache"
- IMPALA-3786 - Replace "cloudera" with "apache"
- IMPALA-3788 - Fix Kudu ReadMode flag checking
- IMPALA-3788 - Add flag for Kudu read-your-writes
- IMPALA-3788 - Support for Kudu 'read-your-writes' consistency
- IMPALA-3808 - Add incubating DISCLAIMER from the Incubator Branding Guide
- IMPALA-3809 - Show Kudu-specific column metadata in DESCRIBE.
- IMPALA-3812 - Fix error message for unsupported types
- IMPALA-3815 - clean up cross-compiled comparator
- IMPALA-3823 - Add timer to measure Parquet footer reads
- IMPALA-3838 - IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates
- IMPALA-3853 - More RAT cleaning.
- IMPALA-3853 - squeasel is MIT (and dual copyright) not Apache
- IMPALA-3872 - allow providing PyPi mirror for python packages
- IMPALA-3875 - Thrift threaded server hang in some cases
- IMPALA-3884 - Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
- IMPALA-3902 - Scheduler improvements for running multiple fragment instances on a single backend
- IMPALA-3905 - Add single-threaded scan node.
- IMPALA-3912 - test_random_rpc_timeout is flaky.
- IMPALA-3918 - Fix straggler Cloudera -> ASF license headers
- IMPALA-3920 - TotalStorageWaitTime counter not populated for fragments with Kudu scan node
- IMPALA-3943 - Address post-merge comments.
- IMPALA-3971 - IMPALA-3229: Bootstrap an Impala dev environment
- IMPALA-3973 - add position and occurrence to instr()
- IMPALA-3980 - qgen: re-enable Hive as a target database
- IMPALA-3983 - /IMPALA-3974: Delete function jar resources after load
- IMPALA-4000 - Restricted Sentry authorization for Kudu Tables
- IMPALA-4006 - Fix typo in buildall.sh introduced in
- IMPALA-4006 - dangerous rm -rf statements in scripts
- IMPALA-4008 - Don't bake ExprContext pointers into IR code
- IMPALA-4008 - don't bake in hash table and hash join pointers
- IMPALA-4011 - Remove / reword messages when statestore messages are late
- IMPALA-4020 - Handle external conflicting changes to HMS gracefully
- IMPALA-4023 - don't attach buffered tuple streams to batches
- IMPALA-4026 - Implement double-buffering for BlockingQueue
- IMPALA-4028 - Improve message for improper Sentry config to make extra spaces visible
- IMPALA-4037 - IMPALA-4038: fix locking during query cancellation
- IMPALA-4042 - Preserve root types when substituting grouping exprs
- IMPALA-4047 - Remove occurrences of 'CDH'/'cdh' from repo
- IMPALA-4048 - Misc. improvements to /sessions
- IMPALA-4054 - Remove serial test workarounds for IMPALA-2479.
- IMPALA-4056 - Fix toSql() of DistributeParam
- IMPALA-4058 - benchmark byteswap on misaligned memory
- IMPALA-4074 - Configuration items duplicate in template of YARN
- IMPALA-4080 - IMPALA-3638: Introduce ExecNode::Codegen()
- IMPALA-4087 - TestFragmentLifecycle.test_failure_in_prepare
- IMPALA-4091 - Fix backend unit to log in logs/be_tests.
- IMPALA-4096 - Allow clean.sh to work from snapshots
- IMPALA-4097 - Crash in kudu-scan-node-test
- IMPALA-4098 - Open()/Close() partition exprs once per fragment instance.
- IMPALA-4100 - 4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter
- IMPALA-4101 - qgen: Hive join predicates should only contains equality functions
- IMPALA-4102 - Remote Kudu reads should be reported
- IMPALA-4104 - add DCHECK to ConsumeLocal() and fix tests
- IMPALA-4110 - IMPALA-3853: npm.js uses Artistic License 2
- IMPALA-4110 - Clean up issues found by Apache RAT
- IMPALA-4110 - Apache RAT script on Impala tarballs
- IMPALA-4111 - backend death tests should not produce minidumps
- IMPALA-4116 - Remove 'cdh' from version string again
- IMPALA-4116 - Remove 'cdh' from version string
- IMPALA-4117 - Factor simple scheduler test code into own files
- IMPALA-4118 - extract encryption utils from BufferedBlockMgr
- IMPALA-4122 - qgen: fix bitrotted cluster unit tests
- IMPALA-4123 - Fast bit unpacking
- IMPALA-4134 - IMPALA-3704: Kudu INSERT improvements
- IMPALA-4136 - testKudu planner test hangs if Kudu is not supported
- IMPALA-4138 - Fix AcquireState() for batches that change capacity
- IMPALA-4142 - qgen: Hive does not support CTEs inside sub-query blocks
- IMPALA-4155 - Update default partition when table is altered
- IMPALA-4160 - Remove some leftover Llama references
- IMPALA-4160 - Remove Llama support.
- IMPALA-4171 - Remove JAR from repo.
- IMPALA-4180 - Synchronize accesses to RuntimeState::reader_contexts_
- IMPALA-4187 - Switch RPC latency metrics to histograms
- IMPALA-4188 - Leopard: support external Docker volumes
- IMPALA-4193 - Warn when benchmarks run with sub-optimal CPU settings
- IMPALA-4194 - Bump version to 2.8.0
- IMPALA-4199 - Add 'SNAPSHOT' to Impala version
- IMPALA-4204 - Remove KuduScanNodeTest
- IMPALA-4205 - fix tmp-file-mgr-test under ASAN
- IMPALA-4206 - Add column lineage regression test.
- IMPALA-4207 - test infra: move Hive options from connection to cluster options
- IMPALA-4213 - Fix Kudu predicates that need constant folding
- IMPALA-4230 - ASF policy issues from 2.7.0 rc3.
- IMPALA-4231 - fix codegen time regression
- IMPALA-4232 - qgen: Hive does not support aggregates inside specific analytic clauses
- IMPALA-4234 - Remove astyle config file, looks outdated.
- IMPALA-4239 - fix buffer pool test failures in release build
- IMPALA-4240 - qgen: Add "ParseException line missing )" to Known Errors for Hive
- IMPALA-4241 - remove spurious child queries event
- IMPALA-4253 - impala-server.backends.client-cache.total-clients shows negative value
- IMPALA-4258 - Remove duplicated and unused test macros
- IMPALA-4259 - build Impala without any test cluster setup
- IMPALA-4260 - Alter table add column drops all the column stats
- IMPALA-4266 - Java udf returning string can give incorrect results
- IMPALA-4269 - Codegen merging exchange node
- IMPALA-4270 - Gracefully fail unsupported queries with mt_dop > 0.
- IMPALA-4274 - hang in buffered-block-mgr-test
- IMPALA-4277 - remove unneeded LegacyTCLIService
- IMPALA-4277 - remove references for unsupported s3/s3n connectors
- IMPALA-4277 - allow overriding of Hive/Hadoop versions/locations
- IMPALA-4278 - Don't abort Catalog startup quickly if HMS is not present
- IMPALA-4283 - Ensure Kudu-specific lineage and audit behavior
- IMPALA-4285 - /IMPALA-4286: Fixes for Parquet scanner with MT_DOP > 0.
- IMPALA-4287 - EE tests fail to run when KUDU_IS_SUPPORTED=false
- IMPALA-4289 - Mark agg slots of NDV() functions as non-nullable
- IMPALA-4291 - Reduce LLVM module's preparation time
- IMPALA-4294 - Make check-schema-diff.sh executable from anywhere.
- IMPALA-4295 - XFAIL wildcard SSL test
- IMPALA-4299 - add buildall.sh option to start test cluster
- IMPALA-4300 - Speed up BloomFilter::Or with SIMD
- IMPALA-4302 - ,IMPALA-2379: constant expr arg fixes
- IMPALA-4303 - Do not reset() qualifier of union operands.
- IMPALA-4309 - Introduce Expr rewrite phase and supporting classes
- IMPALA-4310 - Make push_to_asf.py respect --apache_remote
- IMPALA-4314 - Standardize on MT-related data structures
- IMPALA-4325 - StmtRewrite lost parentheses of CompoundPredicate
- IMPALA-4330 - Fix JSON syntax in generate_metrics.py
- IMPALA-4335 - Don't send 0-row batches to clients
- IMPALA-4338 - test infra data migrator: include tables' primary keys in PostgreSQL
- IMPALA-4339 - ensure coredumps end up in IMPALA_HOME
- IMPALA-4340 - explain how to install postgresql-9.5 or higher
- IMPALA-4343 - IMPALA-4354: qgen: model INSERTs; write INSERTs from query model
- IMPALA-4348 - / IMPALA-4333: Improve coordinator fragment cancellation
- IMPALA-4350 - Crash with vlog level 2 in hash join node
- IMPALA-4352 - test infra: store Impala/Kudu primary keys in object model
- IMPALA-4357 - Fix DROP TABLE to pass analysis if the table fails to load
- IMPALA-4362 - Misc. fixes for PFE counters
- IMPALA-4363 - Add Parquet timestamp validation
- IMPALA-4365 - Enabling end-to-end tests on a remote cluster
- IMPALA-4369 - Avoid DCHECK in Parquet scanner with MT_DOP > 0
- IMPALA-4371 - Incorrect DCHECK-s in hdfs-parquet-table-writer
- IMPALA-4372 - 'Describe formatted' returns types in upper case
- IMPALA-4374 - Use new syntax for creating TPC-DS/H tables in Kudu stress test
- IMPALA-4377 - Fix Java UDF-arg buffer use-after-free in UdfExecutorTest.
- IMPALA-4379 - Fix and test Kudu table type checking, follow up
- IMPALA-4379 - Fix and test Kudu table type checking
- IMPALA-4380 - Remove 'cloudera' from hostnames in bin/generate_minidump_collection_testdata.py
- IMPALA-4381 - Incorrect AVX version of BloomFilter::Or
- IMPALA-4383 - Ensure plan fragment report thread is always started
- IMPALA-4384 - NPE when cols list has trailing comma
- IMPALA-4388 - Fix query option reset in tests
- IMPALA-4391 - fix dropped statuses in scanners
- IMPALA-4392 - restore PeakMemoryUsage to DataSink profiles
- IMPALA-4397 - addendum: remove stray semicolon
- IMPALA-4397 - IMPALA-3259: reduce codegen time and memory
- IMPALA-4403 - Implement SHOW RANGE PARTITIONS for Kudu tables
- IMPALA-4406 - Add cryptography export control notice
- IMPALA-4408 - Omit null bytes for Kudu scans with no nullable slots.
- IMPALA-4409 - respect lock order in QueryExecState::CancelInternal()
- IMPALA-4410 - Safer tear-down of RuntimeState
- IMPALA-4411 - Kudu inserts violate lock ordering and could deadlock
- IMPALA-4412 - Per operator timing in profile summary is incorrect when mt_dop > 0
- IMPALA-4415 - Fix unassigned scan range of size 1
- IMPALA-4421 - Send custom cluster & process failure test results to logs/
- IMPALA-4427 - leopard: make DOCKER_IMAGE_NAME required
- IMPALA-4432 - Handle internal codegen disabling properly
- IMPALA-4433 - Always generate testdata using the same time zone setting
- IMPALA-4434 - In Python, ''.split('\n') is [''], which has length 1
- IMPALA-4435 - Fix in-predicate-benchmark linking by moving templates
- IMPALA-4436 - StringValue::StringCompare() should match strncmp()
- IMPALA-4437 - fix crash in disk-io-mgr
- IMPALA-4437 - hit DCHECK in buffered-block-mgr-test
- IMPALA-4438 - Serialize test_failpoints.py to reduce memory pressure
- IMPALA-4440 - lineage timestamps can go backwards across daylight savings transitions
- IMPALA-4441 - Divide-by-zero in RuntimeProfile::SummaryStatsCounter::SetStats
- IMPALA-4442 - Fix FE ParserTests UnsatisfiedLinkError
- IMPALA-4444 - Transfer row group resources to row batch on scan failure
- IMPALA-4446 - expr-test fails under ASAN
- IMPALA-4447 - Rein in overly broad sed that dirties the tree
- IMPALA-4450 - qgen: use string concatenation operator for postgres queries
- IMPALA-4452 - Always call AggFnEvaluator::Open() before AggFnEvaluator::Init()
- IMPALA-4454 - test_kudu.TestShowCreateTable flaky
- IMPALA-4455 - MemPoolTest.TryAllocateAligned failure: sizeof v. alignof
- IMPALA-4458 - Fix resource cleanup of cancelled mt scan nodes.
- IMPALA-4461 - Make sure data gets loaded for wide hbase tables.
- IMPALA-4465 - Don't hold process wide lock while serializing Runtime Profile in GetRuntimeProfileStr()
- IMPALA-4466 - Improve Kudu CRUD test coverage
- IMPALA-4470 - Avoid creating a NumericLiteral from NaN/infinity/-0
- IMPALA-4476 - Use unique_database to stop races in test_udfs.py
- IMPALA-4477 - Bump Kudu version to latest master
- IMPALA-4477 - Upgrade Kudu version to latest master
- IMPALA-4477 - Upgrade Kudu version to latest master
- IMPALA-4477 - Upgrade Kudu version to latest master
- IMPALA-4478 - Initial Kudu client mem tracking for sink
- IMPALA-4479 - Use correct isSet() thrift function when evaluating constant bool exprs
- IMPALA-4480 - zero_length_region_ must be as aligned as max_align_t
- IMPALA-4488 - HS2 GetOperationStatus() should keep session alive
- IMPALA-4490 - Only generate runtime filters for hash join nodes.
- IMPALA-4493 - fix string-compare-test when using clang
- IMPALA-4494 - Fix crash in SimpleScheduler
- IMPALA-4497 - Fix Kudu client crash w/ SASL initialization
- IMPALA-4498 - crash in to_utc_timestamp/from_utc_timestamp
- IMPALA-4502 - test_partition_ddl_predicates breaks on non-HDFS filesystems
- IMPALA-4504 - fix races in PlanFragmentExecutor regarding status reporting
- IMPALA-4509 - Initialise Sasl-specific mutex
- IMPALA-4510 - Selectively filter args for metric verification tests
- IMPALA-4511 - Add missing total_time_counter() to PFE::Exec()
- IMPALA-4512 - Add a script that builds Impala on stock Ubuntu 14.04
- IMPALA-4514 - Fix broken exhaustive builds caused by non-nullable columns
- IMPALA-4516 - Don't hold process wide lock connection_to_sessions_map_lock_ while cancelling queries
- IMPALA-4518 - CopyStringVal() doesn't copy null string
- IMPALA-4519 - increase timeout in TestFragmentLifecycle
- IMPALA-4522 - Bound Kudu client threads to avoid stress crash
- IMPALA-4523 - Correct max VARCHAR size to 65535 (2^16 - 1).
- IMPALA-4525 - follow-on: cleanup error handling
- IMPALA-4525 - fix crash when codegen mem limit exceeded
- IMPALA-4527 - Columns in Kudu tables created from Impala default to "NULL"
- IMPALA-4529 - speed up parsing of identifiers
- IMPALA-4532 - Fix use-after-free in ProcessBuildInputAsync()
- IMPALA-4535 - Remove 'auto' from parameter list
- IMPALA-4539 - fix bug when scratch batch references I/O buffers
- IMPALA-4540 - Function call in DCHECK crashes scheduler
- IMPALA-4541 - fix test dimensions for test_codegen_mem_limit
- IMPALA-4542 - Fix use-after-free in some BE tests
- IMPALA-4550 - Fix CastExpr analysis for substituted slots
- IMPALA-4553 - ntpd must be synchronized for kudu to start.
- IMPALA-4554 - fix projection of nested collections with mt_dop > 0
- IMPALA-4557 - Fix flakiness with FLAGS_stress_free_pool_alloc
- IMPALA-4561 - Replace DISTRIBUTE BY with PARTITION BY in CREATE TABLE
- IMPALA-4562 - Fix for crash on kerberized clusters w/o Kudu support
- IMPALA-4564 - ,IMPALA-4565: mt_dop fixes for old aggs and joins
- IMPALA-4566 - Kudu client glog contention can cause timeouts
- IMPALA-4567 - Fix test_kudu_alter_table exhaustive failures
- IMPALA-4570 - shell tarball breaks with certain setuptools versions
- IMPALA-4571 - Push IN predicates to Kudu
- IMPALA-4572 - Run COMPUTE STATS on Parquet tables with MT_DOP=4
- IMPALA-4574 - Do not treat UUID() like a constant expr
- IMPALA-4577 - Adjust maximum size of row batch queue with MT_DOP
- IMPALA-4578 - Pick up bound predicates for Kudu scan nodes.
- IMPALA-4579 - SHOW CREATE VIEW fails for view containing a subquery
- IMPALA-4580 - Fix crash with FETCH_FIRST when #rows < result cache size
- IMPALA-4584 - Make alter table operations on Kudu tables synchronous
- IMPALA-4585 - Allow the $DATABASE template in the CATCH section
- IMPALA-4586 - don't constant fold in backend
- IMPALA-4592 - Improve error msg for non-deterministic predicates
- IMPALA-4594 - WriteSlot and CodegenWriteSlot handle escaped NULL slots differently
- IMPALA-4595 - Ignore discarded functions after linking
- IMPALA-4608 - Fix fragment completion times for INSERTs
- IMPALA-4609 - prefix thread counters in fragment profile
- IMPALA-4613 - Make sure timers are finished before sending report profile
- IMPALA-4614 - Set eval cost of timestamp literals.
- IMPALA-4619 - Allow NULL as default value in Kudu tables
- IMPALA-4628 - Disable broken kudu test to unblock GVOs
- IMPALA-4630 - remove debug webpage easter egg
- IMPALA-4633 - Change broken gflag default for Kudu client mem
- IMPALA-4636 - Correct Suse Linux distro string
- IMPALA-4636 - Add support for SLES12 for Kudu integration
- IMPALA-4638 - Run queries with MT_DOP through admission control
- IMPALA-4642 - Fix TestFragmentLifecycle failures; kudu test must wait
- IMPALA-4654 - KuduScanner must return when ReachedLimit()
- IMPALA-4659 - fuzz test fixes
- IMPALA-4739 - ExprRewriter fails on HAVING clauses
- IMPALA-4765 - Avoid using several loading threads on one table
- IMPALA-4768 - Improve logging of table loading
- IMPALA-3905 - Add single-threaded scan node
- IMPALA-4262 - LZO-scanner fails when reading large index files from S3
- IMPALA-4277 - Merge "build against hadoop components in different location" into cdh5-trunk
- IMPALA-4277 - build against hadoop components in different location
- IMPALA-4322 - test_scanners_fuzz.py hits a DCHECK
- IMPALA-4391 - fix dropped status in scanners
Issues Fixed in Impala for CDH 5.9.3
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-3794 - Workaround for Breakpad ID conflicts
- IMPALA-4293 - query profile should include error log
- IMPALA-4383 - Ensure plan fragment report thread is always started
- IMPALA-4409 - respect lock order in QueryExecState::CancelInternal()
- IMPALA-4615 - Fix create_table.sql command order
- IMPALA-4787 - Optimize APPX_MEDIAN() memory usage
- IMPALA-5088 - Fix heap buffer overflow
- IMPALA-5193 - Initialize decompressor before finding first tuple
- IMPALA-5197 - Erroneous corrupted Parquet file message
- IMPALA-5252 - Fix crash in HiveUdfCall::GetStringVal() when mem_limit exceeded
- IMPALA-5253 - Use appropriate transport for StatestoreSubscriber
- IMPALA-5355 - Fix the order of Sentry roles and privileges
- IMPALA-5469 - Fix exception when processing catalog update
Issues Fixed in Impala for CDH 5.9.2
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-1702 - Enforce single-table consistency in query analysis.
- IMPALA-2864 - Ensure that client connections are closed after a failed Open()
- IMPALA-3167 - Fix assignment of WHERE conjunct through grouping agg + OJ.
- IMPALA-3314 - Fix Avro schema loading for partitioned tables.
- IMPALA-3552 - Make incremental stats max serialized size configurable
- IMPALA-3875 - Thrift threaded server hang in some cases
- IMPALA-3884 - Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
- IMPALA-3983 - IMPALA-3974: Delete function jar resources after load
- IMPALA-4020 - Handle external conflicting changes to HMS gracefully
- IMPALA-4037 - IMPALA-4038: fix locking during query cancellation
- IMPALA-4180 - Synchronize accesses to RuntimeState::reader_contexts_
- IMPALA-4260 - Alter table add column drops all the column stats
- IMPALA-4263 - Fix wrong ommission of agg/analytic hash exchanges.
- IMPALA-4266 - Java udf returning string can give incorrect results
- IMPALA-4282 - Remove max length check for type strings.
- IMPALA-4291 - Reduce LLVM module's preparation time
- IMPALA-4363 - IMPALA-4585: Add Parquet timestamp validation
- IMPALA-4391 - fix dropped statuses in scanners
- IMPALA-4433 - Always generate testdata using the same time zone setting
- IMPALA-4449 - Revisit table locking pattern in the catalog
- IMPALA-4488 - HS2 GetOperationStatus() should keep session alive
- IMPALA-4494 - IMPALA-4540: Fix crash in SimpleScheduler
- IMPALA-4516 - Don't hold process wide lock connection_to_sessions_map_lock_ while cancelling queries
- IMPALA-4518 - CopyStringVal() doesn't copy null string
- IMPALA-4539 - fix bug when scratch batch references I/O buffers
- IMPALA-4550 - Fix CastExpr analysis for substituted slots
- IMPALA-4579 - SHOW CREATE VIEW fails for view containing a subquery
- IMPALA-4705 - IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen
- IMPALA-4765 - Avoid using several loading threads on one table.
- IMPALA-4767 - Workaround for HIVE-15653 to preserve table stats.
- IMPALA-4916 - Fix maintenance of set of item sets in DisjointSet.
- IMPALA-4929 - Safe concurrent access to IR function call graph
- IMPALA-4995 - Fix integer overflow in TopNNode::PrepareForOutput
- IMPALA-4997 - Fix overflows in Sorter::TupleIterator
- IMPALA-5005 - Don't allow server to send SASL COMPLETE msg out of order
- IMPALA-4391 - fix dropped status in scanners
Issues Fixed in Impala for CDH 5.9.1
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-3949 - Log error message in FileSystemUtil.copyToLocal()
- IMPALA-4076 - Fix runtime filter sort compare method
- IMPALA-4099 - Fix the error message while loading UDFs with no JARs
- IMPALA-4120 - Incorrect results with LEAD() analytic function
- IMPALA-4135 - Thrift threaded server times out connections during high load
- IMPALA-4153 - Fix count(*) on all blank('') columns - test
- IMPALA-4170 - Fix identifier quoting in COMPUTE INCREMENTAL STATS
- IMPALA-4196 - Cross compile bit-byte-functions
- IMPALA-4223 - Handle truncated file read from HDFS cache
- IMPALA-4237 - Fix materialization of 4-byte decimals in data source scan node
- IMPALA-4246 - SleepForMs() utility function has undefined behavior for > 1s
- IMPALA-4301 - Fix IGNORE NULLS with subquery rewriting
- IMPALA-4336 - Cast exprs after unnesting union operands
- IMPALA-4387 - Validate decimal type in Avro file schema
- IMPALA-4751 - For unknown query IDs, /query_profile_encoded?query_id=123 starts with an empty line.
- IMPALA-4423 - Correct but conservative implementation of Subquery.equals().
Issues Fixed in Impala for CDH 5.9.0
For the full list of Impala fixed issues in CDH 5.9.0 / Impala 2.7.0, see this report in the Impala JIRA tracker.
For the full list of fixed issues for all CDH components in CDH 5.9.0, see Issues Fixed in CDH 5.9.x.
- IMPALA-1112 - Remove some unncessary code from cross-compilation
- IMPALA-1240 - add back spilling sort now that sorter is not flaky
- IMPALA-1440 - test for insert mem limit
- IMPALA-3018 - Address various small memory allocation related bugs
- IMPALA-1619 - Support 64-bit allocations
- IMPALA-1633 - GetOperationStatus should set errorMessage and sqlState
- IMPALA-1671 - Print time and link to coordinator web UI once query is submitted in shell
- IMPALA-1683 - Allow REFRESH on a single partition
- IMPALA-2347 - Reuse metastore client connections in Catalog
- IMPALA-2459 - Implement next_day date/time UDF
- IMPALA-2700 - ASCII NUL characters are doubled on insert into text tables
- IMPALA-2767 - Web UI call to force expire sessions
- IMPALA-2878 - Fix Base64Decode error and remove duplicate codes
- IMPALA-2885 - ScannerContext::Stream objects should be owned by ScannerContext
- IMPALA-2979 - Fix scheduling on remote hosts
- IMPALA-3018 - Don't return NULL on zero length allocations
- IMPALA-3063 - Separate join inversion from join ordering
- IMPALA-3084 - Cache the sequence of table ref and materialized tuple ids during analysis
- IMPALA-3181 - Add noexcept to some functions
- IMPALA-3201 - buffer pool header only
- IMPALA-3206 - Enable codegen for AVRO_DECIMAL
- IMPALA-3210 - last/first_value() support for IGNORE NULLS
- IMPALA-3223 - Remove boost multiprecision in thirdparty
- IMPALA-3225 - Add script to push from gerrit to ASF
- IMPALA-3227 - generate test TPC data sets during data load
- IMPALA-3253 - Modify gen_build_version.sh to always output the right version
- IMPALA-3336 - qgen: do not randomly generate query options
- IMPALA-3376 - Extra definition level when writing Parquet files
- IMPALA-3418 - The Impala FE project relies on Z-tools snapshot builds
- IMPALA-3442 - Replace '> >' with '>>' in template decls
- IMPALA-3449 - Kudu deploy.py should find clusters by displayName
- IMPALA-3454 - Kudu deletes may fail if subqueries are used
- IMPALA-3470 - DecompressorTest is flaky
- IMPALA-3491 - Merge test_hbase_metadata.py into compute_stats.py. Use unique db fixture
- IMPALA-3501 - ee tests: detect build type and support different timeouts based on the same
- IMPALA-3507 - update binutils version to fix slow linking
- IMPALA-3521 - Impalad should communicate with the statestore after binding to the hs2 and besswax ports
- IMPALA-3530 - Clean up test_ddl.py. Part 1
- IMPALA-3567 - Part 1: groundwork to make Join build sides DataSinks
- IMPALA-3575 - Add retry to backend connection request and rpc timeout
- IMPALA-3587 - Get rid of not_default_fs skip marker
- IMPALA-3600 - Add missing admission control tests
- IMPALA-3606 - Fix Java NPE when trying to add an existing partition
- IMPALA-3611 - track unused Disk IO buffer memory
- IMPALA-3627 - Clean up RPC structures in ImpalaInternalService
- IMPALA-3632 - Add script for runnig cppclean over the BE code
- IMPALA-3647 - track runtime filter memory in separate tracker
- IMPALA-3656 - Hitting DCHECK/CHECK does not write minidumps
- IMPALA-3664 - S3A test_keys_do_not_work fails
- IMPALA-3674 - Lazy materialization of LLVM module bitcode
- IMPALA-3677 - Write minidump on SIGUSR1
- IMPALA-3682 - Don't retry unrecoverable socket creation errors
- IMPALA-3687 - Prefer Avro field name during schema reconciliation
- IMPALA-3715 - Include total usage of JVM memory
- IMPALA-3715 - Include more info by default in Impala debug memz webpage
- IMPALA-3716 - Add Memory Tab in query's Details page
- IMPALA-3727 - Change microbenchmarks to use percentile-based reporting
- IMPALA-3729 - batch_size=1 coverage for avro scanner
- IMPALA-3734 - C++11 - Replace boost:shared_ptr with std:: equivalent
- IMPALA-3736 - Move Impala HTTP handlers to a separate class
- IMPALA-3737 - Local filesystem build failed loading custom schemas
- IMPALA-3751 - fix clang build errors and warnings
- IMPALA-3753 - Disable create table test for old aggs and joins
- IMPALA-3756 - Fix wrong argument type in HiveStringsTest
- IMPALA-3757 - Add missing lock in RuntimeProfile::ComputeTimeInProfile
- IMPALA-3762 - Download Python requirements before they are needed
- IMPALA-3763 - download_requirements fixes
- IMPALA-3764 - fuzz test HDFS scanners and fix parquet bugs found
- IMPALA-3767 - bootstrap_virtualenv fails to find cython distribution
- IMPALA-3774 - fix download_requirements for older Python versions
- IMPALA-3778 - Fix ASF packaging build
- IMPALA-3779 - Disable cache pool reader thread when HDFS isn't running
- IMPALA-3780 - avoid many small reads past end of block
- IMPALA-3786 - Remove "Cloudera" from impalad webpage title
- IMPALA-3790 - AC tests timeout in codecoverage builds
- IMPALA-3799 - Make MAX_SCAN_RANGE_LENGTH accept formatted quantities
- IMPALA-3806 - remove a few modern shell idioms to improve RHEL5 support
- IMPALA-3817 - Ensure filter hash function is the same on all hardware
- IMPALA-3839 - Fix race condition in impala_cluster.py
- IMPALA-3843 - Update warning for non-SSSE3 CPUs
- IMPALA-3845 - Split up hdfs-parquet-scanner.cc into more files/components
- IMPALA-3852 - Remove Derby and Shiro FE dependencies
- IMPALA-3854 - Fix use-after-free in HdfsTextScanner::Close()
- IMPALA-3856 - Fix BinaryPredicate normalization for Kudu
- IMPALA-3857 - KuduScanNode race on returning "optional" threads
- IMPALA-3864 - qgen: reduce likelihood of create_query() exceptions
- IMPALA-3866 - consistent user-facing terminology for scratch dirs
- IMPALA-3881 - Add DataTables 1.10.12 to www/
- IMPALA-3886 - Improve log of pip_download.py
- IMPALA-3892 - qgen: always run Impala with -convert_legacy_hive_parquet_utc_timestamps=true
- IMPALA-3905 - Add HdfsScanner::GetNext() interface and implementation for Parquet
- IMPALA-3906 - Materialize implicitly referenced IR functions
- IMPALA-3914 - SKIP_TOOLCHAIN_BOOTSTRAP skips Python package downloads
- IMPALA-3918 - Remove Cloudera copyrights and add ASF license header
- IMPALA-3923 - fix overflow in BufferedTupleStream::GetRows()
- IMPALA-3924 - Ubuntu16 support
- IMPALA-3936 - BufferedBlockMgr fixes for Pin() while write in flight
- IMPALA-3939 - Data loading may fail on tpch kudu
- IMPALA-3943 - Do not throw scan errors for empty Parquet files
- IMPALA-3946 - fix MemPool integrity issues with empty chunks
- IMPALA-3952 - Clear scratch batch mem pool if Open() failed
- IMPALA-3953 - Fixes for KuduScanNode BE test failure
- IMPALA-3954 - Add unique_database to scanner test
- IMPALA-3957 - Test failure in S3 build: TestLoadData.test_load
- IMPALA-3964 - Fix crash when a count(*) is performed on a nested collection
- IMPALA-3969 - stress test: add option to set common query options
- IMPALA-3972 - Improve display of /varz page
- IMPALA-3992 - bad shell error message when running nonexistent file
Issues Fixed in Impala for CDH 5.8.5
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-1346 - /1590/2344: fix sorter buffer mgmt when spilling
- IMPALA-1619, IMPALA-3018: Address various small memory allocation related bugs
- IMPALA-1619 - Support 64-bit allocations.
- IMPALA-1657 - Rework detection and reporting of corrupt table stats.
- IMPALA-2864 - Ensure that client connections are closed after a failed Open()
- IMPALA-3018 - Don't return NULL on zero length allocations.
- IMPALA-3159 - impala-shell does not accept wildcard or SAN certificates
- IMPALA-3167 - Fix assignment of WHERE conjunct through grouping agg + OJ.
- IMPALA-3314 - Fix Avro schema loading for partitioned tables.
- IMPALA-3344 - Simplify sorter and document/enforce invariants.
- IMPALA-3441,IMPALA-3659: check for malformed Avro data
- IMPALA-3499 - Split catalog update
- IMPALA-3552 - Make incremental stats max serialized size configurable
- IMPALA-3575 - Add retry to backend connection request and rpc timeout
- IMPALA-3628 - Fix cancellation from shell when security is enabled
- IMPALA-3633 - cancel fragment if coordinator is gone
- IMPALA-3646 - Handle corrupt RLE literal or repeat counts of 0.
- IMPALA-3670 - fix sorter buffer mgmt bugs
- IMPALA-3678 - Fix migration of predicates into union operands with an order by + limit.
- IMPALA-3680 - Cleanup the scan range state after failed hdfs cache reads
- IMPALA-3682 - Don't retry unrecoverable socket creation errors
- IMPALA-3687 - Prefer Avro field name during schema reconciliation
- IMPALA-3711 - Remove unnecessary privilege checks in getDbsMetadata()
- IMPALA-3732 - handle string length overflow in avro files
- IMPALA-3745 - parquet invalid data handling
- IMPALA-3751 - fix clang build errors and warnings
- IMPALA-3754 - fix TestParquet.test_corrupt_rle_counts flakiness
- IMPALA-3776 - fix 'describe formatted' for Avro tables
- IMPALA-3820 - Handle linkage errors while loading Java UDFs in Catalog
- IMPALA-3861 - Replace BetweenPredicates with their equivalent CompoundPredicate.
- IMPALA-3875 - Thrift threaded server hang in some cases
- IMPALA-3884 - Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
- IMPALA-3915 - Register privilege and audit requests when analyzing resolved table refs.
- IMPALA-3930, IMPALA-2570: Fix shuffle insert hint with constant partition exprs.
- IMPALA-3940 - Fix getting column stats through views.
- IMPALA-3949 - Log the error message in FileSystemUtil.copyToLocal()
- IMPALA-3964 - Fix crash when a count(*) is performed on a nested collection.
- IMPALA-3965 - TSSLSocketWithWildcardSAN.py not exported as part of impala-shell build lib
- IMPALA-3983, IMPALA-3974: Delete function jar resources after load
- IMPALA-4019 - initialize member variables in HdfsTableSink
- IMPALA-4020 - Handle external conflicting changes to HMS gracefully
- IMPALA-4037, IMPALA-4038: fix locking during query cancellation
- IMPALA-4049 - fix empty batch handling NLJ build side
- IMPALA-4076 - Fix runtime filter sort compare method
- IMPALA-4099 - Fix the error message while loading UDFs with no JARs
- IMPALA-4120 - Incorrect results with LEAD() analytic function
- IMPALA-4135 - Thrift threaded server times-out connections during high load
- IMPALA-4153 - Fix count(*) on all blank('') columns - test
- IMPALA-4170 - Fix identifier quoting in COMPUTE INCREMENTAL STATS.
- IMPALA-4180 - Synchronize accesses to RuntimeState::reader_contexts_
- IMPALA-4196 - Cross compile bit-byte-functions
- IMPALA-4223 - Handle truncated file read from HDFS cache
- IMPALA-4237 - Fix materialization of 4 byte decimals in data source scan node.
- IMPALA-4246 - SleepForMs() utility function has undefined behavior for > 1s
- IMPALA-4260 - Alter table add column drops all the column stats
- IMPALA-4263 - Fix wrong ommission of agg/analytic hash exchanges.
- IMPALA-4266 - Java udf returning string can give incorrect results
- IMPALA-4282 - Remove max length check for type strings.
- IMPALA-4293 - query profile should include error log
- IMPALA-4295 - XFAIL wildcard SSL test
- IMPALA-4336 - Cast exprs after unnesting union operands.
- IMPALA-4363 - Add Parquet timestamp validation
- IMPALA-4383 - Ensure plan fragment report thread is always started
- IMPALA-4391 - fix dropped statuses in scanners
- IMPALA-4423 - Correct but conservative implementation of Subquery.equals().
- IMPALA-4433 - Always generate testdata using the same time zone setting
- IMPALA-4449 - Revisit table locking pattern in the catalog This commit fixes an issue where multiple long-running operations on the same catalog object (e.g. table) can block other catalog operations from making progress.
- IMPALA-4488 - HS2 GetOperationStatus() should keep session alive
- IMPALA-4518 - CopyStringVal() doesn't copy null string
- IMPALA-4539 - fix bug when scratch batch references I/O buffers
- IMPALA-4550 - Fix CastExpr analysis for substituted slots
- IMPALA-4579 - SHOW CREATE VIEW fails for view containing a subquery
- IMPALA-4765 - Avoid using several loading threads on one table.
- IMPALA-4767 - Workaround for HIVE-15653 to preserve table stats.
- IMPALA-4779, IMPALA-4780: Fix conditional functions built-in and Timestamp bounds
- IMPALA-4787 - Optimize APPX_MEDIAN() memory usage
- IMPALA-4916 - Fix maintenance of set of item sets in DisjointSet.
- IMPALA-4995 - Fix integer overflow in TopNNode::PrepareForOutput
- IMPALA-4997 - Fix overflows in Sorter::TupleIterator
- IMPALA-5005 - Don't allow server to send SASL COMPLETE msg out of order
- IMPALA-5088 - Fix heap buffer overflow
- IMPALA-5253 - Use appropriate transport for StatestoreSubscriber
Issues Fixed in Impala for CDH 5.8.4
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-1702 - "invalidate metadata" can cause duplicate TableIds
- IMPALA-3167 - Fix assignment of WHERE clause predicate through grouping aggregate and outer join
- IMPALA-3314 - Fix Avro schema loading for partitioned tables
- IMPALA-3552 - Make incremental stats max serialized size configurable
- IMPALA-3575 - Add retry to backend connection request and rpc timeout
- IMPALA-3682 - Don't retry unrecoverable socket creation errors
- IMPALA-3875 - Thrift threaded server hang in some cases
- IMPALA-3884 - Support TYPE_TIMESTAMP for HashTableCtx::CodegenAssignNullValue()
- IMPALA-3949 - Log the error message in FileSystemUtil.copyToLocal()
- IMPALA-3964 - Fix crash when a count(*) is performed on a nested collection.
- IMPALA-3983 - Delete function jar resources after load
- IMPALA-4037 - ChildQuery::Cancel() appears to violate lock ordering
- IMPALA-4038 - Fix locking during query cancellation
- IMPALA-4076 - Fix runtime filter sort compare method
- IMPALA-4099 - Fix the error message while loading UDFs with no JARs
- IMPALA-4120 - Incorrect results with LEAD() analytic function
- IMPALA-4153 - Fix count(*) on all blank('') columns - test
- IMPALA-4223 - Handle truncated file read from HDFS cache
- IMPALA-4246 - SleepForMs() utility function has undefined behavior for > 1s
- IMPALA-4336 - Cast expressions after unnesting union operands
- IMPALA-4363 - Add Parquet timestamp validation
- IMPALA-4391 - Fix dropped statuses in scanners
- IMPALA-4423 - Correct but conservative implementation of Subquery.equals()
- IMPALA-4433 - Always generate test data using the same time zone setting
- IMPALA-4449 - Revisit table locking pattern in the catalog. Fixes an issue where multiple long-running operations on the same catalog object (for example, a table) can block other catalog operations from making progress
- IMPALA-4550 - Fix CastExpr analysis for substituted slots
- IMPALA-4391 - Fix dropped status in scanners
Issues Fixed in Impala for CDH 5.8.3
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-1619 - Support 64-bit allocations
- IMPALA-3687 - Prefer Avro field name during schema reconciliation
- IMPALA-3751 - Fix clang build errors and warnings
- IMPALA-4135 - Thrift threaded server times-out connections during high load
- IMPALA-4170 - Fix identifier quoting in COMPUTE INCREMENTAL STATS
- IMPALA-4180 - Synchronize accesses to RuntimeState::reader_contexts_
- IMPALA-4196 - Cross compile bit-byte-functions
- IMPALA-4237 - Fix materialization of 4-byte decimals in data source scan node
Issues Fixed in Impala for CDH 5.8.2
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-1346 - /1590/2344: fix sorter buffer mgmt when spilling
- IMPALA-3159 - impala-shell does not accept wildcard or SAN certificates
- IMPALA-3344 - Simplify sorter and document/enforce invariants.
- IMPALA-3441 - , IMPALA-3659: check for malformed Avro data
- IMPALA-3499 - Split catalog update.
- IMPALA-3628 - Fix cancellation from shell when security is enabled
- IMPALA-3633 - cancel fragment if coordinator is gone
- IMPALA-3646 - Handle corrupt RLE literal or repeat counts of 0.
- IMPALA-3670 - fix sorter buffer mgmt bugs
- IMPALA-3678 - Fix migration of predicates into union operands with an order by + limit.
- IMPALA-3680 - Cleanup the scan range state after failed hdfs cache reads
- IMPALA-3711 - Remove unnecessary privilege checks in getDbsMetadata().
- IMPALA-3732 - handle string length overflow in avro files
- IMPALA-3745 - parquet invalid data handling
- IMPALA-3754 - fix TestParquet.test_corrupt_rle_counts flakiness
- IMPALA-3772 - Fix admission control stress test.
- IMPALA-3776 - fix 'describe formatted' for Avro tables
- IMPALA-3820 - Handle linkage errors while loading Java UDFs in Catalog
- IMPALA-3861 - Replace BetweenPredicates with their equivalent CompoundPredicate.
- IMPALA-3915 - Register privilege and audit requests when analyzing resolved table refs.
- IMPALA-3930 - Fix shuffle insert hint with constant partition exprs.
- IMPALA-3940 - Fix getting column stats through views.
- IMPALA-3965 - TSSLSocketWithWildcardSAN.py not exported as part of impala-shell build lib
- IMPALA-4020 - Handle external conflicting changes to HMS gracefully
- IMPALA-4049 - fix empty batch handling NLJ build side
Issues Fixed in Impala for CDH 5.8.0
The following list contains the most critical fixed issues (priority='Blocker') from the JIRA system. For the full list of fixed issues in CDH 5.8.0 / Impala 2.6.0, see this report in the Impala JIRA tracker.
RuntimeState::error_log_ crashes
A crash could occur, with stack trace pointing to impala::RuntimeState::ErrorLog.
Bug: IMPALA-3385
Severity: High
HiveUdfCall::Open() produces unsynchronized access to JniUtil::global_refs_ vector
A crash could occur because of contention between multiple calls to Java UDFs.
Bug: IMPALA-3378
Severity: High
HBaseTableWriter::CreatePutList() produces unsynchronized access to JniUtil::global_refs_ vector
A crash could occur because of contention between multiple concurrent statements writing to HBase.
Bug: IMPALA-3379
Severity: High
Stress test failure: sorter.cc:745] Check failed: i == 0 (1 vs. 0)
A crash or wrong results could occur if the spill-to-disk mechanism encountered a zero-length string at the very end of a data block.
Bug: IMPALA-3317
Severity: High
String data coming out of agg can be corrupted by blocking operators
If a query plan contains an aggregation node producing string values anywhere within a subplan (that is,if in the SQL statement, the aggregate function appears within an inline view over a collection column), the results of the aggregation may be incorrect.
Bug: IMPALA-3311
Severity: High
CTAS with subquery throws AuthzException
A CREATE TABLE AS SELECT operation could fail with an authorization error, due to a slight difference in the privilege checking for the CTAS operation.
Bug: IMPALA-3269
Severity: High
Crash on inserting into table with binary and parquet
Impala incorrectly allowed BINARY to be specified as a column type, resulting in a crash during a write to a Parquet table with a column of that type.
Bug: IMPALA-3237
Severity: High
RowBatch::MaxTupleBufferSize() calculation incorrect, may lead to memory corruption
A crash could occur while querying tables with very large rows, for example wide tables with many columns or very large string values. This problem was identified in Impala 2.3, but had low reproducibility in subsequent releases. The fix ensures the memory allocation size is correct.
Bug: IMPALA-3105
Severity: High
Thrift buffer overflows when serialize more than 3355443200 bytes in impala
A very large memory allocation within the catalogd daemon could exceed an internal Thrift limit, causing a crash.
Bug: IMPALA-3494
Severity: High
Altering table partition's storage format is not working and crashing the daemon
If a partitioned table used a file format other than Avro, and the file format of an individual partition was changed to Avro, subsequent queries could encounter a crash.
Bug: IMPALA-3314
Severity: High
Race condition may cause scanners to spin with runtime filters on Avro or Sequence files
A timing problem during runtime filter processing could cause queries against Avro or SequenceFile tables to hang.
Bug: IMPALA-3798
Severity: High
Issues Fixed in Impala for CDH 5.7.6
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-2864 - Ensure that client connections are closed after a failed Open()
- IMPALA-3167 - Fix assignment of WHERE conjunct through grouping agg + OJ.
- IMPALA-3552 - Make incremental stats max serialized size configurable
- IMPALA-3698 - Fix Isilon permissions test
- IMPALA-3861 - Replace BetweenPredicates with their equivalent CompoundPredicate.
- IMPALA-3875 - Thrift threaded server hang in some cases
- IMPALA-3983 - /IMPALA-3974: Delete function jar resources after load
- IMPALA-4153 - Return valid non-NULL pointer for 0-byte allocations
- IMPALA-4223 - Handle truncated file read from HDFS cache
- IMPALA-4336 - Cast exprs after unnesting union operands.
- IMPALA-4363 - Add Parquet timestamp validation
- IMPALA-4423 - Correct but conservative implementation of Subquery.equals().
- IMPALA-4433 - Always generate testdata using the same time zone setting
- IMPALA-4449 - Revisit table locking pattern in the catalog
- IMPALA-4488 - HS2 GetOperationStatus() should keep session alive
- IMPALA-4550 - Fix CastExpr analysis for substituted slots
- IMPALA-4579 - SHOW CREATE VIEW fails for view containing a subquery
- IMPALA-4765 - Avoid using several loading threads on one table.
Issues Fixed in Impala for CDH 5.7.5
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-1619 - Support 64-bit allocations
- IMPALA-1740 - Add support for skip.header.line.count
- IMPALA-3458 - Fix table creation to test insert with header lines
- IMPALA-3949 - Log the error message in FileSystemUtil.copyToLocal()
- IMPALA-4037 - Fx locking during query cancellation
- IMPALA-4076 - Fix runtime filter sort compare method
- IMPALA-4099 - Fix the error message while loading UDFs with no JARs
- IMPALA-4120 - Incorrect results with LEAD() analytic function
- IMPALA-4135 - Thrift threaded server times-out connections during high load
- IMPALA-4170 - Fix identifier quoting in COMPUTE INCREMENTAL STATS
- IMPALA-4196 - Cross compile bit-byte functions
- IMPALA-4237 - Fix materialization of 4 byte decimals in data source scan node
- IMPALA-4246 - SleepForMs() utility function has undefined behavior for > 1s
Issues Fixed in Impala for CDH 5.7.4
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-3081 - Increase memory limit for TestWideRow
- IMPALA-3311 - Fix string data coming out of aggs in subplans
- IMPALA-3575 - Add retry to back end connection request and rpc timeout
- IMPALA-3678 - Fix migration of predicates into union operands with an order by + limit.
- IMPALA-3682 - Do not retry unrecoverable socket creation errors
- IMPALA-3687 - Fix test failure introduced by backporting
- IMPALA-3687 - Prefer Avro field name during schema reconciliation
- IMPALA-3820 - Handle linkage errors while loading Java UDFs in Catalog
- IMPALA-3930 - Fix shuffle insert hint with constant partition exprs
- IMPALA-3940 - Fix getting column stats through views
- IMPALA-4020 - Handle external conflicting changes to HMS gracefully
- IMPALA-4049 - Fix empty batch handling NLJ build side
Issues Fixed in Impala for CDH 5.7.2
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-1928 - Fix Thrift client transport wrapping order
- IMPALA-2660 - Respect auth_to_local configs from hdfs configs
- IMPALA-3276 - Consistently handle pin failure in BTS::PrepareForRead()
- IMPALA-3369 - Add ALTER TABLE SET COLUMN STATS statement.
- IMPALA-3441 - Impala should not crash for invalid avro serialized data
- IMPALA-3499 - Split catalog update
- IMPALA-3502 - Fix race in the coordinator while updating filter routing table
- IMPALA-3633 - Cancel fragment if coordinator is gone
- IMPALA-3732 - Handle string length overflow in Avro files
- IMPALA-3745 - Corrupt encoded values in parquet files can cause crashes
- IMPALA-3751 - Fix clang build errors and warnings
- IMPALA-3754 - Fix TestParquet.test_corrupt_rle_counts flakiness
Issues Fixed in Impala for CDH 5.7.1
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-2076 - Correct execution time tracking for DataStreamSender
- IMPALA-2502 - Don't redundantly repartition grouping aggregations
- IMPALA-2892 - Buffered-tuple-stream-ir.cc is not cross-compiled
- IMPALA-3133 - Wrong privileges after a REVOKE ALL ON SERVER statement
- IMPALA-3139 - Fix drop table statement to not drop views and vice versa
- IMPALA-3141 - Send dummy filters when filter production is disabled
- IMPALA-3194 - Allow queries materializing scalar type columns in RC/sequence files
- IMPALA-3220 - Skip logging empty ScannerContext's stream in parse error
- IMPALA-3236 - Increase timeout for runtime filter tests
- IMPALA-3238 - Avoid log spam for very large hash tables
- IMPALA-3245, IMPALA-3305: Fix crash with global filters when NUM_NODES=1
- IMPALA-3269 - Remove authz checks on default table location in CTAS queries
- IMPALA-3285 - Fix ASAN failure in webserver-test
- IMPALA-3317 - Fix crash in sorter when spilling zero-length strings
- IMPALA-3334 - Fix some bugs in query options' parsing.
- IMPALA-3367 - Ensure runtime filters tests run on 3 nodes
- IMPALA-3378, IMPALA-3379: fix various JNI issues
- IMPALA-3385 - Fix crashes on accessing error_log
- IMPALA-3395 - Old HT filter code uses wrong expr type
- IMPALA-3396 - Fix ConcurrentTimerCounter unit test "TimerCounterTest" failure.
- IMPALA-3412 - Fix CHAR codegen crash in tuple comparator
- IMPALA-3420 - Set IMPALA_THRIFT_VERSION patch level to +4
Issues Fixed in Impala for CDH 5.7.0
The following list contains the most critical issues (priority='Blocker') from the JIRA system. For the full list of fixed issues in CDH 5.7.0 / Impala 2.5.0, see this report in the Impala JIRA tracker.
Stress test hit assert in LLVM: external function could not be resolved
Bug: IMPALA-2683
The stress test was running a build with the TPC-H, TPC-DS, and TPC-H nested queries with scale factor 3.
Impalad is crashing if udf jar is not available in hdfs location for first time
Bug: IMPALA-2365
If a UDF JAR was not available in the HDFS location specified in the CREATE FUNCTION statement, the impalad daemon could crash.
PAGG hits mem_limit when switching to I/O buffers
Bug: IMPALA-2535
A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory. The cause was the internal ordering of operations that could cause a later phase of the query to allocate memory required by an earlier phase of the query. The workaround was to either increase or decrease the MEM_LIMIT query option, because the issue would only occur for a specific combination of memory limit and data volume.
Prevent migrating incorrectly inferred identity predicates into inline views
Bug: IMPALA-2643
Referring to the same column twice in a view definition could cause the view to omit rows where that column contained a NULL value. This could cause incorrect results due to an inaccurate COUNT(*) value or rows missing from the result set.
Fix migration/assignment of On-clause predicates inside inline views
Bug: IMPALA-1459
Some combinations of ON clauses in join queries could result in comparisons being applied at the wrong stage of query processing, leading to incorrect results. Wrong predicate assignment could happen under the following conditions:
- The query includes an inline view that contains an outer join.
- That inline view is joined with another table in the enclosing query block.
- That join has an ON clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view.
Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate
Bug: IMPALA-2093
IN subqueries might return wrong results if the left-hand side of the IN is a constant. For example:
select * from alltypestiny t1 where 10 not in (select sum(int_col) from alltypestiny);
Parquet DictDecoders accumulate throughout query
Bug: IMPALA-2940
Parquet dictionary decoders can accumulate throughout query execution, leading to excessive memory usage. One decoder is created per-column per-split.
Planner doesn't set the has_local_target field correctly
Bug: IMPALA-3056
MemPool allocation growth behavior
Bug: IMPALA-2742
Currently, the MemPool would always double the size of the last allocation. This can lead to bad behavior if the MemPool transferred the ownership of all its data except the last chunk. In the next allocation, the next allocated chunk would double the size of this large chunk, which can be undesirable.
Drop partition operations don't follow the catalog's locking protocol
Bug: IMPALA-3035
The CatalogOpExecutor.alterTableDropPartition() function violates the locking protocol used in the catalog that requires catalogLock_ to be acquired before any table-level lock. That may cause deadlocks when ALTER TABLE DROP PARTITION is executed concurrently with other DDL operations.
HAVING clause without aggregation not applied properly
Bug: IMPALA-2215
A query with a HAVING clause but no GROUP BY clause was not being rejected, despite being invalid syntax. For example:
select case when 1=1 then 'didit' end as c1 from (select 1 as one) a having 1!=1;
Hit DCHECK Check failed: HasDateOrTime()
Bug: IMPALA-2914
TimestampValue::ToTimestampVal() requires a valid TimestampValue as input. This requirement was not enforced in some places, leading to serious errors.
Aggregation spill loop gives up too early leading to mem limit exceeded errors
Bug: IMPALA-2986
An aggregation query could fail with an out-of-memory error, despite sufficient memory being reported as available.
DataStreamSender::Channel::CloseInternal() does not close the channel on an error.
Bug: IMPALA-2592
Some queries do not close an internal communication channel on an error. This will cause the node on the other side of the channel to wait indefinitely, causing the query to hang. For example, this issue could happen on a Kerberos-enabled system if the credential cache was outdated. Although the affected query hangs, the impalad daemons continue processing other queries.
Codegen does not catch exceptions in FROM_UNIXTIME()
Bug: IMPALA-2184
Querying for the min or max value of a timestamp cast from a bigint via from_unixtime() fails silently and crashes instances of impalad when the input includes a value outside of the valid range.
Workaround: Disable native code generation with:
SET disable_codegen=true;
Impala returns wrong result for function 'conv(bigint, from_base, to_base)'
Bug: IMPALA-2788
Impala returns wrong result for function conv(). Function conv(bigint, from_base, to_base) returns an correct result, while conv(string, from_base, to_base) returns the correct value. For example:
select 2061013007, conv(2061013007, 16, 10), conv('2061013007', 16, 10); +------------+--------------------------+----------------------------+ | 2061013007 | conv(2061013007, 16, 10) | conv('2061013007', 16, 10) | +------------+--------------------------+----------------------------+ | 2061013007 | 1627467783 | 139066421255 | +------------+--------------------------+----------------------------+ Fetched 1 row(s) in 0.65s select 2061013007, conv(cast(2061013007 as bigint), 16, 10), conv('2061013007', 16, 10); +------------+------------------------------------------+----------------------------+ | 2061013007 | conv(cast(2061013007 as bigint), 16, 10) | conv('2061013007', 16, 10) | +------------+------------------------------------------+----------------------------+ | 2061013007 | 1627467783 | 139066421255 | +------------+------------------------------------------+----------------------------+ select 2061013007, conv(cast(2061013007 as string), 16, 10), conv('2061013007', 16, 10); +------------+------------------------------------------+----------------------------+ | 2061013007 | conv(cast(2061013007 as string), 16, 10) | conv('2061013007', 16, 10) | +------------+------------------------------------------+----------------------------+ | 2061013007 | 139066421255 | 139066421255 | +------------+------------------------------------------+----------------------------+ select 2061013007, conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10), conv('2061013007', 16, 10); +------------+-----------------------------------------------------------------+----------------------------+ | 2061013007 | conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10) | conv('2061013007', 16, 10) | +------------+-----------------------------------------------------------------+----------------------------+ | 2061013007 | 1627467783 | 139066421255 | +------------+-----------------------------------------------------------------+----------------------------+
Workaround: Cast the value to string and use conv(string, from_base, to_base) for conversion.
Issues Fixed in Impala for CDH 5.6.1
For the full list of fixed issues for all CDH components, see Upstream Issues Fixed.
- IMPALA-852, IMPALA-2215 - Analyze HAVING clause before aggregation
- IMPALA-1092 - Fix estimates for trivial coord-only queries
- IMPALA-1170 - Fix URL parsing when path contains '@'
- IMPALA-1934 - Allow shell to retrieve LDAP password from shell cmd
- IMPALA-2093 - Disallow NOT IN aggregate subqueries with a constant lhs expr
- IMPALA-2184 - don't inline timestamp methods with try/catch blocks in IR
- IMPALA-2425 - Broadcast join hint not enforced when low memory limit is set
- IMPALA-2503 - Add missing String.format() arg in error message
- IMPALA-2539 - Unmark collections slots of empty union operands
- IMPALA-2554 - Change default buffer size for RPC servers and clients
- IMPALA-2565 - Planner tests are flaky due to file size mismatches
- IMPALA-2592 - DataStreamSender::Channel::CloseInternal() does not close the channel on an error
- IMPALA-2599 - Pseudo-random sleep before acquiring kerberos ticket possibly not really pseudo-random
- IMPALA-2711 - Fix memory leak in Rand()
- IMPALA-2732 - Timestamp formats with non-padded values
- IMPALA-2734 - Correlated EXISTS subqueries with HAVING clause return wrong results
- IMPALA-2742 - Avoid unbounded MemPool growth with AcquireData()
- IMPALA-2749 - Fix decimal multiplication overflow
- IMPALA-2765 - Preserve return type of subexpressions substituted in isTrueWithNullSlots()
- IMPALA-2788 - conv(bigint num, int from_base, int to_base) returns wrong result
- IMPALA-2798 - Bring in AVRO-1617 fix and add test case for it
- IMPALA-2818 - Fix cancellation crashes/hangs due to BlockOnWait() race
- IMPALA-2820 - Support unquoted keywords as struct-field names
- IMPALA-2832 - Fix cloning of FunctionCallExpr
- IMPALA-2844 - Allow count(*) on RC files with complex types
- IMPALA-2870 - Fix failing metadata.test_ddl.TestDdlStatements.test_create_table test
- IMPALA-2894 - Move regression test into a different .test file
- IMPALA-2906 - Fix an edge case with materializing TupleIsNullPredicates in analytic sorts
- IMPALA-2914 - Fix DCHECK Check failed: HasDateOrTime()
- IMPALA-2926 - Fix off-by-one bug in SelectNode::CopyRows()
- IMPALA-2940 - Fix leak of dictionaries in Parquet scanner
- IMPALA-3000 - Fix BitReader::Reset()
- IMPALA-3034 - Verify all consumed memory of a MemTracker is always released at destruction time
- IMPALA-3047 - Separate create table test with nested types
- IMPALA-3054 - Disable proble side filters when spilling
- IMPALA-3071 - Fix assignment of On-clause predicates belonging to an inner join
- IMPALA-3085 - Unregister data sinks' MemTrackers at their Close() functions
- IMPALA-3093 - ReopenClient() could NULL out 'client_key' causing a crash
- IMPALA-3095 - Add configurable whitelist of authorized internal principals
- IMPALA-3151 - Impala crash for avro table when casting to char data type
- IMPALA-3194 - Allow queries materializing scalar type columns in RC/sequence files
Issues Fixed in Impala for CDH 5.6.0
The set of fixes for Impala in CDH 5.6.0 is the same as in CDH 5.5.2. See Issues Fixed in Impala for CDH 5.5.2 for details.
Issues Fixed in Impala for CDH 5.5.6
For the full list of fixed issues for all CDH components, see Issues Fixed in CDH 5.5.4.
- IMPALA-1928 - Fix Thrift client transport wrapping order
- IMPALA-3369 - Add ALTER TABLE SET COLUMN STATS statement
- IMPALA-3378 - HiveUdfCall::Open() produces unsynchronized access to JniUtil::global_refs_ vector
- IMPALA-3379 - HBaseTableWriter::CreatePutList() produces unsynchronized access to JniUtil::global_refs_ vector
- IMPALA-3441 - Check for malformed Avro data
- IMPALA-3499 - Split catalog update
- IMPALA-3575 - Add retry to backend connection request and rpc timeout
- IMPALA-3633 - Cancel fragment if coordinator is gone
- IMPALA-3682 - Do not retry unrecoverable socket creation errors
- IMPALA-3687 - Prefer Avro field name during schema reconciliation
- IMPALA-3698 - Fix Isilon permissions test
- IMPALA-3711 - Remove unnecessary privilege checks in getDbsMetadata()
- IMPALA-3732 - Handle string length overflow in Avro files
- IMPALA-3751 - Fix clang build errors and warnings
- IMPALA-3915 - Register privilege and audit requests when analyzing resolved table refs.
- IMPALA-4135 - Thrift threaded server times-out connections during high load
- IMPALA-4153 - Return valid non-NULL pointer for 0-byte allocations
Issues Fixed in Impala for CDH 5.5.4
For the full list of fixed issues for all CDH components, see Issues Fixed in CDH 5.5.4.
- IMPALA-852 - ,IMPALA-2215: Analyze HAVING clause before aggregation
- IMPALA-1092 - Fix estimates for trivial coord-only queries
- IMPALA-1170 - Fix URL parsing when path contains '@'
- IMPALA-1934 - Allow shell to retrieve LDAP password from shell cmd
- IMPALA-2093 - Disallow NOT IN aggregate subqueries with a constant lhs expr
- IMPALA-2184 - Don't inline timestamp methods with try/catch blocks in IR
- IMPALA-2425 - Broadcast join hint not enforced when low memory limit is set
- IMPALA-2503 - Add missing String.format() arg in error message
- IMPALA-2539 - Unmark collections slots of empty union operands
- IMPALA-2554 - Change default buffer size for RPC servers and clients
- IMPALA-2565 - Planner tests are flaky due to file size mismatches
- IMPALA-2592 - DataStreamSender::Channel::CloseInternal() does not close the channel on an error
- IMPALA-2599 - Pseudo-random sleep before acquiring kerberos ticket possibly not really pseudo-random
- IMPALA-2711 - Fix memory leak in Rand()
- IMPALA-2719 - test_parquet_max_page_header fails on Isilon
- IMPALA-2732 - Timestamp formats with non-padded values
- IMPALA-2734 - Correlated EXISTS subqueries with HAVING clause return wrong results
- IMPALA-2742 - Avoid unbounded MemPool growth with AcquireData()
- IMPALA-2749 - Fix decimal multiplication overflow
- IMPALA-2765 - Preserve return type of subexpressions substituted in isTrueWithNullSlots()
- IMPALA-2788 - conv(bigint num, int from_base, int to_base) returns wrong result
- IMPALA-2798 - Bring in AVRO-1617 fix and add test case for it
- IMPALA-2818 - Fix cancellation crashes/hangs due to BlockOnWait() race
- IMPALA-2820 - Support unquoted keywords as struct-field names
- IMPALA-2832 - Fix cloning of FunctionCallExpr
- IMPALA-2844 - Allow count(*) on RC files with complex types
- IMPALA-2870 - Fix failing metadata.test_ddl.TestDdlStatements.test_create_table test
- IMPALA-2894 - Move regression test into a different .test file
- IMPALA-2906 - Fix an edge case with materializing TupleIsNullPredicates in analytic sorts
- IMPALA-2914 - Fix DCHECK Check failed: HasDateOrTime()
- IMPALA-2926 - Fix off-by-one bug in SelectNode::CopyRows()
- IMPALA-2940 - Fix leak of dictionaries in Parquet scanner
- IMPALA-3000 - Fix BitReader::Reset()
- IMPALA-3034 - Verify all consumed memory of a MemTracker is always released at destruction time
- IMPALA-3047 - Separate create table test with nested types
- IMPALA-3054 - Disable proble side filters when spilling
- IMPALA-3071 - Fix assignment of On-clause predicates belonging to an inner join
- IMPALA-3085 - Unregister data sinks' MemTrackers at their Close() functions
- IMPALA-3093 - ReopenClient() could NULL out 'client_key' causing a crash
- IMPALA-3095 - Add configurable whitelist of authorized internal principals
- IMPALA-3151 - Impala crash for avro table when casting to char data type
- IMPALA-3194 - Allow queries materializing scalar type columns in RC/sequence files
Issues Fixed in Impala for CDH 5.5.2
This section lists the most serious or frequently encountered customer issues fixed in CDH 5.5.2 / Impala 2.3.2. For the full list of fixed Impala issues, see Issues Fixed in CDH 5.5.2.
SEGV in AnalyticEvalNode touching NULL input_stream_
A query involving an analytic function could encounter a serious error. This issue was encountered infrequently, depending upon specific combinations of queries and data.
Bug: IMPALA-2829
Free local allocations per row batch in non-partitioned AGG and HJ
An outer join query could fail unexpectedly with an out-of-memory error when the "spill to disk" mechanism was turned off.
Bug: IMPALA-2722
Free local allocations once for every row batch when building hash tables
A join query could encounter a serious error due to an internal failure to allocate memory, which resulted in dereferencing a NULL pointer.
Bug: IMPALA-2612
Prevent migrating incorrectly inferred identity predicates into inline views
Referring to the same column twice in a view definition could cause the view to omit rows where that column contained a NULL value. This could cause incorrect results due to an inaccurate COUNT(*) value or rows missing from the result set.
Bug: IMPALA-2643
Fix GRANTs on URIs with uppercase letters
A GRANT statement for a URI could be ineffective if the URI contained uppercase letters, for example in an uppercase directory name. Subsequent statements, such as CREATE EXTERNAL TABLE with a LOCATION clause, could fail with an authorization exception.
Bug: IMPALA-2695
Avoid sending large partition stats objects over thrift
The catalogd daemon could encounter a serious error when loading the incremental statistics metadata for tables with large numbers of partitions and columns. The problem occurred when the internal representation of metadata for the table exceeded 2 GB, for example in a table with 20K partitions and 77 columns. The fix causes a COMPUTE INCREMENTAL STATS operation to fail if it would produce metadata that exceeded the maximum size.
Bug: IMPALA-2664, IMPALA-2648
Throw AnalysisError if table properties are too large (for the Hive metastore)
CREATE TABLE or ALTER TABLE statements could fail with metastore database errors due to length limits on the SERDEPROPERTIES and TBLPROPERTIES clauses. (The limit on key size is 256, while the limit on value size is 4000.) The fix makes Impala handle these error conditions more cleanly, by detecting too-long values rather than passing them to the metastore database.
Bug: IMPALA-2226
Make MAX_PAGE_HEADER_SIZE configurable
Impala could fail to access Parquet data files with page headers larger than 8 MB, which could occur, for example, if the minimum or maximum values for a column were long strings. The fix adds a configuration setting --max_page_header_size, which you can use to increase the Impala size limit to a value higher than 8 MB.
Bug: IMPALA-2273
reduce scanner memory usage
Queries on Parquet tables could consume excessive memory (potentially multiple gigabytes) due to producing large intermediate data values while evaluating groups of rows. The workaround was to reduce the size of the NUM_SCANNER_THREADS query option, the BATCH_SIZE query option, or both.
Bug: IMPALA-2473
Handle error when distinct and aggregates are used with a having clause
A query that included a DISTINCT operator and a HAVING clause, but no aggregate functions or GROUP BY, would fail with an uninformative error message.
Bug: IMPALA-2113
Handle error when star based select item and aggregate are incorrectly used
A query that included * in the SELECT list, in addition to an aggregate function call, would fail with an uninformative message if the query had no GROUP BY clause.
Bug: IMPALA-2225
Refactor MemPool usage in HBase scan node
Queries involving HBase tables used substantially more memory than in earlier Impala versions. The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284. The fix for this issue involves removing a separate memory work area for HBase queries and reusing other memory that was already allocated.
Bug: IMPALA-2731
Fix migration/assignment of On-clause predicates inside inline views
Some combinations of ON clauses in join queries could result in comparisons being applied at the wrong stage of query processing, leading to incorrect results. Wrong predicate assignment could happen under the following conditions:
- The query includes an inline view that contains an outer join.
- That inline view is joined with another table in the enclosing query block.
- That join has an ON clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view.
Bug: IMPALA-1459
DCHECK in parquet scanner after block read error
A debug build of Impala could encounter a serious error after encountering some kinds of I/O errors for Parquet files. This issue only occurred in debug builds, not release builds.
Bug: IMPALA-2558
PAGG hits mem_limit when switching to I/O buffers
A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory. The cause was the internal ordering of operations that could cause a later phase of the query to allocate memory required by an earlier phase of the query. The workaround was to either increase or decrease the MEM_LIMIT query option, because the issue would only occur for a specific combination of memory limit and data volume.
Bug: IMPALA-2535
Fix check failed: sorter_runs_.back()->is_pinned_
A query could fail with an internal error while calculating the memory limit. This was an infrequent condition uncovered during stress testing.
Bug: IMPALA-2559
Don't ignore Status returned by DataStreamRecvr::CreateMerger()
A query could fail with an internal error while calculating the memory limit. This was an infrequent condition uncovered during stress testing.
Bug: IMPALA-2614, IMPALA-2559
DataStreamSender::Send() does not return an error status if SendBatch() failed
Bug: IMPALA-2591
Re-enable SSL and Kerberos on server-server
These fixes lift the restriction on using SSL encryption and Kerberos authentication together for internal communication between Impala components.
Bug: IMPALA-2598, IMPALA-2747
Issues Fixed in Impala for CDH 5.5.1
The version of Impala that is included with CDH 5.5.1 / Impala 2.3.1 is identical to CDH 5.5.0 / Impala 2.3.0. There are no new bug fixes, new features, or incompatible changes.
Issues Fixed in Impala for CDH 5.5.0
This section lists the most serious or frequently encountered customer issues fixed in CDH 5.5.0 / Impala 2.3.0. Any issues already fixed in CDH 5.4 maintenance releases (up through CDH 5.4.8) are also included. Those issues are listed under the respective CDH 5.4 sections and are not repeated here. For the full list of fixed Impala issues, see Issues Fixed in CDH 5.5.0.
Fixes for Serious Errors
A number of issues were resolved that could result in serious errors when encountered. The most critical or commonly encountered are listed here.
Bugs: IMPALA-2168, IMPALA-2378, IMPALA-2369, IMPALA-2357, IMPALA-2319, IMPALA-2314, IMPALA-2016
Fixes for Correctness Errors
A number of issues were resolved that could result in wrong results when encountered. The most critical or commonly encountered are listed here.
Bugs: IMPALA-2192, IMPALA-2440, IMPALA-2090, IMPALA-2086, IMPALA-1947, IMPALA-1917
Issues Fixed in Impala for CDH 5.4.10
For the full list of fixed issues for all CDH components, see Issues Fixed in CDH 5.4.10.
- IMPALA-1702 - Check for duplicate table IDs at the end of analysis (issue not entirely fixed, but now fails gracefully)
- IMPALA-2264 - Implicit casts to integers from decimals with higher precision sometimes allowed
- IMPALA-2473 - Excessive memory usage by scan nodes
- IMPALA-2621 - Fix flaky UNIX_TIMESTAMP() test
- IMPALA-2643 - Nested inline view produces incorrect result when referencing the same column implicitly
- IMPALA-2765 - AnalysisException: operands of type BOOLEAN and TIMESTAMP are not comparable when OUTER JOIN with CASE statement
- IMPALA-2798 - After adding a column to avro table, Impala returns weird result if codegen is enabled.
- IMPALA-2861 - Fix flaky scanner test added via IMPALA-2473 backport
- IMPALA-2914 - Hit DCHECK Check failed: HasDateOrTime()
- IMPALA-3034 - MemTracker leak on PHJ failure to spill
- IMPALA-3085 - DataSinks' MemTrackers need to unregister themselves from parent
- IMPALA-3093 - ReopenClient() could NULL out 'client_key' causing a crash
- IMPALA-3095 - Allow additional Kerberos users to be authorized to access internal APIs
Issues Fixed in Impala for CDH 5.4.9
This section lists the most frequently encountered customer issues fixed in Impala for CDH 5.4.9.
For the full list of fixed issues, see Issues Fixed in CDH 5.4.9.
Query return empty result if it contains NullLiteral in inlineview
If an inline view in a FROM clause contained a NULL literal, the result set was empty.
Bug: IMPALA-1917
HBase scan node uses 2-4x memory after upgrade to Impala 2.2.8
Queries involving HBase tables used substantially more memory than in earlier Impala versions. The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284. The fix for this issue involves removing a separate memory work area for HBase queries and reusing other memory that was already allocated.
Bug: IMPALA-2731
Fix migration/assignment of On-clause predicates inside inline views
Some combinations of ON clauses in join queries could result in comparisons being applied at the wrong stage of query processing, leading to incorrect results. Wrong predicate assignment could happen under the following conditions:
- The query includes an inline view that contains an outer join.
- That inline view is joined with another table in the enclosing query block.
- That join has an ON clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view.
Bug: IMPALA-1459
Fix wrong predicate assignment in outer joins
The join predicate for an OUTER JOIN clause could be applied at the wrong stage of query processing, leading to incorrect results.
Bug: IMPALA-2446
Avoid sending large partition stats objects over thrift
The catalogd daemon could encounter a serious error when loading the incremental statistics metadata for tables with large numbers of partitions and columns. The problem occurred when the internal representation of metadata for the table exceeded 2 GB, for example in a table with 20K partitions and 77 columns. The fix causes a COMPUTE INCREMENTAL STATS operation to fail if it would produce metadata that exceeded the maximum size.
Bug: IMPALA-2648, IMPALA-2664
Avoid overflow when adding large intervals to TIMESTAMPs
Adding or subtracting a large INTERVAL value to a TIMESTAMP value could produce an incorrect result, with the value wrapping instead of returning an out-of-range error.
Bug: IMPALA-1675
Analysis exception when a binary operator contains an IN operator with values
An IN operator with literal values could cause a statement to fail if used as the argument to a binary operator, such as an equality test for a BOOLEAN value.
Bug: IMPALA-1949
Make MAX_PAGE_HEADER_SIZE configurable
Impala could fail to access Parquet data files with page headers larger than 8 MB, which could occur, for example, if the minimum or maximum values for a column were long strings. The fix adds a configuration setting --max_page_header_size, which you can use to increase the Impala size limit to a value higher than 8 MB.
Bug: IMPALA-2273
Fix spilling sorts with var-len slots that are NULL or empty.
A query that activated the spill-to-disk mechanism could fail if it contained a sort expression involving certain combinations of fixed-length or variable-length types.
Bug: IMPALA-2357
Work-around IMPALA-2344: Fail query with OOM in case block->Pin() fails
Some queries that activated the spill-to-disk mechanism could produce a serious error if there was insufficient memory to set up internal work areas. Now those queries produce normal out-of-memory errors instead.
Bug: IMPALA-2344
Crash (likely race) tearing down BufferedBlockMgr on query failure
A serious error could occur under rare circumstances, due to a race condition while freeing memory during heavily concurrent workloads.
Bug: IMPALA-2252
QueryExecState doesn't check for query cancellation or errors
A call to SetError() in a user-defined function (UDF) would not cause the query to fail as expected.
Bug: IMPALA-1746
Impala throws IllegalStateException when inserting data into a partition while select subquery group by partition columns
An INSERT ... SELECT operation into a partitioned table could fail if the SELECT query included a GROUP BY clause referring to the partition key columns.
Bug: IMPALA-2533
Issues Fixed in Impala for CDH 5.4.8
This section lists the most frequently encountered customer issues fixed in Impala for CDH 5.4.8.
For the full list of fixed issues, see Issues Fixed in CDH 5.4.8.
Impala is unable to read hive tables created with the "STORED AS AVRO" clause
Impala could not read Avro tables created in Hive with the STORED AS AVRO clause.
Bug: IMPALA-1136, IMPALA-2161
make Parquet scanner fail query if the file size metadata is stale
If a Parquet file in HDFS was overwritten by a smaller file, Impala could encounter a serious error. Issuing a INVALIDATE METADATA statement before a subsequent query would avoid the error. The fix allows Impala to handle such inconsistencies in Parquet file length cleanly regardless of whether the table metadata is up-to-date.
Bug: IMPALA-2213
Avoid allocating StringBuffer > 1GB in ScannerContext::Stream::GetBytesInternal()
Impala could encounter a serious error when reading compressed text files larger than 1 GB. The fix causes Impala to issue an error message instead in this case.
Bug: IMPALA-2249
Disallow long (1<<30) strings in group_concat()
A query using the group_concat() function could encounter a serious error if the returned string value was larger than 1 GB. Now the query fails with an error message in this case.
Bug: IMPALA-2284
avoid FnvHash64to32 with empty inputs
An edge case in the algorithm used to distribute data among nodes could result in uneven distribution of work for some queries, with all data sent to the same node.
Bug: IMPALA-2270
The catalog does not close the connection to HMS during table invalidation
A communication error could occur between Impala and the Hive metastore database, causing Impala operations that update table metadata to fail.
Bug: IMPALA-2348
Wrong DCHECK in PHJ::ProcessProbeBatch
Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.
Bug: IMPALA-2364
Avoid cardinality 0 in scan nodes of small tables and low selectivity
Impala could generate a suboptimal query plan for some queries involving small tables.
Bug: IMPALA-2165
Issues Fixed in Impala for CDH 5.4.7
This section lists the most frequently encountered customer issues fixed in Impala for CDH 5.4.7.
For the full list of fixed issues, see Issues Fixed in CDH 5.4.7.
Warn if table stats are potentially corrupt.
Impala warns if it detects a discrepancy in table statistics: a table considered to have zero rows even though there are data files present. In this case, Impala also skips query optimizations that are normally applied to very small tables.
Bug: IMPALA-1983:
Pass correct child node in 2nd phase merge aggregation.
A query could encounter a serious error if it included a particular combination of aggregate functions and inline views.
Bug: IMPALA-2266
Set the output smap of an EmptySetNode produced from an empty inline view.
A query could encounter a serious error if it included an inline view whose subquery had no FROM clause.
Bug: IMPALA-2216
Set an InsertStmt's result exprs from the source statement's result exprs.
A CREATE TABLE AS SELECT or INSERT ... SELECT statement could produce different results than a SELECT statement, for queries including a FULL JOIN clause and including literal values in the select list.
Bug: IMPALA-2203
Fix planning of empty union operands with analytics.
A query could return incorrect results if it contained a UNION clause, calls to analytic functions, and a constant expression that evaluated to FALSE.
Bug: IMPALA-2088
Retain eq predicates bound by grouping slots with complex grouping exprs.
A query containing an INNER JOIN clause could return undesired rows. Some predicate specified in the ON clause could be omitted from the filtering operation.
Bug: IMPALA-2089
Row count not set for empty partition when spec is used with compute incremental stats
A COMPUTE INCREMENTAL STATS statement could leave the row count for an emptyp partition as -1, rather than initializing the row count to 0. The missing statistic value could result in reduced query performance.
Bug: IMPALA-2199
Explicit aliases + ordinals analysis bug
A query could encounter a serious error if it included column aliases with the same names as table columns, and used ordinal numbers in an ORDER BY or GROUP BY clause.
Bug: IMPALA-1898
Fix TupleIsNullPredicate to return false if no tuples are nullable.
A query could return incorrect results if it included an outer join clause, inline views, and calls to functions such as coalesce() that can generate NULL values.
Bug: IMPALA-1987
fix Expr::ComputeResultsLayout() logic
A query could return incorrect results if the table contained multiple CHAR columns with length of 2 or less, and the query included a GROUP BY clause that referred to multiple such columns.
Bug: IMPALA-2178
Substitute an InsertStmt's partition key exprs with the root node's smap.
An INSERT statement could encounter a serious error if the SELECT portion called an analytic function.
Bug: IMPALA-1737
Issues Fixed in Impala for CDH 5.4.5
This section lists the most frequently encountered customer issues fixed in Impala for CDH 5.4.5.
For the full list of fixed issues, see Issues Fixed in CDH 5.4.5.
Impala DML/DDL operations corrupt table metadata leading to Hive query failures
When the Impala COMPUTE STATS statement was run on a partitioned Parquet table that was created in Hive, the table subsequently became inaccessible in Hive. The table was still accessible to Impala. Regaining access in Hive required a workaround of creating a new table. The error displayed in Hive was:
Error: Error while compiling statement: FAILED: SemanticException Class not found: com.cloudera.impala.hive.serde.ParquetInputFormat (state=42000,code=40000)
Bug: IMPALA-2048
Avoiding a DCHECK of NULL hash table in spilled right joins
A query could encounter a serious error if it contained a RIGHT OUTER, RIGHT ANTI, or FULL OUTER join clause and approached the memory limit on a host so that the "spill to disk" mechanism was activated.
Bug: IMPALA-1929
Bug in PrintTColumnValue caused wrong stats for TINYINT partition cols
Declaring a partition key column as a TINYINT caused problems with the COMPUTE STATS statement. The associated partitions would always have zero estimated rows, leading to potential inefficient query plans.
Bug: IMPALA-2136
Where clause does not propagate to joins inside nested views
A query that referred to a view whose query referred to another view containing a join, could return incorrect results. WHERE clauses for the outermost query were not always applied, causing the result set to include additional rows that should have been filtered out.
Bug: IMPALA-2018
Add effective_user() builtin
The user() function returned the name of the logged-in user, which might not be the same as the user name being checked for authorization if, for example, delegation was enabled.
Bug: IMPALA-2064
Resolution: Rather than change the behavior of the user() function, the fix introduces an additional function effective_user() that returns the user name that is checked during authorization.
Make UTC to local TimestampValue conversion faster.
Query performance was improved substantially for Parquet files containing TIMESTAMP data written by Hive, when the -convert_legacy_hive_parquet_utc_timestamps=true setting is in effect.
Bug: IMPALA-2125
Workaround IMPALA-1619 in BufferedBlockMgr::ConsumeMemory()
A join query could encounter a serious error if the query approached the memory limit on a host so that the "spill to disk" mechanism was activated, and data volume in the join was large enough that an internal memory buffer exceeded 1 GB in size on a particular host. (Exceeding this limit would only happen for huge join queries, because Impala could split this intermediate data into 16 parts during the join query, and the buffer only contains compact bookkeeping data rather than the actual join column data.)
Bug: IMPALA-2065
Issues Fixed in Impala for CDH 5.4.3
This section lists the most frequently encountered customer issues fixed in Impala for CDH 5.4.3.
For the full list of fixed issues, see Issues Fixed in CDH 5.4.3.
Enable using Isilon as the underlying filesystem.
The certification of CDH and Impala with the Isilon filesystem involves a number of fixes to performance and flexibility for dealing with I/O using remote reads. See Using Impala with Isilon Storage for details on using Impala and Isilon together.
Bug: IMPALA-1968, IMPALA-1730
Expand set of supported timezones.
The set of timezones recognized by Impala was expanded. You can always find the latest list of supported timezones in the Impala source code, in the file timezone_db.cc.
Bug: IMPALA-1381
Impala Timestamp ISO-8601 Support.
Impala can now process TIMESTAMP literals including a trailing z, signifying "Zulu" time, a synonym for UTC.
Bug: IMPALA-1963
Fix wrong warning when insert overwrite to empty table
An INSERT OVERWRITE operation would encounter an error if the SELECT portion of the statement returned zero rows, such as with a LIMIT 0 clause.
Bug: IMPALA-2008
Expand parsing of decimals to include scientific notation
DECIMAL literals can now include e scientific notation. For example, now CAST(1e3 AS DECIMAL(5,3)) is a valid expression. Formerly it returned NULL. Some scientific expressions might have worked before in DECIMAL context, but only when the scale was 0.
Bug: IMPALA-1952
Issues Fixed in Impala for CDH 5.4.1
This section lists the most frequently encountered customer issues fixed in Impala for CDH 5.4.1.
For the full list of fixed issues, see Issues Fixed in CDH 5.4.1.
Issues Fixed in CDH 5.4 / Impala 2.2
This section lists the most frequently encountered customer issues fixed in Impala 2.2.0.
For the full list of fixed issues in Impala 2.2.0, including over 40 critical issues, see this report in the JIRA system.
Continue reading:
- Altering a column's type causes column stats to stop sticking for that column
- Impala may leak or use too many file descriptors
- Spurious stale block locality messages
- DROP TABLE fails after COMPUTE STATS and ALTER TABLE RENAME to a different database.
- IMPALA-1556 causes memory leak with secure connections
- unix_timestamp() does not return correct time
- Impala incorrectly handles text data missing a newline on the last line
- Impala's ACLs check do not consider all group ACLs, only checked first one.
- Fix infinite loop opening or closing file with invalid metadata
- Cannot write Parquet files when values are larger than 64KB
- Impala Will Not Run on Certain Intel CPUs
Altering a column's type causes column stats to stop sticking for that column
When the type of a column was changed in either Hive or Impala through ALTER TABLE CHANGE COLUMN, the metastore database did not correctly propagate that change to the table that contains the column statistics. The statistics (particularly the NDV) for that column were permanently reset and could not be changed by Impala's COMPUTE STATS command. The underlying cause is a Hive bug (HIVE-9866).
Bug: IMPALA-1607
Resolution: Resolved by incorporating the fix for HIVE-9866.
Workaround: On systems without the corresponding Hive fix, change the column back to its original type. The stats reappear and you can recompute or drop them.
Impala may leak or use too many file descriptors
If a file was truncated in HDFS without a corresponding REFRESH in Impala, Impala could allocate memory for file descriptors and not free that memory.
Bug: IMPALA-1854
Spurious stale block locality messages
Impala could issue messages stating the block locality metadata was stale, when the metadata was actually fine. The internal "remote bytes read" counter was not being reset properly. This issue did not cause an actual slowdown in query execution, but the spurious error could result in unnecessary debugging work and unnecessary use of the INVALIDATE METADATA statement.
Bug: IMPALA-1712
DROP TABLE fails after COMPUTE STATS and ALTER TABLE RENAME to a different database.
When a table was moved from one database to another, the column statistics were not pointed to the new database.i This could result in lower performance for queries due to unavailable statistics, and also an inability to drop the table.
Bug: IMPALA-1711
IMPALA-1556 causes memory leak with secure connections
impalad daemons could experience a memory leak on clusters using Kerberos authentication, with memory usage growing as more data is transferred across the secure channel, either to the client program or between Impala nodes. The same issue affected LDAP-secured clusters to a lesser degree, because the LDAP security only covers data transferred back to client programs.
Bug: IMPALA-1674
unix_timestamp() does not return correct time
The unix_timestamp() function could return an incorrect value (a constant value of 1).
Bug: IMPALA-1623
Impala incorrectly handles text data missing a newline on the last line
Some queries did not recognize the final line of a text data file if the line did not end with a newline character. This could lead to inconsistent results, such as a different number of rows for SELECT COUNT(*) as opposed to SELECT *.
Bug: IMPALA-1476
Impala's ACLs check do not consider all group ACLs, only checked first one.
If the HDFS user ID associated with the impalad process had read or write access in HDFS based on group membership, Impala statements could still fail with HDFS permission errors if that group was not the first listed group for that user ID.
Bug: IMPALA-1805
Fix infinite loop opening or closing file with invalid metadata
Truncating a file in HDFS, after Impala had cached the file metadata, could produce a hang when Impala queried a table containing that file.
Bug: IMPALA-1794
Cannot write Parquet files when values are larger than 64KB
Impala could sometimes fail to INSERT into a Parquet table if a column value such as a STRING was larger than 64 KB.
Bug: IMPALA-1705
Impala Will Not Run on Certain Intel CPUs
This fix relaxes the CPU requirement for Impala. Now only the SSSE3 instruction set is required. Formerly, SSE4.1 instructions were generated, making Impala refuse to start on some older CPUs.
Bug: IMPALA-1646
Issues Fixed in Impala for CDH 5.3.10
For the full list of fixed issues for all CDH components, see Issues Fixed in CDH 5.3.10.
- IMPALA-1702 - "invalidate metadata" can cause duplicate TableIds (issue not entirely fixed, but now fails gracefully)
- IMPALA-2125 - Improve perf when reading timestamps from parquet files written by hive
- IMPALA-2565 - Planner tests are flaky due to file size mismatches
- IMPALA-3095 - Allow additional Kerberos users to be authorized to access internal APIs
Issues Fixed in the 2.1.7 Release / CDH 5.3.9
This section lists the most significant Impala issues fixed in Impala 2.1.7 for CDH 5.3.9.
For the full list of Impala fixed issues in this release, see Issues Fixed in CDH 5.3.9.
Query return empty result if it contains NullLiteral in inlineview
If an inline view in a FROM clause contained a NULL literal, the result set was empty.
Bug: IMPALA-1917
Fix edge cases for decimal/integer cast
A value of type DECIMAL(3,0) could be incorrectly cast to TINYINT. The resulting out-of-range value could be incorrect. After the fix, the smallest type that is allowed for this cast is INT, and attempting to use DECIMAL(3,0) in a TINYINT context produces a "loss of precision" error.
Bug: IMPALA-2264
Constant filter expressions are not checked for errors and state cleanup on exception / DCHECK on destroying an ExprContext
An invalid constant expression in a WHERE clause (for example, an invalid regular expression pattern) could fail to clean up internal state after raising a query error. Therefore, certain combinations of invalid expressions in a query could cause a crash, or cause a query to continue when it should halt with an error.
Bug: IMPALA-1756, IMPALA-2514
QueryExecState does not check for query cancellation or errors
A call to SetError() in a user-defined function (UDF) would not cause the query to fail as expected.
Bug: IMPALA-1746, IMPALA-2141
Issues Fixed in the 2.1.6 Release / CDH 5.3.8
This section lists the most significant Impala issues fixed in Impala 2.1.6 for CDH 5.3.8.
For the full list of Impala fixed issues in this release, see Issues Fixed in CDH 5.3.8.
Wrong DCHECK in PHJ::ProcessProbeBatch
Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.
Bug: IMPALA-2364
LargestSpilledPartition was not checking if partition is closed
Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.
Bug: IMPALA-2314
Avoid cardinality 0 in scan nodes of small tables and low selectivity
Impala could generate a suboptimal query plan for some queries involving small tables.
Bug: IMPALA-2165
fix Expr::ComputeResultsLayout() logic
Queries using the GROUP BY operator on multiple CHAR columns with length less than or equal to 2 characters could return incorrect results for some columns.
Bug: IMPALA-2178
Properly unescape string value for HBase filters
Queries against HBase tables could return incomplete results if the WHERE clause included string comparisons using literals containing escaped quotation marks.
Bug: IMPALA-2133
Avoiding a DCHECK of NULL hash table in spilled right joins
A query could encounter a serious error if it contained a RIGHT OUTER, RIGHT ANTI, or FULL OUTER join clause and approached the memory limit on a host so that the "spill to disk" mechanism was activated.
Bug: IMPALA-1929
Issues Fixed in the 2.1.5 Release / CDH 5.3.6
This section lists the most significant Impala issues fixed in Impala 2.1.5 for CDH 5.3.6.
For the full list of Impala fixed issues in this release, see Issues Fixed in CDH 5.3.6.
Avoid calling ProcessBatch with out_batch->AtCapacity in right joins
Queries including RIGHT OUTER JOIN, RIGHT ANTI JOIN, or FULL OUTER JOIN clauses and involving a high volume of data could encounter a serious error.
Bug: IMPALA-1919
Issues Fixed in the 2.1.4 Release / CDH 5.3.4
This section lists the most significant Impala issues fixed in Impala 2.1.4 for CDH 5.3.4. Because CDH 5.3.5 does not include any code changes for Impala, Impala 2.1.4 is included with both CDH 5.3.4 and 5.3.5.
For the full list of Impala fixed issues in Impala 2.1.4 for CDH 5.3.4, see Issues Fixed in CDH 5.3.4.
Continue reading:
- Crash: impala::TupleIsNullPredicate::Prepare
- Expand parsing of decimals to include scientific notation
- INSERT/CTAS evaluates and applies constant predicates.
- Assign predicates below analytic functions with a compatible partition by clause
- FIRST_VALUE may produce incorrect results with preceding windows
- FIRST_VALUE rewrite fn type might not match slot type
- AnalyticEvalNode cannot handle partition/order by exprs with NaN
- AnalyticEvalNode not properly handling nullable tuples
Crash: impala::TupleIsNullPredicate::Prepare
When expressions that tested for NULL were used in combination with analytic functions, an error could occur. (The original crash issue was fixed by an earlier patch.)
Bug: IMPALA-1519
Expand parsing of decimals to include scientific notation
DECIMAL literals could include e scientific notation. For example, now CAST(1e3 AS DECIMAL(5,3)) is a valid expression. Formerly it returned NULL. Some scientific expressions might have worked before in DECIMAL context, but only when the scale was 0.
Bug: IMPALA-1952
INSERT/CTAS evaluates and applies constant predicates.
An INSERT OVERWRITE statement would write new data, even if a constant clause such as WHERE 1 = 0 should have prevented it from writing any rows.
Bug: IMPALA-1860
Assign predicates below analytic functions with a compatible partition by clause
If the PARTITION BY clause in an analytic function refers to partition key columns in a partitioned table, now Impala can perform partition pruning during the analytic query.
Bug: IMPALA-1900
FIRST_VALUE may produce incorrect results with preceding windows
A query using the FIRST_VALUE analytic function and a window defined with the PRECEDING keyword could produce wrong results.
Bug: IMPALA-1888
FIRST_VALUE rewrite fn type might not match slot type
A query referencing a DECIMAL column with the FIRST_VALUE analytic function could encounter an error.
Bug: IMPALA-1559
AnalyticEvalNode cannot handle partition/order by exprs with NaN
A query using an analytic function could encounter an error if the evaluation of an analytic ORDER BY or PARTITION expression resulted in a NaN value, for example if the ORDER BY or PARTITION contained a division operation where both operands were zero.
Bug: IMPALA-1808
AnalyticEvalNode not properly handling nullable tuples
An analytic function containing only an OVER clause could encounter an error if another part of the query (specifically an outer join) produced all-NULL tuples.
Bug: IMPALA-1562
Issues Fixed in the 2.1.3 Release / CDH 5.3.3
This section lists the most significant issues fixed in Impala 2.1.3.
For the full list of fixed issues in Impala 2.1.3, see Issues Fixed in CDH 5.3.3.
Continue reading:
- Add compatibility flag for Hive-Parquet-Timestamps
- Use snprintf() instead of lexical_cast() in float-to-string casts
- Fix partition spilling cleanup when new stream OOMs
- Impala's ACLs check do not consider all group ACLs, only checked first one.
- Fix infinite loop opening or closing file with invalid metadata
- external-data-source-executor leaking global jni refs
- Spurious stale block locality messages
Add compatibility flag for Hive-Parquet-Timestamps
When Hive writes TIMESTAMP values, it represents them in the local time zone of the server. Impala expects TIMESTAMP values to always be in the UTC time zone, possibly leading to inconsistent results depending on which component created the data files. This patch introduces a new startup flag, -convert_legacy_hive_parquet_utc_timestamps for the impalad daemon. Specify -convert_legacy_hive_parquet_utc_timestamps=true to make Impala recognize Parquet data files written by Hive and automatically adjust TIMESTAMP values read from those files into the UTC time zone for compatibility with other Impala TIMESTAMP processing. Although this setting is currently turned off by default, consider enabling it if practical in your environment, for maximum interoperability with Hive-created Parquet files.
Bug: IMPALA-1658
Use snprintf() instead of lexical_cast() in float-to-string casts
Converting a floating-point value to a STRING could be slower than necessary.
Bug: IMPALA-1738
Fix partition spilling cleanup when new stream OOMs
Certain calls to aggregate functions with STRING arguments could encounter a serious error when the system ran low on memory and attempted to activate the spill-to-disk mechanism. The error message referenced the function impala::AggregateFunctions::StringValGetValue.
Bug: IMPALA-1865
Impala's ACLs check do not consider all group ACLs, only checked first one.
If the HDFS user ID associated with the impalad process had read or write access in HDFS based on group membership, Impala statements could still fail with HDFS permission errors if that group was not the first listed group for that user ID.
Bug: IMPALA-1805
Fix infinite loop opening or closing file with invalid metadata
Truncating a file in HDFS, after Impala had cached the file metadata, could produce a hang when Impala queried a table containing that file.
Bug: IMPALA-1794
external-data-source-executor leaking global jni refs
Successive calls to the data source API could result in excessive memory consumption, with memory allocated but never freed.
Bug: IMPALA-1801
Spurious stale block locality messages
Impala could issue messages stating the block locality metadata was stale, when the metadata was actually fine. The internal "remote bytes read" counter was not being reset properly. This issue did not cause an actual slowdown in query execution, but the spurious error could result in unnecessary debugging work and unnecessary use of the INVALIDATE METADATA statement.
Bug: IMPALA-1712
Issues Fixed in the 2.1.2 Release / CDH 5.3.2
This section lists the most significant issues fixed in Impala 2.1.2.
For the full list of fixed issues in Impala 2.1.2, see this report in the JIRA system.
Impala incorrectly handles double numbers with more than 19 significant decimal digits
When a floating-point value was read from a text file and interpreted as a FLOAT or DOUBLE value, it could be incorrectly interpreted if it included more than 19 significant digits.
Bug: IMPALA-1622
unix_timestamp() does not return correct time
The unix_timestamp() function could return an incorrect value (a constant value of 1).
Bug: IMPALA-1623
Row Count Mismatch: Partition pruning with NULL
A query against a partitioned table could return incorrect results if the WHERE clause compared the partition key to NULL using operators such as = or !=.
Bug: IMPALA-1535
Fetch column stats in bulk using new (Hive .13) HMS APIs
The performance of the COMPUTE STATS statement and queries was improved, particularly for wide tables.
Bug: IMPALA-1120
Issues Fixed in the 2.1.1 Release / CDH 5.3.1
This section lists the most significant issues fixed in Impala 2.1.1.
For the full list of fixed issues in Impala 2.1.1, see this report in the JIRA system.
IMPALA-1556 causes memory leak with secure connections
impalad daemons could experience a memory leak on clusters using Kerberos authentication, with memory usage growing as more data is transferred across the secure channel, either to the client program or between Impala nodes. The same issue affected LDAP-secured clusters to a lesser degree, because the LDAP security only covers data transferred back to client programs.
Bug: IMPALA-1674
TSaslServerTransport::Factory::getTransport() leaks transport map entries
impalad daemons in clusters secured by Kerberos or LDAP could experience a slight memory leak on each connection. The accumulation of unreleased memory could cause problems on long-running clusters.
Bug: IMPALA-1668
Issues Fixed in the 2.1.0 Release / CDH 5.3.0
This section lists the most significant issues fixed in Impala 2.1.0.
For the full list of fixed issues in Impala 2.1.0, see this report in the JIRA system.
Kerberos fetches 3x slower
Transferring large result sets back to the client application on Kerberos
Bug: IMPALA-1455
Compressed file needs to be hold on entirely in Memory
Queries on gzipped text files required holding the entire data file and its uncompressed representation in memory at the same time. SELECT and COMPUTE STATS statements could fail or perform inefficiently as a result. The fix enables streaming reads for gzipped text, so that the data is uncompressed as it is read.
Bug: IMPALA-1556
Cannot read hbase metadata with NullPointerException: null
Impala might not be able to access HBase tables, depending on the associated levels of Impala and HBase on the system.
Bug: IMPALA-1611
Serious errors / crashes
Improved code coverage in Impala testing uncovered a number of potentially serious errors that could occur with specific query syntax. These errors are resolved in Impala 2.1.
Bug: IMPALA-1553, IMPALA-1528, IMPALA-1526, IMPALA-1524, IMPALA-1508, IMPALA-1493, IMPALA-1501, IMPALA-1483
Issues Fixed in the 2.0.5 Release / CDH 5.2.6
For the full list of fixed issues in Impala 2.0.5, see this report in the JIRA system.
Issues Fixed in the 2.0.4 Release / CDH 5.2.5
This section lists the most significant issues fixed in Impala 2.0.4.
For the full list of fixed issues in Impala 2.0.4, see this report in the JIRA system.
Add compatibility flag for Hive-Parquet-Timestamps
When Hive writes TIMESTAMP values, it represents them in the local time zone of the server. Impala expects TIMESTAMP values to always be in the UTC time zone, possibly leading to inconsistent results depending on which component created the data files. This patch introduces a new startup flag, -convert_legacy_hive_parquet_utc_timestamps for the impalad daemon. Specify -convert_legacy_hive_parquet_utc_timestamps=true to make Impala recognize Parquet data files written by Hive and automatically adjust TIMESTAMP values read from those files into the UTC time zone for compatibility with other Impala TIMESTAMP processing. Although this setting is currently turned off by default, consider enabling it if practical in your environment, for maximum interoperability with Hive-created Parquet files.
Bug: IMPALA-1658
IoMgr infinite loop opening/closing file when shorter than cached metadata size
If a table data file was replaced by a shorter file outside of Impala, such as with INSERT OVERWRITE in Hive producing an empty output file, subsequent Impala queries could hang.
Bug: IMPALA-1794
Issues Fixed in the 2.0.3 Release / CDH 5.2.4
This section lists the most significant issues fixed in Impala 2.0.3.
For the full list of fixed issues in Impala 2.0.3, see this report in the JIRA system.
Anti join could produce incorrect results when spilling
An anti-join query (or a NOT EXISTS operation that was rewritten internally into an anti-join) could produce incorrect results if Impala reached its memory limit, causing the query to write temporary results to disk.
Bug: IMPALA-1471
Row Count Mismatch: Partition pruning with NULL
A query against a partitioned table could return incorrect results if the WHERE clause compared the partition key to NULL using operators such as = or !=.
Bug: IMPALA-1535
Fetch column stats in bulk using new (Hive .13) HMS APIs
The performance of the COMPUTE STATS statement and queries was improved, particularly for wide tables.
Bug: IMPALA-1120
Issues Fixed in the 2.0.2 Release / CDH 5.2.3
This section lists the most significant issues fixed in Impala 2.0.2.
For the full list of fixed issues in Impala 2.0.2, see this report in the JIRA system.
Continue reading:
GROUP BY on STRING column produces inconsistent results
Some operations in queries submitted through Hue or other HiveServer2 clients could produce inconsistent results.
Bug: IMPALA-1453
Fix leaked file descriptor and excessive file descriptor use
Impala could encounter an error from running out of file descriptors. The fix reduces the amount of time file descriptors are kept open, and avoids leaking file descriptors when read operations encounter errors.
unix_timestamp() does not return correct time
The unix_timestamp() function could return a constant value 1 instead of a representation of the time.
Bug: IMPALA-1623
Impala should randomly select cached replica
To avoid putting too heavy a load on any one node, Impala now randomizes which scan node processes each HDFS data block rather than choosing the first cached block replica.
Bug: IMPALA-1586
Impala does not always give short name to Llama.
In clusters secured by Kerberos or LDAP, a discrepancy in internal transmission of user names could cause a communication error with Llama.
Bug: IMPALA-1606
accept unmangled native UDF symbols
The CREATE FUNCTION statement could report that it could not find a function entry point within the .so file for a UDF written in C++, even if the corresponding function was present.
Bug: IMPALA-1475
Issues Fixed in the 2.0.1 Release / CDH 5.2.1
This section lists the most significant issues fixed in Impala 2.0.1.
For the full list of fixed issues in Impala 2.0.1, see this report in the JIRA system.
Continue reading:
Queries fail with metastore exception after upgrade and compute stats
After running the COMPUTE STATS statement on an Impala table, subsequent queries on that table could fail with the exception message Failed to load metadata for table: default.stats_test.
Bug: IMPALA-1416
Workaround: Upgrading to CDH 5.2.1, or another level of CDH that includes the fix for HIVE-8627, prevents the problem from affecting future COMPUTE STATS statements. On affected levels of CDH, or for Impala tables that have become inaccessible, the workaround is to disable the hive.metastore.try.direct.sql setting in the Hive metastore hive-site.xml file and issue the INVALIDATE METADATA statement for the affected table. You do not need to rerun the COMPUTE STATS statement for the table.
Issues Fixed in the 2.0.0 Release / CDH 5.2.0
This section lists the most significant issues fixed in Impala 2.0.0.
For the full list of fixed issues in Impala 2.0.0, see this report in the JIRA system.
Continue reading:
- Join Hint is dropped when used inside a view
- WHERE condition ignored in simple query with RIGHT JOIN
- Query with self joined table may produce incorrect results
- Incorrect plan after reordering predicates (inner join following outer join)
- Combining fragments with compatible data partitions can lead to incorrect results due to type incompatibilities (missing casts).
- Predicate dropped: Inline view + DISTINCT aggregate in outer query
- Reuse of a column in JOIN predicate may lead to incorrect results
- Usage of TRUNC with string timestamp reliably crashes node
- Timestamp Cast Returns invalid TIMESTAMP
- IllegalStateException upon JOIN of DECIMAL columns with different precision
- Allow creating Avro tables without column definitions. Allow COMPUTE STATS to always work on Impala-created Avro tables.
- Ensure all webserver output is escaped
- Queries with union in inline view have empty resource requests
- Impala does not employ ACLs when checking path permissions for LOAD and INSERT
- Impala does not map principals to lowercase, affecting Sentry authorisation
Join Hint is dropped when used inside a view
Hints specified within a view query did not take effect when the view was queried, leading to slow performance. As part of this fix, Impala now supports hints embedded within comments.
Bug: IMPALA-995"
WHERE condition ignored in simple query with RIGHT JOIN
Potential wrong results for some types of queries.
Bug: IMPALA-1101"
Query with self joined table may produce incorrect results
Potential wrong results for some types of queries.
Bug: IMPALA-1102"
Incorrect plan after reordering predicates (inner join following outer join)
Potential wrong results for some types of queries.
Bug: IMPALA-1118"
Combining fragments with compatible data partitions can lead to incorrect results due to type incompatibilities (missing casts).
Potential wrong results for some types of queries.
Bug: IMPALA-1123"
Predicate dropped: Inline view + DISTINCT aggregate in outer query
Potential wrong results for some types of queries.
Bug: IMPALA-1165"
Reuse of a column in JOIN predicate may lead to incorrect results
Potential wrong results for some types of queries.
Bug: IMPALA-1353"
Usage of TRUNC with string timestamp reliably crashes node
Serious error for certain combinations of function calls and data types.
Bug: IMPALA-1105"
Timestamp Cast Returns invalid TIMESTAMP
Serious error for certain combinations of function calls and data types.
Bug: IMPALA-1109"
IllegalStateException upon JOIN of DECIMAL columns with different precision
DECIMAL columns with different precision could not be compared in join predicates.
Bug: IMPALA-1121"
Allow creating Avro tables without column definitions. Allow COMPUTE STATS to always work on Impala-created Avro tables.
Hive-created Avro tables with columns specified by a JSON file or literal could produce errors when queried in Impala, and could not be used with the COMPUTE STATS statement. Now you can create such tables in Impala to avoid such errors.
Bug: IMPALA-1104"
Ensure all webserver output is escaped
The Impala debug web UI did not properly encode all output.
Bug: IMPALA-1133"
Queries with union in inline view have empty resource requests
Certain queries could run without obeying the limits imposed by resource management.
Bug: IMPALA-1236"
Impala does not employ ACLs when checking path permissions for LOAD and INSERT
Certain INSERT and LOAD DATA statements could fail unnecessarily, if the target directories in HDFS had restrictive HDFS permissions, but those permissions were overridden by HDFS extended ACLs.
Bug: IMPALA-1279"
Impala does not map principals to lowercase, affecting Sentry authorisation
In a Kerberos environment, the principal name was not mapped to lowercase, causing issues when a user logged in with an uppercase principal name and Sentry authorization was enabled.
Bug: IMPALA-1334"
Issues Fixed in the 1.4.4 Release / CDH 5.1.5
For the list of fixed issues, see Issues Fixed in CDH 5.1.5 in the CDH 5 Release Notes.
Issues Fixed in the 1.4.3 Release / CDH 5.1.4
Impala 1.4.3 includes fixes to address what is known as the POODLE vulnerability in SSLv3. SSLv3 access is disabled in the Impala debug web UI.
Issues Fixed in the 1.4.2 Release / CDH 5.1.3
This section lists the most significant issues fixed in Impala 1.4.2.
For the full list of fixed issues in Impala 1.4.2, see this report in the JIRA system.
Issues Fixed in the 1.4.1 Release / CDH 5.1.2
This section lists the most significant issues fixed in Impala 1.4.1.
For the full list of fixed issues in Impala 1.4.1, see this report in the JIRA system.
Continue reading:
- impalad terminating with Boost exception
- Impalad uses wrong string format when writing logs
- Update HS2 client API.
- Impalad catalog updates can fail with error: "IllegalArgumentException: fromKey out of range" at com.cloudera.impala.catalog.CatalogDeltaLog
- "Total" time counter does not capture all the network transmit time
- Impala will crash when reading certain Avro files containing bytes data
- Support specifying a custom AuthorizationProvider in Impala
impalad terminating with Boost exception
Occasionally, a non-trivial query run through Llama could encounter a serious error. The detailed error in the log was:
boost::exception_detail::clone_impl <boost::exception_detail::error_info_injector<boost::lock_error> >
Severity: High
Impalad uses wrong string format when writing logs
Impala log files could contain internal error messages due to a problem formatting certain strings. The messages consisted of a Java call stack starting with:
jni-util.cc:177] java.util.MissingFormatArgumentException: Format specifier 's'
Update HS2 client API.
A downlevel version of the HiveServer2 API could cause difficulty retrieving the precision and scale of a DECIMAL value.
Bug: IMPALA-1107
Impalad catalog updates can fail with error: "IllegalArgumentException: fromKey out of range" at com.cloudera.impala.catalog.CatalogDeltaLog
The error in the title could occur following a DDL statement. This issue was discovered during internal testing and has not been reported in customer environments.
Bug: IMPALA-1093
"Total" time counter does not capture all the network transmit time
The time for some network operations was not counted in the report of total time for a query, making it difficult to diagnose network-related performance issues.
Bug: IMPALA-1131
Impala will crash when reading certain Avro files containing bytes data
Certain Avro fields for byte data could cause Impala to be unable to read an Avro data file, even if the field was not part of the Impala table definition. With this fix, Impala can now read these Avro data files, although Impala queries cannot refer to the "bytes" fields.
Bug: IMPALA-1149
Support specifying a custom AuthorizationProvider in Impala
The --authorization_policy_provider_class option for impalad was added back. This option specifies a custom AuthorizationProvider class rather than the default HadoopGroupAuthorizationProvider. It had been used for internal testing, then removed in Impala 1.4.0, but it was considered useful by some customers.
Bug: IMPALA-1142
Issues Fixed in the 1.4.0 Release / CDH 5.1.0
This section lists the most significant issues fixed in Impala 1.4.0.
For the full list of fixed issues in Impala 1.4.0, see this report in the JIRA system.
Continue reading:
- Failed DCHECK in disk-io-mgr-reader-context.cc:174
- impala-shell only works with ASCII characters
- The extended view definition SQL text in Views created by Impala should always have fully-qualified table names
- Impala forgets about partitions with non-existant locations
- CREATE TABLE LIKE fails if source is a view
- Improve partition pruning time
- Improve compute stats performance
- When I run CREATE TABLE new_table LIKE avro_table, the schema does not get mapped properly from an avro schema to a hive schema
- Race condition in IoMgr. Blocked ranges enqueued after cancel.
- Deadlock in scan node
Failed DCHECK in disk-io-mgr-reader-context.cc:174
The serious error in the title could occur, with the supplemental message:
num_used_buffers_ < 0: #used=-1 during cancellation HDFS cached data
The issue was due to the use of HDFS caching with data files accessed by Impala. Support for HDFS caching in Impala was introduced in Impala 1.4.0 for CDH 5.1.0. The fix for this issue was backported to Impala 1.3.x, and is the only change in Impala 1.3.2 for CDH 5.0.4.
Bug: IMPALA-1019
Workaround: On CDH 5.0.x, upgrade to CDH 5.0.4 with Impala 1.3.2, where this issue is fixed. In Impala 1.3.0 or 1.3.1 on CDH 5.0.x, do not use HDFS caching for Impala data files in Impala internal or external tables. If some of these data files are cached (for example because they are used by other components that take advantage of HDFS caching), set the query option DISABLE_CACHED_READS=true. To set that option for all Impala queries across all sessions, start impalad with the -default_query_options option and include this setting in the option argument, or on a cluster managed by Cloudera Manager, fill in this option setting on the Impala Daemon options page.
Resolution: This issue is fixed in Impala 1.3.2 for CDH 5.0.4. The addition of HDFS caching support in Impala 1.4 means that this issue does not apply to any new level of Impala on CDH 5.
impala-shell only works with ASCII characters
The impala-shell interpreter could encounter errors processing SQL statements containing non-ASCII characters.
Bug: IMPALA-489
The extended view definition SQL text in Views created by Impala should always have fully-qualified table names
When a view was accessed while inside a different database, references to tables were not resolved unless the names were fully qualified when the view was created.
Bug: IMPALA-962
Impala forgets about partitions with non-existant locations
If an ALTER TABLE specified a non-existent HDFS location for a partition, afterwards Impala would not be able to access the partition at all.
Bug: IMPALA-741
CREATE TABLE LIKE fails if source is a view
The CREATE TABLE LIKE clause was enhanced to be able to create a table with the same column definitions as a view. The resulting table is a text table unless the STORED AS clause is specified, because a view does not have an associated file format to inherit.
Bug: IMPALA-834
Improve partition pruning time
Operations on tables with many partitions could be slow due to the time to evaluate which partitions were affected. The partition pruning code was speeded up substantially.
Bug: IMPALA-887
Improve compute stats performance
The performance of the COMPUTE STATS statement was improved substantially. The efficiency of its internal operations was improved, and some statistics are no longer gathered because they are not currently used for planning Impala queries.
Bug: IMPALA-1003
When I run CREATE TABLE new_table LIKE avro_table, the schema does not get mapped properly from an avro schema to a hive schema
After a CREATE TABLE LIKE statement using an Avro table as the source, the new table could have incorrect metadata and be inaccessible, depending on how the original Avro table was created.
Bug: IMPALA-185
Race condition in IoMgr. Blocked ranges enqueued after cancel.
Impala could encounter a serious error after a query was cancelled.
Bug: IMPALA-1046
Deadlock in scan node
A deadlock condition could make all impalad daemons hang, making the cluster unresponsive for Impala queries.
Bug: IMPALA-1083
Issues Fixed in the 1.3.3 Release / CDH 5.0.5
Impala 1.3.3 includes fixes to address what is known as the POODLE vulnerability in SSLv3. SSLv3 access is disabled in the Impala debug web UI.
Issues Fixed in the 1.3.2 Release / CDH 5.0.4
This backported bug fix is the only change between Impala 1.3.1 and Impala 1.3.2.
Failed DCHECK in disk-io-mgr-reader-context.cc:174
The serious error in the title could occur, with the supplemental message:
num_used_buffers_ < 0: #used=-1 during cancellation HDFS cached data
The issue was due to the use of HDFS caching with data files accessed by Impala. Support for HDFS caching in Impala was introduced in Impala 1.4.0 for CDH 5.1.0. The fix for this issue was backported to Impala 1.3.x, and is the only change in Impala 1.3.2 for CDH 5.0.4.
Bug: IMPALA-1019
Workaround: On CDH 5.0.x, upgrade to CDH 5.0.4 with Impala 1.3.2, where this issue is fixed. In Impala 1.3.0 or 1.3.1 on CDH 5.0.x, do not use HDFS caching for Impala data files in Impala internal or external tables. If some of these data files are cached (for example because they are used by other components that take advantage of HDFS caching), set the query option DISABLE_CACHED_READS=true. To set that option for all Impala queries across all sessions, start impalad with the -default_query_options option and include this setting in the option argument, or on a cluster managed by Cloudera Manager, fill in this option setting on the Impala Daemon options page.
Resolution: This issue is fixed in Impala 1.3.2 for CDH 5.0.4. The addition of HDFS caching support in Impala 1.4 means that this issue does not apply to any new level of Impala on CDH 5.
Issues Fixed in the 1.3.1 Release / CDH 5.0.3
This section lists the most significant issues fixed in Impala 1.3.1.
For the full list of fixed issues in Impala 1.3.1, see this report in the JIRA system.
Continue reading:
- Impalad crashes when left joining inline view that has aggregate using distinct
- Incorrect result with group by query with null value in group by data
- Drop Function does not clear local library cache
- Compute stats doesn't propagate underlying error correctly
- Inserts should respect changes in partition location
- Text data with carriage returns generates wrong results for count(*)
- IO Mgr should take instance memory limit into account when creating io buffers
- Impala should provide an option for new sub directories to automatically inherit the permissions of the parent directory
- Illegal state exception (or crash) in query with UNION in inline view
- INSERT column reordering doesn't work with SELECT clause
Impalad crashes when left joining inline view that has aggregate using distinct
Impala could encounter a severe error in a query combining a left outer join with an inline view containing a COUNT(DISTINCT) operation.
Bug: IMPALA-904
Incorrect result with group by query with null value in group by data
If the result of a GROUP BY operation is NULL, the resulting row might be omitted from the result set. This issue depends on the data values and data types in the table.
Bug: IMPALA-901
Drop Function does not clear local library cache
When a UDF is dropped through the DROP FUNCTION statement, and then the UDF is re-created with a new .so library or JAR file, the original version of the UDF is still used when the UDF is called from queries.
Bug: IMPALA-786
Workaround: Restart the impalad daemon on all nodes.
Compute stats doesn't propagate underlying error correctly
If a COMPUTE STATS statement encountered an error, the error message is "Query aborted" with no further detail. Common reasons why a COMPUTE STATS statement might fail include network errors causing the coordinator node to lose contact with other impalad instances, and column names that match Impala reserved words. (Currently, if a column name is an Impala reserved word, COMPUTE STATS always returns an error.)
Bug: IMPALA-762
Inserts should respect changes in partition location
After an ALTER TABLE statement that changes the LOCATION property of a partition, a subsequent INSERT statement would always use a path derived from the base data directory for the table.
Bug: IMPALA-624
Text data with carriage returns generates wrong results for count(*)
A COUNT(*) operation could return the wrong result for text tables using nul characters (ASCII value 0) as delimiters.
Bug: IMPALA-13
Workaround: Impala adds support for ASCII 0 characters as delimiters through the clause FIELDS TERMINATED BY '\0'.
IO Mgr should take instance memory limit into account when creating io buffers
Impala could allocate more memory than necessary during certain operations.
Bug: IMPALA-488
Workaround: Before issuing a COMPUTE STATS statement for a Parquet table, reduce the number of threads used in that operation by issuing SET NUM_SCANNER_THREADS=2 in impala-shell. Then issue UNSET NUM_SCANNER_THREADS before continuing with queries.
Impala should provide an option for new sub directories to automatically inherit the permissions of the parent directory
When new subdirectories are created underneath a partitioned table by an INSERT statement, previously the new subdirectories always used the default HDFS permissions for the impala user, which might not be suitable for directories intended to be read and written by other components also.
Bug: IMPALA-827
Resolution: In Impala 1.3.1 and higher, you can specify the --insert_inherit_permissions configuration when starting the impalad daemon.
Illegal state exception (or crash) in query with UNION in inline view
Impala could encounter a severe error in a query where the FROM list contains an inline view that includes a UNION. The exact type of the error varies.
Bug: IMPALA-888
INSERT column reordering doesn't work with SELECT clause
The ability to specify a subset of columns in an INSERT statement, with order different than in the target table, was not working as intended.
Bug: IMPALA-945
Issues Fixed in the 1.3.0 Release / CDH 5.0.0
This section lists the most significant issues fixed in Impala 1.3.0, primarily issues that could cause wrong results, or cause problems running the COMPUTE STATS statement, which is very important for performance and scalability.
For the full list of fixed issues, see this report in the JIRA system.
Continue reading:
- Inner join after right join may produce wrong results
- Incorrect results with codegen on multi-column group by with NULLs.
- Using distinct inside aggregate function may cause incorrect result when using having clause
- Aggregation on union inside (inline) view not distributed properly.
- Wrong expression may be used in aggregate query if there are multiple similar expressions
- Incorrect results when changing the order of aggregates in the select list with codegen enabled
- Union queries give Wrong result in a UNION followed by SIGSEGV in another union
- String data in MR-produced parquet files may be read incorrectly
- Compute stats need to use quotes with identifiers that are Impala keywords
- COMPUTE STATS child queries do not inherit parent query options.
- COMPUTE STATS should update partitions in batches
- Fail early (in analysis) when COMPUTE STATS is run against Avro table with no columns
Inner join after right join may produce wrong results
The automatic join reordering optimization could incorrectly reorder queries with an outer join or semi join followed by an inner join, producing incorrect results.
Bug: IMPALA-860
Workaround: Including the STRAIGHT_JOIN keyword in the query prevented the issue from occurring.
Incorrect results with codegen on multi-column group by with NULLs.
A query with a GROUP BY clause referencing multiple columns could introduce incorrect NULL values in some columns of the result set. The incorrect NULL values could appear in rows where a different GROUP BY column actually did return NULL.
Bug: IMPALA-850
Using distinct inside aggregate function may cause incorrect result when using having clause
A query could return incorrect results if it combined an aggregate function call, a DISTINCT operator, and a HAVING clause, without a GROUP BY clause.
Bug: IMPALA-845
Aggregation on union inside (inline) view not distributed properly.
An aggregation query or a query with ORDER BY and LIMIT could be executed on a single node in some cases, rather than distributed across the cluster. This issue affected queries whose FROM clause referenced an inline view containing a UNION.
Bug: IMPALA-831
Wrong expression may be used in aggregate query if there are multiple similar expressions
If a GROUP BY query referenced the same columns multiple times using different operators, result rows could contain multiple copies of the same expression.
Bug: IMPALA-817
Incorrect results when changing the order of aggregates in the select list with codegen enabled
Referencing the same columns in both a COUNT() and a SUM() call in the same query, or some other combinations of aggregate function calls, could incorrectly return a result of 0 from one of the aggregate functions. This issue affected references to TINYINT and SMALLINT columns, but not INT or BIGINT columns.
Bug: IMPALA-765
Workaround: Setting the query option DISABLE_CODEGEN=TRUE prevented the incorrect results. Switching the order of the function calls could also prevent the issue from occurring.
Union queries give Wrong result in a UNION followed by SIGSEGV in another union
A UNION query could produce a wrong result, followed by a serious error for a subsequent UNION query.
Bug: IMPALA-723
String data in MR-produced parquet files may be read incorrectly
Impala could return incorrect string results when reading uncompressed Parquet data files containing multiple row groups. This issue only affected Parquet data files produced by MapReduce jobs.
Bug: IMPALA-729
Compute stats need to use quotes with identifiers that are Impala keywords
Using a column or table name that conflicted with Impala keywords could prevent running the COMPUTE STATS statement for the table.
Bug: IMPALA-777
COMPUTE STATS child queries do not inherit parent query options.
The COMPUTE STATS statement did not use the setting of the MEM_LIMIT query option in impala-shell, potentially causing problems gathering statistics for wide Parquet tables.
Bug: IMPALA-903
COMPUTE STATS should update partitions in batches
The COMPUTE STATS statement could be slow or encounter a timeout while analyzing a table with many partitions.
Bug: IMPALA-880
Fail early (in analysis) when COMPUTE STATS is run against Avro table with no columns
If the columns for an Avro table were all defined in the TBLPROPERTIES or SERDEPROPERTIES clauses, the COMPUTE STATS statement would fail after completely analyzing the table, potentially causing a long delay. Although the COMPUTE STATS statement still does not work for such tables, now the problem is detected and reported immediately.
Bug: IMPALA-867
Workaround: Re-create the Avro table with columns defined in SQL style, using the output of SHOW CREATE TABLE. (See the JIRA page for detailed steps.)
Issues Fixed in the 1.2.4 Release
This section lists the most significant issues fixed in Impala 1.2.4. For the full list of fixed issues, see this report in the JIRA system.
Continue reading:
- The Catalog Server exits with an OOM error after a certain number of CREATE statements
- Catalog Server consumes excessive cpu cycle
- Query against Avro table crashes Impala with codegen enabled
- Statestore seems to send concurrent heartbeats to the same subscriber leading to repeated "Subscriber 'hostname' is registering with statestore, ignoring update" messages
- Join predicate incorrectly ignored
- Query result differing between Impala and Hive
- ArrayIndexOutOfBoundsException / Invalid query handle when reading large HBase cell
- select with distinct and full outer join, impalad coredump
- Impala cannot load tables with more than Short.MAX_VALUE number of partitions
- Various issues with HBase row key specification
The Catalog Server exits with an OOM error after a certain number of CREATE statements
A large number of concurrent CREATE TABLE statements can cause the catalogd process to consume excessive memory, and potentially be killed due to an out-of-memory condition.
Bug: IMPALA-818
Workaround: Restart the catalogd service and re-try the DDL operations that failed.
Catalog Server consumes excessive cpu cycle
A large number of tables and partitions could result in unnecessary CPU overhead during Impala idle time and background operations.
Bug: IMPALA-821
Resolution: Catalog server processing was optimized in several ways.
Query against Avro table crashes Impala with codegen enabled
A query against a TIMESTAMP column in an Avro table could encounter a serious issue.
Bug: IMPALA-828
Workaround: Set the query option DISABLE_CODEGEN=TRUE
Statestore seems to send concurrent heartbeats to the same subscriber leading to repeated "Subscriber 'hostname' is registering with statestore, ignoring update" messages
Impala nodes could produce repeated error messages after recovering from a communication error with the statestore service.
Bug: IMPALA-809
Join predicate incorrectly ignored
A join query could produce wrong results if multiple equality comparisons between the same tables referred to the same column.
Bug: IMPALA-805
Query result differing between Impala and Hive
Certain outer join queries could return wrong results. If one of the tables involved in the join was an inline view, some tests from the WHERE clauses could be applied to the wrong phase of the query.
ArrayIndexOutOfBoundsException / Invalid query handle when reading large HBase cell
An HBase cell could contain a value larger than 32 KB, leading to a serious error when Impala queries that table. The error could occur even if the applicable row is not part of the result set.
Bug: IMPALA-715
Workaround: Use smaller values in the HBase table, or exclude the column containing the large value from the result set.
select with distinct and full outer join, impalad coredump
A query involving a DISTINCT operator combined with a FULL OUTER JOIN could encounter a serious error.
Bug: IMPALA-735
Workaround: Set the query option DISABLE_CODEGEN=TRUE
Impala cannot load tables with more than Short.MAX_VALUE number of partitions
If a table had more than 32,767 partitions, Impala would not recognize the partitions above the 32K limit and query results could be incomplete.
Bug: IMPALA-749
Various issues with HBase row key specification
Queries against HBase tables could fail with an error if the row key was compared to a function return value rather than a string constant. Also, queries against HBase tables could fail if the WHERE clause contained combinations of comparisons that could not possibly match any row key.
Resolution: Queries now return appropriate results when function calls are used in the row key comparison. For queries involving non-existent row keys, such as WHERE row_key IS NULL or where the lower bound is greater than the upper bound, the query succeeds and returns an empty result set.
Issues Fixed in the 1.2.3 Release
This release is a fix release that supercedes Impala 1.2.2, with the same features and fixes as 1.2.2 plus one additional fix for compatibility with Parquet files generated outside of Impala by components such as Hive, Pig, or MapReduce.
Continue reading:
Impala cannot read Parquet files with multiple row groups
The parquet-mr library included with CDH4.5 writes files that are not readable by Impala, due to the presence of multiple row groups. Queries involving these data files might result in a crash or a failure with an error such as "Column chunk should not contain two dictionary pages".
This issue does not occur for Parquet files produced by Impala INSERT statements, because Impala only produces files with a single row group.
Bug: IMPALA-720
Issues Fixed in the 1.2.2 Release
This section lists the most significant issues fixed in Impala 1.2.2. For the full list of fixed issues, see this report in the JIRA system.
Continue reading:
- Order of table references in FROM clause is critical for optimal performance
- Parquet in CDH4.5 writes data files that are sometimes unreadable by Impala
- Deadlock in statestore when unregistering a subscriber and building a topic update
- IllegalStateException when doing a union involving a group by
- Impala Parquet Writer hit DCHECK in RleEncoder
- Hive UDF jars cannot be loaded by the FE
Order of table references in FROM clause is critical for optimal performance
Impala does not currently optimize the join order of queries; instead, it joins tables in the order in which they are listed in the FROM clause. Queries that contain one or more large tables on the right hand side of joins (either an explicit join expressed as a JOIN statement or a join implicit in the list of table references in the FROM clause) may run slowly or crash Impala due to out-of-memory errors. For example:
SELECT ... FROM small_table JOIN large_table
Anticipated Resolution: Fixed in Impala 1.2.2.
Workaround: In Impala 1.2.2 and higher, use the COMPUTE STATS statement to gather statistics for each table involved in the join query, after data is loaded. Prior to Impala 1.2.2, modify the query, if possible, to join the largest table first. For example:
SELECT ... FROM small_table JOIN large_table
should be modified to:
SELECT ... FROM large_table JOIN small_table
Parquet in CDH4.5 writes data files that are sometimes unreadable by Impala
Some Parquet files could be generated by other components that Impala could not read.
Bug: IMPALA-694
Resolution: The underlying issue is being addressed by a fix in the CDH Parquet libraries. Impala 1.2.2 works around the problem and reads the existing data files.
Deadlock in statestore when unregistering a subscriber and building a topic update
The statestore service cound experience an internal error leading to a hang.
Bug: IMPALA-699
IllegalStateException when doing a union involving a group by
A UNION query where one side involved a GROUP BY operation could cause a serious error.
Bug: IMPALA-687
Impala Parquet Writer hit DCHECK in RleEncoder
A serious error could occur when doing an INSERT into a Parquet table.
Bug: IMPALA-689
Hive UDF jars cannot be loaded by the FE
If the JAR file for a Java-based Hive UDF was not in the CLASSPATH, the UDF could not be called during a query.
Bug: IMPALA-695
Issues Fixed in the 1.2.1 Release
This section lists the most significant issues fixed in Impala 1.2.1. For the full list of fixed issues, see this report in the JIRA system.
Scanners use too much memory when reading past scan range
While querying a table with long column values, Impala could over-allocate memory leading to an out-of-memory error. This problem was observed most frequently with tables using uncompressed RCFile or text data files.
Bug: IMPALA-525
Resolution: Fixed in 1.2.1
Join node consumes memory way beyond mem-limit
A join query could allocate a temporary work area that was larger than needed, leading to an out-of-memory error. The fix makes Impala return unused memory to the system when the memory limit is reached, avoiding unnecessary memory errors.
Bug: IMPALA-657
Resolution: Fixed in 1.2.1
Excessive memory consumption when query tables with 1k columns (Parquet file)
Impala could encounter an out-of-memory condition setting up work areas for Parquet tables with many columns. The fix reduces the size of the allocated memory when not actually needed to hold table data.
Bug: IMPALA-652
Resolution: Fixed in 1.2.1
Issues Fixed in the 1.2.0 Beta Release
This section lists the most significant issues fixed in Impala 1.2 (beta). For the full list of fixed issues, see this report in the JIRA system.
Issues Fixed in the 1.1.1 Release
This section lists the most significant issues fixed in Impala 1.1.1. For the full list of fixed issues, see this report in the JIRA system.
Continue reading:
- Unexpected LLVM Crash When Querying Doubles on CentOS 5.x
- "block size is too big" error with Snappy-compressed RCFile containing null
- Cannot query RC file for table that has more columns than the data file
- Views Sometimes Not Utilizing Partition Pruning
- Update the serde name we write into the metastore for Parquet tables
- Selective queries over large tables produce unnecessary memory consumption
- Impala stopped to query AVRO tables
- Impala continues to allocate more memory even though it has exceed its mem-limit
Unexpected LLVM Crash When Querying Doubles on CentOS 5.x
Certain queries involving DOUBLE columns could fail with a serious error. The fix improves the generation of native machine instructions for certain chipsets.
Bug: IMPALA-477
"block size is too big" error with Snappy-compressed RCFile containing null
Queries could fail with a "block size is too big" error, due to NULL values in RCFile tables using Snappy compression.
Bug: IMPALA-482
Cannot query RC file for table that has more columns than the data file
Queries could fail if an Impala RCFile table was defined with more columns than in the corresponding RCFile data files.
Bug: IMPALA-510
Views Sometimes Not Utilizing Partition Pruning
Certain combinations of clauses in a view definition for a partitioned table could result in inefficient performance and incorrect results.
Bug: IMPALA-495
Update the serde name we write into the metastore for Parquet tables
The SerDes class string written into Parquet data files created by Impala was updated for compatibility with Parquet support in Hive. See Incompatible Changes Introduced in Impala 1.1.1 for the steps to update older Parquet data files for Hive compatibility.
Bug: IMPALA-485
Selective queries over large tables produce unnecessary memory consumption
A query returning a small result sets from a large table could tie up memory unnecessarily for the duration of the query.
Bug: IMPALA-534
Impala stopped to query AVRO tables
Queries against Avro tables could fail depending on whether the Avro schema URL was specified in the TBLPROPERTIES or SERDEPROPERTIES field. The fix causes Impala to check both fields for the schema URL.
Bug: IMPALA-538
Impala continues to allocate more memory even though it has exceed its mem-limit
Queries could allocate substantially more memory than specified in the impalad -mem_limit startup option. The fix causes more frequent checking of the limit during query execution.
Bug: IMPALA-520
Issues Fixed in the 1.1.0 Release
This section lists the most significant issues fixed in Impala 1.1. For the full list of fixed issues, see this report in the JIRA system.
Continue reading:
- 10-20% perf regression for most queries across all table formats
- planner fails with "Join requires at least one equality predicate between the two tables" when "from" table order does not match "where" join order
- Parquet writer uses excessive memory with partitions
- Comments in impala-shell in interactive mode are not handled properly causing syntax errors or wrong results
- Cancelled queries sometimes aren't removed from the inflight query list
- Impala's 1.0.1 Shell Broke Python 2.4 Compatibility (AttributeError: 'module' object has no attribute 'field_size_limit)
10-20% perf regression for most queries across all table formats
This issue is due to a performance tradeoff between systems running many queries concurrently, and systems running a single query. Systems running only a single query could experience lower performance than in early beta releases. Systems running many queries simultaneously should experience higher performance than in the beta releases.
planner fails with "Join requires at least one equality predicate between the two tables" when "from" table order does not match "where" join order
A query could fail if it involved 3 or more tables and the last join table was specified as a subquery.
Bug: IMPALA-85
Parquet writer uses excessive memory with partitions
INSERT statements against partitioned tables using the Parquet format could use excessive amounts of memory as the number of partitions grew large.
Bug: IMPALA-257
Comments in impala-shell in interactive mode are not handled properly causing syntax errors or wrong results
The impala-shell interpreter did not accept comment entered at the command line, making it problematic to copy and paste from scripts or other code examples.
Bug: IMPALA-192
Cancelled queries sometimes aren't removed from the inflight query list
The Impala web UI would sometimes display a query as if it were still running, after the query was cancelled.
Bug: IMPALA-364
Impala's 1.0.1 Shell Broke Python 2.4 Compatibility (AttributeError: 'module' object has no attribute 'field_size_limit)
The impala-shell command in Impala 1.0.1 does not work with Python 2.4, which is the default on Red Hat 5.
For the impala-shell command in Impala 1.0, the -o option (pipe output to a file) does not work with Python 2.4.
Bug: IMPALA-396
Issues Fixed in the 1.0.1 Release
This section lists the most significant issues fixed in Impala 1.0.1. For the full list of fixed issues, see this report in the JIRA system.
Continue reading:
- Impala parquet scanner cannot read all data files generated by other frameworks
- Impala is unable to query RCFile tables which describe fewer columns than the file's header.
- Impala does not correctly substitute _HOST with hostname in --principal
- HBase query missed the last region
- Hbase region changes are not handled correctly
- Query state for successful create table is EXCEPTION
- Double check release of JNI-allocated byte-strings
- Impala returns 0 for bad time values in UNIX_TIMESTAMP, Hive returns NULL
- INSERT INTO TABLE SELECT <constant> does not work.
Impala parquet scanner cannot read all data files generated by other frameworks
Impala might issue an erroneous error message when processing a Parquet data file produced by a non-Impala Hadoop component.
Bug: IMPALA-333
Resolution: Fixed
Impala is unable to query RCFile tables which describe fewer columns than the file's header.
If an RCFile table definition had fewer columns than the fields actually in the data files, queries would fail.
Bug: IMPALA-293
Resolution: Fixed
Impala does not correctly substitute _HOST with hostname in --principal
The _HOST placeholder in the --principal startup option was not substituted with the correct hostname, potentially leading to a startup error in setups using Kerberos authentication.
Bug: IMPALA-351
Resolution: Fixed
HBase query missed the last region
A query for an HBase table could omit data from the last region.
Bug: IMPALA-356
Resolution: Fixed
Hbase region changes are not handled correctly
After a region in an HBase table was split or moved, an Impala query might return incomplete or out-of-date results.
Bug: IMPALA-300
Resolution: Fixed
Query state for successful create table is EXCEPTION
After a successful CREATE TABLE statement, the corresponding query state would be incorrectly reported as EXCEPTION.
Bug: IMPALA-349
Resolution: Fixed
Double check release of JNI-allocated byte-strings
Operations involving calls to the Java JNI subsystem (for example, queries on HBase tables) could allocate memory but not release it.
Bug: IMPALA-358
Resolution: Fixed
Impala returns 0 for bad time values in UNIX_TIMESTAMP, Hive returns NULL
Impala returns 0 for bad time values in UNIX_TIMESTAMP, Hive returns NULL.
Impala:
impala> select UNIX_TIMESTAMP('10:02:01') ; impala> 0
Hive:
hive> select UNIX_TIMESTAMP('10:02:01') FROM tmp; hive> NULL
Bug: IMPALA-16
Anticipated Resolution: Fixed
INSERT INTO TABLE SELECT <constant> does not work.
Insert INTO TABLE SELECT <constant> will not insert any data and may return an error.
Anticipated Resolution: Fixed
Issues Fixed in the 1.0 GA Release
Here are the major user-visible issues fixed in Impala 1.0. For a full list of fixed issues, see this report in the public issue tracker.
Continue reading:
- Undeterministically receive "ERROR: unknown row bach destination..." and "ERROR: Invalid query handle" from impala shell when running union query
- Insert with NULL partition keys results in SIGSEGV.
- INSERT queries don't show completed profiles on the debug webpage
- Impala HBase scan is very slow
- Add some library version validation logic to impalad when loading impala-lzo shared library
- Problems inserting into tables with TIMESTAMP partition columns leading table metadata loading failures and failed dchecks
- Ctrl-C sometimes interrupts shell in system call, rather than cancelling query
- Empty string partition value causes metastore update failure
- Round() does not output the right precision
- Cannot cast string literal to string
- Excessive mem usage for certain queries which are very selective
- HdfsScanNode crashes in UpdateCounters
- Parquet performance issues on large dataset
- impala not populating hive metadata correctly for create table
- impala daemons die if statestore goes down
- Constant SELECT clauses do not work in subqueries
- Right outer Join includes NULLs as well and hence wrong result count
- Parquet scanner hangs for some queries
Undeterministically receive "ERROR: unknown row bach destination..." and "ERROR: Invalid query handle" from impala shell when running union query
A query containing both UNION and LIMIT clauses could intermittently cause the impalad process to halt with a segmentation fault.
Bug: IMPALA-183
Resolution: Fixed
Insert with NULL partition keys results in SIGSEGV.
An INSERT statement specifying a NULL value for one of the partitioning columns could cause the impalad process to halt with a segmentation fault.
Bug: IMPALA-190
Resolution: Fixed
INSERT queries don't show completed profiles on the debug webpage
In the Impala web user interface, the profile page for an INSERT statement showed obsolete information for the statement once it was complete.
Bug: IMPALA-217
Resolution: Fixed
Impala HBase scan is very slow
Queries involving an HBase table could be slower than expected, due to excessive memory usage on the Impala nodes.
Bug: IMPALA-231
Resolution: Fixed
Add some library version validation logic to impalad when loading impala-lzo shared library
No validation was done to check that the impala-lzo shared library was compatible with the version of Impala, possibly leading to a crash when using LZO-compressed text files.
Bug: IMPALA-234
Resolution: Fixed
Workaround: Always upgrade the impala-lzo library at the same time as you upgrade Impala itself.
Problems inserting into tables with TIMESTAMP partition columns leading table metadata loading failures and failed dchecks
INSERT statements for tables partitioned on columns involving datetime types could appear to succeed, but cause errors for subsequent queries on those tables. The problem was especially serious if an improperly formatted timestamp value was specified for the partition key.
Bug: IMPALA-238
Resolution: Fixed
Ctrl-C sometimes interrupts shell in system call, rather than cancelling query
Pressing Ctrl-C in the impala-shell interpreter could sometimes display an error and return control to the shell, making it impossible to cancel the query.
Bug: IMPALA-243
Resolution: Fixed
Empty string partition value causes metastore update failure
Specifying an empty string or NULL for a partition key in an INSERT statement would fail.
Bug: IMPALA-252
Resolution: Fixed. The behavior for empty partition keys was made more compatible with the corresponding Hive behavior.
Round() does not output the right precision
The round() function did not always return the correct number of significant digits.
Bug: IMPALA-266
Resolution: Fixed
Cannot cast string literal to string
Casting from a string literal back to the same type would cause an "invalid type cast" error rather than leaving the original value unchanged.
Bug: IMPALA-267
Resolution: Fixed
Excessive mem usage for certain queries which are very selective
Some queries that returned very few rows experienced unnecessary memory usage.
Bug: IMPALA-288
Resolution: Fixed
HdfsScanNode crashes in UpdateCounters
A serious error could occur for relatively small and inexpensive queries.
Bug: IMPALA-289
Resolution: Fixed
Parquet performance issues on large dataset
Certain aggregation queries against Parquet tables were inefficient due to lower than required thread utilization.
Bug: IMPALA-292
Resolution: Fixed
impala not populating hive metadata correctly for create table
The Impala CREATE TABLE command did not fill in the owner and tbl_type columns in the Hive metastore database.
Bug: IMPALA-295
Resolution: Fixed. The metadata was made more Hive-compatible.
impala daemons die if statestore goes down
The impalad instances in a cluster could halt when the statestored process became unavailable.
Bug: IMPALA-312
Resolution: Fixed
Constant SELECT clauses do not work in subqueries
A subquery would fail if the SELECT statement inside it returned a constant value rather than querying a table.
Bug: IMPALA-67
Resolution: Fixed
Right outer Join includes NULLs as well and hence wrong result count
The result set from a right outer join query could include erroneous rows containing NULL values.
Bug: IMPALA-90
Resolution: Fixed
Parquet scanner hangs for some queries
The Parquet scanner non-deterministically hangs when executing some queries.
Bug: IMPALA-204
Resolution: Fixed
Issues Fixed in Version 0.7 of the Beta Release
Impala does not gracefully handle unsupported Hive table types (INDEX and VIEW tables)
When attempting to load metadata from an unsupported Hive table type (INDEX and VIEW tables), Impala fails with an unclear error message.
Bug: IMPALA-167
Resolution: Fixed in 0.7
DDL statements (CREATE/ALTER/DROP TABLE) are not supported in the Impala Beta Release
Resolution: Fixed in 0.7
Avro is not supported in the Impala Beta Release
Resolution: Fixed in 0.7
Workaround: None
Impala does not currently allow limiting the memory consumption of a single query
It is currently not possible to limit the memory consumption of a single query. All tables on the right hand side of JOIN statements need to be able to fit in memory. If they do not, Impala may crash due to out of memory errors.
Resolution: Fixed in 0.7
Aggregate of a subquery result set returns wrong results if the subquery contains a 'limit' and data is distributed across multiple nodes
Aggregate of a subquery result set returns wrong results if the subquery contains a 'limit' clause and data is distributed across multiple nodes. From the query plan, it looks like we are just summing the results from each worker node.
Bug: IMPALA-20
Resolution: Fixed in 0.7
Partition pruning for arbitrary predicates that are fully bound by a particular partition column
We currently cannot utilize a predicate like "country_code in ('DE', 'FR', 'US')" to do partitioning pruning, because that requires an equality predicate or a binary comparison.
We should create a superclass of planner.ValueRange, ValueSet, that can be constructed with an arbitrary predicate, and whose isInRange(analyzer, valueExpr) constructs a literal predicate by substitution of the valueExpr into the predicate.
Bug: IMPALA-144
Resolution: Fixed in 0.7
Issues Fixed in Version 0.6 of the Beta Release
Impala reads the NameNode address and port as command line parameters
Impala reads the NameNode address and port as command line parameters rather than reading them from core-site.xml. Updating the NameNode address in the core-site.xml file does not propagate to Impala.
Severity: Low
Resolution: Fixed in 0.6 - Impala reads the namenode location and port from the Hadoop configuration files, though setting -nn and -nn_port overrides this. Users are advised not to set -nn or -nn_port.
Queries may fail on secure environment due to impalad Kerberos ticket expiration
Queries may fail on secure environment due to impalad Kerberos tickets expiring. This can happen if the Impala -kerberos_reinit_interval flag is set to a value ten minutes or less. This may lead to an impalad requesting a ticket with a lifetime that is less than the time to the next ticket renewal.
Bug: IMPALA-64
Resolution: Fixed in 0.6
Concurrent queries may fail when Impala uses Thrift to communicate with the Hive Metastore
Concurrent queries may fail when Impala is using Thrift to communicate with part of the Hive Metastore such as the Hive Metastore Service. In such a case, the error get_fields failed: out of sequence response" may occur because Impala shared a single Hive Metastore Client connection across threads. With Impala 0.6, a separate connection is used for each metadata request.
Bug: IMPALA-48
Resolution: Fixed in 0.6
impalad fails to start if unable to connect to the Hive Metastore
Impala fails to start if it is unable to establish a connection with the Hive Metastore. This behavior was fixed, allowing Impala to start, even when no Metastore is available.
Bug: IMPALA-58
Resolution: Fixed in 0.6
Impala treats database names as case-sensitive in some contexts
In some queries (including "USE database" statements), database names are treated as case-sensitive. This may lead queries to fail with an IllegalStateException.
Bug: IMPALA-44
Resolution: Fixed in 0.6
Impala does not ignore hidden HDFS files
Impala does not ignore hidden HDFS files, meaning those files prefixed with a period '.' or underscore '_'. This diverges from Hive/MapReduce, which skips these files.
Bug: IMPALA-18
Resolution: Fixed in 0.6
Issues Fixed in Version 0.5 of the Beta Release
Impala may have reduced performance on tables that contain a large number of partitions
Impala may have reduced performance on tables that contain a large number of partitions. This is due to extra overhead reading/parsing the partition metadata.
Resolution: Fixed in 0.5
Backend client connections not getting cached causes an observable latency in secure clusters
Backend impalads do not cache connections to the coordinator. On a secure cluster, this introduces a latency proportional to the number of backend clients involved in query execution, as the cost of establishing a secure connection is much higher than in the non-secure case.
Bug: IMPALA-38
Resolution: Fixed in 0.5
Concurrent queries may fail with error: "Table object has not been been initialised : `PARTITIONS`"
Concurrent queries may fail with error: "Table object has not been been initialised : `PARTITIONS`". This was due to a lack of locking in the Impala table/database metadata cache.
Bug: IMPALA-30
Resolution: Fixed in 0.5
UNIX_TIMESTAMP format behaviour deviates from Hive when format matches a prefix of the time value
The Impala UNIX_TIMESTAMP(val, format) operation compares the length of format and val and returns NULL if they do not match. Hive instead effectively truncates val to the length of the format parameter.
Bug: IMPALA-15
Resolution: Fixed in 0.5
Issues Fixed in Version 0.4 of the Beta Release
Impala fails to refresh the Hive metastore if a Hive temporary configuration file is removed
Impala is impacted by Hive bug HIVE-3596 which may cause metastore refreshes to fail if a Hive temporary configuration file is deleted (normally located at /tmp/hive-<user>-<tmp_number>.xml). Additionally, the impala-shell will incorrectly report that the failed metadata refresh completed successfully.
Anticipated Resolution: To be fixed in a future release
Workaround: Restart the impalad service. Use the impalad log to check for metadata refresh errors.
lpad/rpad builtin functions is not correct.
The lpad/rpad builtin functions generate the wrong results.
Resolution: Fixed in 0.4
Files with .gz extension reported as 'not supported'
Compressed files with extensions incorrectly generate an exception.
Bug: IMPALA-14
Resolution: Fixed in 0.4
Queries with large limits would hang.
Some queries with large limits were hanging.
Resolution: Fixed in 0.4
Order by on a string column produces incorrect results if there are empty strings
Resolution: Fixed in 0.4
Issues Fixed in Version 0.3 of the Beta Release
All table loading errors show as unknown table
If Impala is unable to load the metadata for a table for any reason, a subsequent query referring to that table will return an unknown table error message, even if the table is known.
Resolution: Fixed in 0.3
A table that cannot be loaded will disappear from SHOW TABLES
After failing to load metadata for a table, Impala removes that table from the list of known tables returned in SHOW TABLES. Subsequent attempts to query the table returns 'unknown table', even if the metadata for that table is fixed.
Resolution: Fixed in 0.3
Impala cannot read from HBase tables that are not created as external tables in the hive metastore.
Attempting to select from these tables fails.
Resolution: Fixed in 0.3
Certain queries that contain OUTER JOINs may return incorrect results
Queries that contain OUTER JOINs may not return the correct results if there are predicates referencing any of the joined tables in the WHERE clause.
Resolution: Fixed in 0.3.
Issues Fixed in Version 0.2 of the Beta Release
Subqueries which contain aggregates cannot be joined with other tables or Impala may crash
Subqueries that contain an aggregate cannot be joined with another table or Impala may crash. For example:
SELECT * FROM (SELECT sum(col1) FROM some_table GROUP BY col1) t1 JOIN other_table ON (...);
Resolution: Fixed in 0.2
An insert with a limit that runs as more than one query fragment inserts more rows than the limit.
For example:
INSERT OVERWRITE TABLE test SELECT * FROM test2 LIMIT 1;
Resolution: Fixed in 0.2
Query with limit clause might fail.
For example:
SELECT * FROM test2 LIMIT 1;
Resolution: Fixed in 0.2
Files in unsupported compression formats are read as plain text.
Attempting to read such files does not generate a diagnostic.
Resolution: Fixed in 0.2
Impala server raises a null pointer exception when running an HBase query.
When querying an HBase table whose row-key is string type, the Impala server may raise a null pointer exception.
Resolution: Fixed in 0.2