Impala changelog for Cloudera Data Warehouse on premises
Review the changes introduced in Impala for Cloudera Data Warehouse on premises.
2025.0.19.1-74
-
- IMPALA-14070: Use checkedMultiply in SortNode.java
- CDPD-82599: IMPALA-14006: Bound max_instances in CreateInputCollocatedInstances
- CDPD-82303: IMPALA-14001: Start EXEC_TIME_LIMIT_S timer after backend execution begins
- CDPD-82600: IMPALA-13991: Skip CROSS_JOIN rewrite if subquery is in disjunctive
- DWX-20913: IMPALA-14029: Add Kerberos utilities to Docker image build
- IMPALA-13873: Missing equivalence conjunct in aggregation node with inline views
- IMPALA-13987: Fix stress_catalog_init_delay_ms check in RELEASE
- IMPALA-13850 (part 1): Wait until CatalogD active before resetting
- CDPD-79495: IMPALA-13759: Fix Hive ACID INSERT OVERWRITE base detection
- CDPD-81690: Ensure that OpenSSL is not using a FIPS profile for Chainguard
- IMPALA-13937: Use simpler chmod syntax to set +t on /var/tmp in Docker build
- DWX-18882: Rebase Impala Docker images on Chainguard for release builds
- IMPALA-13825: Extend Docker container build to custom base images
- IMPALA-13826: Migrate from imp to importlib in the config generator
- IMPALA-13881: Fix Workload Management Statement Expression Limit Exceeded Errors
- IMPALA-13812: Fail query for certain errors related to AI functions
- IMPALA-13724: Add hostnames for Docker host and gateway to Impala containers
- CDPD-79675: Use the toolchain mirror in us-west-2 for stack builds
- CDPD-79500: Change internal PyPI mirror to Nexus for Releng builds
- IMPALA-13671: Add Additional Debug Output
- IMPALA-13817: Impala fails to start if 'ai_endpoint' and 'ai_additional_platforms' are not set in the right order
- IMPALA-13792: Cross compile AI functions
- IMPALA-13798: Cleanup host-level remote scratch dir on shutdown
- IMPALA-13789: Defer creating Path objects in loading file metadata
- IMPALA-13783: Fix huge temp list created in catalog-update thread
- CDPD-79908: IMPALA-13804: Use redacted statement in live table
- IMPALA-13803: Fix hiveserver2_protocol_version Values in Workload Management
- IMPALA-13786: Skip rewriting expr of Hive auto-generated label
- IMPALA-13772: Fix Workload Management DMLs Timeouts
- IMPALA-13771: Fix heap-use-after-free in Cluster Membership Manager
- CDPD-78626: IMPALA-13740: (Addendum) Set velocity-engine-core.version downstream
- CDPD-78626: IMPALA-13740: Update velocity-engine-core to 2.4.1
- IMPALA-12785: Add commands to control event-processor status (#387)
- CDPD-79424: Fix missing execute_query_using_vector
- IMPALA-13201: System Table Queries Execute When Admission Queues are Full
- IMPALA-13736: Fix Use-After-Free in ExecutorGroup.RemoveExecutor
- CDPD-78505: IMPALA-13703: Cancel running queries before shutdown deadline
- CDPD-78024: IMPALA-13677: Support remote scratch directory cleanup at Impala daemon startup
- IMPALA-13594: Read Puffin stats also from older snapshots
- IMPALA-12648: Add KILL QUERY statement
- CDPD-75589: Update the golden files for ExternalFeTest
- CDPD-78207: Disable ESTIMATE_DUPLICATE_IN_PREAGG in downstream
- IMPALA-13678: Validate remote_submit_time against coordinator time
- IMPALA-13500: (Addendum) Do not assert CatalogD's partition id
- IMPALA-13666: Provide a non-null fileMetadataStats for HdfsPartition
- IMPALA-2945: Account for duplicate keys on multiple nodes preAgg
- IMPALA-13644: Generalize and move getPerInstanceNdvForCpuCosting
- IMPALA-13565: Add general AI platform support to ai_generate_text
- IMPALA-13253: Add option to enable keepalive for client connections
- IMPALA-13592: Set IV length before setting IV in OpenSsl
- IMPALA-13656: MERGE redundantly accumulates memory in HDFS WRITER
- IMPALA-13288: OAuth AuthN Support for Impala
- IMPALA-13662: Bump the ARM toolchain to support ARM builds for RHEL 9
- CDPD-77932: Use Ozone from CDP rather than the Ozone parcel
- CDPD-77710 (Part 3): Provide a non-null fileMetadataStats for HdfsPartition
- IMPALA-13154: Update metrics when loading an HDFS table
- IMPALA-13618: (Addendum) Remove IMPALA_COMMONS_IO_VERSION downstream
- CDPD-77821: IMPALA-13664: Lower datanucleus.connectionPool.maxPoolSize to 20
- IMPALA-13655: UPDATE redundantly accumulates memory in HDFS WRITER
- IMPALA-13403: Refactor the checks of skip reloading file metadata for ALTER_TABLE events
- IMPALA-13126: Obtain table read lock in EP to process partitioned event
- IMPALA-13458: Fix installing curl on Red Hat variants for dockerised tests
- CDPD-77756: Fix incorrect conflict resolution of IMPALA-13620 (#320)
- IMPALA-13361: Add INSERT * and UPDATE SET * syntax for MERGE statement
- IMPALA-13620: Refresh compute_table_stats.py script
- IMPALA-13086: Lower AggregationNode estimate using stats predicate
- IMPALA-13518: Show target name of COMMIT_TXN events in logs
- IMPALA-13305: Better thrift compatibility checks based on pyparsing
- IMPALA-13465: Trace TupleId further to reduce Agg cardinality
- IMPALA-13638: Translate apostrophe to underscore in Prometheus metric names.
- IMPALA-13641: Lazily init Parquet column read counters
- CDPD-77637: Add back a public static method removed in IMPALA-13622
- IMPALA-13637: Add ENABLE_TUPLE_ANALYSIS_IN_AGGREGATE option
- IMPALA-12141: EP shouldn't fail while releasing write lock if the lock is not held previously
- IMPALA-13362: Implement WHEN NOT MATCHED BY SOURCE syntax for MERGE statement
- IMPALA-13596: Add warnings and exceptions to reading of fair-scheduler file.
- IMPALA-13622: Fix negative cardinality bug in AggregationNode.java
- IMPALA-13628: Use impala::Thread in PeriodicCounterUpdater
- IMPALA-13619: (Addendum) Set commons-lang3 downstream
- IMPALA-13526: Fix Agg node creation order in DistributedPlanner
- IMPALA-13551: Produce the shell tarball by pip installing impala-shell
- IMPALA-13368: Fixup Redhat detection for Python >= 3.8
- IMPALA-13064: Install service for linux packaging
- IMPALA-13588: Update Puffin reading doc after IMPALA-13370
- IMPALA-13590: Use CacheLineAligned instead of CACHELINE_ALIGNED for PerFilterState
- IMPALA-13598: OPTIMIZE redundantly accumulates memory in HDFS WRITER
- IMPALA-13544: Addendum: fixed assert message.
- IMPALA-13536: Fix Workload Management Init with Catalog HA
- IMPALA-13597: Upgrade critique-gerrit-review.py to Python3
- IMPALA-13535: Add script to restore stats on PlannerTest
- IMPALA-889: Add trim() function matching ANSI SQL definition
- IMPALA-13589: SELECT INPUT__FILE__NAME can crash Impala
- IMPALA-13585: Make pip_download.py interruptible
- IMPALA-13567: Update RowsRead counter more frequently
- IMPALA-13448: Log cause when failing to flush lineage events, audit events or profiles
- IMPALA-13370: Read Puffin stats from metadata.json property if available
- IMPALA-13511: Addendum, fixed wrong case statement
- IMPALA-13556: Log GetRuntimeProfile and GetExecSummary at VLOG_QUERY
- IMPALA-13540: Calcite planner: fix wrong results for set operators
- IMPALA-13558: Workaround Python 2 tarfile issue by patching tarfile.py
- IMPALA-13511: Calcite planner: support sub-second datetime parts
- IMPALA-13513: Support decode function
- IMPALA-13543: single_node_perf_run.py must accept tpcds_partitioned
- IMPALA-13179: Make non-deterministic functions ineligible for tuple caching
- IMPALA-5792: Eliminate duplicate beeswax python code
- IMPALA-13541: Calcite planner, declare explicit_cast in operator table.
- IMPALA-13516: Fix handling of cast functions
- IMPALA-13509: Copy rows directly to OutboundRowBatch during hash partitioning
- IMPALA-12957: Support reading Inf and NaN from JSON
- IMPALA-13482: Bug fixes for lag/coalesce in analytic function.
- IMPALA-13502: Clean up around constructors
- IMPALA-13148: Show the number of in-progress Catalog operations
- IMPALA-13507: Allow disabling glog buffering via with_args fixture
- IMPALA-13327: Fix wrong unpack dir name for Apache HBase
- IMPALA-13441: Support explain statements in Impala planner
- CDPD-75833: Make impala-config-branch.sh work with ZSH
- IMPALA-12390 (part 4): Enable unnecessary-value-param
- IMPALA-12737: Query columns in workload management tables.
- IMPALA-13340: Fix missing partitions in COPY TESTCASE of LocalCatalog mode
- IMPALA-13497: Add TupleCacheBytesWritten/Read to the profile
- IMPALA-12758: Fix catalogd not setting prev_id for reloaded partitions
- IMPALA-13477: Set request_pool in QueryStateRecord for CTAS query
- CDPD-75628: Replace the datasketches.version property in java/pom.xml
- IMPALA-13395: Adds USE_APACHE_COMPONENTS=true in all-build-options job
- IMPALA-13461: Added rules to make tpcds queries work.
- IMPALA-13247: Support Reading Puffin files for the current snapshot
- IMPALA-13445: Ignore num partition for unpartitioned writes
- IMPALA-13462: Added support for functions used in tpcds
- IMPALA-13459: Handle duplicate table in same query
- IMPALA-13405: Do tuple analysis to lower AggregationNode cardinality
- IMPALA-13455: Put expressions in CNF form for performance.
- IMPALA-13412: Use the BYTES type for tuple cache entries-in-use-bytes
- IMPALA-13430: Too many RelNodes created for "IN" literals
- IMPALA-13446: Bump CDP GBN to 58457853 to get Ranger improvements
- IMPALA-11298: Allow proxy users to share hs2 session from different hosts or realms
- IMPALA-13393: Remove old javax.el config
- IMPALA-12908: (Addendum) use RUNTIME_FILTER_WAIT_TIME_MS for tuple cache TPC testing
- IMPALA-13429: Calcite planner crashing with outer join
- IMPALA-11943: Mark utf8 string functions with IR_ALWAYS_INLINE
- IMPALA-12146: Fix incorrect host memory reserved when the executor quits abnormally
- IMPALA-12216: Print timestamp for impala-shell errors
- IMPALA-13426: Log Java debug sleeps at debug
- IMPALA-13197: Implement Analytic Exprs for Calcite
- IMPALA-13181: Disable tuple caching for locations with limits
- IMPALA-13408: use a specific flag for the topic prefix cluster identifier.
- IMPALA-13186: Tag query option scope for tuple cache
- IMPALA-13427: Make connect timeout of statestore HA RPC tunable
- IMPALA-11729: Speed up start-impala-cluster.py
- IMPALA-13407: Codegen fails with struct in TOP-N
- IMPALA-13406: Switch to curl 8.10.1 to resolve CVEs
- IMPALA-13185: Include runtime filter source in key
- IMPALA-13396: Unify tmp dir management in CustomClusterTestSuite
- IMPALA-12908: Add correctness check for tuple cache
- IMPALA-12686: Switch to toolchain with basic debug info
- IMPALA-13312: Use client address from X-Forwarded-For Header in Ranger Audit Logs
- IMPALA-13384: Only install gcovr deps for coverage builds
- IMPALA-12939: Bound IMPALA_BUILD_THREADS for cgroups and memory
- IMPALA-11265: Part1: Clear GroupContentFiles once used
- CDPD-74806: Force updating Impala package version
- IMPALA-13022: Added infrastructure for implicit casting of functions
- IMPALA-13302: Restore registering all conjuncts
- CDPD-74442: Provide default implementation for IMPALA-12876
- IMPALA-13233: Improve display of instance-level skew in query timeline
- IMPALA-12876: Add catalogVersion and loaded timestamp in query profiles
- IMPALA-12594: Add flag to tune KrpcDataStreamSender mem estimate
- IMPALA-13322: Fix alter on SystemTables
- IMPALA-13378: Verify tuple ids in descriptor table received in executor side
- IMPALA-13182: Support uploading additional jars
- IMPALA-13371: Avoid throwing exception in FindFileInPath()
- IMPALA-13156: Investigation: Set explicit credential provider for S3 builds
- IMPALA-7086: Cache timezone in *_utc_timestamp()
- IMPALA-11431: Avoid getting stats for synthetic column row__id from HMS (#207)
- IMPALA-13347: Fixes TSAN Thread Leak of Workload Management Thread
- IMPALA-13344: Analyze new rewrite exprs
- CDPD-74199: Produce Java dependency tree
- IMPALA-12165: Add option for split debug information (-gsplit-dwarf)
- IMPALA-12906: Incorporate scan range information into the tuple cache key
- IMPALA-12363: Upgrade RE2 to 2023-03-01
- IMPALA-13328: Fix missing krb5-config in building impala_quickstart_client docker image
- IMPALA-12737: Refactor the Workload Management Initialization Process.
- IMPALA-12867: Filter files to OPTIMIZE based on file size
- IMPALA-13311: Hive3 INSERT failed by ClassNotFoundException: org.apache.tez.runtime.api.Event
- IMPALA-13310 Add the value of the http 'X-Forwarded-For' header to the runtime profile
- IMPALA-13082: Fix downstream build scripts to use IMPALA_JACKSON_VERSION
- IMPALA-13082: Use separate versions for jackson vs jackson-databind
- IMPALA-13317: Enhance tpc_sort_key for wider name support
- IMPALA-13303: FileSystemUtil.listFiles() should handle non-recursive case
- IMPALA-13291: Filter dmesg messages by date
- IMPALA-13274: Filter out illegal output for certain join nodes
- IMPALA-12856: Event processor should ignore processing partition with empty partition values
- IMPALA-13262: Do not always migrate inferred predicates into inline view
- IMPALA-13313: Fix ExpireQueries deadlock
- IMPALA-12954: Implement Sorting capability for Calcite planner
- IMPALA-13196 (part2): fix some urls in template files
- IMPALA-10408: Support build using Apache components
- IMPALA-13296: Check column compatibility earlier for table migration
- IMPALA-13301: Replace the aircompressor.version Property in java/pom.xml
- IMPALA-13301: Upgrade aircompressor to 0.27
- IMPALA-13240: Add gerrit comments for Thrift/FlatBuffers changes
- IMPALA-12947: Implement Calcite Planner Union and Value RelNodes
- IMPALA-13246: Smallify strings during broadcast exchange
- IMPALA-13293: Fix too long wait for initial catalog update
- IMPALA-13207: Update error message for operations on blacklist dbs
- CDPD-73367: Apply hadoop-source.patch
- IMPALA-13294: Add support for long polling to avoid client side wait
- IMPALA-13276: Revise the documentation of 'RUNTIME_FILTER_WAIT_TIME_MS'
- IMPALA-12737: Add HS2 support to the InternalServer class.
- IMPALA-13115: Add query id to error messages
- IMPALA-13071: Update the doc of Impala components
- IMPALA-13252: (Addendum) PrintId cancel query
- IMPALA-13279: Upgrade gcovr to 7.2
- IMPALA-13272: Analytic function of collections can lead to crash
- IMPALA-13252: Always use PrintId for TUniqueId
- IMPALA-13271: Correct the documentation with respect to granting privileges on URI
- IMPALA-13270: Fix IllegalStateException on runtime filter
- IMPALA-13230: Dump stacktrace for impala-shell when it receives SIGUSR1
- IMPALA-13264: Use the toolchain's gcov for bin/coverage_helper.sh
- CDPD-72923: Build and publish debug docker images
- IMPALA-13243: Patch pom.xml with Dropwizard Metrics version for CDP install.
- IMPALA-13189: Reset the database on clearing imported query profiles
- IMPALA-13256: Support more than 2G rows for COUNT(*) on jdbc table
- IMPALA-12771: Impala catalogd events-skipped may mark the wrong number
- IMPALA-12277: Fix NullPointerException for partitioned inserts when partition list is stale
- IMPALA-13077: Fix selectivity estimation for SEMI JOIN
- IMPALA-13252: Consistently use PrintId to print TUniqueId
- IMPALA-13214: Skip wait_until_connected when shell exits
- IMPALA-13243: Update Dropwizard Metrics to 4.2.x
- IMPALA-12857: Add flag to enable merge-on-read even if tables are configured with copy-on-write
- IMPALA-13226: Rename TupleCacheInfo.finalize() to finalizeHash()
- IMPALA-13208: Add cluster id to the membership and request-queue topic names
- IMPALA-13231: Gitignore auto-generated files for ranger
- IMPALA-13194: Fast-serialize position delete records
- IMPALA-13161: Fix column index overflow in DelimitedTextParser
- IMPALA-13209: Optimize ConvertRowBatchTime in ExchangeNode
- IMPALA-13088, IMPALA-13109: Use RoaringBitmap instead of sorted vector of int64s
- CDPD-72675:Impala - Upgrade Apache Derby to internal version
- IMPALA-13001: Support graceful and force shutdown for impala.sh
- IMPALA-13014: Upgrade Maven to 3.9.6
- IMPALA-13193: RuntimeFilter on parquet dictionary should evaluate NULL values
- IMPALA-12786: Optimize count(*) for JSON scans
- IMPALA-13203: Rewrite 'id = 0 OR false' as expected
- IMPALA-13196: Fully qualify urls in www/query_timeline
- IMPALA-9441,IMPALA-13170: Ops listing dbs/tables should handle db not exists
- IMPALA-12093: impala-shell to preserve all cookies
- IMPALA-13028: Strip dynamic link libraries in Linux DEB/RPM packages
- IMPALA-6311: Lower max_filter_error_rate to 10%
- IMPALA-13180: Upgrade postgresql to 42.5.6
- IMPALA-12800: Implement hashCode everywhere
- IMPALA-13106: Support larger imported query profile sizes through compression
- IMPALA-13175: Upgrade Spring Framework to 5.3.37
- IMPALA-13168: Add README file for setting up Trino
- IMPALA-13076 Add pstack and jstack to Impala Redhat docker images
- IMPALA-13136: Refactor AnalyzedFunctionCallExpr (for Calcite)
- IMPALA-13169: Specify cluster id before starting HiveServer2
- IMPALA-12940: Added filtering capability for Calcite planner
- IMPALA-12370: Allow converting timestamps to UTC when writing to Kudu
- IMPALA-13137: Add additional client fetch metrics columns to the queries page
- IMPALA-13159: Fix query cancellation caused by statestore failover
- IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()
- IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
- IMPALA-12921, IMPALA-12985: Support running Impala with locally built Ranger (#76)
- IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB
- IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy
- IMPALA-12712: Invalidate metadata on table should set better createEventId
- IMPALA-13131: Azure OpenAI API expects 'api-key' instead of 'Authorization' in the request header
- IMPALA-12562: Cast double and float to string with exact presicion
- IMPALA-12800: Add cache for isTrueWithNullSlots() evaluation
- IMPALA-11871: Skip permissions loading and check on HDFS if Ranger is enabled
- IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups
- IMPALA-13146: Download NodeJS from native toolchain
- IMPALA-12935: First pass on Calcite planner functions
- IMPALA-13130: Prioritize EndDataStream messages
- IMPALA-13119: Fix cost_ initialization at CostingSegment.java
- IMPALA-13134: DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status
- IMPALA-12705: Add /catalog_ha_info page on Statestore to show catalog HA information
- CDPD-71872: IMPALA-13120: Load failed table without need for manual invalidate (#99)
- CDPD-59574: Update exception path for Avro 1.11 (#52)
- IMPALA-12680: Fix NullPointerException during AlterTableAddPartitions
- IMPALA-13129: Move runtime filter skipping at registerRuntimeFilter
- IMPALA-13111: Fix the calculation of fragment ids for impala-gdb.py
- IMPALA-13057: Incorporate tuple/slot information into tuple cache key
- IMPALA-13108: Update version to 4.5.0-SNAPSHOT
- IMPALA-13107: Don't start query on executor if instance number equals 0
- IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column
- IMPALA-13105: Fix multiple imported query profiles fail to import/clear at once
- IMPALA-13034: Add logs and counters for HTTP profile requests blocking client fetches
- IMPALA-13102: Normalize invalid column stats from HMS
- IMPALA-13083: Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION
- IMPALA-11735: Handle CREATE_TABLE event when the db is invisible to the impala server user
- IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds
- IMPALA-13020 (part 2): Split out external vs internal Thrift max message size
- IMPALA-13020 (part 1): Change thrift_rpc_max_message_size to int64_t
- IMPALA-12559 (part 2): Fix build issue for different versions of openssl
- IMPALA-12559: Support x5c Parameter for RSA JSON Web Keys
- IMPALA-13038: Support profile tab for imported query profiles
- IMPALA-12607: Bump the GBN and fetch events specific to the db/table from the metastore
- IMPALA-13040: Add waiting mechanism in UpdateFilterFromRemote
- IMPALA-10451: Fix avro table loading failures caused by HIVE-24157
- IMPALA-11499: Refactor UrlEncode function to handle special characters
- IMPALA-12934: Added Calcite parsing files to Impala
- IMPALA-13018: Block push down of conjuncts with implicit casting on base columns for jdbc tables
- IMPALA-13058: Init first_arrival_time_ and completion_time_ with -1
- IMPALA-13061: Create query live as external table
- IMPALA-13054: Avoid revisiting children in QueryStateExpanded
- IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
- IMPALA-12977: add search and pagination to /hadoop-varz
- IMPALA-13044: Upgrade bouncycastle to 1.78
- IMPALA-13031: Enhancing logging for spilling configuration with local buffer directory details
- IMPALA-13049: Add dependency management for log4j2 to use 2.18.0
- IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables
- IMPALA-13045: Wait for impala_query_live to exist
- IMPALA-12684: Enable IMPALA_COMPRESSED_DEBUG_INFO by default
- IMPALA-13012: Lower default query_log_max_queued
- IMPALA-13005: Create Query Live table in HMS
- IMPALA-13024: Ignore slots if using default pool and empty group
- IMPALA-12872: Use Calcite for optimization - part 1: simple queries
- IMPALA-12950: Improve error message in case of out-of-range numeric conversions
- IMPALA-12543: Detect self-events before finishing DDL
- IMPALA-12933: Avoid fetching unneccessary events of unwanted types
- IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
- IMPALA-12988: Calculate an unbounded version of CpuAsk
- IMPALA-12938: add-opens for platform.cgroupv1
- IMPALA-13016: Fix ambiguous row_regex that check for no-existence
- IMPALA-12980: Translate CpuAsk into admission control slots
- IMPALA-12874: Identify active and standby catalog and statestore in the web debug endpoint
- IMPALA-12998: Add SHOW_METADATA_TABLES to ignored DDL
- IMPALA-12990: Fix impala-shell handling of unset rows_deleted
- IMPALA-11495: Add glibc version and effective locale to the Web UI
- IMPALA-12963: Return parent PID when children spawned
- IMPALA-12999: Add log4j.properties to the DEB/RPM packages
- IMPALA-12986: Base64Encode fails if the 'out_len' output parameter is passed with certain values
- IMPALA-12564: Prevent Hive loading libfesupport.so in the minicluster during TSAN runs
- IMPALA-5323: Support BINARY columns in Kudu tables
- IMPALA-12965: Add debug query option RUNTIME_FILTER_IDS_TO_SKIP
- IMPALA-12978: Fix impala-shell`s live progress with older Impalas
- IMPALA-12920: Support ai_generate_text built-in function for OpenAI's chat completion AP