Behavioral changes

Summary: Changes to Unified Analytics availability in Cloudera Data Warehouse

Before this release: When creating a new Impala virtual warehouse, you could select the Enable Unified Analytics option.

After this release: The Enable Unified Analytics option is no longer available when creating a new Impala virtual warehouse. This option is disabled in the user interface. However, you can continue to create and manage Impala Unified Analytics Virtual Warehouses using the CDP CLI.

Summary: Increased Batch Sizes for COMPUTE STATS

Before this release: The COMPUTE STATS query previously failed on tables containing more than 5000 columns. This issue was specific to wide tables and could not be resolved by dropping and rerunning the query.

After this release: To resolve this, we enable the batch retrieval or insertion of the object metadata by default value of the hive.metastore.direct.sql.batch.size property is changed from 0 to 1000, and the default value of the metastore.rawstore.batch.size property is changed from -1 to 500. After this change, COMPUTE STATS queries now run successfully on tables with more than 5000 columns.

Summary: Parquet late materialization behavior has changed

Parquet late materialization feature is enabled by default for all types including collections.

Before this release: Parquet late materialization feature was disabled by default. You would use the parquet_late_materialization_threshold query option to set the minimum number of consecutive filtered rows required to trigger late materialization. The default value was -1. The feature was not supported for collection columns.

After this release: Parquet late materialization feature is enabled by default. The parquet_late_materialization_threshold is now set to 1 if the query option is greater than or equal to 0 and there is a collection value that can be skipped. Otherwise, the value is the same as the query option, which defaults to 20

Apache Jira: IMPALA-3841

Summary: TCP Keepalive is now enabled by default for client connections

Before this release: TCP keepalive was disabled by default for client connections. Idle connections dropped by load balancers remained active in Impala, consuming service threads (fe\_service\_threads).

After this release: TCP keepalive is now enabled by default for all client connections, enhancing stability and availability. Impala is configured to check idle connections aggressively, every 10 minutes.

JIRA Issue: IMPALA-14031

Summary: Support for load-based routing in impala-proxy

Before this release: The impala-proxy used a random selection policy to choose a coordinator. This approach did not consider the current load on each coordinator, which lead to an uneven distribution of connections and potential performance bottlenecks.

After this release: The impala-proxy now uses load-based routing to decide which coordinator should handle a new session request. The Impala proxy directs the new session to the coordinator with the minimum calculated load. You can customize how this load is calculated using the following parameters:

IMPALA_PROXY_COORDINATOR_LOAD_CPU_WEIGHT: Determines the weight applied to the current percentage of CPU utilization when calculating the coordinator's load.
IMPALA_PROXY_COORDINATOR_LOAD_MEMORY_WEIGHT: Determines the weight applied to the current percentage of memory utilization when calculating the coordinator's load.

By adjusting these weights, you can tune the Impala proxy to prioritize CPU or memory headroom when routing new sessions.

Log in to the Cloudera web interface and navigate to the Cloudera Data Warehouse service.
From the Overview page, click the Virtual Warehouses tab.
Identify the Impala Virtual Warehouse you want to configure, and then click the Edit icon.
In the Virtual Warehouse details page, click Configurations > Impala Proxy .
Select env from the Configuration files drop-down.
Modify the values as required for the following parameters:
- IMPALA_PROXY_COORDINATOR_LOAD_CPU_WEIGHT
- IMPALA_PROXY_COORDINATOR_LOAD_MEMORY_WEIGH