Behavior changes
This release of the Cloudera Data Warehouse service on Cloudera on cloud has the following behavior changes:
Summary: Changes to Unified Analytics availability in Cloudera Data Warehouse
Before this release: When creating a new Impala virtual warehouse, you could select the Enable Unified Analytics option.
After this release: The Enable Unified Analytics option is no longer available when creating a new Impala virtual warehouse. This option is disabled in the user interface. However, you can continue to create and manage Impala Unified Analytics Virtual Warehouses using the CDP CLI.
Summary: Increased Batch Sizes for COMPUTE STATS
Before this release: The COMPUTE STATS query previously failed on tables containing more than 5000 columns. This issue was specific to wide tables and could not be resolved by dropping and rerunning the query.
After this release: To resolve this, we enable the batch retrieval or insertion of the object metadata by default value of the hive.metastore.direct.sql.batch.size property is changed from 0 to 1000, and the default value of the metastore.rawstore.batch.size property is changed from -1 to 500. After this change, COMPUTE STATS queries now run successfully on tables with more than 5000 columns.
Summary: Parquet late materialization threshold behavior has changed
The way the parquet_late_materialization_threshold query option works has been updated to better handle queries with collections.
Before this release: You would use the parquet_late_materialization_threshold query option to set the minimum number of consecutive filtered rows required to trigger late materialization. The default value was 20, and setting it to a value less than 0 would disable the feature.
After this release: The parquet_late_materialization_threshold is now set to 1 if the query option is greater than or equal to 0 and there is a collection value that can be skipped. Otherwise, the value is the same as the query option, which defaults to 20.
Apache Jira: IMPALA-3841
Summary: Support for load-based routing in impala-proxy
Before this release: The impala-proxy used a random selection policy to choose a coordinator. This approach did not consider the current load on each coordinator, which lead to an uneven distribution of connections and potential performance bottlenecks.
After this release: The impala-proxy now uses load-based routing to decide which coordinator should handle a new session request.
