Known issues in Cloudera Data Warehouse on cloud

Review the known issues in this release of the Cloudera Data Warehouse service on Cloudera on cloud.

Known issues identified in the July 26, 2024 release

DWX-19451: Cloudera Data Visualization restore job can fail with ignorable errors

After a successful Cloudera Data Visualization restoration job, the restore job could be in a failed state with the log displaying ignorable errors.

pg_restore: error: could not execute query: ERROR:  sequence "jobs_joblog_id_seq" does not exist
Command was: DROP SEQUENCE public.jobs_joblog_id_seq;
pg_restore: error: could not execute query: ERROR:  table "jobs_joblog" does not exist
Command was: DROP TABLE public.jobs_joblog;
pg_restore: error: could not execute query: ERROR:  sequence "jobs_jobcontent_id_seq" does not exist
Command was: DROP SEQUENCE public.jobs_jobcontent_id_seq;
.......
.......

This issue occurs because the restore job issues commands to DROP all the objects that will be restored, and if any of these objects do not exist in the destination database, such ignorable errors are reported.

This has no functional impact on the restored Cloudera Data Visualization application. It is noticed that all the backed up queries, datasets, connections, and dashboards are restored successfully and Cloudera Data Visualization is available for new queries.

None.

DWX-19003: Unable to set t-shirt size for a Cloudera Data Visualization instance

You cannot set or configure a t-shirt size for a Cloudera Data Visualization instance while creating or editing it from Cloudera Data Warehouse. By default, any Cloudera Data Visualization instance created in Cloudera Data Warehouse release version 1.9.1 uses the small compute instance size.

None.

DWX-18950: Hue data restoration fails on Azure Cloudera Data Warehouse clusters having Cloudera Data Warehouse version 1.8.7 or earlier

If you backup your Cloudera Data Warehouse cluster running Cloudera Data Warehouse version 1.8.7 or earlier and if you perform the restore operation without reactivating the latest Cloudera Data Warehouse environment, you might encounter Hue restoration job failure.

None.

DWX-18843: Unable to read Iceberg table from Hive Virtual Warehouse

If you have used Apache Flink to insert data into an Iceberg table that is created from Hive, you cannot read the Iceberg table from the Hive Virtual Warehouse.

Add the engine.hive.enabled table property through the Hive beeline and set the value to "true". You can add this table property either while creating the Iceberg table or use the ALTER TABLE statement to add the table property.

DWX-18489: Hive compaction of Iceberg tables results in a failure

When Cloudera Data Warehouse and Cloudera Data Hub are deployed in the same environment and use the same Hive Metastore (HMS) instance, the Cloudera Data Hub compaction workers can inadvertently pick up Iceberg compaction tasks. Since Iceberg compaction is not yet supported in the latest Cloudera Data Hub version, the compaction tasks will fail when they are processed by the Cloudera Data Hub compaction workers.

In such a scenario where both Cloudera Data Warehouse and Cloudera Data Hub share the same HMS instance and there is a requirement to run both Hive ACID and Iceberg compaction jobs, it is recommended that you use the Cloudera Data Warehouse environment for these jobs. If you want to run only Hive ACID compaction tasks, you can choose to use either the Cloudera Data Warehouse or Cloudera Data Hub environments.

If you want to run the compaction jobs without changing the environment, it is recommended that you use Cloudera Data Warehouse. To avoid interference fromCloudera Data Hub, change the value of the hive.compactor.worker.threads Hive Server (HS2) property to '0'. This ensures that the compaction jobs are not processed by Cloudera Data Hub.

In Cloudera Manager, click Clusters > Hive > Configuration to navigate to the configuration page for HMS.
Search for hive.compactor.worker.threads and modify the value to '0'.
Save the changes and restart the Hive service.

DWX-18854: Compaction cleaner configuration

The compaction cleaner is turned off by default in the Cloudera Data Warehouse database catalog, potentially causing compaction job failures.

To enable the compactor on the Hive Metastore (HMS) instance of the Cloudera Data Warehouse database catalog, set the hive.compactor.cleaner.on property to "true".

DWX-12703: Hue connects to only one Impala coordinator in Active-Active mode

You may not see all Impala queries that have run on the Virtual Warehouse from the Impala tab on the Hue Job Browser. You encounter this on an Impala Virtual Warehouse that has Impala coordinator configured in an active-active mode. This happens because Hue fetches this information from only one Impala coordinator that is active.

None. You can view the query history from the Impala Queries tab on the Job Browser page, because the information is fetched from the Hue Query Processor.

Known limitation: Cloudera Data Warehouse does not support S3 Express One Zone buckets

Cloudera does not recommend deploying the Data Lake on S3 Express One Zone buckets. Cloudera Data Warehouse cannot read content present in the S3 Express One Zone buckets. The following limitations apply when using S3 Express buckets:

You can only use S3 Express buckets with Cloudera Data Hub clusters running Runtime 7.2.18 or newer. Data services do not support it, currently.
S3 Express buckets may not be used for logs and backups.

Known issues identified before the July 26, 2024 release

DWX-15112: Enterprise Data Warehouse database configuration problems after a Helm-related rollback

If a Helm rollback fails due to an incorrect Enterprise Data Warehouse database configuration, the Virtual Warehouse and Database Catalog roll back to a previous configuration. The incorrect Enterprise Data Warehouse configuration persists, and can affect subsequent edit, upgrade, and rebuild operations on the rolled-back Virtual Warehouse or Database Catalog.

None.

DWX-17455: ODBC client using JWT authentication cannot connect to Impala Virtual Warehouse

If you are using the Cloudera ODBC Connector and JWT authentication to connect to a Cloudera Data Warehouse Impala Virtual Warehouse where the Impala coordinators are configured for high availability in an active-active mode, the connection results in a "401 Unauthorized" error. The error is not seen on Impala Virtual Warehouses that have an active-passive high availability or where high availability is disabled.

In the ODBC Data Source Name (DSN), make the following changes:

Ensure that SSL is enabled by setting SSL=1.
Remove authentication by setting AuthMech=0.
Remove the JWTString parameter.
Add the parameter http.header.Authorization and set it to the word "Bearer" followed by a space and the JWT string

DSN configuration before the workaround

SSL=1
AuthMech=8
JWTString=full_jwt_text

DSN configuration after the workaround

SSL=1
AuthMech=0
http.header.Authorization=Bearer full_jwt_text

HIVE-28055: Merging Iceberg branches requires a target table alias

Hive supports only one level of qualifier when referencing columns. In other words only one dot is accepted. For example,

select table.col
              from ...;

is allowed. select db.table.col is not allowed. Using the merge statement to merge Iceberg branches without a target or source table alias causes an exception:

org.apache.hadoop.hive.ql.parse.SemanticException: ... Invalid table alias or column reference ...

Use an alias, for example t, for the target table.

merge into mydb.target.branch_branch1 t using mydb.source.branch_branch1 s on t.id = s.id when matched then update set value = 'matched';

DWX-17620: Folder having special characters in its name is not accessible in ABFS

In Cloudera Data Warehouse on cloud, from a Virtual Warehouse going to the Azure Blob Filesystem (ABFS), creating a folder, and then performing an action such as Move, causes an error as shown in the following example:

Cannot access: abfs://data-files/user/hrt_qa/~@$&()*!+'=;. 404 Client Error: The specified path does not exist.

None.

Branch FAST FORWARD does not work as expected

The Apache Iceberg spec indicates you can use either one or two arguments to fast forward a branch. The following example shows using two arguments:

ALTER TABLE <name> EXECUTE FAST-FORWARD 'x' 'y'

However, omitting the second branch name, does not work as documented by Apache Iceberg. The named branch is not fast-forwarded to the current branch. An exception occurs at the Iceberg level.

You must use two arguments to the EXECUTE FAST FORWARD feature to forward a branch.

DWX-17613: Generic error message is displayed when you click on the directory you don't have access to on a RAZ cluster

You see the following error message when you click on an ABFS directory to which you do not have read/write permission on the ABFS File Browser in Hue: There was a problem with your request. This message is generic and does not provide insight into the actual issue.

None.

DWX-17109: ABFS File Browser operations failing intermittently

You may encounter intermittent issues while performing typical operations on files and directories on the ABFS File Broswer, such as moving or renaming files.

None.

CDPD-27918: Hue does not automatically pick up RAZ HA configurations

On a Cloudera on cloud environment in which you have configured RAZ in High Availability mode, Hue in Cloudera Data Warehouse does not pick up all the RAZ host URLs automatically. Therefore, if a RAZ instance to which Hue is connected goes down, Hue becomes unavailable.

You must manually add comma-separated RAZ instances in the Hue Advanced Configuration Snippet.

Log in to the Cloudera Management Console as an Administrator.
Go to Environment > Data Lake and open Cloudera Manager for your environment.
Go to Clusters > Ranger RAZ service > Instances > RAZ server > Processes and note the value of the fs.s3a.ext.raz.rest.host.url property from the core-site.xml file. You need this to specify the value of the api_url property in the Hue configuration.
For Azure environments, note the value of the fs.azure.ext.raz.rest.host.url property.
For AWS and GCS environments, note the value of the fs.s3a.ext.raz.rest.host.url property.
Go to Cloudera Data Warehouse > Virtual Warehouse > > Edit > CONFIGURATIONS and select the hue-safety-valve from the Configuration files dropdown menu.

Add the following lines in the hue-safety-valve field:

[desktop]
[[raz]]
is_enabled=true
api_url=https://[***INSTANCE-1***]:6082/,https://[***INSTANCE-2***]:6082/

Click Apply Changes.

CDPD-66779: Partitioned Iceberg table not getting loaded with insert select query from Hive

If you create a partitioned table in Iceberg and then try to insert data from another table as shown below, an error occurs.

insert into table partition_transform_4 select t, ts from vectortab10k;

Use the CLUSTER BY clause on the partitioned column to insert data. For example:

insert into table partition_transform_4 select t, ts from t1 cluster by ts;

DWX-17703: Non-HA Impala Virtual Warehouse on a private Azure Kubernetes Service (AKS) setup fails

When 'Refresh' and 'Stop' operations run in parallel, Impala might move into an error state. The Refresh operation might think that Impala is in an error state as the coordinator pod is missing.

Rebuild the Impala Virtual Warehouse or restart it using the CLI.

DWX-14923: After JWT authentication, attempting to connect the Impyla client using a user name or password should cause an error

Using a JWT token, you can connect to a Virtual Warehouse as the user who generated the token.

If you connect with the JWT token, and then pass a user name or password from Impyla to the Virtual Warehouse, the connection arguments are silently ignored. Such an action should indicate that it is illegal to specify user or password when using JWT authentication.

None.

DWX-15145: Environment validation popup error after activating an environment

Activating an environment having a public load balancer can cause an environment validation popup error.

To reproduce this problem 1) Create a data lake. 2) Activate an environment having a public load balancer deployment type and subnets in three different availability zones. A environment validation popup can occur even through subnets are in different availability zones. Several different popups can occur, including the following one:

None.

DWX-15144: Virtual Warehouse naming restrictions

You cannot create a Virtual Warehouse having the same name as another Virtual Warehouse even if the like-named Virtual Warehouses are in different environments. You can create a Database Catalog having the same name as another Database Catalog if the Database Catalogs are in different environments.

None.

DWX-13103: Cloudera Data Warehouse environment activation problem

When Cloudera Data Warehouse environments are activated, a race condition can occur between the prometheus pod and istiod pod. The prometheus pod can be set up without an istio-proxy container, causing communication failures to/from prometheus to any other pods in the Kubernetes cluster. Cloudera Data Warehouse prometheus-related functionalities, such as autoscaling, stop working. Grafana dashboards, which get metrics from prometheus, are not populated.

Restart the prometheus pod so that it gets the istio-proxy container.

DWX-5742: Upgrading multiple Hive and Impala Virtual Warehouses or Database Catalogs at the same time fails

Upgrading multiple Hive and Impala Virtual Warehouses or Database Catalogs at the same time fails.

If you need to upgrade or create multiple Hive and Impala Virtual Warehouses or Database Catalogs, perform the upgrade or creation sequentially one at a time.

AWS availability zone inventory issue

In this release, you can select a preferred availability zone when you create a Virtual Warehouse; however, AWS might not be able to provide enough compute instances of the type that Cloudera Data Warehouse needs.

If you experience this AWS issue, try recreating the Virtual Warehouse and choosing a different availability zone.

DWX-7613: CloudFormation stack creation using AWS CLI broken for Cloudera Data Warehouse Reduced Permissions Mode

If you use the AWS CLI to create a CloudFormation stack to activate an AWS environment for use in Reduced Permissions Mode, it fails and returns the following error:

An error occurred (ValidationError) when calling the CreateStack
       operation: Parameters: [SdxDDBTableName] must have values

The default value of SdxDDBTableName is not being set. If you create the CloudFormation stack using the AWS Console, there is no problem.

If you must use the AWS CLI, edit the CloudFormation stack template file as follows:

SdxDDBTableName:
       Description: DynamoDB table name for the SDX S3 file listings, created through S3Guard
       Type: String
       Default: " "

Then rerun the CloudFormation stack creation command using the AWS CLI.

ENGESC-8271: Helm 2 to Helm 3 migration fails on AWS environments where the overlay network feature is in use and namespaces are stuck in a terminating state

While using the overlay network feature for AWS environments and after attempting to migrate an AWS environment from Helm 2 to Helm 3, the migration process fails.

Run the following kubectl command to determine whether you have any namespaces stuck in a terminating state:

kubectl get ns

Then contact Cloudera Technical Support to report and get help on this issue.

DWX-6970: Tags do not get applied in existing Cloudera Data Warehouse environments

You may see the following error while trying to apply tags to Virtual Warehouses in an existing Cloudera Data Warehouse environment:

An error occurred (UnauthorizedOperation) when calling the CreateTags
              operation: You are not authorized to perform this operation

and Compute node tagging was unsuccessful. This happens because the ec2:CreateTags privilege is missing from your AWS cluster-autoscaler inline policy for the NodeInstanceRole role.

Add the ec2:CreateTags privilege to the cluster-autoscaler inline policy as follows:

Log into the AWS IAM console at https://console.aws.amazon.com/iam/.
In the navigation panel, choose Roles.
Search the list of roles for NodeInstanceRole.
Click Permissions.
Select cluster-autoscaler and click Edit policy.

Add the ec2:CreateTags line in the Actions section after the ec2:DescribeLaunchTemplateVersions line as shown:

"Version": "2012-10-17",
          "Statement": [
          {
          "Action": [
          "autoscaling:DescribeAutoScalingGroups",
          "autoscaling:DescribeAutoScalingInstances",
          "autoscaling:DescribeTags",
          "autoscaling:DescribeLaunchConfigurations",
          "autoscaling:SetDesiredCapacity",
          "autoscaling:TerminateInstanceInAutoScalingGroup",
          "ec2:DescribeLaunchTemplateVersions",
          "ec2:CreateTags"
          ],

Save changes.

Incorrect diagnostic bundle location

The path you see to the diagnostic bundle is wrong when you create a Virtual Warehouse, collect a diagnostic bundle of log files for troubleshooting, and click

Edit > Diagnostic Bundle. Your storage account name is missing from the beginning of the path.

To find your diagnostic bundle, add your storage account name to the beginning of the path, for example:

my-storage-account-name/log-files/clusters/my-env/my-warehouse-xx-yy/warehouse/debug-artifacts/hive/compute-zz-mvvf/compute-zz-mvvf-0926210111-0927090111-01234.zip

To get your storage account name, in Cloudera Management Console, click Environments, select your environment, and scroll down to Logs Storage and Audits. The first part of the path is your storage account name.

DWX-7349: In reduced permissions mode, default Database Catalog name does not include the environment name

When you activate an AWS environment in reduced permissions mode, the default Database Catalog name does not include the environment name:

This does not cause collisions because each Database Catalog named "default" is associated with a different environment. For more information about reduced permissions mode, see Reduced permissions mode for AWS environments.

None.

DWX-15064: Hive Virtual Warehouse stops but appears healthy

Due to an istio-proxy problem, the query coordinator can unexpectedly enter a not ready state instead of the expected error-state. Subsequently, the Hive Virtual Warehouse stops when reaching the autosuspend timeout without indicating a problem.

None.

DWX-14452: Parquet table query might fail

Querying a table stored in Parquet from Hive might fail with the following exception message: java.lang.RuntimeException: java.lang.StackOverflowError. This problem can occur when the IN predicate has 243 values or more, and a small stack (-Xss = 256k) is configured for Hive in CDW.

Change the value of the Xss in the JVM args to use the default value (1Mb) as follows: Select Edit from the Options menu of your Virtual Warehouse. Click CONFIGURATIONS > Query executor and select the env configuration file.

In the third line shown below, change the value of LLAP_DAEMON_OPTS from -Xss256k to -Xss1M, and then click Apply Changes:

FROM:

-Dorg.wildfly.openssl.path=/usr/lib64 -Dzookeeper.sasl.client=false -Djdk.tls.disabledAlgorithms=SSLv3,GCM -Dhttp.maxConnections=12 -Xms24G -Xmx48G -Xss256k ...

TO:

... -Xss1M ...

CDPD-40730: Parquet change can cause incompatibility

Parquet files written by the parquet-mr library in this version of Cloudera Data Warehouse, where the schema contains a timestamp with no UTC conversion will not be compatible with older versions of Parquet readers. The effect is that the older versions will still consider these timestamps as they would require UTC conversions and will thus end up with a wrong result. You can encounter this problem only when you write Parquet-based tables using Hive, and tables have the non-default configuration hive.parquet.write.int64.timestamp=true.

None.

DWX-5926: Cloning an existing Hive Virtual Warehouse fails

If you have an existing Hive Virtual Warehouse that you clone by selecting Clone from the drop-down menu, the cloning process fails. This does not apply to creating a new Hive Virtual Warehouse.

Make the following configuration change to resolve this issue:

In the Hive Virtual Warehouse tile, click the edit icon. This launches the Virtual Warehouse details page.
In the details page for the Virtual Warehouse, click the Configurations tab.
Click the Hiveserver2 sub-tab.
Select hive-site from the configuration file drop-down list menu.
Search for the configuration property hive.metastore.sasl.enabled.
Set the hive.metastore.sasl.enabled configuration property to true.
note
If the hive.metastore.sasl.enabled configuration property is already set to true, delete the setting and re-enter it.
Click Apply in the upper right corner of the page to save the configuration.
Click the Actions menu and select Clone to clone the Hive Virtual Warehouse:

DWX-2690: Older versions of Beeline return SSLPeerUnverifiedException when submitting a query

When submitting queries to Virtual Warehouses that use Hive, older Beeline clients return an SSLPeerUnverifiedException error:

javax.net.ssl.SSLPeerUnverifiedException: Host name ‘ec2-18-219-32-183.us-east-2.compute.amazonaws.com’ does not
       match the certificate subject provided by the peer (CN=*.env-c25dsw.dwx.cloudera.site) (state=08S01,code=0)

Only use Beeline clients from Cloudera Runtime version 7.0.1.0 or later.

DWX-16899: Error while viewing Impala job status on the mini Job Browser

When you click on the application ID after submitting an Impala query on Hue that is running on the environment level, you may notice the following error: 401 Client Error: Unauthorized for url: http://coordinator.impala-xyz.svc.cluster.local:25000/queries?json=true Must authenticate with Basic authentication. This does not happen when you do the same from Hue deployed at a Virtual Warehouse level.

None.

DWX-16895: Incorrect status of Hue pods when you edit the Hue instance properties

When you update a configuration of a Hue instance that is deployed at the environment level, such as increasing or decreasing the size of the Hue instance, you see a success message on the Cloudera Data Warehouse UI. After some time, the status of the Hue instance also changes from “Updating” to “Running”. However, when you list the Hue pods using kubectl, you see that not all backend pods are in the running state–a few of them are still in the init state.

None. The pods come up successfully eventually after a sufficient time has passed.

DWX-16863: The upgrade button is present on the Cloudera Data Warehouse UI, but Hue upgrades are not supported

You see the Upgrade button on the Query Editor page in the Cloudera Data Warehouse UI when Hue is deployed at the environment level. However, on Cloudera Data Warehouse version 1.8.1, upgrading the Hue instance that is deployed at the environment level is not supported.

None.

DWX-16893: A user can see all the queries in Job browser

In a Hue instance deployed at the environment level, by design, the Hue instances must not share the saved queries and query history with other Hue instances even for the same user. However, a logged in user is able to view all the queries executed by that user on all the Virtual Warehouses on a particular Database Catalog.

None.

Delay in listing queries in Impala Queries in the Job browser

Listing an Impala query in the Job browser can take an inordinate amount of time.

None.

DWX-14927: Hue fails to list Iceberg snapshots

Hue does not recognize the Iceberg history queries from Hive to list table snapshots. For example, Hue indicates an error at the . before history when you run the following query.

select * from <db_name>.<table_name>.history

None.

DWX-14968: Connection termination error in Impala queries tab after Hue inactivity

To reproduce the problem: 1) In the Impala job browser, navigate to Impala queries. 2) Wait for a few minutes.

After a few minutes of inactivity, the follow error is displayed:

Refresh the page, or alternatively start a new session.

DWX-15115: Error displayed after clicking on hyperlink below Hue table browser

In Hue, below the table browser, clicking the hyperlink to a location causes an HTTP 500 error because the file browser is not enabled for environments that are not Ranger authorized (RAZ).

None.

DWX-15090: CSRF error intermittently seen in the Hue Job Browser

You may intermittently see the “403 - CSRF” error on the Hue web interface as well as in the Hue logs after running Hive queries from Hue.

Reload the browser or start a new Hue session.

IMPALA-11447: Selecting certain complex types in Hue crashes Impala

Queries that have structs/arrays in the select list crash Impala if initiated by Hue.

Do not select structs/arrays in Hue.

DWX-8460: Unable to delete, move, or rename directories within the S3 bucket from Hue

You may not be able to rename, move, or delete directories within your S3 bucket from the Hue web interface. This is because of an underlying issue, which will be fixed in a future release.

You can move, rename, or delete a directory using the HDFS commands as follows:

SSH into your Cloudera environment host.
To delete a directory within your S3 bucket, run the following command:
```
hdfs dfs -rm -r [***COMPLETE-PATH-TO-S3-BUCKET***]/[***DIRECTORY-NAME***]
```

To rename a folder, create a new directory and run the following command to move files from the source directory to the target directory:

hdfs dfs -mkdir [***DIRECTORY-NAME***]

hdfs dfs -mv [***COMPLETE-PATH-TO-S3-BUCKET***]/[***SOURCE-DIRECTORY***] [***COMPLETE-PATH-TO-S3-BUCKET***]/[***TARGET-DIRECTORY***]

DWX-6674: Hue connection fails on cloned Impala Virtual Warehouses after upgrading

If you clone an Impala Virtual Warehouse from a recently upgraded Impala Virtual Warehouse, and then try to connect to Hue, the connection fails.

Create a new Impala Virtual Warehouse and do not clone from a recently upgraded warehouse. Then the connection to Hue from the new Impala Virtual Warehouse succeeds.

DWX-5650: Hue only makes the first user a superuser for all Virtual Warehouses within a Data Catalog

Hue marks the user that logs in to Hue from a Virtual Warehouse for the first time as the Hue superuser. But if multiple Virtual Warehouses are connected to a single Data Catalog, then the first user that logs in to any one of the Virtual Warehouses within that Data Catalog is the Hue superuser.

For example, consider that a Data Catalog DC-1 has two Virtual Warehouses VW-1 and VW-2. If a user named John logs in to Hue from VW-1 first, then he becomes the Hue superuser for all the Virtual Warehouses within DC-1. At this time, if Amy logs in to Hue from VW-2, Hue does not make her a superuser within VW-2.

None.

DWX-17210, DWX-13733: Timeout issue querying Iceberg tables from Hive

When querying Iceberg tables from Hive, the queries can faile due to a timeout issue.

Add the following configurations to hadoop-core-site for the Database Catalog and the Virtual Warehouse.
- fs.s3.maxConnections=1000
- fs.s3a.connection.maximum=1000
Restart the Database Catalog and Virtual Warehouse.

DWX-15014: Loading airports table in demo data fails

This issue occurs only when using a non-default Database Catalog. A invalid path error message occurs when using the load command to load the demo data airports table in Iceberg, ORC, or Parquet format.

None.

DWX-14163: Limitations reading Iceberg tables in Avro file format from Impala

The Avro, Impala, and Iceberg specifications describe some limitations related to Avro, and those limitations exist in Cloudera. In addition to these, the DECIMAL type is not supported in this release.

None.

DEX-7946: Data loss during migration of a Hive table to Iceberg

In this release, by default the table property 'external.table.purge' is set to true, which deletes the table data and metadata if you drop the table during migration from Hive to Iceberg.

Either one of the following workarounds prevents data loss during table migration:

Set the table property 'external.table.purge'='FALSE'.
Do not drop a table during migration from Hive to Iceberg.

DWX-13062: Hive-26507 Converting a Hive table having CHAR or VARCHAR columns to Iceberg causes an exception

CHAR and VARCHAR data can be shorter than the length specified by the data type. Remaining characters are padded with spaces. Data is converted to a string in Iceberg. This process can yield incorrect results when you query the converted Iceberg table.

Change columns from CHAR or VARCHAR to string types before converting the Hive table to Iceberg.

DWX-13276: Multiple inserts into tables having different formats can cause a deadlock.

Under the following conditions, a deadlock can occur:

You run a query to insert data into multiple tables comprised of at least one Iceberg table and at least one non-Iceberg table.
The STAT task locking feature is turned on (default = on).

Perform either one of the following workarounds:

Run separate queries to insert data into only one table at a time.

Turn off STAT task locking as follows:

set iceberg.hive.request-lock-on-stats-task=false;

Knowledge article

For the latest update on this issue see the corresponding Knowledge article: TSB 2023-684: Automatic metadata synchronization across multiple Impala Virtual Warehouses (in Cloudera Data Warehouse on cloud) may encounter an exception

IMPALA-11045: Impala Virtual Warehouses might produce an error when querying transactional (ACID) table even after you enabled the automatic metadata refresh (version DWX 1.1.2-b2008)

Impala doesn't open a transaction for select queries, so you might get a FileNotFound error after compaction even though you refreshed the metadata automatically.

Run the INVALIDATE METADATA statement on the transactional (ACID) table to refresh the metadata. This fixes the problem until the next compaction occurs. For information about running this statement, see INVALIDATE METADATA statement INVALIDATE METADATA statement.

Impala Virtual Warehouses might produce an error when querying transactional (ACID) tables (DWX 1.1.2-b1949 or earlier)

If you are querying transactional (ACID) tables with an Impala Virtual Warehouse and compaction is run on the compacting Hive Virtual Warehouse, the query might fail. The compacting process deletes files and the Impala Virtual Warehouse might not be aware of the deletion. Then when the Impala Virtual Warehouse attempts to read the deleted file, an error can occur. This situation occurs randomly.

Do not use the start/stop icons in Impala Virtual Warehouses version 7.2.2.0-106 or earlier

If you use the stop/start icons in Impala Virtual Warehouses version 7.2.2.0-106 or earlier, it might render the Virtual Warehouse unusable and make it necessary for you to re-create it.

Do not use the stop/start icons in these older Virtual Warehouses. Instead, these older versions automatically suspend and resume the Impala executors depending on the absence or presence of queries, making manual start or stop unnecessary.

DWX-6674: Hue connection fails on cloned Impala Virtual Warehouses after upgrading

If you clone an Impala Virtual Warehouse from a recently upgraded Impala Virtual Warehouse, and then try to connect to Hue, the connection fails.

Create a new Impala Virtual Warehouse and do not clone from a recently upgraded warehouse. Then the connection to Hue from the new Impala Virtual Warehouse succeeds.

DWX-3914: Collect Diagnostic Bundle option does not work on older environments

The Collect Diagnostic Bundle menu option in Impala Virtual Warehouses does not work for older environments:

None.

Data caching:

This feature is limited to 200 GB per executor, multiplied by the total number of executors.

None.

Sessions with Impala continue to run for 15 minutes after the connection is disconnected.

When a connection to Impala is disconnected, the session continues to run for 15 minutes in case the user or client can reconnect to the same session again by presenting the session_token. After 15 minutes, the client must re-authenticate to Impala to establish a new connection.

None.

Technical Service Bulletins

TSB 2023-719: Cloudera Data Warehouse Backup/Restore of Cloudera Data Visualization incomplete: Cloudera Data Warehouse customers using the Cloudera Data Warehouse Automated Backup/Restore feature will encounter an issue with the restoration versions of Cloudera Data Visualization older than 7.1.6.2-3 due to a schema change in this release. If the Backup was taken from an older Cloudera Data Warehouse environment that contained a version of Cloudera Data Visualization older than 7.1.6.2-3, the Restore procedure will succeed. Though once the user opens the Cloudera Data Visualization Queries tab, the user could encounter the error message: “column jobs_jobschedule.owner_id does not exist…”; None. For the latest update on this issue see the corresponding Knowledge article: TSB 2023-719: Cloudera Data Warehouse Backup/Restore of Cloudera Data Visualization incomplete.

TSB 2023-684: Automatic metadata synchronization across multiple Impala Virtual Warehouses (in Cloudera Data Warehouse on cloud) may encounter an exception: The Cloudera Data Warehouse on cloud 2023.0.14.0 (DWX-1.6.3) version incorporated a feature for performing automatic metadata synchronization across multiple Apache Impala (Impala) Virtual Warehouses. The feature is enabled by default, and relies on the Hive MetaStore events. When a certain sequence of Data Definition Language (DDL) SQL commands are executed as described below, users may encounter a java.lang.NullPointerException (NPE). The exception causes the event processor to stop processing other metadata operations.
If a CREATE TABLE command (not CREATE TABLE AS SELECT) is followed immediately (approximately within 1 second interval) by INVALIDATE METADATA or REFRESH TABLE command on the same table (either on the same Virtual Warehouse or on a different one), there is a possibility that the second command will not find the table in the catalog cache of a peer Virtual Warehouse and generate an NPE.
Knowledge article: For the latest update on this issue see the corresponding Knowledge article: TSB 2023-684: Automatic metadata synchronization across multiple Impala Virtual Warehouses (in Cloudera Data Warehouse on cloud) may encounter an exception