Known Issues in Ozone

Known issues and technical limitations for Ozone are addressed in Cloudera Runtime 7.3.2, its service packs, and cumulative hotfixes.

Known issues identified in Cloudera Runtime 7.3.2

CDPD-73792: The new snapshot could not be found after renaming the old snapshot
7.3.2
The ozone sh snapshot rename command renames snapshots. It is a feature developed by the Apache Ozone community and is included in Cloudera Base on premises 7.3.1. However, it does not work properly, and Cloudera Base on premises does not support it.
None
Apache JIRA: HDDS-11384
CDPD-63350: Force deleting a FSO bucket and its contents while running rb --force from AWS S3 API is failing
7.3.2
Force deleting a File System Optimized (FSO) bucket and its contents while running the rb --force command from the AWS S3 API might fail with an error, because the S3 client sends individual delete requests for each key in the bucket. It might delete or fail to delete individual keys or directories, depending on the availability of leaf elements. It can completely delete the bucket only when all the keys or directories are cleaned up. If some keys are not deleted, the bucket will not be deleted.
Sample error message
# aws s3 rm s3://buck-fso --recursive
delete: s3://buck-fso/dir1/
delete: s3://buck-fso/dir1/dir2/
delete: s3://buck-fso/dir3/dir4/dir5/
delete: s3://buck-fso/dir3/dir4/
delete failed: s3://buck-fso/dir3/
# Rerun the same command again
# aws s3 rm s3://buck-fso --recursive
delete: s3://buck-fso/dir1/
delete: s3://buck-fso/dir3/
# To Confirm
# ozone sh key list s3v/buck-fso
[ ]
Run the rb --force command multiple times to completely clean up the keys and directories.
Apache JIRA: HDDS-9637
CDPD-98751: Mismatched Replicas tab in the Recon UI fails to display containers with inconsistent replica checksums
7.3.2
In the Ozone Recon UI, the Mismatched Replicas tab does not update or display containers when one or more replicas have differing checksums. Instead, they are displayed in the Under-Replicated tab.
The replica can be checked in the Under-Replicated tab, or can be cross-checked against API response.
CDPD-97512: Mismatch in Open Key count between Overview page and OM DB Insight in the Recon UI
7.3.2
In the Ozone Recon UI, the Open Key count on the Summary section of the Overview page might not match the count on the OM DB Insight > Open Key tab. Users might see different Open Key values for the same cluster across these two views.
Use the OM DB Insight > Open Key tab for the most accurate Open Key count.
CDPD-97376: Container replication counts mismatch in Recon UI
7.3.2
In the Ozone Recon UI, container replication counts, including Under-Replicated, Over-Replicated, and Mis-Replicated counts, differ between the updated and the legacy Container page for the same cluster.
Refer to the legacy Container page to view accurate replication counts.
CDPD-97311: Incorrect Creation Time and Modification Time displayed on the Namespace Usage page in the Recon UI
7.3.2
In the Ozone Recon UI, the Namespace Usage page displays incorrect Creation Time and Modification Time timestamps. These values do not accurately reflect the actual creation or last modification times or dates of the namespace, resulting in inaccurate metadata information.
Retrieve the correct timestamp values directly from the API response.
CDPD-97312: Mismatch between cluster State Container count and Container Summary totals
7.3.2
The Ozone Recon UI can display discrepancies between the Storage Container Manager (SCM) container count and the Recon container summary. This occurs because Recon does not synchronize all container states such as QUASI_CLOSED leading to inconsistent totals.
No workaround within the Ozone Recon UI. Use ozone admin container report CLI command to obtain the correct container counts for all states in the Ozone cluster.
CDPD-99248: After cdh upgrade, Ozone encounters failure while running the Finalize Upgrade for SCM on role Storage Container command
7.3.2
When finalizing an Ozone upgrade for the first time from Cloudera Manager, the Finalize Upgrade for SCM on role Storage Container command might fail with the following stderr message:
"Invalid response from Storage Container Manager.
Current finalization status is: FINALIZATION_IN_PROGRESS"

This error occurs even though finalization continues to run on the Storage Container Manager (SCM).

Ignore the failure in Cloudera Manager. Use the ozone admin scm finalizationstatus command to monitor progress and wait for the process to complete on SCM.
CDPD-98892: File Size Distribution bucket size range calculation is not correct
7.3.2
If large number of buckets exists within a volume, the Ozone Recon UI might display incorrect file size distribution bucket size range calculation in the File Size Distribution chart on the Insights page.
None
Apache JIRA: HDDS-14827
CDPD-93116: Ozone client hangs intermittently when disks are full
7.3.2

This hang is caused by continuous write retries that persist until the pipeline on the Datanode closes.

The Ozone client hangs for approximately five minutes when writing data to a Datanode if a disk full exception or other Datanode error occurs. This hang is caused by continuous write retries that persist until the pipeline on the Datanode closes.
Control the request retry behavior by setting the following configurations on the client side:
Table 1. Client-side retry request configurations
Configuration Recommended value
hdds.ratis.raft.client.rpc.request.timeout 30s
hdds.ratis.client.multilinear.random.retry.policy 1s, 1
hdds.ratis.client.exponential.backoff.max.sleep 5s
hdds.ratis.client.exponential.backoff.base.sleep 1s
hdds.ratis.client.exponential.backoff.max.retries 2
Apache JIRA: HDDS-14040

Known Issues identified before Cloudera Runtime 7.3.2

Known issues identified before Cloudera Runtime 7.3.2 include only unresolved issues from previous releases that continue to affect the Cloudera Runtime 7.3.2 base release.

CDPD-91562: test_validate_certs_configs configuration is failing with the maximum lifetime validation
7.3.2, 7.3.1.600
In daylight saving time zones, the autogenerated Ozone certificate duration might differ from the expected duration. This discrepancy is minor, because the default certificate duration is 365 days or five years, depending on the Ozone component.
CDPD-75954: The ozone debug ldb command and ozone auditparser fails with java.lang.UnsatisfiedLinkError
7.3.2, 7.3.1.400
For information on workaround, see Changing temporary path for Ozone services and CLI tools.
CDPD-54885: Ozone Prometheus does not work with TLS
7.3.2, 7.3.1 and its SPs and CHFs
The Prometheus service shipped by Ozone does not support TLS mode. So, Prometheus is not able to gather metrics from Ozone endpoints when TLS is enabled.
Go to Ozone > Configuration > Ozone Prometheus Endpoint Token and in the Ozone Prometheus Endpoint Token property enter any random string. This configuration generates a plaintext token in the Ozone endpoint process directory allowing Prometheus to authenticate and collect metrics despite the TLS limitation.
CDPD-56684: Keys and buckets get deleted without volume permission
7.3.2, 7.3.1 and its SPs and CHFs
When a volume deletion is initiated, the system recursively deletes all buckets and keys within the volume before attempting to delete the volume itself. Because the ACL check for volume deletion permissions occurs only in the end, all the data within the volume is deleted even without having delete permission on the volume.
CDPD-50610: Large file uploads are slow with OPEN and stream data approach
7.3.2, 7.3.1 and its SPs and CHFs
Hue file browser uses the append operation for large files. This API is not supported by Ozone in 7.1.9, therefore large file uploads can be slow or can time out in the browser.
Use native Ozone client to upload large files instead of the Hue file browser.
OPSAPS-66469: Ozone-site.xml is missing if the host does not contain HDFS roles
7.3.2, 7.3.1 and its SPs and CHFs
The client side /etc/hadoop/conf/ozone-site.xml file is not generated by Cloudera Manager if the host does not have any HDFS role. Because of this, issuing Ozone commands from that host fails because it cannot find the service name to hostname mapping. When this issue occurs, an error message is displayed: # ozone sh volume list o3://ozoneabc 23/03/06 18:46:15 WARN ha.OMProxyInfo: OzoneManager address ozoneabc:9862 for serviceID null remains unresolved for node ID null Check your ozone-site.xml file to ensure ozone manager addresses are configured properly.
Add the HDFS gateway role on that host.
CDPD-74016: Running Ozone sh token print on token generated using ozone dtutil command results in Null Pointer Exception.
Print the token generated by dtutil using dtutil.
CDPD-74013: Ozone dtutil get token fails when using o3 or ofs schemas.
Use ozone sh token get to get a token for ozone file system.
CDPD-63144: Key rename inside the FSO bucket fails and displays the Failed to get parent dir error. This happens when running impala workloads with ozone
None.
CDPD-74331: Key put fails and displays the Failed to write chunk error when a volume failure occurs during configuration
None.
CDPD-74475: YCSB test with Hbase on Ozone degrades performance.
7.3.2, 7.3.1 and its SPs and CHFs
None.
CDPD-74884: Exclusive size of snapshot is always 0 when you run the info command on the Ozone snapshots. It is a statistics issue and does not impact the functionality of Ozone snapshot.
7.3.2, 7.3.1 and its SPs and CHFs
The exclusiveSize and exclusiveReplicationSize statistics provided by the snapshot info command return a value of 0 even when the snapshot contains exclusive keys or files that do not exist in other snapshots.
None.
Apache JIRA: HDDS-11528
CDPD-75042: AWS CLI rm or delete command fails to delete all files and directories on the Ozone FSO bucket. It only deletes the leaf node.
7.3.2, 7.3.1 and its SPs and CHFs
None.
CDPD-75204: Namenode restart fails after dfs.namenode.fs-limits.max-component-length is set to a lower value and there is existing data present which exceeds the length limit.
7.3.2, 7.3.1 and its SPs and CHFs
Increase the value for the dfs.namenode.fs-limits.max-component-length parameter and restart the namenode.
CDPD-75635: Ozone write fails intermittently because SCM remains in safe mode.
7.3.2, 7.3.1 and its SPs and CHFs
Wait for SCM to exit safe mode automatically after it meets the required thresholds. Alternatively, manually force SCM to exit from safe mode using CLI options.