Known Issues in Ozone

Known issues and technical limitations for Ozone are addressed in Cloudera Runtime 7.3.2, its service packs, and cumulative hotfixes.

Known issues identified in Cloudera Runtime 7.3.2

OPSAPS-76062: Exposing "ozone.replication" in Cloudera Manager configurations and assigning a default value: 7.3.2; Exposing "ozone.replication in Cloudera Manager configurations and assigning a default value to it is causing it to override bucket replication configuration as a client side configuration even when you does not mean to set the client side configurations.; None
CDPD-73792: The new snapshot could not be found after renaming the old snapshot: 7.3.2; The ozone sh snapshot rename command renames snapshots. It is a feature developed by the Apache Ozone community and is included in Cloudera Base on premises 7.3.1. However, it does not work properly, and Cloudera Base on premises does not support it.; None; Apache JIRA: HDDS-11384
CDPD-63350: Force deleting a FSO bucket and its contents while running rb --force from AWS S3 API is failing: 7.3.2; Force deleting a File System Optimized (FSO) bucket and its contents while running the rb --force command from the AWS S3 API might fail with an error, because the S3 client sends individual delete requests for each key in the bucket. It might delete or fail to delete individual keys or directories, depending on the availability of leaf elements. It can completely delete the bucket only when all the keys or directories are cleaned up. If some keys are not deleted, the bucket will not be deleted.
Sample error message
# aws s3 rm s3://buck-fso --recursive delete: s3://buck-fso/dir1/ delete: s3://buck-fso/dir1/dir2/ delete: s3://buck-fso/dir3/dir4/dir5/ delete: s3://buck-fso/dir3/dir4/ delete failed: s3://buck-fso/dir3/ # Rerun the same command again # aws s3 rm s3://buck-fso --recursive delete: s3://buck-fso/dir1/ delete: s3://buck-fso/dir3/ # To Confirm # ozone sh key list s3v/buck-fso [ ]; Run the rb --force command multiple times to completely clean up the keys and directories.; Apache JIRA: HDDS-9637
CDPD-98751: Mismatched Replicas tab in the Recon UI fails to display containers with inconsistent replica checksums: 7.3.2; In the Ozone Recon UI, the Mismatched Replicas tab does not update or display containers when one or more replicas have differing checksums. Instead, they are displayed in the Under-Replicated tab.; The replica can be checked in the Under-Replicated tab, or can be cross-checked against API response.
CDPD-97512: Mismatch in Open Key count between Overview page and OM DB Insight in the Recon UI: 7.3.2; In the Ozone Recon UI, the Open Key count on the Summary section of the Overview page might not match the count on the OM DB Insight > Open Key tab. Users might see different Open Key values for the same cluster across these two views.; Use the OM DB Insight > Open Key tab for the most accurate Open Key count.
CDPD-97376: Container replication counts mismatch in Recon UI: 7.3.2; In the Ozone Recon UI, container replication counts, including Under-Replicated, Over-Replicated, and Mis-Replicated counts, differ between the updated and the legacy Container page for the same cluster.; Refer to the legacy Container page to view accurate replication counts.
CDPD-97311: Incorrect Creation Time and Modification Time displayed on the Namespace Usage page in the Recon UI: 7.3.2; In the Ozone Recon UI, the Namespace Usage page displays incorrect Creation Time and Modification Time timestamps. These values do not accurately reflect the actual creation or last modification times or dates of the namespace, resulting in inaccurate metadata information.; Retrieve the correct timestamp values directly from the API response.
CDPD-97312: Mismatch between cluster State Container count and Container Summary totals: 7.3.2; The Ozone Recon UI can display discrepancies between the Storage Container Manager (SCM) container count and the Recon container summary. This occurs because Recon does not synchronize all container states such as QUASI_CLOSED leading to inconsistent totals.; No workaround within the Ozone Recon UI. Use ozone admin container report CLI command to obtain the correct container counts for all states in the Ozone cluster.
CDPD-99248: After cdh upgrade, Ozone encounters failure while running the Finalize Upgrade for SCM on role Storage Container command: 7.3.2; When finalizing an Ozone upgrade for the first time from Cloudera Manager, the Finalize Upgrade for SCM on role Storage Container command might fail with the following stderr message:
"Invalid response from Storage Container Manager. Current finalization status is: FINALIZATION_IN_PROGRESS"
This error occurs even though finalization continues to run on the Storage Container Manager (SCM).; Ignore the failure in Cloudera Manager. Use the ozone admin scm finalizationstatus command to monitor progress and wait for the process to complete on SCM.
CDPD-98892: File Size Distribution bucket size range calculation is not correct: 7.3.2; If large number of buckets exists within a volume, the Ozone Recon UI might display incorrect file size distribution bucket size range calculation in the File Size Distribution chart on the Insights page.; None; Apache JIRA: HDDS-14827
CDPD-93116: Ozone client hangs intermittently when disks are full: 7.3.2; This hang is caused by continuous write retries that persist until the pipeline on the Datanode closes.
The Ozone client hangs for approximately five minutes when writing data to a Datanode if a disk full exception or other Datanode error occurs. This hang is caused by continuous write retries that persist until the pipeline on the Datanode closes.; Control the request retry behavior by setting the following configurations on the client side:
Table 1. Client-side retry request configurations

Configuration Recommended value

hdds.ratis.raft.client.rpc.request.timeout 30s

hdds.ratis.client.multilinear.random.retry.policy 1s, 1

hdds.ratis.client.exponential.backoff.max.sleep 5s

hdds.ratis.client.exponential.backoff.base.sleep 1s

hdds.ratis.client.exponential.backoff.max.retries 2; Apache JIRA: HDDS-14040

Table 1. Client-side retry request configurations
Configuration	Recommended value
hdds.ratis.raft.client.rpc.request.timeout	30s
hdds.ratis.client.multilinear.random.retry.policy	1s, 1
hdds.ratis.client.exponential.backoff.max.sleep	5s
hdds.ratis.client.exponential.backoff.base.sleep	1s
hdds.ratis.client.exponential.backoff.max.retries	2

Known Issues identified before Cloudera Runtime 7.3.2

Known issues identified before Cloudera Runtime 7.3.2 include only unresolved issues from previous releases that continue to affect the Cloudera Runtime 7.3.2 base release.

CDPD-91562: test_validate_certs_configs configuration is failing with the maximum lifetime validation: 7.3.2, 7.3.1.600; In daylight saving time zones, the autogenerated Ozone certificate duration might differ from the expected duration. This discrepancy is minor, because the default certificate duration is 365 days or five years, depending on the Ozone component.
CDPD-75954: The ozone debug ldb command and ozone auditparser fails with java.lang.UnsatisfiedLinkError: 7.3.2, 7.3.1.400; For information on workaround, see Changing temporary path for Ozone services and CLI tools.
CDPD-54885: Ozone Prometheus does not work with TLS: 7.3.2, 7.3.1 and its SPs and CHFs; The Prometheus service shipped by Ozone does not support TLS mode. So, Prometheus is not able to gather metrics from Ozone endpoints when TLS is enabled.; Go to Ozone > Configuration > Ozone Prometheus Endpoint Token and in the Ozone Prometheus Endpoint Token property enter any random string. This configuration generates a plaintext token in the Ozone endpoint process directory allowing Prometheus to authenticate and collect metrics despite the TLS limitation.
CDPD-56684: Keys and buckets get deleted without volume permission: 7.3.2, 7.3.1 and its SPs and CHFs; When a volume deletion is initiated, the system recursively deletes all buckets and keys within the volume before attempting to delete the volume itself. Because the ACL check for volume deletion permissions occurs only in the end, all the data within the volume is deleted even without having delete permission on the volume.
note
This occurs even if no specific bucket or key permissions were granted to the user for recursive deletion.
CDPD-50610: Large file uploads are slow with OPEN and stream data approach: 7.3.2, 7.3.1 and its SPs and CHFs; Hue file browser uses the append operation for large files. This API is not supported by Ozone in 7.1.9, therefore large file uploads can be slow or can time out in the browser.; Use native Ozone client to upload large files instead of the Hue file browser.
OPSAPS-66469: Ozone-site.xml is missing if the host does not contain HDFS roles: 7.3.2, 7.3.1 and its SPs and CHFs; The client side /etc/hadoop/conf/ozone-site.xml file is not generated by Cloudera Manager if the host does not have any HDFS role. Because of this, issuing Ozone commands from that host fails because it cannot find the service name to hostname mapping. When this issue occurs, an error message is displayed: # ozone sh volume list o3://ozoneabc 23/03/06 18:46:15 WARN ha.OMProxyInfo: OzoneManager address ozoneabc:9862 for serviceID null remains unresolved for node ID null Check your ozone-site.xml file to ensure ozone manager addresses are configured properly.; Add the HDFS gateway role on that host.
CDPD-63144: hadoop.ozone.om.request.key.OMKeyRenameRequestWithFSO: Rename key failed with "failed to get parent dir" error: 7.3.2 and its SPs and CHFs, 7.3.1 and its SPs and CHFs; Key rename inside the FSO bucket fails and discplays the Failed to get parent dir error. This happens when running impala workloads with ozone.; None.
CDPD-74331: Key put fails with "CompletionException: Failed to write chunk": 7.3.2 and its SPs and CHFs, 7.3.1 and its SPs and CHFs; Key put fails and displays the Failed to write chunk error when there is a volume failure during configuration.; None.
CDPD-74475: YCSB test with Hbase on Ozone degrades performance: 7.3.2, 7.3.1 and its SPs and CHFs; HBase on Ozone is currently provided as a Tech Preview feature. Performance characteristics, including throughput, may not yet match those of HBase running on HDFS.; None.
CDPD-74884: Exclusive size of snapshot is always returning 0: 7.3.2, 7.3.1 and its SPs and CHFs; The exclusiveSize and exclusiveReplicationSize statistics provided by the snapshot info command return a value of 0 even when the snapshot contains exclusive keys or files that do not exist in other snapshots.; None.; Apache JIRA: HDDS-11528
CDPD-75042: AWS Cli recursive delete only deletes the leaf element: 7.3.2 and its SPs and CHFs, 7.3.1 and its SPs and CHFs; AWS CLI rm or delete command fails to delete all files and directories on the Ozone FSO bucket. It only deletes the leaf node.; None.
CDPD-75204: Setting file name length limit can cause NN shutdown with FSImage related error: 7.3.2 and its SPs and CHFs, 7.3.1 and its SPs and CHFs; Namenode restart fails after dfs.namenode.fs-limits.max-component-length is set to a lower value and there is existing data present which exceeds the length limit.; Increase the value for the dfs.namenode.fs-limits.max-component-length parameter and restart the namenode.
CDPD-75635: Ozone write fails intermittently because SCM remains in safe mode.: 7.3.2, 7.3.1 and its SPs and CHFs; Ozone write operations can fail intermittently after restarting the Storage Container Manager (SCM) leader node (or when stopping the OM leader and an SCM follower). This occurs because SCM may remain in safe mode after the restart, causing write/block allocation prechecks to fail (for example, SafeModePrecheck failed for allocateBlock).; Wait for SCM to exit safe mode automatically after it meets the required thresholds. Alternatively, manually force SCM to exit from safe mode using CLI options.