Known Issues in Ozone

Known issues and technical limitations for Ozone are addressed in Cloudera Runtime 7.3.2, its service packs, and cumulative hotfixes.

Known issues identified in Cloudera Runtime 7.3.2

CDPD-63350: Force deleting a FSO bucket and its contents while running rb --force from AWS S3 API is failing
7.3.2
Force deleting a FSO bucket and its contents while running the rb --force from the AWS S3 API might fail with an error, because the S3 client sends individual delete requests for each key in the bucket. It may delete or fail to delete individual keys or directories, depending on the availability of leaf elements. It can completely delete the bucket only when all the keys or directories are cleaned up, if some keys are not deleted, the bucket will not be deleted.
Sample error message:
# aws s3 rm s3://buck-fso --recursive
delete: s3://buck-fso/dir1/
delete: s3://buck-fso/dir1/dir2/
delete: s3://buck-fso/dir3/dir4/dir5/
delete: s3://buck-fso/dir3/dir4/
delete failed: s3://buck-fso/dir3/
# Rerun the same command again
# aws s3 rm s3://buck-fso --recursive
delete: s3://buck-fso/dir1/
delete: s3://buck-fso/dir3/
# To Confirm
# ozone sh key list s3v/buck-fso
[ ]
Run the rb --force command multiple times to completely cleanup the keys and directories.
Apache JIRA: HDDS-9637
CDPD-98751: Mismatched Replicas tab in the Recon UI fails to display containers with inconsistent replica checksums
7.3.2
In the Ozone Recon UI, Mismatched Replicas tab does not update or display containers when one or more replicas have differing checksums. Instead, they are displayed in the Under-Replicated tab.
The replica can be checked in the Under-Replicated tab, or can be cross-checked against API response.
CDPD-97512: Mismatch in Open Key count between Overview page and OM DB Insight in the Recon UI
7.3.2
In the Ozone Recon UI, the Open Key count on the Overview page (Summary section) may not match the count on the OM DB Insight > Open Key tab. Users might see different Open Key values for the same cluster across these two views.
Use the OM DB Insight > Open Key tab for the most accurate Open Key count.
CDPD-97376: Container replication counts mismatch between new and old Container pages in the Recon UI
7.3.2
In the Ozone Recon UI, Container replication counts—including Under-Replicated, Over-Replicated, and Mis-Replicated—differ between the new and old Container pages for the same cluster.
Refer to the old Container page to view accurate replication counts.
CDPD-97311: Incorrect Creation Time and Modification Time displayed on the Namespace Usage page in the Recon UI
7.3.2
In the Ozone Recon UI, the Namespace Usage page displays incorrect Created and Modified timestamps. These values do not accurately reflect the actual creation or last modification times or dates of the namespace, resulting in inaccurate metadata information.
Retrieve the correct timestamp values directly from the API response.
CDPD-97312: Mismatch between cluster State Container count and Container Summary totals
7.3.2
Ozone Recon does not sync all container states and can have discrepancies between the Storage Container Manager (SCM) container count and the Recon container count due to QUASI_CLOSED and other container states.
There is no workaround from the Ozone Recon. Use ozone admin container report CLI to get the correct container counts for various states in the Ozone cluster.
CDPD-99248: After cdh upgrade, Ozone encountered failure while executing the Finalize Upgrade for SCM on role Storage Container command
7.3.2
When finalizing an Ozone upgrade for the first time from Cloudera Manager, the finalize command Finalize Upgrade for SCM on role Storage Container may fail with the following message in stderr:
"Invalid response from Storage Container Manager.
Current finalization status is: FINALIZATION_IN_PROGRESS"

Despite this, upgrade finalization is still running on the Storage Container Manager (SCM).

Even though the Cloudera Manager command failed, finalization is still running on SCM as the message indicates. Use the command ozone admin scm finalizationstatus to check the status of SCM finalization and wait for it to complete even if the Cloudera Manager command fails.
CDPD-98892: File Size Distribution bucket size range calculation is not correct
7.3.2
There are large number of buckets within a volume. The Ozone Recon UI might display incorrect file size distribution bucket size range calculation in theFile Size Distribution chart in the Insights page.
None
Apache JIRA: HDDS-14827
CDPD-93116: Ozone client hangs for approximately five minutes intermittently when the disk is full
7.3.2
When the Ozone client writes data to the DataNode and there is an exception due to disk full condition or other error on the DataNode, then the client hangs for approximately five minutes due to continuous retry to write data till pipeline over the DataNode is closed.
Retry of the requests must be controlled using below configuration from the client side:
Table 1.
Configuration Recommended value
hdds.ratis.raft.client.rpc.request.timeout 30s
hdds.ratis.client.multilinear.random.retry.policy 1s, 1
hdds.ratis.client.exponential.backoff.max.sleep 5s
hdds.ratis.client.exponential.backoff.base.sleep 1s
hdds.ratis.client.exponential.backoff.max.retries 2
Apache JIRA: HDDS-14040

Known Issues identified before Cloudera Runtime 7.3.2

Known issues identified before Cloudera Runtime 7.3.2 include only unresolved issues from previous releases that continue to affect the Cloudera Runtime 7.3.2 base release.

CDPD-91562: test_validate_certs_configs configuration is failing with the maximum lifetime validation
7.3.2, 7.3.1.600
In daylight saving time zones, the autogenerated Ozone certificate duration might differ from the expected duration. This discrepancy is minor, because the default certificate duration is 365 days or five years, depending on the Ozone component.
CDPD-75954: The ozone debug ldb command and ozone auditparser fails with java.lang.UnsatisfiedLinkError
7.3.2, 7.3.1.400
For information on workaround, see the Changing temporary path for Ozone services and CLI tools documentation.
CDPD-54885: Ozone Prometheus does not work with TLS
7.3.2, 7.3.1 and it's higher versions
The Prometheus service shipped by Ozone does not support TLS mode. So, Prometheus is not able to gather metrics from Ozone endpoints when TLS is enabled.
Go to Ozone > Configuration > Ozone Prometheus Endpoint Token and update the Ozone Prometheus Endpoint Token with any random string. This makes a token available in the process directory of the Ozone endpoint in plaintext, which Prometheus can use to get around the TLS limitation.
CDPD-56684: Keys get deleted when you do not have permission on volume
7.3.2, 7.3.1 and it's higher versions
When a volume is deleted, it recursively deletes the buckets and keys inside it and only then deletes the volume. The volume delete ACL check is done only in the end, due to which you may end up deleting all the data inside the volume without having delete permission on the volume.
CDPD-50610: Large file uploads are slow with OPEN and stream data approach
7.3.2, 7.3.1 and it's higher versions
Hue file browser uses the append operation for large files. This API is not supported by Ozone in 7.1.9 and therefore large file uploads can be slow or timeout from the browser.
Use native Ozone client to upload large files instead of the Hue file browser.
OPSAPS-66469: Ozone-site.xml is missing if the host does not contain HDFS roles
7.3.2, 7.3.1 and it's higher versions
The client side ozone-site.xml (/etc/hadoop/conf/ozone-site.xml) is not generated by Cloudera Manager if the host does not have any HDFS role. Because of this, issuing Ozone commands from that host fails because it cannot find the service name to host name mapping. The error message is similar to this: # ozone sh volume list o3://ozoneabc 23/03/06 18:46:15 WARN ha.OMProxyInfo: OzoneManager address ozoneabc:9862 for serviceID null remains unresolved for node ID null Check your ozone-site.xml file to ensure ozone manager addresses are configured properly.
Add the HDFS gateway role on that host.
CDPD-74016: Running Ozone sh token print on token generated using ozone dtutil command results in Null Pointer Exception.
Print the token generated by dtutil using dtutil.
CDPD-74013: Ozone dtutil get token fails when using o3 or ofs schemas.
Use ozone sh token get to get a token for ozone file system.
CDPD-63144: Key rename inside the FSO bucket fails and discplays the Failed to get parent dir error. This happens when running impala workloads with ozone.
None.
CDPD-74331: Key put fails and displays the Failed to write chunk error when there is a volume failure during configuration.
None.
CDPD-74475: YCSB test with Hbase on Ozone degrades performance.
7.3.2, 7.3.1 and it's higher versions
None.
CDPD-74884: Exclusive size of snapshot is always 0 when you run the info command on the Ozone snapshots. It is a statistics issue and does not impact the functionality of Ozone snapshot.
7.3.2, 7.3.1 and it's higher versions
The exclusiveSize and exclusiveReplicationSize stats presented by snapshot info are 0 even though the snapshot contains exclusive keys or files that are not present in other snapshots.
None.
Apache JIRA: HDDS-11528
CDPD-75042: AWS CLI rm or delete command fails to delete all files and directories on the Ozone FSO bucket. It only deletes the leaf node.
7.3.2, 7.3.1 and it's higher versions
None.
CDPD-75204: Namenode restart fails after dfs.namenode.fs-limits.max-component-length is set to a lower value and there is existing data present which exceeds the length limit.
7.3.2, 7.3.1 and it's higher versions
Increase the value for the dfs.namenode.fs-limits.max-component-length parameter and restart the namenode.
CDPD-75635: Ozone write fails intermittently as SCM remains in safemode.
7.3.2, 7.3.1 and it's higher versions
You must wait for SCM to come out of samemode or exit from safemode through CLI options.