Known issues and technical limitations for Ozone are addressed in Cloudera Runtime 7.3.2, its service packs, and cumulative
hotfixes.
Known issues identified in Cloudera Runtime 7.3.2
CDPD-73792: The new snapshot could not be found after
renaming the old snapshot
7.3.2
The ozone sh snapshot rename
command renames snapshots. It is a feature developed by the Apache Ozone
community and is included in Cloudera Base on premises 7.3.1. However, it
does not work properly, and Cloudera Base on premises does not support
it.
CDPD-63350: Force deleting a FSO bucket and its
contents while running rb --force from AWS S3 API is
failing
7.3.2
Force deleting a File System Optimized (FSO)
bucket and its contents while running the rb --force
command from the AWS S3 API might fail with an error, because the S3 client
sends individual delete requests for each key in the bucket. It might delete
or fail to delete individual keys or directories, depending on the
availability of leaf elements. It can completely delete the bucket only when
all the keys or directories are cleaned up. If some keys are not deleted,
the bucket will not be deleted.
Sample error
message
# aws s3 rm s3://buck-fso --recursive
delete: s3://buck-fso/dir1/
delete: s3://buck-fso/dir1/dir2/
delete: s3://buck-fso/dir3/dir4/dir5/
delete: s3://buck-fso/dir3/dir4/
delete failed: s3://buck-fso/dir3/
# Rerun the same command again
# aws s3 rm s3://buck-fso --recursive
delete: s3://buck-fso/dir1/
delete: s3://buck-fso/dir3/
# To Confirm
# ozone sh key list s3v/buck-fso
[ ]
Run the rb --force command
multiple times to completely clean up the keys and directories.
CDPD-98751: Mismatched Replicas
tab in the Recon UI fails to display containers with inconsistent replica
checksums
7.3.2
In the Ozone Recon UI, the
Mismatched Replicas tab does not update or
display containers when one or more replicas have differing checksums.
Instead, they are displayed in the Under-Replicated
tab.
The replica can be checked in the
Under-Replicated tab, or can be cross-checked
against API response.
CDPD-97512: Mismatch in Open Key count between
Overview page and OM DB Insight in the Recon
UI
7.3.2
In the Ozone Recon UI, the Open
Key count on the Summary section of the
Overview page might not match the count on the OM DB Insight > Open Key tab. Users might see different Open
Key values for the same cluster across these two views.
Use the OM DB Insight > Open Key tab for the most accurate Open Key
count.
CDPD-97376: Container replication counts mismatch in
Recon UI
7.3.2
In the Ozone Recon UI, container replication
counts, including Under-Replicated,
Over-Replicated, and
Mis-Replicated counts, differ between the updated
and the legacy Container page for the same
cluster.
Refer to the legacy
Container page to view accurate replication
counts.
CDPD-97311: Incorrect Creation
Time and Modification Time displayed
on the Namespace Usage page in the Recon UI
7.3.2
In the Ozone Recon UI, the Namespace
Usage page displays incorrect Creation
Time and Modification Time
timestamps. These values do not accurately reflect the actual creation or
last modification times or dates of the namespace, resulting in inaccurate
metadata information.
Retrieve the correct timestamp values directly
from the API response.
CDPD-97312: Mismatch between cluster State Container
count and Container Summary totals
7.3.2
The Ozone Recon UI can display discrepancies
between the Storage Container Manager (SCM) container count and the Recon
container summary. This occurs because Recon does not synchronize all
container states such as QUASI_CLOSED leading to inconsistent totals.
No workaround within the Ozone Recon UI. Use
ozone admin container report CLI command to obtain
the correct container counts for all states in the Ozone cluster.
CDPD-99248: After cdh upgrade, Ozone encounters
failure while running the Finalize Upgrade for SCM on role Storage
Container command
7.3.2
When finalizing an Ozone upgrade for the first
time from Cloudera Manager, the Finalize Upgrade
for SCM on role Storage Container command might fail with the
following stderr
message:
"Invalid response from Storage Container Manager.
Current finalization status is: FINALIZATION_IN_PROGRESS"
This
error occurs even though finalization continues to run on the Storage
Container Manager (SCM).
Ignore the failure in Cloudera Manager. Use the ozone admin scm
finalizationstatus command to monitor progress and wait for
the process to complete on SCM.
CDPD-98892: File Size
Distribution bucket size range calculation is not
correct
7.3.2
If large number of buckets exists within a
volume, the Ozone Recon UI might display incorrect file size distribution
bucket size range calculation in the File Size
Distribution chart on the Insights
page.
CDPD-93116: Ozone client hangs intermittently when
disks are full
7.3.2
This hang is caused by continuous write
retries that persist until the pipeline on the Datanode closes.
The
Ozone client hangs for approximately five minutes when writing data to a
Datanode if a disk full exception or other Datanode error occurs. This hang
is caused by continuous write retries that persist until the pipeline on the
Datanode closes.
Control the request retry behavior by setting
the following configurations on the client side:
Known Issues identified before Cloudera Runtime 7.3.2
Known issues identified before Cloudera Runtime 7.3.2 include only
unresolved issues from previous releases that continue to affect the Cloudera Runtime 7.3.2 base release.
CDPD-91562:
test_validate_certs_configs configuration is failing
with the maximum lifetime validation
7.3.2, 7.3.1.600
In daylight saving time zones, the autogenerated Ozone
certificate duration might differ from the expected duration. This
discrepancy is minor, because the default certificate duration is 365 days
or five years, depending on the Ozone component.
CDPD-75954: The ozone debug ldb
command and ozone auditparser fails with
java.lang.UnsatisfiedLinkError
CDPD-54885: Ozone Prometheus does not work with
TLS
7.3.2, 7.3.1 and its SPs and CHFs
The Prometheus service shipped by Ozone does not
support TLS mode. So, Prometheus is not able to gather metrics from Ozone
endpoints when TLS is enabled.
Go to Ozone > Configuration > Ozone Prometheus Endpoint Token and in the Ozone Prometheus Endpoint
Token property enter any random string. This configuration
generates a plaintext token in the Ozone endpoint process directory allowing
Prometheus to authenticate and collect metrics despite the TLS limitation.
CDPD-56684: Keys and buckets get deleted without
volume permission
7.3.2, 7.3.1 and its SPs and CHFs
When a volume deletion is initiated, the system
recursively deletes all buckets and keys within the volume before attempting
to delete the volume itself. Because the ACL check for volume deletion
permissions occurs only in the end, all the data within the volume is
deleted even without having delete permission on the volume.
CDPD-50610: Large file uploads are slow with OPEN and
stream data approach
7.3.2, 7.3.1 and its SPs and CHFs
Hue file browser uses the append operation for large
files. This API is not supported by Ozone in 7.1.9, therefore large file
uploads can be slow or can time out in the browser.
Use native Ozone client to upload large files
instead of the Hue file browser.
OPSAPS-66469: Ozone-site.xml is missing if the host
does not contain HDFS roles
7.3.2, 7.3.1 and its SPs and CHFs
The client side
/etc/hadoop/conf/ozone-site.xml file is not
generated by Cloudera Manager if the host does not have
any HDFS role. Because of this, issuing Ozone commands from that host fails
because it cannot find the service name to hostname mapping. When this issue
occurs, an error message is displayed: # ozone sh volume list
o3://ozoneabc 23/03/06 18:46:15 WARN ha.OMProxyInfo: OzoneManager
address ozoneabc:9862 for serviceID null remains unresolved for node ID
null Check your ozone-site.xml file to ensure ozone manager addresses
are configured properly.
Add the HDFS gateway role on that host.
CDPD-63144: hadoop.ozone.om.request.key.OMKeyRenameRequestWithFSO: Rename
key failed with "failed to get parent dir" error
7.3.2 and its SPs and CHFs, 7.3.1 and its
SPs and CHFs
Key rename inside the FSO bucket fails and discplays the Failed to
get parent dir error. This happens when running impala
workloads with ozone.
None.
CDPD-74331: Key put fails with "CompletionException: Failed to write
chunk"
7.3.2 and its SPs and CHFs, 7.3.1 and its
SPs and CHFs
Key put fails and displays the Failed to write chunk
error when there is a volume failure during configuration.
None.
CDPD-74475: YCSB test with Hbase on Ozone degrades
performance
7.3.2, 7.3.1 and its SPs and CHFs
HBase on Ozone is currently provided as a Tech Preview
feature. Performance characteristics, including throughput, may not yet
match those of HBase running on HDFS.
None.
CDPD-74884: Exclusive size of snapshot is always returning
0
7.3.2, 7.3.1 and its SPs and CHFs
The exclusiveSize and
exclusiveReplicationSize statistics provided by the
snapshot info command return a value of
0 even when the snapshot contains exclusive keys or
files that do not exist in other snapshots.
CDPD-75042: AWS Cli recursive delete only deletes the leaf
element
7.3.2 and its SPs and CHFs, 7.3.1 and its
SPs and CHFs
AWS CLI rm or
delete command fails to delete all files and
directories on the Ozone FSO bucket. It only deletes the leaf node.
None.
CDPD-75204: Setting file name length limit can cause NN
shutdown with FSImage related error
7.3.2 and its SPs and CHFs, 7.3.1 and its
SPs and CHFs
Namenode restart fails after
dfs.namenode.fs-limits.max-component-length is set to a lower value and
there is existing data present which exceeds the length limit.
Increase the value for the
dfs.namenode.fs-limits.max-component-length
parameter and restart the namenode.
CDPD-75635: Ozone write fails intermittently because SCM
remains in safe mode.
7.3.2, 7.3.1 and its SPs and CHFs
Ozone write operations can fail intermittently after
restarting the Storage Container Manager (SCM) leader node (or when stopping
the OM leader and an SCM follower). This occurs because SCM may remain in
safe mode after the restart, causing write/block allocation prechecks to
fail (for example, SafeModePrecheck failed for allocateBlock).
Wait for SCM to exit safe mode automatically
after it meets the required thresholds. Alternatively, manually force SCM to
exit from safe mode using CLI options.