Review the list of Ozone issues that are resolved in Cloudera Runtime 7.3.1, its service packs and cumulative hotfixes.
Cloudera Runtime 7.3.1.400 SP2
- CDPD-82201: OMKeyAclRequestWithFSO is incorrectly
setting full path as key name
- 7.3.1.400
- When you set, add, or remove an ACL for a FSO
bucket, the key name gets corrupted with the full key path. This fix ensures
the correct key name is set during the ACL calls.
-
Apache Jira: HDDS-12891
- CDPD-81939: Volume scanner should fail volume if
rocksDB is inaccessible
- 7.3.1.400
- When RocksDB becomes unreadable on a DataNode
due to disk-related issues, the DataNode will mark the affected storage
volume as unhealthy. This proactive health marking enables the system to
initiate data replication processes more rapidly, thereby maintaining data
availability and integrity.
Apache Jira: HDDS-12723
- CDPD-78932: Container replication should be
atomic
- 7.3.1.400
- During container replication, the destination
node imports the container from the source node. If any issues are
encountered during the import process, the Datanode is responsible for
gracefully cleaning up any residual or stale container metadata to maintain
system integrity.
Apache Jira: HDDS-12233
- CDPD-73278: Update OM, SCM, Datanode conf for
RATIS-2135
- 7.3.1.400
- Set
raft.grpc.message.size.max to be 1MB larger than
raft.server.log.appender.buffer.byte-limit for OM,
SCM and Datanode.
Apache Jira: HDDS-11320
- CDPD-57559: New Ozone Manager leader cannot verify the
Ozone delegation token signed by old Ozone Manager leader
- 7.3.1.400
- If an Ozone cluster is upgraded and then later
downgraded, then the new Ozone Manager after downgrade cannot verify the new
Ozone delegation tokens issued before downgrade causing the clients which
are still running during this upgrade download period to fail. If there is
no downgrade, then everything is fine.
This issue is fixed and this fix
changes the Ozone delegation token sign from using asymmetric key to
symmetric key.
Apache Jira: HDDS-8829
- CDPD-70409: Recon Overview Page UI fails to load if
Recon Solr Health throws error
- 7.3.1.400
- This fixes an issue where the Recon UI failed
to load if the Solr Health check API threw an error.
- CDPD-80742: ConstraintViolationException was crashing
the ContainerHealthTask in Ozone Recon
- 7.3.1.400
- ConstraintVoilationException was crashing the
ContainerHealthTask in Recon. After this fix, task will not crash and
continue to identify the Unhealthy containers in SCM if any.
Apache
Jira: HDDS-12585
Cloudera Runtime 7.3.1.300 SP1 CHF 1
- CDPD-80823: Snapshot creation is removing extra keys
from the Active Object Storage's DB
- 7.3.1.300
- Wrong keys were trapped in the
DeletedTable of the snapshot if the OBS bucket name
is a prefix for another OBS bucket, resulting in orphaned blocks. After the
fix, it will not remove any extra keys from the
DeletedTable for Active Object storage.
Apache
Jira: HDDS-12611
- CDPD-73375: Publishing hadoop metrics immediately in
Prometheus sink fills up SinkQueue quickly
- 7.3.1.300
- Prometheus sink already have a mechanism to
publish metrics every 10 seconds by default using call back with timer
event. So, we removed the code to publish immediately and this issue is
fixed.
Apache Jira: HDDS-12193
- CDPD-78671: Metric timer task is blocking
installSnapshotFromLeader on follower node
- 7.3.1.300
- ozone.om.snapshot.rocksdb.metrics.enabled is now
available in Ozone to disable metric collection on the snapshotted DB if
necessary.
Apache Jira: HDDS-11339
- CDPD-78781: Tarball creation interfering with snapshot
purge
- 7.3.1.300
- Synchronize
SnapshotDeletingService on
BootstrapStateHandler.Lock to make sure that no
background service is running when Tarball creation is in
progress.
Apache Jira: HDDS-12210
Cloudera Runtime 7.3.1.200 SP1
- CDPD-74556: EC Checksum throws
IllegalArgumentException because the buffer limit is negative
- 7.3.1.200
- When
ozone.client.bytes.per.checksum is set to a lower
value (for example, 16kb), the parity checksum calculation during validation
phase is wrong leading to IllegalArgumentException on client. This is now
fixed.
Apache Jira: HDDS-11482
- CDPD-75981: Default native ACL limits to user and
user's primary group
- 7.3.1.200
- Default native ACL created for an object such as
volume, bucket, or file limits to the object owner and owner's primary
group.
Apache Jira: HDDS-11656
- CDPD-72782: Ozone write does not work when http proxy
is set for the JVM
- 7.3.1.200
- GRPC uses HTTP internally for its connections and due
to this, if HTTP proxy is configured for any Ozone process using GRPC , it
directs each call through the proxy even for GRPC which is not desirable for
performance. Hence this fix disables proxy for GRPC connections that Ozone
uses.
Apache Jira: HDDS-11257
- CDPD-65714: Allow FS client to specify EC as default
filesystem replication
- 7.3.1.200
- This fix allows you to specify EC as the default
replication type for a file uploaded through the Hadoop FileSystem API to
Ozone through a client side configuration option.
Apache Jira:
HDDS-10336
Cloudera Runtime 7.3.1.100 CHF 1
There are no fixed issues in this release.
Cloudera Runtime 7.3.1
- OPSAPS-71474: In Cloudera Manager
UI, the Ozone service Snapshot tab displays
label label.goToBucket and it must be changed to
Go to bucket.
- 7.3.1
- This issue is now resolved.
- OPSAPS-70288: Improvements in master node
decommissioning.
- 7.3.1
- This issue is now resolved by making usability and
functional improvements to the Ozone master node decommissioning.
- CDPD-74756: Update Ratis to 3.1.1
- 7.3.1
- Updated Ratis dependency version from 3.1.0 to 3.1.1.
Apache Jira:
HDDS-11504
- CDPD-74241: OmSnapshotPurge
should be in a different Ozone manager double buffer batch.
- 7.3.1
- This issue is now resolved.
Apache Jira:
HDDS-11453
- CDPD-74200: Recon UI shows incorrect data about
volume, bucket, and keys. Recon is unable to sync its data with OM DB.
- 7.3.1
- This issue is now resolved.
- CDPD-74074: The
/v1/triggerdbsync/om api is working with non-admin
user even if security is enabled.
- 7.3.1
- This issue is now resolved.
Apache Jira:
HDDS-11436
- CDPD-73775: Replace solr.version
with solr_lkgb_jar_version for Ozone to use downstream
version of Solr.
- 7.3.1
- Replaced pom solr.version with
solr_lkgb_jar_version for Ozone to use downstream
version of Solr.
- CDPD-73447: Incorrect number of deleted containers
shown in Recon UI.
- 7.3.1
- The addition of the EMPTY_MISSING
state to the retainOrUpdateRecord method enables Recon
to correctly identify and manage the lifecycle of these containers, ensuring
that no stale or deleted containers ramin in memory or in Recon's
records.
- CDPD-73330: The namespace quota and namespace dist
commands fail and displays the Path not found in the system
error for the existing volume or bucket.
- 7.3.1
- Fixed the Ozone admin namespace summary.
Apache Jira:
HDDS-10581
- CDPD-72142: Keys from
DeletedTable and
DeletedDirTable of Active Object Store (AOS) should
be deleted on batch operation while creating a Snapshot.
- 7.3.1
- On snapshot creation,
DeletedTable and
DeletedDirTable of AOS is cleared. This operation
is not performed in the same transaction as Snapshot create which can cause
an orphan block objects in case of bootstrapping and lagging follower. This
issue is now resolved and Snapshot creation and clearing of the
DeletedTableand
DeletedDirTable on AOS will be a single batch operation.
Apache Jira:
HDDS-11183
- CDPD-72076: The OMDoubleBuffer
error is displayed when handling OMRequest:
cmdType:
SnapshotMoveDeletedKeys.
- 7.3.1
- This fixes OM crash issue when the follower is lagging
and it executes purgeKeys or
snapshotMoveDeletedKeys for the Snapshot in the one
transaction.
Apache Jira:
HDDS-11152
- CDPD-72019: Remove the locks from
SnapshotPurge and
SnapshotSetProperty APIs.
- 7.3.1
- This fixes the OM crash issue when the follower is
lagging and it executes purgeKeys or
snapshotMoveDeletedKeys for the Snapshot in one transaction.
Apache Jira:
HDDS-11137
- CDPD-71702: Ozone Manager is down to Snapshot Chain
Corruption.
- 7.3.1
-
SSTFilteringService directly updates the
snapshotInfoTable which can cause the snapshot
chain corruption if OM crashes before DB gets flushed for snapshot purge and
SSTFilteringService has updated the next snapshot
in the chain.
Apache Jira:
HDDS-11068
- CDPD-71584: Ozone Recon
DecomissioningInfo API throws displays the NPE
error.
- 7.3.1
- This issue is resolved by fixing the
NullPointerException when running
DecomissioningInfo API.
Apache Jira:
HDDS-11045
- CDPD-71502: Ozone Recon - Decommissioned datanodes
show up even after removing it from the Recon Datanodes page.
- 7.3.1
- Recon previously allowed to remove the Decommissioned
datanodes and was removing from Recon rocksDB nodes table. However,
Decommissioned datanodes continue to send heartbeats till they are being
shutdown. This gets registered and added again in the Recon memory map which
makes them show up again in datanodes UI. This issue is now resolved and
allows only decommissioned datanodes to be removed and skip other node
status or node operational status datanodes.
Apache Jira:
HDDS-11032
- CDPD-70469: Ozone Recon - Handle startup failure and
log reasons as error because SCM non-HA is enabled.
- 7.3.1
- This issue is now resolved by fixing the Recon startup
failure when SCM runs in non-HA mode.
Apache Jira:
HDDS-10937
- CDPD-68912: Ozone Recon - Improve Recon startup
failure handling.
- This issue is now resolved. Recon should recover from
Cloudera Runtime or unexpected failures during startup
and provide information on Recon UI. Recon can fail to start due to several
reasons:
- Failure of registering of datanodes or invalid topology.
- Initialization of pipelines.
Apache Jira:
HDDS-10702
- CDPD-67668: Ozone Recon - Provide DN decommissioning
detailed status and information inline with current CLI command output.
- 7.3.1
- This issue resolved by adding a new improvement to
provide API in Recon for DN decommissioning. Status and information is now
inline with current CLI command output.
Apache Jira:
HDDS-10514
- CDPD-67460: Container Balancer should only move
containers with size greater than 0 bytes.
- 7.3.1
- This issue is now resolved by introducing a check on
the size of the containers allowed to leave the source node during the
balancing process.
Apache Jira:
HDDS-10483
- CDPD-67278: Fix the DN links on the Ozone SCM UI. This
is a backport of KNOX-3012.
- 7.3.1
- A change in Ozone affected Knox on the Ozone SCM UI.
The links for the datanodes did not route through Knox. This issue is now
resolved and the DN links will redirect to the correct Knox URLs.
- CDPD-67095: DN URL in SCM Page through Knox redirects
to non-Knox URL.
- A change in Ozone affected Knox on the Ozone SCM UI.
The links for the datanodes did not route through Knox. With CDPD-67278 and
CDPD-69143, this issue is now resolved and the DN links will redirect to the
correct Knox URLs.
- CDPD-64874: Intermittent failure in
TestOzoneRpcClientAbstract.testListSnapshot.
- 7.3.1
- This issue is now resolved by fixing
listSnapshotAPI intermittent wrong data issues. The
listSnapshot API uses the
org.apache.hadoop.ozone.om.ListIterator.MinHeapIterator
which internally uses both CacheIterator and
DBIterator and DBIterator had
the logic of checking if rocks DB key is present in cache in
org.apache.hadoop.ozone.om.ListIterator.DbTableIter#getNextKey.
This checks the cache from table cache which may be intermittently flushed
and makes the addition of duplicate entry in
org.apache.hadoop.ozone.om.ListIterator.MinHeapIterator.
You must use the pre-loaded keys in
org.apache.hadoop.ozone.om.ListIterator.CacheIter#cacheKeyMap
in org.apache.hadoop.ozone.om.ListIterator.CacheIter.
Apache Jira:
HDDS-9967
- CDPD-64815: NSSummary commands
should close OzoneClient.
- 7.3.1
- NSSummaryAdmin creates
OzoneClient for some bucket-related checks. This
issue now resolves:
- Close client when no longer needed
- Reuse client (or even bucket after lookup) for all checks
Apache Jira:
HDDS-9944
- CDPD-64209: Ozone Recon - Potential memory overflow in
Container Health Task.
- 7.3.1
- This issue is now resolved by fixing the Potential
memory overflow in Container Health Task of Recon.
Apache Jira:
HDDS-9819
- CDPD-63596: Do not include
SpotBugs at compile scope.
- 7.3.1
- This issue is now resolved by removing
spotbugs-annotation, an LGPL thirdparty dependency from the Ozone package.
Apache Jira:
HDDS-9692
- CDPD-62991: Recon UI - Bucket Drop down filter is not
getting disabled when more than 1 volume is selected. This is a backport of
HDDS-9556.
- 7.3.1
- This issue is now resolved.
Apache Jira:
HDDS-9556
- CDPD-62931: Incorrect pipeline ID for closed
container.
- 7.3.1
- This issue is now resolved.
Apache Jira:
HDDS-9544
- CDPD-62925: Ozone debug chunkinfo
command shows incorrect number of entries.
- 7.3.1
- This issue is now resolved.
Apache Jira:
HDDS-9542
- CDPD-62471: Recon UI - Disk Usage page should reflect
the information it displays.
- 7.3.1
- This issue is now resolved.
Apache Jira:
HDDS-9465
- CDPD-62466: Improve thread names in Recon.
- 7.3.1
- This issue is resolved by improving the thread naming
in Recon process.
- Pass Recon as a thread name prefix in Recon.
- Ensure all other threads created in Recon code also include Recon in
their name.
Apache Jira:
HDDS-9470
- CDPD-61700: Ozone debug chunkinfo
shows incorrect block path for some nodes in a phatcat cluster.
- 7.3.1
- This issue is now resolved.
Apache Jira:
HDDS-9356
- CDPD-60647: Snapshot purge should be an atomic
operation.
- 7.3.1
- This issue is resolved by fixing the OM crash issue
when the follower is lagging and it executes purgeKeys
or snapshotMoveDeletedKeys for the Snapshot in one
transaction.
Apache Jira:
HDDS-9198
- CDPD-51724: SCM should avoid sending delete
transactions for under-replicated containers.
- 7.3.1
- This issue is now resolved.
Apache Jira:
HDDS-4368