Fixed issues in Ozone
Review the list of Ozone issues that are resolved in Cloudera Runtime 7.3.1.
- OPSAPS-71474: In Cloudera Manager UI, the Ozone service Snapshot tab displays label label.goToBucket and it must be changed to Go to bucket.
- This issue is now resolved.
- OPSAPS-70288: Improvements in master node decommissioning.
- This issue is now resolved by making usability and functional improvements to the Ozone master node decommissioning.
- CDPD-74756: Update Ratis to 3.1.1
- Updated Ratis dependency version from 3.1.0 to 3.1.1.
Apache Jira: HDDS-11504
- CDPD-74241: OmSnapshotPurge should be in a different Ozone manager double buffer batch.
- This issue is now resolved.
Apache Jira: HDDS-11453
- CDPD-74200: Recon UI shows incorrect data about volume, bucket, and keys. Recon is unable to sync its data with OM DB.
- This issue is now resolved.
- CDPD-74074: The /v1/triggerdbsync/om api is working with non-admin user even if security is enabled.
- This issue is now resolved.
Apache Jira: HDDS-11436
- CDPD-73775: Replace solr.version with solr_lkgb_jar_version for Ozone to use downstream version of Solr.
- Replaced pom solr.version with solr_lkgb_jar_version for Ozone to use downstream version of Solr.
- CDPD-73447: Incorrect number of deleted containers shown in Recon UI.
- The addition of the EMPTY_MISSING state to the retainOrUpdateRecord method enables Recon to correctly identify and manage the lifecycle of these containers, ensuring that no stale or deleted containers ramin in memory or in Recon's records.
- CDPD-73330: The namespace quota and namespace dist commands fail and displays the Path not found in the system error for the existing volume or bucket.
- Fixed the Ozone admin namespace summary.
Apache Jira: HDDS-10581
- CDPD-72142: Keys from DeletedTable and DeletedDirTable of Active Object Store (AOS) should be deleted on batch operation while creating a Snapshot.
- On snapshot creation, DeletedTable and
DeletedDirTable of AOS is cleared. This operation is
not performed in the same transaction as Snapshot create which can cause an
orphan block objects in case of bootstrapping and lagging follower. This issue
is now resolved and Snapshot creation and clearing of the
DeletedTableand DeletedDirTable on
AOS will be a single batch operation.
Apache Jira: HDDS-11183
- CDPD-72076: The OMDoubleBuffer error is displayed when handling OMRequest: cmdType: SnapshotMoveDeletedKeys.
- This fixes OM crash issue when the follower is lagging and it executes
purgeKeys or
snapshotMoveDeletedKeys for the Snapshot in the one
transaction.
Apache Jira: HDDS-11152
- CDPD-72019: Remove the locks from SnapshotPurge and SnapshotSetProperty APIs.
- This fixes the OM crash issue when the follower is lagging and it
executes purgeKeys or
snapshotMoveDeletedKeys for the Snapshot in one transaction.
Apache Jira: HDDS-11137
- CDPD-71702: Ozone Manager is down to Snapshot Chain Corruption.
-
SSTFilteringService directly updates the
snapshotInfoTable which can cause the snapshot chain
corruption if OM crashes before DB gets flushed for snapshot purge and
SSTFilteringService has updated the next snapshot in
the chain.
Apache Jira: HDDS-11068
- CDPD-71584: Ozone Recon DecomissioningInfo API throws displays the NPE error.
- This issue is resolved by fixing the NullPointerException
when running DecomissioningInfo API.
Apache Jira: HDDS-11045
- CDPD-71502: Ozone Recon - Decommissioned datanodes show up even after removing it from the Recon Datanodes page.
- Recon previously allowed to remove the Decommissioned datanodes and was
removing from Recon rocksDB nodes table. However, Decommissioned datanodes
continue to send heartbeats till they are being shutdown. This gets registered
and added again in the Recon memory map which makes them show up again in
datanodes UI. This issue is now resolved and allows only decommissioned
datanodes to be removed and skip other node status or node operational status datanodes.
Apache Jira: HDDS-11032
- CDPD-70469: Ozone Recon - Handle startup failure and log reasons as error because SCM non-HA is enabled.
- This issue is now resolved by fixing the Recon startup failure when SCM
runs in non-HA mode.
Apache Jira: HDDS-10937
- CDPD-68912: Ozone Recon - Improve Recon startup failure handling.
- This issue is now resolved. Recon should recover from Runtime or
unexpected failures during startup and provide information on Recon UI. Recon
can fail to start due to several reasons:
- Failure of registering of datanodes or invalid topology.
- Initialization of pipelines.
Apache Jira: HDDS-10702
- CDPD-67668: Ozone Recon - Provide DN decommissioning detailed status and information inline with current CLI command output.
- This issue resolved by adding a new improvement to provide API in Recon
for DN decommissioning. Status and information is now inline with current CLI
command output.
Apache Jira: HDDS-10514
- CDPD-67460: Container Balancer should only move containers with size greater than 0 bytes.
- This issue is now resolved by introducing a check on the size of the
containers allowed to leave the source node during the balancing
process.
Apache Jira: HDDS-10483
- CDPD-67278: Fix the DN links on the Ozone SCM UI. This is a backport of KNOX-3012.
- A change in Ozone affected Knox on the Ozone SCM UI. The links for the datanodes did not route through Knox. This issue is now resolved and the DN links will redirect to the correct Knox URLs.
- CDPD-67095: DN URL in SCM Page through Knox redirects to non-Knox URL.
- A change in Ozone affected Knox on the Ozone SCM UI. The links for the datanodes did not route through Knox. With CDPD-67278 and CDPD-69143, this issue is now resolved and the DN links will redirect to the correct Knox URLs.
- CDPD-64874: Intermittent failure in TestOzoneRpcClientAbstract.testListSnapshot.
- This issue is now resolved by fixing listSnapshotAPI
intermittent wrong data issues. The listSnapshot API uses the
org.apache.hadoop.ozone.om.ListIterator.MinHeapIterator
which internally uses both CacheIterator and
DBIterator and DBIterator had the
logic of checking if rocks DB key is present in cache in
org.apache.hadoop.ozone.om.ListIterator.DbTableIter#getNextKey.
This checks the cache from table cache which may be intermittently flushed and
makes the addition of duplicate entry in
org.apache.hadoop.ozone.om.ListIterator.MinHeapIterator.
You must use the pre-loaded keys in
org.apache.hadoop.ozone.om.ListIterator.CacheIter#cacheKeyMap
in org.apache.hadoop.ozone.om.ListIterator.CacheIter.
Apache Jira: HDDS-9967
- CDPD-64815: NSSummary commands should close OzoneClient.
- NSSummaryAdmin creates
OzoneClient for some bucket-related checks. This issue
now resolves:
- Close client when no longer needed
- Reuse client (or even bucket after lookup) for all checks
Apache Jira: HDDS-9944
- CDPD-64209: Ozone Recon - Potential memory overflow in Container Health Task.
- This issue is now resolved by fixing the Potential memory overflow in
Container Health Task of Recon.
Apache Jira: HDDS-9819
- CDPD-63596: Do not include SpotBugs at compile scope.
- This issue is now resolved by removing spotbugs-annotation, an LGPL
thirdparty dependency from the Ozone package.
Apache Jira: HDDS-9692
- CDPD-62991: Recon UI - Bucket Drop down filter is not getting disabled when more than 1 volume is selected. This is a backport of HDDS-9556.
- This issue is now resolved.
Apache Jira: HDDS-9556
- CDPD-62931: Incorrect pipeline ID for closed container.
- This issue is now resolved.
Apache Jira: HDDS-9544
- CDPD-62925: Ozone debug chunkinfo command shows incorrect number of entries.
- This issue is now resolved.
Apache Jira: HDDS-9542
- CDPD-62471: Recon UI - Disk Usage page should reflect the information it displays.
- This issue is now resolved.
Apache Jira: HDDS-9465
- CDPD-62466: Improve thread names in Recon.
- This issue is resolved by improving the thread naming in Recon process.
- Pass Recon as a thread name prefix in Recon.
- Ensure all other threads created in Recon code also include Recon in their name.
Apache Jira: HDDS-9470
- CDPD-61700: Ozone debug chunkinfo shows incorrect block path for some nodes in a phatcat cluster.
- This issue is now resolved.
Apache Jira: HDDS-9356
- CDPD-60647: Snapshot purge should be an atomic operation.
- This issue is resolved by fixing the OM crash issue when the follower is
lagging and it executes purgeKeys or
snapshotMoveDeletedKeys for the Snapshot in one
transaction.
Apache Jira: HDDS-9198
- CDPD-51724: SCM should avoid sending delete transactions for under-replicated containers.
- This issue is now resolved.
Apache Jira: HDDS-4368