Fixed issues in Ozone parcel 718.2.0
You can review the list of reported issues and their fixes in Ozone parcel 718.2.0. Fixed issues represent selected issues that were previously logged through Cloudera Support, but are now addressed in the current Ozone parcel release. These issues may have been reported in previous versions of Runtime as a known issue; meaning they were reported by customers or identified by Cloudera Quality Engineering teams.
- CDPD-56006: On providing an incorrect hostname/service ID in the Ozone URI, the filesystem client instead of failing, retries till exhaustion and the default retry is too high.
- This issue is resolved.
Bug Fixes:
- HDDS-8495. Fix permissions and path handling for ozone sh token get command
- HDDS-8320. Fix ranger jackson version conflict
- HDDS-8173. Fix to remove enrties from RocksDB after container gets deleted.
- HDDS-8068. Fix Exception: JMXJsonServlet, getting attribute RatisRoles of Hadoop:service=OzoneManager.
- HDDS-8054. Fix NPE in metrics for failed volume
- HDDS-8221. Fix misplaced OzoneKey javadoc comment
- HDDS-8217. Fix OmBucketInfo#equals for comparing defaultReplicationConfig
- HDDS-8183. Fix edge case where delimiter is empty string in BucketEndpoint#get
- HDDS-8091. [addendum] Generate list of config tags from ConfigTag enum - Hadoop 3.1 compatibility fix
- HDDS-7721. Make OM Ratis roles available in /prom endpoint (fix compile error)
- HDDS-7596. Fix OM crash due to a corner case for FSO-enabled bucket
- HDDS-7871. Fix false positive in KeyManagerImpl#createFakeDirIfShould()
- HDDS-7372. Fix missing jars in classpath by specifying jar versions
- HDDS-7792. Fix package name typos in o.a.h.hdds.security.x509
- HDDS-7705. Fix OM Bootstrap request
- HDDS-7576. Prometheus metrics do not remove stale metrics until restart
- HDDS-7644. S3 multipart upload does not update quota namespace for missing parents
- HDDS-7641. Namespace quota validation is not present in multiple places
- HDDS-7696. MisReplicationHandler does not consider QUASI_CLOSED replicas as sources
- HDDS-7633. Compile error with Java 11: package com.sun.jmx.mbeanserver is not visible
- HDDS-7652. Volume Quota not enforced during write when bucket quota is not set
- HDDS-7801. Bucket not found when calling getKeyInfo with tenant context
- HDDS-7859. ICR processing does not remove container reference in NodeManager for a deleted replica
- HDDS-7838. gRPC channel created block input/output stream not shutdown properly
- HDDS-7969. CacheValue should not store value as an Optional.
- HDDS-7991. Do not return fake parent dir for deleted keys
- HDDS-8009. OM HA metrics should be unregistered if leader is not known
- HDDS-8029. [hsync] Outputstream in encrypted buckets do not return the correct stream capabilities.
- HDDS-8070. DBCheckpointMetrics is not unregistered during OM stop.
- HDDS-6176. Ozone service WebUI is not accessible with 404 error.
- HDDS-8108. gRPC channel created for replication client not shutdown properly
- HDDS-7930. [addendum] input stream does not refresh expired block token.
- HDDS-7930. input stream does not refresh expired block token.
- HDDS-8139. Datanodes should not drop block delete transactions based on transaction ID
- HDDS-8275. ManagedWriteBatch is not closed properly in SCMHADBTransactionBuffer
- HDDS-8258. RocksIterator not closed properly in KeyValueHandler.logBlocksIfNonZero
- HDDS-8257. RocksIterator not closed properly in LegacyBucketHandler
- HDDS-8242. Rename operation not working with FSO bucket destination
- HDDS-8496. S3 to return not found for object head/set when keyinfo indicates a directory.
- HDDS-8324. DN data cache gets removed randomly asking for data from disk.
- HDDS-7842. java.lang.NullPointerException: hadoop.ozone.container.keyvalue.statemachine.background.StaleRecoveringContainerScrubbingService.
- HDDS-8045. Dependency convergence error for zookeeper.
- HDDS-7754. Download of container is failing with SSL/TLS error during re-replication
- HDDS-7455. ClassCastException: OzoneTokenIdentifier cannot be cast to String
- HDDS-8440. Ozone Manager crashed with ClassCastException when deleting FSO bucket.
- HDDS-8195. RDBStore.getUpdatesSince() throws RocksDBException: Requested array size exceeds VM limit.
- HDDS-8220. [Ozone-Streaming] Trigger volume check on IOException in StreamDataChannelBase
- HDDS-8057. Handle unchecked exception in KeyValueHandler more gracefully.
- HDDS-8019. IllegalStateException: call already closed at GrpcXceiverService.onCompleted.
- HDDS-7926. [hsync] Recon throws ClassCastException.
- HDDS-7714. Docker cluster ozone-om-ha fails during docker-compose up.
Upgrades:
- HDDS-8512. Bump Spring Framework to 5.3.27
- HDDS-8461. Bump jetty to 9.4.51.v20230217
- HDDS-8394. Upgrade Spring Framework to 5.3.26 due to CVE-2023-20861
- HDDS-8288. Upgrade moment.js to 2.29.4
- HDDS-7839. Upgrade Weld to 3.1.9
- HDDS-8263. [JDK17] Bump guice to 5.1.0, maven-shade-plugin to 3.4.1, remove guice-multibindings
- HDDS-8086. Bump snakeyaml from 1.33 to 2.0
- HDDS-7850. Bump maven-enforcer-plugin version to 3.2.1
- HDDS-7718. Bump Netty to 4.1.86 and gRPC to 1.51.1
- HDDS-8022. Add okio as dependency to the root pom
- HDDS-8022. Add okio as a dependency to the root pom.
Container Balancer:
- HDDS-8167. Inject MoveManager into ContainerBalancer.
- HDDS-8153. Integrate ContainerBalancer with MoveManager
- HDDS-8169. Delay Starting ContainerBalancer after SCM failover.
Erasure Coding improvements:
- HDDS-7122. Add validation for EC chunk size
- HDDS-7825. Warn when EC write exception occurs
- HDDS-7787. GetChecksum for EC files can fail intermittently with IndexOutOfBounds exception
- HDDS-7918. EC: ECBlockReconstructedStripeInputStream should check for spare replicas before failing an index
- HDDS-8216. EC: OzoneClientConfig is overwritten in ECKeyOutputStream
- HDDS-6572. EC: ReplicationManager - add move manager for container move
- HDDS-7931. EC: ManagedChannelImpl not cleaned up properly
- HDDS-7919. EC: ECPipelineProvider.createForRead should filter out dead replicas and sort replicas
- HDDS-7928. EC: Change ContainerReplicaPendingOps to store deadline rather than scheduled time
- HDDS-7923. [EC] Reconstruction is failing with IndexOutOfBoundsException
- HDDS-7917. EC: ECBlockInputStream should try spare replicas on error
- HDDS-7844. EC: Add normal and low priority to replication supervisor and commands
- HDDS-7841. EC: Remove ECReconstructionSupervisor and send reconstruction commands to ReplicationSupervisor
- HDDS-7833. EC: Refactor ReplicationSupervisor to allow Replication and ECReconstruction tasks
- HDDS-7761. EC: ReplicationManager - Use placementPolicy.replicasToRemoveToFixOverreplication in EC Over replication handler
- HDDS-7775. EC: Exception encountered while deleting UNHEALTHY replica in Datanode
- HDDS-7695. EC metrics related to replication commands don't add up
- HDDS-7729. EC: ECContainerReplicaCount should handle pending delete of unhealthy replicas
- HDDS-7727. EC: SCM unregistered event handler for DatanodeCommandCountUpdated
- HDDS-7666. EC: Unrecoverable EC containers with some remaining replicas may block decommissioning
- HDDS-7683. EC: ReplicationManager - UnderRep maintenance handler should not request nodes if none needed
- HDDS-7654. EC: ReplicationManager - merge mis-rep queue into under replicated queue
- HDDS-8287. Add toString() method to ECContainerReplicaCount
- HDDS-8172. ECUnderReplicationHandler should consider commands already sent when processing the container
- HDDS-8112. ECReconstructionCoordinator is not closed
- HDDS-8075. ECReconstructionCoordinatorTask.runTask should catch Exception
- HDDS-7649. S3 multipart upload EC release space quota wrong for old version
Test improvements:
- HDDS-8374. Fix flaky TestContainerStateCounts in Recon
- HDDS-8274. Intermittent timeout in acceptance-MR test setup
- HDDS-8219. Run HA secure tests from s3g container
- HDDS-7874. Disable flaky unit test: TestHddsSecureDatanodeInit.testCertificateRotationRecoverableFailure
- HDDS-7975. Rebalance acceptance tests
- HDDS-8152. Reduce S3 acceptance test setup time
- HDDS-8205. Reorder OM nodes in HA acceptance tests
- HDDS-7958. Ozone client not closed in integration tests
- HDDS-8144. TestDefaultCertificateClient#testTimeBeforeExpiryGracePeriod fails as we approach DST.
- HDDS-8150. RpcClientTest and ConfigurationSourceTest not run due to naming convention
- HDDS-8087. Intermittent crash in TestHddsDatanodeService
- HDDS-8024. Mark TestHSync#testOfsHSync as flaky
- HDDS-8035. Mark TestHSync#testO3fsHSync as flaky
- HDDS-8035. Mark TestOzoneManagerHAWithData#testOMHAMetrics as flaky
- HDDS-8012. Clean up after link loop test
- HDDS-7983. Intermittent OutOfMemoryError in TestOzoneRpcClientWithRatis#testUploadWithStreamAndMemoryMappedBuffer.
- HDDS-7588. Intermittent failure in TestObjectStoreWithLegacyFS#testFlatKeyStructureWithOBS
- Revert "HDDS-7588. Intermittent failure in TestObjectStoreWithLegacyFS#testFlatKeyStructureWithOBS"
- HDDS-7988. Run S3 tests with HA Proxy
- HDDS-7617. Remove flaky tag from TestECContainerRecovery
- HDDS-7617. Intermittent timeout in TestECContainerRecovery
- HDDS-7921. Migrate TestKeyManagerImpl to JUnit5
- HDDS-7617. Mark TestECContainerRecovery as flaky
- HDDS-3265. Intermittent timeout in TestRatisPipelineLeader
- HDDS-7864. Add integration test for replication
- HDDS-7881. Intermittent timeout in basic acceptance test
- HDDS-7886. Mark parts of TestRatisPipelineCreateAndDestroy as flaky
- HDDS-7868. Intermittent failure in TestOmAcls
- HDDS-7879. Intermittent BindException in HA integration tests
- HDDS-5626. Mark parts of TestAddRemoveOzoneManager, TestBlockOutputStreamFlushDelay and TestFailureHandlingByClient as flaky
- HDDS-7874. Disable flaky unit test: TestHddsSecureDatanodeInit.testCertificateRotation
- HDDS-7856. Fix timeout in TestPushReplicator
- HDDS-7856. Disable flaky TestPushReplicator until fixed
- HDDS-7806. Add unit tests for push replication
- HDDS-7808. Intermittent failure in TestReplicationManager#testUnderReplicationQueuePopulated
- HDDS-7719. [HTTPFSGW] Fix secure integration tests for the HttpFS module
- HDDS-7628. Intermittent failure in TestOzoneContainerWithTLS
- HDDS-7588. Intermittent failure in TestObjectStoreWithLegacyFS#testFlatKeyStructureWithOBS
- HDDS-5698 [HTTPFSGW] Port HTTPFS node and robot tests to ozone-ha, and ozonesecure(-ha)
- HDDS-5615 Add a simple test suite for HTTPFS GW.
- HDDS-8244. Selective checks: handle change in junit.sh
Logging improvements:
- HDDS-8148. Improve log for Pipeline creation failure
- HDDS-8245. Info log for keyDeletingService when nonzero number of keys are deleted.
- HDDS-7867. Clean up replication logs
- HDDS-8131. Add Configuration for OM Ratis Log Purge Tuning Parameters.
- HDDS-7869. Log configuration on component startup.
- HDDS-7959. Improve log in ECBlockInputStream and ECBlockReconstructedStripeInputStream
- HDDS-8037. Improve logging in EC Reconstruction putBlock precondition check
- HDDS-7857. Apply custom RocksDB log configuration for all schema versions
- HDDS-7082. Delete out of date audit logs
- HDDS-7097. Container scanner log output lacks useful information
- HDDS-7753. Simplify DatanodeDetails#toString to improve log messages
- HDDS-7726. EC: Enhance datanode reconstruction log message
- HDDS-7739. EC: Increase the information in the RM sending command log message
- HDDS-7716. Log read requests rejected with permission denied in OM audit
- HDDS-7631. Log format error on quotas when exceeding the space quota
HTTPFS protocol support:
- HDDS-7874. Disable Httpfs path check & mark corresponding APIs unsupported
- Revert "HDDS-5447. HttpFS support in Ozone"
- HDDS-5447. HttpFS support in Ozone
- HDDS-8088. [HTTPFSGW] Conform with new checkstyle rule in master #4354
- HDDS-8023. [HTTPFSGW] Remove support for satisfyStoragePolicy operation
- Change ozone-filesystem dependency scope to runtime in the HttpFS module
- HDDS-7737. [HTTPFSGW] Clean up dependencies
- HDDS-5828 [HTTPFSGW] Add proper handling for unsupported operations
- HDDS-6027 [HTTPFSGW] Fix dependency issues after master merge
- Revert "Quick fix for HTTPFS gateway dependencies after merge... to be revisited."
- Quick fix for HTTPFS gateway dependencies after merge... to be revisited.
- HDDS-5827 [HTTPFSGW] Remove non-server side related code from Ozone.
- HDDS-5829 [HTTPFSGW] Move to org.apache.ozone package from org.apache.hadoop.
- HDDS-5722 [HTTPFSGGW] junit.jar and json-simple in jar report
- HDDS-5826 [HTTPFSGW] Remove or replace Hadoop shaded guava dependencies.
- Revert "[HTTPFSGW] HDDS-5695 Review ZK and Curator dependencies, and get rid of them."
- [HTTPFSGW] HDDS-5695 Review ZK and Curator dependencies, and get rid of them.
- HDDS-5520 Add lifecycle management to HttpFS server
- HDDS-5519 Remove unnecessary hadoop dependencies from httpfs module
- HDDS-5448 Copy over HttpFS module from Hadoop 3.3.1 to Ozone
Recon:
- HDDS-8127. Exclude deleted containers from Recon container count.
- HDDS-5331. Recon: Trigger PipelineSyncTask when DN becomes stale and ContainerHealthTask when DN becomes dead.
- HDDS-6056. Recon /containers endpoint should return SCM container data instead of OM container data.
- HDDS-7505. Recon: Rename OM DB Sync to DB Sync on UI.
- HDDS-7634. Recon: Show Datanode UUID on Pipeline page
- HDDS-7770. Recon namespace summary endpoint to carry basic entity information as well
- HDDS-7854. Recon NPE when trace enabled
- HDDS-4539. Container Health Task should not run until Recon has reached steady state.
- HDDS-8272. DBStore not closed properly in ReconStorageContainerManagerFacade
- HDDS-3486. Recon cannot track missing containers that were created and went missing while it is down.
Documentation:
- HDDS-8016. Updated the ozone doc for linked bucket and deletion async limitation.
- HDDS-7540. Document separate scheduled CI.
- HDDS-7409 [doc] Update documents for better presentation.
- HDDS-7684. Embed Matomo Web Analytics tracking code in docs.
- HDDS-7667. [Admin Doc] Observability: Add Grafana integration
- HDDS-7747. Document jq filtering examples for CLI responses.
- HDDS-7774. Update outdated Trash documentation
- HDDS-7556. Translate EC doc into Chinese
- HDDS-5966. [HTTPFSGW] Update module doc, and place it in Ozone project docs
CLI commands:
- HDDS-8042. Display certificate issuer in cert list command.
- HDDS-8168. Make deadlines inside MoveManager for move commands configurable
- HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers
- HDDS-8187. ReplicationManager: Datanode commands should be sent to nodeManager directly
- HDDS-6802. Implement JSON output for pipeline list CLI.
- HDDS-8241. Transfer leader command doesn't work in secure cluster
- HDDS-8158. Replication Manager: Make all handlers send commands immediately instead of returning commands.
- HDDS-8133. Create ozone sh key checksum command.
- HDDS-534. Remove unsupported jmxget subcommand.
- HDDS-7783. Add compile platform info in 'ozone version' command output
- HDDS-7137. Add CLI for Getting the failed deleted block txn.
- HDDS-3591. ozone.administrators support dynamic changes through the cli.
- HDDS-8039. Allow container inspector to run from ozone debug.
- HDDS-7489. Create a Freon tool to simulate datanodes
Security:
- HDDS-5043: Consistently use user's short name in secure mode
- HDDS-7697: Restrict change of bucket properties to owner and admins in NativeACL
- HDDS-7379. Use certificate bundles instead of the sole certificate
- HDDS-7399. Enable specifying external root ca
- HDDS-7398. Tool to remove old certs from the SCM db
- HDDS-8549. Restore client-side validation of bucket names
- HDDS-7573. Use keyManager and trustManager provided by keyStoreFactory in Ratis group.
- HDDS-8204. Add testuser principals for all Ozone containers.
- HDDS-8041. Let Ozone Client fail faster with wrong OM address in URI.
- HDDS-8145. ReadReplicas should close client.
- HDDS-8163. Use try-with-resources to ensure close rockdb connection in SstFilteringService.
- HDDS-8151. Support fine-grained lifetime for root CA certificate.
- HDDS-8030. Cleanup unused/unnecessary code related to CertificateClient.
- HDDS-8095. Unbuffer not supported in TDE buckets.
- HDDS-7590. Use keyManager and trustManager provided by keyStoreFactory in om grpc services.
- HDDS-8020. File checksum helper leaking client
- HDDS-7525. Migrate key digest from MD5 to SHA256 in Ozone shell.
- HDDS-7708. No check for certificate duration config scenarios.
- HDDS-7339. Implement Certificate renewal task for services.
Performance:
- HDDS-8289: Speed up FSO ListKeys, skip skipToFirst
- HDDS-8147: Introduce latency metrics for S3 Gateway operations
- HDDS-8270: Measure checkAccess latency for Ozone objects
- HDDS-8329: Performance Impact during container close when still Client is writing
- HDDS-8154. Perf: Reuse Mac instances in S3 token validation
- HDDS-8367. setquota should have a check on usedNamespace [NameSpacequota]
- HDDS-8285. Eliminate leftover Guava Optional from CacheValue.
- HDDS-8128. Deduplicate the ops in RDBBatchOperation.
- HDDS-8255. Remove unnecessary sleep at secure container startup
- HDDS-8076. Use container cache in Key listing API.
- HDDS-8074. Improve synchronization around command queue updates in Node Manager.
- HDDS-8129. ContainerStateMachine allows two different tasks with the same container id running in parallel.
- HDDS-8079. RocksDB included in client libs doubles shaded jar size.
- HDDS-7524. Compaction DAG node pruning.
- HDDS-7623. Do not compress container re-replication traffic by default.
- HDDS-7635. Update failure metrics when allocate block fails in preExecute.
- HDDS-7778. Add metrics for push replication
Stability/Reliability:
- HDDS-8424. Restore legacy bucket behavior for directory objects.
- HDDS-8292. Inconsistent key name handling for FSO bucket files.
- HDDS-8383. Misreplication cannot be resolved with single rack
- HDDS-7994. Expose information via the ClusterState endpoint on keys and Directories marked for deletion.
- HDDS-8253. Default Metadir in Current Working Directory.
- HDDS-8230. Let ReplicationManager decide the timeout for commands in Datanodes
- HDDS-8223. SCM delete block service should run wait for safemode to exit.
- HDDS-8155. SCM Block deleting service add transaction blocks for stale/dead/decommissioning DNs
- HDDS-7853. Add support for RemoveSCM in SCMRatisServer.
- HDDS-8090. When getBlock from a datanode fails, retry other datanodes.
- HDDS-6449. Failed container delete can leave artifacts on disk.
- HDDS-8171. Replicate commands could be sent to dead maintenance nodes if the same index is being decommissioned.
- HDDS-8111. ReplicationManager: Add RatisMisReplicationHandler into rm.processUnderReplicatedContainer.
- HDDS-8142. Check if no entries in Block DB for a container on container delete.
- HDDS-8119. Remove loosely related AutoCloseable from SendContainerOutputStream.
- HDDS-8118. Fail container delete on non-empty chunks dir.
- HDDS-8028. JNI for RocksDB SST Dump tool.
- HDDS-8024. When readChunk from a datanode fails, retry other datanodes.
- HDDS-7183. Expose RocksDB critical properties.
- HDDS-8036. Unprotected flush in SCMHADBTransactionBuffer.
- HDDS-6241. Follower SCM node repeatedly sending requests to Ratis server.
- HDDS-7156. Reset pending delete block count
- HDDS-7989. UnhealthyReplicationProcessor retries failure without delay.
- HDDS-7998. Synchronize on containerInfo in ReplicationManager and MoveManager.
- HDDS-8008. Move pendingOps into ContainerStateManagerImpl to ensure consistent state.
- HDDS-7782. OM lease recovery for hsync'ed files.
- HDDS-7463. SCM Pipeline scrubber never able to clean up allocated pipeline.
- HDDS-7621. Update SCM term in datanode from heartbeat without any commands.
- HDDS-7661. Ratis Misreplication Handler.
- HDDS-7584. Addition of new OM node expels itself from the Ratis ring after restart.
- HDDS-7473. Ratis integration for support of remove registeration
- HDDS-7847. Handle Replication of Unhealthy Replicas in RM
- HDDS-7924. Remove non-conflicting transitive-only dependency versions
- HDDS-7915. Force close QUASI_CLOSED replicas only when the container is CLOSED in Legacy RM
- HDDS-7560. Placement Policy Interface changes to handle Overreplication
- HDDS-7738. SCM terminates when adding container to a closed pipeline
Usability:
- HDDS-8484. Allow Ozone volume name to have special characters
- HDDS-7586. Allow user to create bucket with non-s3-naming-convention
- HDDS-7495. Create OBS buckets by default from S3 API
- HDDS-8302. Add a flag to disable hsync by default
- HDDS-6064. Print proper JSON from ozone debug ldb scan.
- HDDS-8281. Add IntelliJ code style scheme import recommendation in CONTRIBUTING.md.
- HDDS-8222. EndpointBase#getBucket should handle BUCKET_NOT_FOUND
- HDDS-8098. Make Schema V3 as default in ozone debug ldb for container db.
- HDDS-8240. Disable JaCoCo for PRs and in forks
- HDDS-8065. Provide GNU long options.
- HDDS-8149. Refactor the way to notify keyStoreFactory about certificate renewed.
- HDDS-8099. Make unlimited length the default in ozone debug ldb.
- HDDS-8110. ReplicationManager: Introduce basic limits on ReplicateContainer commands.
- HDDS-8091. Generate list of config tags from ConfigTag enum.
- HDDS-7816. Add DataNode list to the SCM WebUI.
- HDDS-8073. Replace Usages of LegacyReplicationManager.MoveResult with MoveManager.MoveResult.
- HDDS-8078. Github CI: Allow PR title starting with Revert.
- HDDS-8046. Eliminate embedded newlines from config properties.
- HDDS-7210. Missing open containers show up as "Closing" on the container report.
- HDDS-7934. NPE in RandomKeyGenerator's shutdown hook.
- HDDS-8044. Ensure GrpcOutputStream is closed.
- HDDS-8032. SCM support reconfigurable dynamically.
- HDDS-8077. Enforce new checkstyle: NewlineAtEndOfFile
- HDDS-7301. Cleanup OmUtils.
- HDDS-7971. Support SCM transfer Ratis leadership.
- HDDS-7064. S3 get-object response emits tracing spans outside ObjectEndpoint#get.
- HDDS-7710. Support AWS s3 ListObjects API's encodingType request parameter.
- HDDS-7721. Make OM Ratis roles available in /prom endpoint.
- HDDS-8026. Replace import from shaded Guava.
- HDDS-7890. Refactor ContainerDeletionChoosingPolicy implementations.
- HDDS-7982. NPE in BlockInputStream due to null pipeline after refresh.
- HDDS-8025. ReplicationManager: Count a container once for missing, under, mis, or over replicated.
- HDDS-6743. Specify leader node for OM failover
- HDDS-7973. Let RatisMisReplicationHandler use the new RatisContainerReplicaCount constructor
- HDDS-7979. Replace import from shaded Guava
- HDDS-6650. S3MultipartUpload support update bucket usedNamespace.
- HDDS-7686. Cherry-pick proto.lock files change from ozone-1.3 release branch to master.