Known issues in 7.1.9 CHF 3

You must be aware of the known issues and limitations, the areas of impact, and workaround in Cloudera Runtime 7.1.9 CHF 3.

When using S3A committer fs.s3a.committer.name=directory with fs.s3a.committer.staging.conflict-mode=replace to write to FSO buckets, the client fails with the following error.
DIRECTORY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent directory of xxxxxxxx at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1008) at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:958) at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentId(OMFileRequest.java:1038) at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequestWithFSO.getDBOzoneKey(S3MultipartUploadCompleteRequestWithFSO.java:114) at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:157) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:568) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:363) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
This occurs because S3A uses multipart upload to commit job results in a batch. The staging committer's replace mode deletes the target directory before completing MPU. The problem is that FSO does not create intermediate directories during MPU, it does only for regular file/dir/key requests.
Use fs.s3a.committer.name=magic for ** affected versions.
OPSAPS-69846: If Ozone is installed with custom kerberos principals for its roles, operations on encrypted buckets can fail as Ranger KMS does not have its proxy users and groups configured for the custom S3 Gateway user.
Add the following configurations in Ranger-kms safety valve based on the custom s3g user. In this case , the user is s3gfoo0. The parameters are hadoop.kms.proxyuser.s3gfoo0.hosts = * hadoop.kms.proxyuser.s3gfoo0.groups = *
CDPD-66508: Shallow listing is enabled by default in 7.1.9. There is a bug in shallow listing that causes the below error when listing an empty directory in a LEGACY/OBS bucket:

error when listing an empty directory in a LEGACY/OBS bucket: mkdir: getFileStatus on s3a://testbucket/data/test: com.amazonaws.services.s3.model.AmazonS3Exception: Server Error (Service: Amazon S3; Status Code: 500; Error Code: 500 Server Error; Request ID: null; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null:500 Server Error: Server Error (Service: Amazon S3; Status Code: 500; Error Code: 500 Server Error; Request ID: null; S3 Extended Request ID: null; Proxy: null)

In S3 gateway log: Caused by: java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0 at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) at java.base/java.util.Objects.checkIndex(Objects.java:372) at java.base/java.util.ArrayList.remove(ArrayList.java:535) at org.apache.hadoop.ozone.client.OzoneBucket$KeyIterator.getNextShallowListOfKeys(OzoneBucket.java:1234) at org.apache.hadoop.ozone.client.OzoneBucket$KeyIterator.getNextListOfKeys(OzoneBucket.java:1136) at org.apache.hadoop.ozone.client.OzoneBucket$KeyIterator.hasNext(OzoneBucket.java:1110) at org.apache.hadoop.ozone.s3.endpoint.BucketEndpoint.get(BucketEndpoint.java:208) at jdk.internal.reflect.GeneratedMethodAccessor90.invoke(Unknown Source)

Disable shallow listing.
  1. Log in to Cloudera Manager
  2. Navigate to Clusters
  3. Select the Ozone service
  4. Go to Configurations
  5. In S3 Gateway Advanced Configuration Snippet (Safety Valve) for ozone-conf/ozone-site.xml, set ozone.s3g.list-keys.shallow.enabled = false.
CDPD-65801: This is an intermittent issue in native RocksDB tool which causes corruption to in-memory RocksDB metadata.
Set ozone.om.snapshot.load.native.lib to false and restart the OM.
CDPD-66142: When Solr is slow/down, Solr takes lot of time to respond to Recon Heatmap query or sometimes doesn't respond at all which makes Recon heatmap trying to load the heatmap data forever. This issue will be taken up in future releases and solution could be to introduce a health check for Solr or timeout the Recon query to Solr and show a meaningful message over Recon UI -> "Solr is not responding"
  1. Stop Recon
  2. Restart Solr
  3. Start Recon
CDPD-66247: TestOzoneFileSystem.testListStatusOnKeyNameContainDelimiter is intermittent
None
CDPD-66261: When we have OBS or LEGACY bucket having keyPrefix starting with / like /readPath/ and fsPath configuration (ozone.om.enable.filesystem.paths) is enabled (true), then code flow will hit normalization of key during org.apache.hadoop.ozone.om.KeyManagerImpl#listKeys API call flow and normalized key will be vol/buck/readPath, but in keyTable, key will be saved as vol/buck//readPath/, so it does not match and listKeys API would not be able to retrieve the key with normalized key path. Fix for next CHF release : Check if keyPrefix starting with /, then while normalizing, do not remove the / slash at the begining.
Can set the disable the fsPath configuration (ozone.om.enable.filesystem.paths ) by setting as false in ozone-site.xml before running the test, but this may impact other test also if we set it at global level.
CDPD-66262: As HADOOP-16226 has not landed under CDH/hadoop 7.1.9 CHF3, trailing slashes in a string are not removed during keyName normalization. Consequently, the expected result of the test (TestObjectStoreWithFSO.testListKeysAtDifferentLevels) for listing keys with unnormalized keyNames in an FSO bucket does not match.
None
CDPD-66382: When Bucket layout is LEGACY and ozone.om.enable.filesystem.paths property is set to true, then delete will not work completely if keyName contains "/".
None.
CDPD-66252: du space calculation support for OBS and LEGACY (fsPath disabled).
du space for OBS buckets and LEGACY(fspath disabled) can be seen using CLI command.
OPSAPS-69539: CDP Runtime 7.1.9 from the base release through to CHF3 does not support Oracle JDK 8u401 or OpenJDK 1.8.0_402 (8u402). Some services will fail to start. This can be a problem on RHEL 9.x as version 8u402 is the default OpenJDK 8 installed by the OS.
Workaround is to install an earlier version of JDK 8. For example Oracle jdk-8u291 / 1.8.0_291, or OpenJDK 8u292 / 1.8.0_292.
A fresh install of 7.1.9 CHF 2 does not allow user to bypass the Setup Database screen for YARN Queue Manager
YARN Queue Manager in Cloudera Data Platform (CDP) Private Cloud Base 7.1.9 CHF 2 does not require you to install a PostGres database, therefore users should be able to skip the Setup Database screen. With this known issue, users who are conducting a fresh install of 7.1.9 CHF 2 are not able to bypass the Setup Database screen as expected.
  1. When conducting a fresh install of YARN Queue Manager in 7.1.9 CHF 2, you must ensure that you have both CDP and Cloudera Manager upgraded to 7.1.9 CHF 2.
  2. When you reach the Setup Database screen in the Cloudera Manager installation wizard for Queue Manager, enter any dummy values for the following fields:
    1. Database name: configstore
    2. Database Username: dbuser
    3. Database Password: dbpassword
    YARN Queue Manager will not connect to PostGres with the above details and will fall back to the embedded database.
  3. Run the following script command in a browser console to enable the Continue button:

    document.querySelector('.btn.next').removeAttribute('disabled');

  4. Click Continue and proceed with the YARN Queue Manager installation.
  5. Restart YARN Queue Manager.
CDPD-61524: Ozone Storage Container Manager fails to start on upgrading from CDP Private Cloud Base 7.1.6 to 7.1.9 CHF1. Also, if you have upgraded from CDP Private Cloud Base 7.1.6 to 7.1.7 or 7.1.8 and then to 7.1.9, the upgrade fails.
None. Cloudera recommends you to reach out to the Support before performing the upgrade to CDP Private Cloud Base 7.1.9.
CDPD-62254: Ozone is not supported on SLES15 with CHF1.
If your cluster has Ozone, Cloudera recommends you to not upgrade to 7.1.9 CHF1.
QAINFRA-18371: Conflict while installing libmysqlclient-devel on SLES 15
You may see an error such as the following while installing the mysql-devel and libmysqlclient-devel packages for setting up MariaDB as a backend database on SLES 15: File /usr/bin/mariadb_config from install of MariaDB-devel-<version>.x86_64 conflicts with file from install of libmariadb-devel-3.1.21-150000.3.33.3.x86_64 (SLES Module Server Applications Updates)
While installing the mysql-devel and libmysqlclient-devel packages on SLES15, use the --replacefiles zypper switch or manually enter yes on the interactive pop-up that you see when the files are being overwritten.
CDPD-62834: Status of the deleted table is seen as ACTIVE in Atlas after the completion of navigator2atlas migration process
The status of the deleted table displays as ACTIVE.
None
CDPD-62837: During the navigator2atlas process, the hive_storagedesc is incomplete in Atlas
For the hive_storagedesc entity, some of the attributes are not getting populated.
None
CDPD-63690: RuntimeException encountered when generating snapshotDiff report between 2 snapshots
When snapshot feature is enabled, KeyDeletingService, SSTFilteringService and SnapDiff thread fall into a deadlock when accessing Snapshot Cache.
Restart the Ozone Manager.
CDPD-64238: Snapshot diff request failing when setting ozone.om.snapshot.db.max.open.files=-1
When snapshot feature is enabled, KeyDeletingService, SSTFilteringService and SnapDiff thread fall into a deadlock when accessing Snapshot Cache.
Restart the Ozone Manager.
OPSAPS-69481: Some Kafka Connect metrics missing from CM due to conflicting definitions
The metric definitions for kafka_connect_connector_task_metrics_batch_size_avg and kafka_connect_connector_task_metrics_batch_size_max in recent Kafka CSDs conflict with previous definitions in other CSDs. This prevents CM from registering these metrics. It also results in SMM returning an error. The metrics also cannot be monitored in CM chart builder or queried using the CM API.
Contact Cloudera support for a workaround.