Fixed Issues in Cloudera Manager 7.13.1

Fixed issues in Cloudera Manager 7.13.1.

OPSAPS-72254: FIPS Failed to upload Spark example jar to HDFS in cluster mode

Fixed an issue with deploying the Spark 3 Client Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-env.sh.

For more information, see Added a new Cloudera Manager configuration parameter spark_pyspark_executable_path to Livy for Spark 3 in Behavioral Changes In Cloudera Manager 7.13.1.

OPSAPS-71873 - UCL | CKP4| livyfoo0 kms proxy user is not allowed to access HDFS in 7.3.1.0
In the kms-core.xml file, the Livy proxy user is taken from Livy for Spark 3's configuration in Cloudera 7.3.1 and above.
OPSAPS-70976: The previously hidden real-time monitoring properties are now visible in the Cloudera Manager UI:
The following properties are now visible in the Cloudera Manager UI:
  • enable_observability_real_time_jobs
  • enable_observability_metrics_dmp
OPSAPS-69996: HBase snapshot creation in Cloudera Manager does not work as expected
During the HBase snapshot creation process, the snapshot create command sometimes tries to create the same snapshot twice because of an unhandled OptimisticLockException during the database write operation. This resulted in intermittent HBase snapshot creation failures. The issue is fixed now.
OPSAPS-66459: Enable concurrent Hive external table replication policies with the same cloud root
When the HIVE_ALLOW_CONCURRENT_REPLICATION_WITH_SAME_CLOUD_ROOT_PATH feature flag is enabled, Replication Manager can run two or more Hive external table replication policies with the same cloud root path concurrently.

For example, if two Hive external table replication policies have s3a://bucket/hive/data as the cloud root path and the feature flag is enabled, Replication manager can run these policies concurrently.

By default, this feature flag is disabled. To enable the feature flag, contact your Cloudera account team.

OPSAPS-70859: Ranger metrics APIs were not working on FedRAMP cluster
On FedRAMP HA cloud cluster, Ranger metrics APIs were not working.This issue is fixed now by introducing new Ranger configurations.

This issue is fixed now by introducing new Ranger configurations.

OPSAPS-71436: Telemetry publisher test Altus connection fails
An error occurred while running the test Altus connection action for Telemetry Publisher. This issue is fixed now.
OPSAPS-68252: The Ranger RMS Database Full Sync command is not visible on cloud clusters
The Ranger RMS Database Full Sync command was not visible on any cloud cluster. Also, it was needed to investigate the minimum user privilege required to see the Ranger RMS Database Full Sync command on the UI.
The issue is fixed now. The command definition on service level in Ranger RMS has been updated after which the command is visible on the UI. The minimum user privilege required to see this command is EnvironmentAdmin.
OPSAPS-69692, OPSAPS-69693: Included filters for Ozone incremental replication in API endpoint
You can use the include filters in the POST /clusters/{clusterName}/services/{serviceName}/replications API to replicate only the filtered part of the Ozone bucket. You can use multiple path regular expressions to limit the data to be replicated for an Ozone bucket. For example, if you include the /path/to/data/.* and .*/data filters in the includeFilter field for the POST endpoint, the Ozone replication policy replicates only the keys that start with /path/to/data/.* or ends with .*/data in the Ozone bucket.
OPSAPS-70561: Improved page load performance of the “Bucket Browser” tab.
The Cloudera Manager > Clusters > [***OZONE SERVICE***] > Bucket Browser tab does not load all the entries of the bucket. Therefore, the page loads faster when you try to display the content of a large bucket with several keys in it.
OPSAPS-71090: The spark.*.access.hadoopFileSystems gateway properties are not propagated to Livy.
Added new properties for configuring Spark 2 (spark.yarn.access.hadoopFileSystems) and Spark 3 (spark.kerberos.access.hadoopFileSystems) that propagate to Livy.
OPSAPS-71271: The precopylistingcheck script for Ozone replication policies uses the Ozone replication safety valve value.
The "Run Pre-Filelisting Check" step during Ozone replication uses the content of the ozone_replication_core_site_safety_valve" property value to configure the Ozone client for the source and the target Cloudera Manager.
OPSAPS-70983: Hive replication command for Sentry to Ranger replication works as expected
The Sentry to Ranger migration during the Hive replication policy run from CDH 6.3.x or higher to CDP Public Cloud 7.3.0.1 or higher is successful.
OPSAPS-69806: Collection of YARN diagnostic bundle will fail

For any combinations of CM 7.11.3 version up to CM 7.11.3 CHF7 version, with CDP 7.1.7 through CDP 7.1.8, collection of the YARN diagnostic bundle will fail, and no data transmits occur.

Now the changes are made to Cloudera Manager to allow the collection of the YARN diagnostic bundle and make this operation successful.

OPSAPS-70655: The hadoop-metrics2.properties file is not getting generated into the ranger-rms-conf folder
The hadoop-metrics2.properties file was getting created in the process directory conf folder, for example, conf/hadoop-metrics2.properties, whereas the directory structure in Ranger RMS should be {process_directory}/ranger-rms-conf/hadoop-metrics2.properties.
The issue is fixed now. The directory name is changed from conf to ranger-rms-conf, so that the hadoop-metrics2.properties file gets created under the correct directory structure.
OPSAPS-71014: Auto action email content generation failed for some cluster(s) while loading the template file

The issue has been fixed by using a more appropriate template loader class in the freemarker configuration.

OPSAPS-70826: Ranger replication policies fail when target cluster uses Dell EMC Isilon storage and supports JDK17

Ranger replication policies no longer fail if the target cluster is deployed with Dell EMC Isilon storage and also supports JDK17.

OPSAPS-70861: HDFS replication policy creation process fails for Isilon source clusters

When you choose a source CDP Private Cloud Base cluster using the Isilon service and a target cloud storage bucket for an HDFS replication policy in CDP Private Cloud Base Replication Manager UI, the replication policy creation process fails. This issue is fixed now.

OPSAPS-70708: Cloudera Manager Agent not skipping autofs filesystems during filesystem check

Clusters in which there are a large number of network mounts on each host (for example, more than 100 networked file system mounts), cause the startup of Cloudera Manager Agent to take a long time, on the order of 10 to 20 seconds per mount point. This is due to the OS kernel on the cluster host interrogating each network mount on behalf of the Cloudera Manager Agent to gather monitoring information such as file system usage.

This issue is fixed now by adding the ability in the Cloudera Manager Agent's config.ini file to disable filesystem checks.

OPSAPS-68991: Change default SAML response binding to HTTP-POST

The default SAML response binding is HTTP-Artifact, rather than HTTP-POST. While HTTP-POST is designed for handling responses through the POST method, where as HTTP-Artifact necessitates a direct connection with the SP (Cloudera Manager in this case) and Identity Provider (IDP) and is rarely used. HTTP-POST should be the default choice instead.

This issue is fixed now by setting up the new Default SAML Binding to HTTP-POST.

OPSAPS-40169: Audits page does not list failed login attempts on applying Allowed = false filter

The Audits page in Cloudera Manager shows failed login attempts when no filter is applied. However, when the Allowed = false filter is applied it returns 0 results. Whereas it should have listed those failed login attempts. This issue is fixed now.

OPSAPS-70583: File Descriptor leak from Cloudera Manager 7.11.3 CHF3 version to Cloudera Manager 7.11.3 CHF7

Unable to create NettyTransceiver due to Avro library upgrade which leads to File Descriptor leak. File Descriptor leak occurs in Cloudera Manager when a service tries to talk with Event Server over Avro. This issue is fixed now.

OPSAPS-70962: Creating a cloud restore HDFS replication policy with a peer cluster as destination which is not supported by Replication Manager

During the HDFS replication policy creation process, incorrect Destination clusters and MapReduce services appear which when chosen creates a dummy replication policy to replicate from a cloud account to a remote peer cluster. This scenario is not supported by Replication Manager. This issue is now fixed.

OPSAPS-71108: Use the earlier format of PCR

You can use the latest version of the PCR (Post Copy Reconciliation) script, or you can restore PCR to the earlier format by setting the com.cloudera.enterprise.distcp.post-copy-reconciliation.legacy-output-format.enabled=true key value pair in the Cloudera Manager > Clusters > HDFS service > Configuration > hdfs_replication_hdfs_site_safety_valve property.

OPSAPS-70689: Enhanced performance of DistCp CRC check operation
When a MapReduce job for an HDFS replication policy job fails, or when there are target-side changes during a replication job, Replication Manager initiates the bootstrap replication process. During this process, a cyclic redundancy check (CRC) check is performed by default to determine whether a file can be skipped for replication.

By default, the CRC for each file is queried by the mapper (running on the target cluster) from the source cluster's NameNode. The round trip between the source and target cluster for each file consumes network resources and raises the cost of execution. To improve the performance, you can set the following variables to true, on the target cluster, to improve the performance of the CRC check for the Cloudera Manager > Clusters > HDFS service > Configuration > HDFS_REPLICATION_ENV_SAFETY_VALVE property:

  • ENABLE_FILESTATUS_EXTENSIONS
  • ENABLE_FILESTATUS_CRC_EXTENSIONS

By default, these are set to false.

After you set the key-value pairs, the CRC for each file is queried locally from the NameNode on the source cluster and copied over to the target cluster at the end of the replication process, which reduces the cost because round trip is between two nodes of the same cluster. The CRC checksums are written to the file listing files.

OPSAPS-70685: Post Copy Reconciliation (PCR) for HDFS replication policies between on-premises clusters
To add the Post Copy Reconciliation (PCR) script to run as a command step during the HDFS replication policy job run, you can enter the SCHEDULES_WITH_ADDITIONAL_DEBUG_STEPS = [***ENTER COMMA-SEPARATED LIST OF NUMERICAL IDS OF THE REPLICATION POLICIES***] key-value pair in the target Cloudera Manager > Clusters > HDFS service > hdfs_replication_env_safety_valve property.
To run the PCR script on the HDFS replication policy, use the /clusters/[***CLUSTER NAME***]>/services/[***SERVICE***]/replications/[***SCHEDULE ID***]/postCopyReconciliation API.
For more information about the PCR script, see How to use the post copy reconciliation script for HDFS replication policies.
OPSAPS-70188: Conflicts field missing in ParcelInfo

Fixed an issue in parcels where conflicts field in manifest.json would mark a parcel as invalid

OPSAPS-70248: Optimize Impala Graceful Shutdown Initiation Time
This issue is resolved by streamlining the shutdown initiation process, reducing delays on large clusters.
OPSAPS-70157: Long-term credential-based GCS replication policies continue to work when cluster-wide IDBroker client configurations are deployed
Replication policies that use long-term GCS credentials work as expected even when cluster-wide IDBroker client configurations are configured.
OPSAPS-70422: Change the “Run as username(on source)” field during Hive external table replication policy creation
You can use a different user other than hdfs for Hive external table replication policy run to replicate from an on-premises cluster to the cloud bucket if the USE_PROXY_USER_FOR_CLOUD_TRANSFER=true key-value pair is set for the source Cloudera Manager > Clusters > Hive service > Configuration > Hive Replication Environment Advanced Configuration Snippet (Safety Valve) property. This is applicable for all external accounts other than IDBroker external account.
OPSAPS-70460: Allow white space characters in Ozone snapshot-diff parsing
Ozone incremental replication no longer fails if a changed path contains one or more space characters.
OPSAPS-70594: Ozone HttpFS gateway role is not added to Rolling Restart
This issue is now resolved by adding the Ozone HttpFS gateway role to the Rolling Restart.
OPSAPS-68752: Snapshot-diff delta is incorrectly renamed/deleted twice during on-premises to cloud replication
The snapshots created during replication are deleted twice instead of once, which results in incorrect snapshot information. This issue is fixed. For more information, see Cloudera Customer Advisory 2023-715: Replication Manager may delete its snapshot information when migrating from on-prem to cloud.
OPSAPS-70226: Atlas uses the Solr configuration directory available in ATLAS_PROCESS/conf/solr instead of the Cloudera Manager provided directory
Atlas uses the configuration in /var/run/cloudera-scm-agent/process/151-atlas-ATLAS_SERVER/solrconf.xml.
OPSAPS-68112: Atlas diagnostic bundle should contain server log, configurations, and, if possible, heap memories
The diagnostic bundle contains server log, configurations, and heap memories in a GZ file inside the diagnostic .zip package.
OPSAPS-69921: ATLAS_OPTS environment variable is set for FIPS with JDK 11 environments to run the import script in Atlas
_JAVA_OPTIONS are populated with additional parameters as seen in the following:
java_opts = 'export _JAVA_OPTIONS="-Dcom.safelogic.cryptocomply.fips.approved_only=true ' \
'--add-modules=com.safelogic.cryptocomply.fips.core,' \
'bctls --add-exports=java.base/sun.security.provider=com.safelogic.cryptocomply.fips.core ' \
'--add-exports=java.base/sun.security.provider=bctls --module-path=/cdep/extra_jars ' \
'-Dcom.safelogic.cryptocomply.fips.approved_only=true -Djdk.tls.ephemeralDHKeySize=2048 ' \
'-Dorg.bouncycastle.jsse.client.assumeOriginalHostName=true -Djdk.tls.trustNameService=true" '
OPSAPS-71258: Kafka, SRM, and SMM cannot process messages compressed with Zstd or Snappy if /tmp is mounted as noexec
The issue is fixed by using JVM flags that point to a different temporary folder for extracting the native library.
OPSAPS-69481: Some Kafka Connect metrics missing from Cloudera Manager due to conflicting definitions
Cloudera Manager now registers the metrics kafka_connect_connector_task_metrics_batch_size_avg and kafka_connect_connector_task_metrics_batch_size_max correctly.
OPSAPS-68708: Schema Registry might fail to start if a load balancer address is specified in Ranger
Schema Registry now always ensures that the address it uses to connect to Ranger ends with a trailing slash (/). As a result, Schema Registry no longer fails to start if Ranger has a load balancer address configured that does not end with a trailing slash.
OPSAPS-69978: Cruise Control capacity.py script fails on Python 3
The script querying the capacity information is now fully compatible with Python 3.
OPSAPS-64385: Atlas's client.auth.enabled configuration is not configurable
In customer environments where user certifications are required to authenticate to services, the Apache Atlas web UI will constantly prompt for certifications. To solve this, the client.auth.enabled parameter is set to true by default. If it is needed to set it false, then you need to override the setting from safety-valve with a configuration snippet. Once it set to false, then no more certificate prompts will be displayed.
OPSAPS-71089: Atlas's client.auth.enabled configuration is not configurable
In customer environments where user certifications are required to authenticate to services, the Apache Atlas web UI will constantly prompt for certifications. To solve this, the client.auth.enabled parameter is set to true by default. If it is needed to set it false, then you need to override the setting from safety-valve with a configuration snippet. Once it set to false, then no more certificate prompts will be displayed.
OPSAPS-71677: When you are upgrading from CDP Private Cloud Base 7.1.9 SP1 to CDP Private Cloud Base 7.3.1, upgrade-rollback execution fails during HDFS rollback due to missing directory.
This issue is now resolved. The HDFS meta upgrade command is executed by creating the previous directory due to which the rollback does not fail.
OPSAPS-71390: COD cluster creation is failing on INT and displays the Failed to create HDFS directory /tmp error.
This issue is now resolved. Export options for jdk17 is added.
OPSAPS-71188: Modify default value of dfs_image_transfer_bandwidthPerSec from 0 to a feasible value to mitigate RPC latency in the namenode.
This issue is now resolved.
OPSAPS-58777: HDFS Directories are created with root as user.
This issue is now resolved by fixing service.sdl.
OPSAPS-71474: In Cloudera Manager UI, the Ozone service Snapshot tab displays label label.goToBucket and it must be changed to Go to bucket.
This issue is now resolved.
OPSAPS-70288: Improvements in master node decommissioning.
This issue is now resolved by making usability and functional improvements to the Ozone master node decommissioning.
OPSAPS-71647: Ozone replication fails for incompatible source and target Cloudera Manager versions during the payload serialization operation
Replication Manager now recognizes and annotates the required fields during the payload serialization operation. For the list of unsupported Cloudera Manager versions that do not have this fix, see Preparing clusters to replicate Ozone data.
OPSAPS-71156: PostCopyReconciliation ignores mismatching modification time for directories
The Post Copy Reconciliation script (PCR) script does not check the file length, last modified time, and cyclic redundancy check (CRC) checksums for directories (paths that are directories) on both the source and target clusters.
OPSAPS-70732: Atlas replication policies no longer consider inactive Atlas server instances
Replication Manager considers only the active Atlas server instances during Atlas replication policy runs.
OPSAPS-70924: Configure Iceberg replication policy level JVM options
You can add replication-policy level JVM options for the export, transfer, and sync CLIs for Iceberg replication policies on the Advanced tab in the Create Iceberg Replication Policy wizard.
OPSAPS-70657: KEYTRUSTEE_SERVER & RANGER_KMS_KTS migration to RANGER_KMS from CDP 7.1.x to UCL
KEYTRUSTEE_SERVER and RANGER_KMS_KTS services are not supported starting from the CDP 7.3.1 release. Therefore added validation and confirmation messages to the CM upgrade wizard to alert the user to migrate KEYTRUSTEE_SERVER keys to RANGER_KMS before upgrading to CDP 7.3.1 release.
OPSAPS-70656: Remove KEYTRUSTEE_SERVER & RANGER_KMS_KTS from CM for UCL
The Keytrustee components - KEYTRUSTEE_SERVER and RANGER_KMS_KTS services are not supported starting from the CDP 7.3.1 release. These services cannot be installed or managed with CM 7.13.1 using CDP 7.3.1.
OPSAPS-67480: In 7.1.9, default Ranger policy is added from the cdp-proxy-token topology, so that after a new installation of CDP-7.1.9, the knox-ranger policy includes cdp-proxy-token. However, upgrades do not add cdp-proxy-token to cm_knox policies automatically.
This issue is fixed now.
OPSAPS-70838: Flink user should be add by default in ATLAS_HOOK topic policy in Ranger >> cm_kafka
The "flink" service user is granted publish access on the ATLAS_HOOK topic by default in the Kafka Ranger policy configuration.
OPSAPS-69411: Update AuthzMigrator GBN to point to latest non-expired GBN
Users will now be able to export sentry data only for given Hive objects (databases and tables and the respective URLs) by using the config "authorization.migration.export.migration_objects" during export.
OPSAPS-68252: "Ranger RMS Database Full Sync" option was not visible on mow-int cluster setup for hrt_qa user (7.13.0.0)
The fix makes the command visible on cloud clusters when the user has minimum EnvironmentAdmin privilege.
OPSAPS-70148: Ranger audit collection creation is failing on latest SSL enabled UCL cluster due to zookeeper connection issue
Added support for secure ZooKeeper connection for the Ranger Plugin Solr audit connection configuration xasecure.audit.destination.solr.zookeepers.
OPSAPS-52428: Add SSL to ZooKeeper in CDP
Added SSL/TLS encryption support to CDP components. ZooKeeper SSL (secure) port now gets automatically enabled and components communicate on the encrypted channel if cluster has AutoTLS enabled.
OPSAPS-72093: FIPS - yarn jobs are failing with No key provider is configured
The yarn.nodemanager.admin environment must contain the FIPS related Java options, and this configuration is handled such that the comma is a specific character in the string. This change proposes to use single module additions in the default FIPS options (use separate --add-modules for every module), and it adds the FIPS options to the yarn.nodemanager.admin environment.

Previously, yarn.nodemanager.container-localizer.admin.java.opts contained FIPS options only for 7.1.9, this patch also fixes this, and adds the proper configurations in 7.3.1 environments also.

This was tested on a real cluster, and with the current changes YARN works properly, and can successfully run distcp from/to encryption zones.

OPSAPS-70113: Fix the ordering of YARN admin ACL config
The YARN Admin ACL configuration in Cloudera Manager shuffled the ordering when it was generated. This issue is now fixed, so that the input ordering is maintained and correctly generated.