You must be aware of the known issues and limitations, the areas of impact, and
workaround in Cloudera Manager 7.13.2 and its cumulative hotfixes.
Known issues identified in Cloudera Manager 7.13.2
OPSAPS-77052: Ozone DataNode decommission command stuck for
more than 4 hours
7.13.2
Ozone DataNode decommissioning can appear stuck in Cloudera Manager while the actual decommissioning is successful. This
occurs due to a bug in the monitoring script, where a loose grep expression causes the
script to wait indefinitely.
Administrators can manually monitor the DataNode
decommission state using the Storage Container Manager (SCM) Web UI or CLI. Once all
desired DataNodes are confirmed as decommissioned, the decommission command in Cloudera Manager can be safely aborted.
OPSAPS-76455: The Stop command failed on Ozone S3 Gateway
service
7.13.2
When a restart is performed on the Ozone S3 Gateway, the
java.lang.IllegalStateException: Singleton not set for
STATIC_INSTANCE exception occurs. This exception originates from the JBoss
Weld bootstrap process when the CDI registry fails to reinitialize the static
singleton provider correctly during the restart cycle.
Avoid the restart operation. Instead, perform a manual
stop followed by a start.
When the ozone.replication property is
exposed in Cloudera Manager Ozone configurations and assigned a
default value, it unintentionally overrides the bucket-level replication configuration
as a client-side setting—even if the user does not intend to set client-side
configurations.
None
OPSAPS-76845: Last Page button on the Bucket Browser tab is
not functional
7.13.2
In Cloudera Manager UI, the Last
Page button in the Bucket Browser tab does not
function as expected. When users click the Last Page button,
the UI refreshes the current page instead of navigating to the last page of buckets.
The Next button continues to work as intended, allowing users
to move forward one page at a time. This issue is particularly noticeable when there
are a large number of buckets, as users must navigate page by page to reach the
end.
There is no direct workaround to enable the
Last Page button. However, to reduce the number of navigation
steps, users can increase the number of buckets displayed per page in the UI settings.
This adjustment allows more buckets to be viewed at once, minimizing the number of
pages to navigate.
OPSAPS-74844: Service Monitor fails to connect to multiple
clusters with distinct custom Kerberos principals
7.13.2
The Cloudera Manager Service Monitor does
not support unique Kerberos principal configurations across multiple clusters. The
following limitations apply to the Service Monitor:
You cannot apply different custom Kerberos settings to different clusters
managed by the same Service Monitor.
You cannot connect to multiple clusters simultaneously if those clusters require
distinct custom Kerberos principals.
None
OPSAPS-76330: Missing request/process context ID in Atlas logs
after migration from log4j2 to logback using CM default pattern
7.13.2
After upgrading Apache Atlas logging from log4j2 to
logback, and using the default logback pattern provided by Cloudera Manager, Atlas logs
no longer consistently include the request/process context ID (for example, etp<timestamp>-<pid>
- <uuid>). This results in a regression compared to earlier behavior with
log4j2, where the context ID was consistently present for request-scoped operations. The
missing context information makes troubleshooting and tracing individual requests more
difficult.
None
OPSAPS-75366: The Knox Gateway gateway.log.gz
file in the support bundle is corrupt
7.13.2
When you collect diagnostic data, the Knox Gateway
gateway.log.gz file under
logs/[***HOST
NAME***]/ in the downloaded bundle might have
0-byte length. The file does not contain the Knox Gateway logs even when
gateway.log on the host has content.
In the diagnostic bundle, open the
service-diagnostics/[***CLUSTER NAME***]/[***KNOX SERVICE NAME***]/
folder. The Knox Gateway role diagnostics archive there includes the gateway logs (for
example gateway.log and gateway-audit.log).
OPSAPS-75684: Spark fails due to Zookeeper Custom Kerberos Principal issue
Incorrect Zookeeper principal configuration and missing JVM property setup leads to SASL authentication failures.
When a custom Zookeeper principal is used, add the -Dzookeeper.sasl.client.username=[***USERNAME***] JVM argument to spark.*.defaultJavaOptions or spark.*.extraJavaOptions in spark-defaults.conf.
OPSAPS-75443: Hive Metastore Server fails to start after
memory reallocation
7.13.2
After executing the
/api/v57/hosts/reallocateMemory API, the Hive Metastore Server
(HMS) might fail to start with a "Not enough space" memory error. This issue occurs
even after the heap size is set to 8GB and typically appears following an Atlas Server
memory failure. The HMS service remains in a stopped state because it cannot allocate
the required memory resources.
None
OPSAPS-76683: Hive system database creation
7.13.2
The Hive system database creation blocks the upgrade process. When upgrading, the process is interrupted or blocked during the creation of the Hive system database.
None
OPSAPS-73421: Hive Metastore performance logging
7.13.2
The performance logger does not function as expected in the Hive Metastore. Performance logging (Perflogger) fails to record entries in the Hive Metastore (HMS) logs, even when the "Enable Performance Logging" flag is enabled in the Hive service configurations. The required logger is not correctly added to the loggers list in the logging properties.
None
OPSAPS-73237: Hive default heap sizes on Data Hubs
7.13.2
The default Java heap sizes for Hive Metastore (HMS) and HiveServer2 (HS2) are too large for certain Data Hub configurations. On Data Hub clusters, such as those with a 64 GB environment, the default Java heap size for Hive Metastore and HiveServer2 is automatically configured to 16 GB. This high allocation can lead to memory overcommitment and leave insufficient memory for other cluster processes.
You can manually reduce the Java heap size for Hive
services to 8 GB or lower in the Cloudera Manager
configuration.
OPSAPS-75673: Wrong enablement of Ranger RMS Database Full
Sync command
7.1.8, 7.1.9, 7.1.9 SP1, 7.2.18, 7.3.1,
7.3.2.0
The Ranger RMS Database Full Sync
command should be enabled only when all RMS server instances are stopped. This is
required to ensure that the RMS database synchronizes correctly without introducing
conflicts or data corruption. However, when HA (High Availability) is enabled on the
cluster, the command becomes available from Cloudera Manager > Ranger RMS > Actions drop-down, even though only one Ranger RMS instance is stopped while
the others are still running.
None.
OPSAPS-73684: Service startup failures during High Availability deployment
7.13.2
During High Availability (HA) cluster deployments, Impala services can fail to start due to dependent services not having fully started.
Retry of HA deployment might succeed in some cases.
OPSAPS-76959: Stale alternatives temporary files cause client
configuration deployment failure
7.13.2
An interrupted Cloudera Manager upgrade or Agent restart can
leave stale temporary files in the /var/lib/alternatives/
directory. These leftover files prevent subsequent Deploy Client
Configuration tasks from completing, as the
update-alternatives command fails when it encounters an existing
.new state file.
During a Cloudera Manager upgrade or configuration activation, a
race condition or a forced service restart (such as a SIGTERM/Exit Code
-15) can interrupt the update-alternatives process. This
interruption leaves behind a stale temporary file, typically named
/var/lib/alternatives/<alt_name>.new.
When you later attempt to Deploy Client Configuration, the
Agent tries to run update-alternatives --install. The command fails
because the operating system detects the pre-existing .new file and
returns a non-zero exit code (usually Exit Code 2). Cloudera Manager then reports a command failure similar to:
"client configuration ... exited with 2 and expected 0"
This issue is typically host-specific rather than cluster-wide.
If a deployment fails due to a stale alternatives state, manually clear the
temporary files on the affected host and retry the deployment from the Cloudera Manager UI.
Verify no active alternatives processes: SSH into the affected host and
ensure no other instances of the alternatives tool are currently
running:
pgrep -af 'update-alternatives|alternatives'
If the command returns no active processes, proceed to Step 2.
Identify the stale temporary files:List the temporary files to confirm
which entries are
blocked:
ls -l /var/lib/alternatives/*.new
To check a specific service (for example, HDFS or Hive),
use:
ls -l /var/lib/alternatives/<alt_name>*
Back up and remove the stale .new files: Create a
backup of the stale file in a temporary directory before deleting
it:
sudo cp -a /var/lib/alternatives/<alt_name>.new /var/tmp/<alt_name>.new.$(date +%s).bak
sudo rm -f /var/lib/alternatives/<alt_name>.new
Verify the alternatives state: Confirm the current status of the
link:
Retry the operation: Return to the Cloudera Manager UI
and re-run Deploy Client Configuration for the affected
service.
OPSAPS-76960: Missing symbolic links due to Agent restart race
condition during Cloudera Manager upgrade
7.13.2
During a Cloudera Manager upgrade, a race condition might prevent
the creation of critical parcel symbolic links (symlinks). If the
cloudera-scm-agent restarts while an
update-alternatives process is active, the system terminates the
task (Exit Code -15). This leaves symlinks unconfigured and causes
dependent services, such as Solr, to fail.
When you upgrade Cloudera Manager, the
cloudera-manager-agent package update triggers the
cloudera-scm-agent service to execute parcel activation tasks. If
an automated script or a user issues a systemctl restart
cloudera-scm-agent command while update-alternatives is
running, the operating system sends a SIGTERM (Signal 15) to
the Agent and its child processes.
This forceful termination interrupts the creation of essential symbolic links,
including:
/var/lib/hadoop-hdfs/ozone-filesystem-hadoop3.jar
/etc/alternatives/ozone-filesystem-hadoop3.jar
The Agent logs this failure as Exit Code: -15 in
/var/log/cloudera-scm-agent/cloudera-scm-agent.log. Because these
links are missing, dependent services cannot locate necessary libraries and fail to
start.
If services fail to start after an upgrade due to
missing alternatives or symbolic links (symlinks), manually complete the interrupted
activation steps on the affected host or use the Cloudera Manager
UI to reconcile the state.
Option 1: Manual fix through CLI
Verify the missing symlink: SSH into the affected host machine and check
the status of the failing JAR or symlink. For example:
If the output shows the link is missing or broken, proceed to Step 2.
Manually run the interrupted command:
Search the Agent log at
/var/log/cloudera-scm-agent/cloudera-scm-agent.log for
Exit Code: -15 to locate the failed
update-alternatives command immediately preceding that error.
Copy the full path to the parcel library from that log entry.
For the ozone-filesystem-hadoop3.jar failure, run the
following installation command manually as the root user.:
Restart Services: After you verify the fix, restart the failing service
(such as Solr) through the Cloudera Manager UI.
Option 2: Cloudera Manager UI (Automatic Fix)
Instead of manual
CLI intervention, you can force Cloudera Manager to recreate all symbolic
links:
Navigate to the Hosts > Parcels page in the Cloudera Manager UI.
Select the affected parcel and click Activate again.
This process declaratively identifies and recreates any missing symbolic links
across all hosts the cluster.
CDPD-99248: Ozone upgrade finalization might fail in Cloudera Manager
7.13.2
When finalizing an Ozone upgrade for the first time through Cloudera Manager, the finalization command might report a failure in
the standard error (stderr) log, even though the process continues to run on the
Storage Container Manager (SCM).
During the initial Ozone upgrade finalization, Cloudera Manager
might return the following error
message:
Invalid response from Storage Container Manager.
Current finalization status is: FINALIZATION_IN_PROGRESS
This error occurs because Cloudera Manager fails to parse the
interim status response from the SCM. Despite the "failed" status in the Cloudera Manager interface, the SCM continues the finalization
process in the background.
If you encounter this error, do not restart the finalization command. Instead,
manually verify the progress directly on the SCM by following these steps:
Check the Status: Run the following command from the command line to
monitor the actual SCM finalization state:
ozone admin scm finalizationstatus
Wait for Completion: Monitor the output until it indicates that
finalization is complete.
Verify in Cloudera Manager: Once the CLI command confirms a successful
finalization, you can safely ignore the previous failure message in Cloudera
Manager and proceed with your post-upgrade tasks.
OPSAPS-76363: Knox gateway database connection properties are
not populated automatically when Oracle is the cluster database
7.13.2
When you run the Knox gateway in high availability (HA) and use JWT token features,
only MySQL and PostgreSQL are supported for the Knox gateway token database; Oracle
is not supported. Cloudera Manager populates the Knox gateway
database connection properties automatically only when you use MySQL or PostgreSQL as
the Knox gateway database. If you use Oracle instead, these properties are not
populated automatically.
Without those settings, JWT tokens might not behave consistently across Knox
gateway instances.
Use MySQL or PostgreSQL for the Knox gateway database when you run Knox in HA with
JWT token features.
OPSAPS-76116: Knox gateway database configuration might not be retained after upgrading a Cloudera Base on premises cluster
7.13.2
Knox database configuration properties knox_gateway_database_name,
knox_gateway_database_host,
knox_gateway_database_user, and
knox_gateway_database_password might not be retained after you upgrade
a Cloudera Base on premises cluster.
None
OPSAPS-76528: Ozone services enforce IPv4 in DUAL_STACK
configuration
7.13.2
In Cloudera Base on premises 7.3.2.0, Ozone must be able to operate correctly in a DUAL-STACK environment, but
you will not have control over whether Ozone services communicate in IPv4-only mode or
in dual-stack mode.
None
OPSAPS-75735: Systemd disables Cgroup v2 Controllers written
by Cloudera Manager Agent
When you enable Cgroup v2, Cloudera Manager services might fail
to start because specific controllers (such as cpu,
memory, or pids) are missing or not enabled at
the root.
Although the Cloudera Manager Agent writes controllers to the
root subtree during startup, systemd (the cgroup manager for most
modern Linux distributions) frequently disables controllers from the delegation tree
if a service does not actively use them. On specific Linux distributions,
systemd's strict enforcement of this cleanup prevents the Cloudera Manager Agent from maintaining the necessary environment
for service sub-processes.
If you encounter an error stating that a controller is "missing" or "not enabled at
root," follow these steps to restore and persist the controllers.
Restart the Cloudera Manager Agent on the affected hosts to
force it to rewrite the controllers to the root
subtree.
sudo systemctl restart cloudera-scm-agent
Try
starting the affected services again. If the issue persists, proceed to the
next step.
Configure Systemd Delegation by performing the following
steps:
Explicitly instruct systemd to delegate cgroup
controllers to the Cloudera Manager Agent process. This
ensures the controllers remain available at the root level regardless of
systemd's cleanup policies.
Open the unit file: Use a text editor to open the Cloudera Manager Agent service unit file:
/usr/lib/systemd/system/cloudera-scm-agent.service
Add the Delegate parameter: Locate the [Service]
section. Add Delegate=yes to the
configuration.
OPSAPS-76314: Cloudera Management Service restart fails on
large clusters due to Cloudera Manager Descriptor Fetch Timeout
On large-scale deployments, the Cloudera Management Service might fail to start or
restart correctly with the following
error:
2026-01-03 04:43:05,600 INFO com.cloudera.cmf.BasicScmProxy: Authenticated to SCM.
2026-01-03 04:43:16,074 WARN com.cloudera.cmf.BasicScmProxy: Timed out while fetching the SCM descriptor. This can happen on large clusters. Timeout can be increased by configuring Descriptor Fetch Timeout under Administration > Settings.
2026-01-03 04:43:16,075 WARN com.cloudera.cmf.eventcatcher.server.EventCatcherService: No descriptor fetched from https://ip-10-129-36-226.iopscloud.cloudera.com:7183 on after 1 tries, sleeping for 2 secs.
This occurs because the Cloudera Manager Descriptor Fetch
Timeout defaults to 10 seconds, which is often insufficient for the
Cloudera Manager Server to generate and transmit the full cluster descriptor to
Cloudera Management Service roles like the Event Server or Host Monitor in
high-scale environments.
Reaching this timeout causes the service to log a warning and fail initialization.
Consequently, the Event Catcher enters a loop, unable to retrieve the necessary
configuration.
f you encounter service startup failures on large clusters, manually increase the
fetch timeout through the Cloudera Manager Admin Console:
Log in to the Cloudera Manager Admin Console.
Navigate to Administration > Settings.
Search for the parameter: Cloudera Manager Descriptor Fetch
Timeout.
Increase the value from the default 10 seconds to
60 seconds.
Click Save Changes.
Restart the Cloudera Management Service.
OPSAPS-75899: HDFS directory creation fails on JDK 11 or
higher when LDAP or Active Directory integrated clusters using the
hadoop.security.group.mapping property.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300,
7.13.1.400, 7.13.1.500, and 7.13.1.600
Due to this issue, many critical Cloudera Manager operations fail to complete, such
as:
You must perform the following workaround steps to manually add specific Java
modules to the HDFS service script on all nodes in the cluster:
Navigate to the directory
/opt/cloudera/cm-agent/service/hdfs/.
Open the hdfs.sh file for editing.
Locate the JAVA17_ADDITIONAL_JVM_ARGS variable.
Append the flags
(--add-exports=java.naming/com.sun.jndi.ldap=ALL-UNNAMED
--add-opens=java.naming/com.sun.jndi.ldap=ALL-UNNAMED) to the end of
the existing list, and update JAVA17_ADDITIONAL_JVM_ARGS
variable to include the following
flags:
OPSAPS-74066, OPSAPS-74547: DataHub high memory consumption
on Hiveserver load for JDK 17
7.13.2
In upgraded DataHub deployments, HiveServer might fail to start due to memory
overallocation. This occurs because Cloudera Manager does not account for memory
already assigned to Management Service roles when allocating memory for cluster
roles. This issue is fixed in fresh installations of Cloudera Manager 7.13.1.500.
The updated algorithm now correctly reallocates memory across all roles during
cluster setup.
To resolve this issue, use the following API to manually trigger the Cloudera Manager memory allocation algorithm on the host where both
HiveServer and management roles are running, and restart cluster to apply updated
memory
configs.:
API Endpoint: POST /api/v57/hosts/reallocateMemory
Include the host name (the host where HiveServer and management roles run) in the
API request body. This ensures that memory assignments are recomputed correctly,
taking all roles on the host into account.
OPSAPS-74668:
ozone.snapshot.deep.cleaning.enabled and
ozone.snapshot.ordered.deletion.enabled configs are missing with
Cloudera 7.1.9 SP1 CHF and Cloudera Manager 7.13.1
7.13.2
Two Ozone Manager configs are missing while using Cloudera 7.1.9 SP1 CHF after upgrading Cloudera Manager version from 7.11.3 to 7.13.1.400.
If you are using Cloudera 7.1.9 SP1 CHF, before upgrading Cloudera Manager version from
7.11.3 to 7.13.1.400, add the following configs to Ozone Manager Advanced
Configuration Snippet (Safety Valve) for ozone-conf/ozone-site.xml so
that Ozone Manager does not miss important config after the Cloudera Manager
upgrade:
Cloudera Manager might display a
false-positive error message: Port conflict detected: 8443 (Gateway Health HTTP
Port) is also used by: Knox Gateway during cluster installations. The
warning does not cause actual installation failures.
None
OPSAPS-74950: Ozone replication policies fail for Cloudera
Private Cloud Base 7.1.9 SP1 CHF11 clusters using Cloudera Manager
7.13.1.400
7.13.1.400
7.13.1.500, 7.13.2.0
Ozone replication policies for Ozone linked buckets
fail when the Cloudera Private Cloud Base 7.1.9 SP1 CHF11 source or target clusters
use Cloudera Manager 7.13.1.400.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters
with Cloudera Manager 7.13.1.500.
OPSAPS-72439: HDFS and Hive external tables replication
policies fail when using custom “krb5.conf” files for Cloudera Private Cloud Base
7.1.9 SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
The issue appears when the custom
krb5.conf file is not propagated to the required files, and you
are using Cloudera Private Cloud Base 7.1.9 SP1 CHF11 source or target clusters with
Cloudera Manager 7.13.1.400.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters
with Cloudera Manager 7.13.1.500, and complete the instructions in
step 13 in Using a custom Kerberos configuration
path.
OPSAPS-71459: Commands continue to run after Cloudera Manager
restart
If you are using Cloudera Private Cloud Base 7.1.9 SP1
CHF11 source or target clusters with Cloudera Manager 7.13.1.400,
remote replication commands continue to run endlessly even after a Cloudera Manager
restart operation.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters
with Cloudera Manager 7.13.1.500.
OPSAPS-73158, OPSAPS-74206: HDFS replication policies fail
when the policies prefetch the expired Kerberos ticket from the 'sourceTicketCache'
file for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
If you are using Cloudera Private Cloud Base 7.1.9 SP1
CHF11 source or target clusters with Cloudera Manager 7.13.1.400,
Replication Manager pre-fetches the Kerberos ticket from the
sourceTicketCache file for the replication policies. Issues
appear when the file contains an expired Kerberos ticket.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters
with Cloudera Manager 7.13.1.500.
OPSAPS-73405, OPSAPS-71565, OPSAPS-72860, OPSAPS-72859:
Replication policies fail even after the source or target cluster becomes available
after it recovers from temporary node failures for Cloudera Private Cloud Base 7.1.9
SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
If you are using Cloudera Private Cloud Base 7.1.9 SP1
CHF11 source or target clusters with Cloudera Manager 7.13.1.400,
Hive replication policies and HBase replication policies fail even after the source or
target cluster recovers from a temporary node failure.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters
with Cloudera Manager 7.13.1.500.
OPSAPS-73655, OPSAPS-73737: Cloud replication fails even after
the delegation token is issued for Cloudera Private Cloud Base 7.1.9 SP1 CHF11
clusters using Cloudera Manager 7.13.1.400
If you are using Cloudera Private Cloud Base 7.1.9 SP1
CHF11 source or target clusters with Cloudera Manager 7.13.1.400,
the replication policies fail during an incremental replication run if you chose the Advanced Setting > Delete Policy > Delete permanently option during the replication policy creation process.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters
with Cloudera Manager 7.13.1.500.
OPSAPS-74040, OPSAPS-74058: Ozone OBS replication fails due to
pre-filelisting check failure for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters
using Cloudera Manager 7.13.1.400
7.13.1.400
7.13.1.500, 7.13.2.0
If you are using Cloudera Private Cloud Base 7.1.9 SP1
CHF11 source or target clusters with Cloudera Manager 7.13.1.400 and
the source bucket is a linked bucket, then the replication fails during the
Run Pre-Filelisting Check step for OBS-to-OBS Ozone replication,
and the error message Source bucket is a linked bucket, however the bucket it
points to is also a link appears. This issue appears even when the source
bucket is directly linked to a regular, non-linked bucket.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters
with Cloudera Manager 7.13.1.500.
OPSAPS-73602, OPSAPS-74353: HDFS replication policies to cloud
fails with HTTP 400 error for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters
using Cloudera Manager 7.13.1.400
7.13.1.100, 7.13.1.200, 7.13.1.300,
7.13.1.400
7.13.1.500, 7.13.2.0
If you are using Cloudera Private Cloud Base 7.1.9 SP1
CHF11 source or target clusters with Cloudera Manager 7.13.1.400,
the HDFS replication policies to cloud fail after you edit the replication policies in
the Cloudera Manager > Replication Manager UI.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters
with Cloudera Manager 7.13.1.500.
OPSAPS-73645, OPSAPS-73847: Ozone bucket browser does not show
the volume buckets for Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters using Cloudera Manager 7.13.1.400
7.13.1.100, 7.13.1.200, 7.13.1.300,
7.13.1.400
7.13.1.500, 7.13.2.0
If you are using Cloudera Private Cloud Base 7.1.9 SP1
CHF11 source or target clusters with Cloudera Manager 7.13.1.400,
the volume buckets do not appear if the number of volumes exceed 26, when you click on
Next Page on the Cloudera Manager > Clusters > Ozone service > Bucket Browser page and then on a volume name.
Use Cloudera Private Cloud Base 7.1.9 SP1 CHF11 clusters
with Cloudera Manager 7.13.1.500.
RELENG-27000: Proper link for
bigtop-detect-javahome is missing when using CDP Private Cloud Base
7.1.9 SP1 CHF5 with Cloudera Manager 7.13.1 CHF3.
When using Cloudera Manager 7.13.1
CHF3 with CDP Private Cloud Base 7.1.9 SP1 CHF5 results in inappropriate
bigtop-detect-javahome link.
Create a link under
/opt/cloudera/parcels/CDH/bin/bigtop-detect-javahome that
points to
/opt/cloudera/parcels/CDH/lib/bigtop-utils/bigtop-detect-javahome.
For
example:
After restarting the Cloudera Data hub, the services appears to be down in the
Cloudera Manager UI. The Cloudera Management Console reports a node failure error
for the master node.
The issue is caused by high memory usage due to the G1 garbage collector on Java
17, leading to insufficient memory issues and thereby moving the Cloudera clusters
to an error state.
Starting with Cloudera 7.3.1.0, Java 17 is the default runtime instead of Java 8,
and its memory management increases memory usage, potentially affecting system
performance. Clusters might report error states, and logs might show insufficient
memory exceptions.
To mitigate this issue and prevent startup failures after a Datahub restart, you
can perform either of the following actions, or both:
Reduce the Java heap size for affected services to prevent nodes from exceeding
the available memory.
Increase physical memory for on cloud or on-premises instances running the
affected services.
OPSAPS-74370: Knox's Save Alias -
IDBroker command fails due to missing variable declaration
Verify the addition using
/opt/cloudera/parcels/CDH/lib/knox/bin/knoxcli.sh list-alias --cluster
<CLUSTER_NAME>
For HA deployments, users must do it on every Knox hosts (whereas the
Save Alias command applies the change to all hosts
automatically).
OPSAPS-71669: The Continue option is
disabled on the Static Service Pools Review page, affecting the
functionality of Static Service Pools
7.13.1
7.13.1.100
The minimum and maximum I/O weight values for Cgroup v2 were incorrectly set to
100 and 1000, respectively, in Cloudera Manager 7.13.1.0. According to official Cgroup v2
documentation, the valid range should be 1 to
10,000. Due to this incorrect configuration range, the
Continue option on the Static Service Pools
Review page was disabled, preventing users from proceeding with pool
configuration.
This issue might occur on clusters running Cloudera Manager
7.13.1.0 with Cgroup v2 resource management when configuring or reviewing Static
Service Pools. After upgrading to Cloudera Manager 7.13.1.100
CHF-1, this issue no longer occurs.
None
OPSAPS-75290, OPSAPS-74994: The
yarn_enable_container_usage_aggregation job is failing with
“Null real user” error on Service Monitor.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300,
7.13.1.400, and 7.13.1.500
The
yarn_enable_container_usage_aggregation job is failing with
"Null real user" error on Service Mnitor when the Yarn
service is running on the computer cluster with Stub DFS, and when the Powerscale
Service is running in the cluster with Powerscale DFS provider instead of HDFS.
None.
OPSAPS-71581: Cloudera Manager Agent's
append_properties function fails with the realpath:
invalid option -- 'u' error when executed from service control
scripts.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300,
7.13.1.400, and 7.13.1.500
Errors appear on the standard error (stderr) log of
Cloudera Data Platform (CDP) services when you are attempting to trigger the
cloudera-config.sh script. The error log contains the following
message: realpath: invalid option -- 'u'. This is caused
by an incorrectly placed command-line flag in the script, which prevents some service
configurations from loading correctly.
To resolve this issue temporarily, you must perform the following workaround steps
on each agent node in the base cluster::
Navigate to the directory
/opt/cloudera/cm-agent/service/common/.
Open the cloudera-config.sh file for editing.
Locate the two lines that execute the python scripts such as
append_properties.py and
get_property.py.
In both lines, remove the -u flag or change its position to
after python to the end of the
line:
After saving the changes on all agent nodes, restart the entire cluster for
the new configuration to take effect.
Verify the fix by checking the stderr.log on a few
service instances to ensure the realpath: invalid option --
'u' error no longer appears.
OPSAPS-71878: Ozone fails to restart during cluster restart
and displays the error message: Service has only 0 Storage Container Manager
roles running instead of minimum required 1.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300,
7.13.1.400, and 7.13.1.500
You must open Cloudera Manager on the second browser and
restart the Ozone service separately.
After the Ozone service restarts, you can resume the cluster restart from the
first browser.
ENGESC-30503, OPSAPS-74868: Cloudera Manager
limited support for custom external repository requiring basic authentication
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, and
7.13.1.400
Current Cloudera Manager does not
support custom external repository with basic authentication (the Cloudera Manager Wizard supports either HTTP (non-secured)
repositories or usage of Cloudera https://archive.cloudera.com only).
In case customers want to use a custom external repository with basic authentication,
they might get errors.
The assumption is that you can access the external custom repository (such as Nexus
or JFrog, or others) using LDAP credentials. In case an applicative user is used to
fetch the external content (as done in Data Services with the docker imager
repository), the customer should ensure that this applicative user is located under
the user's base search path where the real users are being retrieved during LDAP
authentication check (so the external repository will find it and will allow it to
gain access for fetching the files).
Once done, you can use the current custom URL fields in the Cloudera Manager Wizard and enter the URL for the RPMs or
parcels/other files in the format of
"https://USERNAME:PASSWORD@server.example.com/XX".
While using the password, you are advised to use only the printable ASCII character
range (excluding space), whereas in case of a special character (not letter/number)
it can be replaced with HEX value (For example, you can replace
Aa1234$ with Aa1234%24 as '%24'
is translated into $ sign).
OPSAPS-72164: Proxy Settings and Telemetry Publisher in Cloudera Manager
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300, and
7.13.1.400
In Cloudera Manager 7.13.1, the PROXY
settings for the Telemetry Publisher (TP) are not functioning as expected. This may
impact the Telemetry Publisher's ability to communicate through a configured
proxy.
You must upgrade to Cloudera Manager
7.13.1 CHF5 (7.13.1.500) and higher.
OPSAPS-60726: Newly saved parcel URLs are not showing up in
the parcels page in the Cloudera Manager HA cluster.
Cgroup v2 support is enabled in CDP 7.1.9 SP1 CHF5 and higher versions. However, if
the user upgrades from Cloudera Manager 7.11.3.x to Cloudera Manager 7.13.1.x, and the environment is using cgroup v2,
the NodeManagers might fail to start during the cluster restart
after the Cloudera Manager 7.13.1.x upgrade.
To resolve this issue temporarily, you must perform the following steps:
Go to the YARN service page on the Cloudera Manager UI.
Navigate to the Configuration tab.
Search for NodeManager Advanced Configuration Snippet (Safety Valve)
for yarn-site.xml.
Due to a missing dependency caused by an incomplete build and packaging in certain
OS releases, the HMS (Hive Metastore) Canary health test fails, logging a
ClassNotFoundException in the Service Monitor log.
This problem relates to all deliveries using runtime cluster version 7.1.x or 7.2.x,
while the Cloudera Manager version is 7.13.1.x and the OS is NOT
RHEL8.
In case your OS is either RHEL 9 or SLES 15 or Ubuntu 2004 or Ubuntu 2204 and if
you install the Cloudera Manager 7.13.1.x version, then create a
symbolic link using root user privileges on the node that host the Service Monitor
service (cloudera-scm-firehose) at
/opt/cloudera/cm/lib/cdh71/cdh71-hive-client-7.13.1-shaded.jar,
pointing to
/opt/cloudera/cm/lib/cdh7/cdh7-hive-client-7.13.1-shaded.jar.
Restart the Service Monitor service post the change. This will allow the Service
Monitor to perform Canary testing correctly on the HMS (Hive Metastore) service.
OPSAPS-72706, OPSAPS-73188: Hive queries fail after upgrading
Cloudera Manager from 7.11.2 to 7.11.3 or
later
Upgrading Cloudera Manager from version
7.11.2 or earlier to 7.11.3 or later causes Hive queries to fail due to JDK17
restrictions. Some JDK8 options are deprecated, leading to inaccessible classes and
exceptions:
java.lang.reflect.InaccessibleObjectException: Unable to make field private volatile java.lang.String java.net.URI.string accessible
To resolve this issue:
In Cloudera Manager, go to Tez > Configuration
Append the following values to both tez.am.launch.cmd-opts and
tez.task.launch.cmd-opts:
Charts for HMS event APIs (get_next_notification,
get_current_notificationEventId, and fire_listener_event) are missing in Cloudera Manager > Hive > Hive Metastore Instance > Charts Library > API
Monitor HMS event activity using Hive Metastore
logs.
OPSAPS-72270: Start ECS
command fails on uncordon nodes step
7.13.1, 7.13.1.100, 7.13.1.200
7.13.1.300
In an ECS HA cluster, the server node restarts during the start up. This may cause
the uncordon step to fail.
To resolve this issue temporarily, you must perform the following steps:
Run the following command on the same node to verify whether the
kube-apiserver is
ready:
kubectl get pods -n kube-system | grep kube-apiserver
Resume the command from the Cloudera Manager UI.
OPSAPS-73225: Cloudera Manager Agent
reporting inactive/failed processes in Heartbeat request
7.13.1, 7.13.1.100, 7.13.1.200
7.13.1.300
As part of introducing Cloudera Manager 7.13.x, some changes were
done to the Cloudera Manager logging, eventually causing Cloudera Manager Agent to report on inactive/stale processes during
Heartbeat request.
As a result, the Cloudera Manager servers logs are getting filled
rapidly with these notifications though they do not have impact on service.
In addition, with adding the support for the Cloudera Observability
feature, some additional messages were added to the logging of the server. However,
in case the customer did not purchase the Cloudera Observability
feature, or the telemetry monitoring is not being used, these messages (which
appears as "TELEMETRY_ALTUS_ACCOUNT is not configured for Otelcol" are filling the
server logs and preventing proper follow-up on the server activities).
This will be fixed in a later release by moving these log notifications to DEBUG
level so they don't appear on the Cloudera Manager server logs.
Until that fix, perform the following workaround to filter out these messages.
On each of the Cloudera Manager servers, update with root
credentials the file /etc/cloudera-scm-server/log4j.properties
and add the following lines at the end of the
file:
# === Custom Appender with Filters ===
log4j.appender.filteredlog=org.apache.log4j.ConsoleAppender
log4j.appender.filteredlog.layout=org.apache.log4j.PatternLayout
log4j.appender.filteredlog.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# === Filter #1: Drop warning ===
log4j.appender.filteredlog.filter.1=org.apache.log4j.varia.StringMatchFilter
log4j.appender.filteredlog.filter.1.StringToMatch=Received Process Heartbeat for unknown (or duplicate) process.
log4j.appender.filteredlog.filter.1.AcceptOnMatch=false
# === Filter #2: Drop telemetry config warning ===
log4j.appender.filteredlog.filter.2=org.apache.log4j.varia.StringMatchFilter
log4j.appender.filteredlog.filter.2.StringToMatch=TELEMETRY_ALTUS_ACCOUNT is not configured for Otelcol
log4j.appender.filteredlog.filter.2.AcceptOnMatch=false
# === Accept all other messages ===
log4j.appender.filteredlog.filter.3=org.apache.log4j.varia.AcceptAllFilter
# === Specific logger for AgentProtocolImpl ===
log4j.logger.com.cloudera.server.cmf.AgentProtocolImpl=WARN, filteredlog
log4j.additivity.com.cloudera.server.cmf.AgentProtocolImpl=false
# === Specific logger for BaseMonitorConfigsEvaluator ===
log4j.logger.com.cloudera.cmf.service.config.BaseMonitorConfigsEvaluator=WARN, filteredlog
log4j.additivity.com.cloudera.cmf.service.config.BaseMonitorConfigsEvaluator=false
Once done, restart the Cloudera Manager server(s) for the updated
configuration to be picked.
OPSAPS-73211: Cloudera Manager 7.13.1 does
not clean up Python Path impacting Hue to start
When you upgrade from Cloudera Manager 7.7.1 or lower versions to
Cloudera Manager 7.13.1 or higher versions with CDP Private
Cloud Base 7.1.7.x Hue does not start because Cloudera Manager
forces Hue to start with Python 3.8, and Hue needs Python 2.7.
The reason for this issue is because Cloudera Manager does not
clean up the Python Path at any time, so when Hue tries to start the Python Path
points to 3.8, which is not supported in CDP Private Cloud Base 7.1.7.x version by
Hue.
To resolve this issue temporarily, you must perform the following steps:
Locate the hue.sh in
/opt/cloudera/cm-agent/service/hue/.
Add the following line after export
HADOOP_CONF_DIR=$CONF_DIR/hadoop-conf:
OPSAPS-73011: Wrong parameter in the
/etc/default/cloudera-scm-server file
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300
7.13.1.400
In case the Cloudera Manager needs to be installed in
High Availability (2 nodes or more as explained here), the parameter
CMF_SERVER_ARGS in the
/etc/default/cloudera-scm-server file is missing the word
"export" before it (on the file there is only
CMF_SERVER_ARGS= and not export CMF_SERVER_ARGS=),
so the parameter cannot be utilized correctly.
Edit the
/etc/default/cloudera-scm-server file with root credentials and
add the word "export" before the parameter
CMF_SERVER_ARGS=.
OPSAPS-60346: Upgrading Cloudera Manager
Agent triggers cert rotation in Auto-TLS use case 1
Upgrading Cloudera Manager Agent nodes from the Cloudera Manager UI wizard as part of a Cloudera Manager upgrade causes the host to get new certificates,
which becomes disruptive.
The issue happens with use case 1 and Cloudera Manager DB is
because Cloudera Manager always regenerates the host cert as part
of the host install or host upgrade step. However, with use case 3, Cloudera Manager does not regenerate the cert as it comes from the user.
Currently, there are three following possible workarounds:
Rotate all CMCA certs again using the generateCmca API
command, and using the "location" argument to specify a directory on disk. This
will revert to the old behavior of storing the certs on disk instead of the
DB.
Switch to Auto-TLS Use Case 3 (Customer CA-signed Certificates).
Manual upgrade of Cloudera Manager Agents, instead of
upgrading from Cloudera Manager GUI.
OPSAPS-72447, CDPD-76705: Ozone incremental replication fails
to copy renamed directory
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300
7.13.1.400, 7.13.2.0
Ozone incremental replication using Ozone replication
policies succeed but might fail to sync nested renames for FSO buckets.
When a directory and its contents are renamed between
the replication runs, the outer level rename synced but did not sync the contents with
the previous name.
None
OPSAPS-72756:The runOzoneCommand API endpoint fails during the
Ozone replication policy run
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300
7.13.1.400, 7.13.2.0
The
/clusters/{clusterName}/runOzoneCommandCloudera Manager API endpoint fails when the API is called with the
getOzoneBucketInfo command. In this scenario, the Ozone
replication policy runs also fail if the following conditions are true:
The source Cloudera Manager version is 7.11.3 CHF11
or 7.11.3 CHF12.
The target Cloudera Manager is version 7.11.3
through 7.11.3 CHF10 or 7.13.0.0 or later where the feature flag
API_OZONE_REPLICATION_USING_PROXY_USER is disabled.
Choose one of the following methods as a workaround:
Upgrade the target Cloudera Manager before you
upgrade the source Cloudera Manager for 7.11.3 CHF12 version
only.
Pause all replication policies, upgrade source Cloudera Manager, upgrade destination Cloudera Manager, and unpause the replication policies.
Upgrade source Cloudera Manager, upgrade target Cloudera Manager, and rerun the failed Ozone replication policies
between the source and target clusters.
OPSAPS-65377: Cloudera Manager - Host
Inspector not finding Psycopg2 on Ubuntu 20 or Redhat 8.x when Psycopg2 version 2.9.3
is installed.
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300
7.13.1.400
Host Inspector fails with Psycopg2 version error while upgrading to Cloudera Manager 7.13.1.x versions. When you run the Host Inspector,
you get an error Not finding Psycopg2, even though it
is installed on all hosts.
None
OPSAPS-68340: Zeppelin paragraph execution fails with the
User not allowed to impersonate error.
Starting from Cloudera Manager 7.11.3, Cloudera Manager auto-configures the
livy_admin_users configuration when Livy is run for the first
time. If you add Zeppelin or Knox services later to the existing cluster and do not
manually update the service user, the User not allowed to
impersonate error is displayed.
If you add Zeppelin or Knox services later to the existing cluster, you must
manually add the respective service user to the livy_admin_users
configuration in the Livy configuration page.
OPSAPS-72804: For recurring replication policies, the interval
is overwritten to 1 after the replication policy is edited
7.13.1
7.13.1.100, 7.13.2.0
When you edit an Atlas, Iceberg, Ozone, or a Ranger
replication policy that has a recurring schedule on the Replication Manager UI, the
Edit Replication Policy modal window appears as expected. However, the frequency of
the policy is reset to run at “1” unit where the unit depends on what you have set in
the replication policy. For example, if you have set the replication policy to run
every four hours, it is reset to one hour when you edit the replication policy.
After you edit the replication policy as required, you
must ensure that you manually set the frequency to the original scheduled frequency,
and then save the replication policy.
OPSAPS-69342: Access issues identified in MariaDB 10.6 were
causing discrepancies in High Availability (HA) mode
MariaDB 10.6, by default, includes the property
require_secure_transport=ON in the configuration file
(/etc/my.cnf), which is absent in MariaDB 10.4. This setting
prohibits non-TLS connections, leading to access issues. This problem is observed in
High Availability (HA) mode, where certain operations may not be using the same
connection.
To resolve the issue temporarily, you can either comment out or disable the line
require_secure_transport in the configuration file located at
/etc/my.cnf.
CDPD-53160: Incorrect job run status appears for subsequent
Hive ACID replication policy runs after the replication policy fails
7.13.1, 7.13.1.100, 7.13.1.200
7.13.1.300, 7.13.2.0
When a Hive ACID replication policy run fails with the
FAILED_ADMIN status, the subsequent Hive ACID replication
policy runs show SKIPPED instead of
FAILED_ADMIN status on the Cloudera Manager > Replication Manager > Replication Policies > Actions > Show History page which is incorrect. It is recommended that you check Hive ACID
replication policy runs if multiple subsequent policy runs show the
SKIPPED status.
None
CDPQE-36126: Iceberg replication fails when source and target
clusters use different nameservice names
When you run an Iceberg replication policy between
clusters where the source and target clusters use different nameservice names, the
replication policy fails.
Perform the following steps to mitigate the issue, note
that in the following steps the source nameservice is assumed to be ns1 and target
cluster nameservice is assumed to be ns2:
Go to the Cloudera Manager > Replication > Replication > Replication Policies page.
Click Actions > Edit for the required Iceberg replication policy.
Go to the Advanced tab on the Edit
Iceberg Replication Policy modal window.
Enter the
mapreduce.job.hdfs-servers.token-renewal.exclude =
ns1, ns2 key value pair for Advanced
Configuration Snippet (Safety Valve) for source hdfs-site.xml and
Advanced Configuration Snippet (Safety Valve) for destination
hdfs-site.xml fields.
Save the changes.
Click Actions > Run Now to run the replication policy.
CDPD-53185: Clear REPL_TXN_MAP table on target cluster when
deleting a Hive ACID replication policy
7.13.1, 7.13.1.100, 7.13.1.200, 7.13.1.300
7.13.1.400, 7.13.2.0
The entry in REPL_TXN_MAP table on the target cluster is
retained when the following conditions are true:
A Hive ACID replication policy is replicating a transaction that requires
multiple replication cycles to complete.
The replication policy and databases used in it get deleted on the source and
target cluster even before the transaction is completely replicated.
In this scenario, if you create a database using the same name as the deleted
database on the source cluster, and then use the same name for the new Hive ACID
replication policy to replicate the database, the replicated database on the target
cluster is tagged as ‘database incompatible’. This happens after the housekeeper
thread process (that runs every 11 days for an entry) deletes the retained
entry.
Create another Hive ACID replication policy with a
different name for the new database
DMX-3973: Ozone replication policy with linked bucket as
destination fails intermittently
When you create an Ozone replication policy using a
linked/non-linked source cluster bucket and a linked target bucket, the replication
policy fails during the "Trigger a OZONE replication job on one of the available OZONE
roles" step.
None
OPSAPS-68143:Ozone replication policy fails for empty source
OBS bucket
An Ozone incremental replication policy for an OBS
bucket fails during the “Run File Listing on Peer cluster” step when the source bucket
is empty.
None
OPSAPS-71592: Replication Manager does not read the default
value of “ozone_replication_core_site_safety_valve” during Ozone replication policy
run
7.13.1
7.13.1.100, 7.13.2
During the Ozone replication policy run, Replication
Manager does not read the value in the
ozone_replication_core_site_safety_valve advanced configuration
snippet if it is configured with the default value.
To mitigate this issue, you can use one of the following
methods:
Remove some or all the properties in
ozone_replication_core_site_safety_valve, and move them to
ozone-conf/ozone-site.xml_service_safety_valve.
Add a dummy property with no value in
ozone_replication_core_site_safety_valve. For example, add
<property><name>dummy_property</name><value></value></property>,
save the changes, and run the Ozone replication policy.
OPSAPS-71897: Finalize Upgrade command fails after upgrading
the cluster with CustomKerberos setup causing INTERNAL_ERROR with EC
writes.
The hive.compactor.initiator.on
checkbox in Cloudera Manager UI for Hive Metastore (HMS) does not
reflect the actual configuration value in cloud deployments. The default value is
false, causing the compactor to not run.
To update the
hive.compactor.initiator.on value:
In the Cloudera Manager, go to Hive > Configuration
Add the value for hive.compactor.initiator.on to
true in the "Hive Service Advanced Configuration Snippet
(Safety Valve) for hive-site.xml"
Save the changes and Restart.
Once applied, the compaction process will run as expected.
OPSAPS-70702: Ranger replication policies fail if the clusters
do not use AutoTLS
Ranger replication policies fail during the
Exporting services, policies and roles from Ranger remote
step.
Log in to the Ranger Admin host(s) on the source cluster.
Identify the Cloudera Manager agent PEM file using
the # cat /etc/cloudera-scm-agent/config.ini | grep -i
client_cert_file command. For example, the file might reside in
client_cert_file=/myTLSpath/cm_server-cert.pem
location.
Create the path for the new PEM file using the # mkdir -p
/var/lib/cloudera-scm-agent/agent-cert/ command.
Copy the client_cert_file from
config.ini as
cm-auto-global_cacerts.pem file using the # cp
/myTLSpath/cm_server-cert.pem
/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem
command.
Change the ownership to 644 using the
# chmod 644
/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem
command.
Resume the Ranger replication policy in Replication Manager.
OPSAPS-71424: The configuration sanity check step ignores
during the replication advanced configuration snippet values during the Ozone
replication policy job run
7.13.1
7.13.1.100, 7.13.2.0
The OBS-to-OBS Ozone replication policy jobs fail if
the S3 property values for fs.s3a.endpoint,
fs.s3a.secret.key, and fs.s3a.access.key are empty
in Ozone Service Advanced Configuration Snippet (Safety Valve) for
ozone-conf/ozone-site.xml even though you defined the properties in
Ozone Replication Advanced Configuration Snippet (Safety Valve) for
core-site.xml.
Ensure that the S3 property values for
fs.s3a.endpoint, fs.s3a.secret.key, and
fs.s3a.access.key contains at least a dummy value in Ozone
Service Advanced Configuration Snippet (Safety Valve) for
ozone-conf/ozone-site.xml.
Additionally, you must ensure that
you do not update the property values in Ozone Replication Advanced
Configuration Snippet (Safety Valve) for core-site.xml for Ozone
replication jobs. This is because the values in this advanced configuration snippet
overrides the property values in core-site.xml and not the
ozone-site.xml file.
Different property values in
Ozone Service Advanced Configuration Snippet (Safety Valve) for
ozone-conf/ozone-site.xml and Ozone Replication Advanced Configuration
Snippet (Safety Valve) for core-site.xml result in a nondeterministic behavior
where the replication job picks up either value during the job run which leads to
incorrect results or replication job failure.
OPSAPS-71403: Ozone replication policy creation wizard shows
"Listing Type" field in source Cloudera Private Cloud Base versions
lower than 7.1.9
When the source Cloudera Private Cloud Base cluster version is lower than 7.1.9 and the
Cloudera Manager version is 7.11.3, the Ozone replication policy
creation wizard shows Listing Type and its options. These
options are not available in Cloudera Private Cloud Base 7.1.8.x
versions.
OPSAPS-71659: Ranger replication policy fails because of
incorrect source to destination service name mapping
7.13.1
7.13.1.100, 7.13.2.0
Ranger replication policy fails because of incorrect
source to destination service name mapping format during the transform step.
If the service names are different in the source and
target, then you can perform the following steps to resolve the issue:
SSH to the host on which the Ranger Admin role is running.
Find the ranger-replication.sh file.
Create a backup copy of the file.
Locate substituteEnv
SOURCE_DESTINATION_RANGER_SERVICE_NAME_MAPPING
${RANGER_REPL_SERVICE_NAME_MAPPING} in the file.
Modify it to substituteEnv
SOURCE_DESTINATION_RANGER_SERVICE_NAME_MAPPING
"'${RANGER_REPL_SERVICE_NAME_MAPPING//\"}'"
Save the file.
Rerun the Ranger replication policy.
OPSAPS-69782: HBase COD-COD replication from 7.3.1 to 7.2.18
fails during the "create adhoc snapshot" step
7.13.1
7.13.1.100, 7.13.2.0
An HBase replication policy replicating from 7.3.1 COD
to 7.2.18 COD cluster that has ‘Perform Initial Snapshot” enabled fails during the
snapshot creation step in Cloudera Replication Manager.
OPSAPS-71414: Permission denied for Ozone replication policy
jobs if the source and target bucket names are identical
The OBS-to-OBS Ozone replication policy job fails with
the com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden or
Permission denied error when the bucket names on the source and
target clusters are identical and the job uses S3 delegation tokens. Note that the
Ozone replication jobs use the delegation tokens when the S3 connector service is
enabled in the cluster.
You can use one of the following workarounds to mitigate
the issue:
Use different bucket names on the source and target clusters.
Set the fs.s3a.delegation.token.binding property to an empty
value in ozone_replication_core_site_safety_valve to disable the
delegation tokens for Ozone replication policy jobs.
OPSAPS-71256: The “Create Ranger replication policy” action
shows 'TypeError' if no peer exists
7.13.1
7.13.1.100, 7.13.2.0
When you click target Cloudera Manager > Replication Manager > Replication Policies > Create Replication Policy > Ranger replication policy, the TypeError: Cannot read properties of undefined
error appears.
OPSAPS-71067: Wrong interval sent from the Replication Manager
UI after Ozone replication policy submit or edit process.
SMM does not show any metrics for Kafka or Kafka Connect when
multiple listeners are set in Kafka.
Workaround: SMM cannot identify multiple listeners and
still points to bootstrap server using the default broker port (9093 for
SASL_SSL). You need to override the bootstrap server URL by
performing the following steps:
In Cloudera Manager, go to SMM > Configuration > Streams Messaging Manager Rest Admin Server Advanced Configuration
Snippet (Safety Valve)
Override bootstrap server URL (hostname:port as set in the
listeners for broker) for
streams-messaging-manager.yaml.
Save your changes.
Restart SMM.
OPSAPS-69317: Kafka Connect Rolling Restart Check fails if SSL
Client authentication is required
The rolling restart action does not work in Kafka Connect when
the ssl.client.auth option is set to required. The health check fails with a timeout
which blocks restarting the subsequent Kafka Connect instances.
You can set ssl.client.auth to
requested instead of required and initiate a
rolling restart again. Alternatively, you can perform the rolling restart manually by
restarting the Kafka Connect instances one-by-one and checking periodically whether
the service endpoint is available before starting the next one.
OPSAPS-70971: Schema Registry does not have permissions to use
Atlas after an upgrade
Following an upgrade, Schema Registry might not have the
required permissions in Ranger to access Atlas. As a result, Schema Registry's
integration with Atlas might not function in secure clusters where Ranger
authorization is enabled.
Access the Ranger Console (Ranger Admin web UI).
Click the cm_atlas resource-based service.
Add the schemaregistry user to the all - *
policies.
Click Manage Service > Edit Service.
Add the schemaregistry user to the
default.policy.users property.
OPSAPS-59597: SMM UI logs are not supported by Cloudera Manager
Cloudera Manager does not display a
Log Files menu for SMM UI role (and SMM UI logs cannot be
displayed in the Cloudera Manager UI) because the logging type used
by SMM UI is not supported by Cloudera Manager.
View the SMM UI logs on the host.
OPSAPS-72298: Impala metadata replication is mandatory and UDF
functions parameters are not mapped to the destination
Impala metadata replication is enabled by default but the
legacy Impala C/C++ UDF's (user-defined functions) are not replicated as expected
during the Hive external table replication policy run.
Edit the location of the UDF functions after the
replication run is complete. To accomplish this task, you can edit the “path of the
UDF function” to map it to the new cluster address, or you can use a script.
OPSAPS-70713: Error appears when running Atlas replication
policy if source or target clusters use Dell EMC Isilon storage
You cannot create an Atlas replication policy between
clusters if one or both the clusters use Dell EMC Isilon storage.
None
OPSAPS-72468: Subsequent Ozone OBS-to-OBS replication policy
do not skip replicated files during replication
7.13.1
7.13.1.100
The first Ozone replication policy run is a bootstrap
run. Sometimes, the subsequent runs might also be bootstrap jobs if the incremental
replication fails and the job runs fall back to bootstrap replication. In this
scenario, the bootstrap replication jobs might replicate the files that were already
replicated because the modification time is different for a file on the source and the
target cluster.
None
OPSAPS-72470: Hive ACID replication policies fail when target
cluster uses Dell EMC Isilon storage and supports JDK17
Hive ACID replication policies fail if the target
cluster is deployed with Dell EMC Isilon storage and also supports JDK17.
None
OPSAPS-73138, OPSAPS-72435: Ozone OBS-to-OBS replication
policies create directories in the target cluster even when no such directories exist
on the source cluster
Ozone OBS-to-OBS replication uses Hadoop S3A connector
to access data on the OBS buckets. Depending on the runtime version and settings in
the clusters:
directory marker keys (associated to the parent directories) appear in the
destination bucket even when it is not available in the source bucket.
delete requests of non-existing keys to the destination storage are submitted
which result in `Key delete failed` messages to appear in the Ozone Manager
log.
The OBS buckets are flat namespaces with independent keys, and the character
‘/’ has no special significance in the key names. Whereas in FSO buckets, each
bucket is a hierarchical namespace with filesystem-like semantics, where the ‘/’
separated components become the path in the hierarchy. The S3A connector provides
filesystem-like semantics over object stores where the connector mimics the
directory behaviour, that is, it creates and optionally deletes the “empty directory
markers”. These markers get created when the S3A connector creates an empty
directory. Depending on the runtime (S3A connector) version and settings, these
markers are deleted when a descendant path is created and is not deleted.
Empty directory marker creation is inherent to S3A
connector. Empty directory marker deletion behavior can be adjusted using the
fs.s3a.directory.marker.retention = keep
or delete key-value pair. For information about configuring the
key-value pair, see Controlling the S3A Directory Marker
Behavior.
OPSAPS-73655: Cloud replication fails after the delegation
token is issued
HDFS and Hive external table replication policies from
an on-premises cluster to cloud fail when the following conditions are true:
You choose the Advanced Options > Delete Policy > Delete Permanently option during the replication policy creation process.
Incremental replication is in progress, that is the source paths of
the replication are snapshottable directories and the bootstrap replication run is
complete.
None
OPSAPS-75090: Ozone replication policies fail without source
proxy user
An Ozone replication policy with an empty
Run on Peer as Username field (The default value for this
field is empty) fails with the "java.io.IOException: Error acquiring writer for
listing file "ofs://<service
id>/user/om/.cm/distcp-staging/<timestamp>/fileList.seq": bucket name 'om' is too
short, valid length is 3-63 characters". error message.
If you do not have a source proxy user name to specify
in the Run on Peer as Username field, you can enter
om as the default user for the replication on the source
cluster.
OPSAPS-75994: Intermittent HBase replication failure because
of missing result file
7.13.2
HBase replication policies fail intermittently during the
Check if source tables exist step with the
java.lang.IllegalArgumentException: argument "src" is null error
message.
Delete and recreate the failed HBase replication
policy.
OPSAPS-72125: The arguments field size exceeds the limit for
Hive external table replications
7.13.2
When replicating a large number of Hive external tables using
table filters to target clusters that use PostgreSQL for the Cloudera Manager
database, the arguments field of the Hive Data Replication command might exceed the
column limit. By default, the arguments column limit in the COMMANDS table is
1,048,676 characters. If the command exceeds the limit, Cloudera Manager cannot
persist the command to the database.
Perform the following steps to mitigate this issue:
When using table filters, split the policy into multiple chunks so that the Hive
Data Replication command created by the chunked policies can be persisted.
Increase the arguments column size of the COMMANDS table in the target Cloudera
Manager database using the ALTER TABLE COMMANDS ALTER COLUMN arguments
TYPE character varying(10485760); command. The maximum varchar column
size is 10,485,760.
OPSAPS-73362: Temporary Ozone snapshots are not deleted
automatically
7.13.2
Temporary snapshots used by Ozone incremental data replication
for checking the target side changes are not deleted automatically in some error
modes.
Currently, the temporary snapshots are generated and
reside in the cm-tmp-[***RANDOM_UUID *** ]
target bucket. These snapshots are deleted immediately after a snapshot-diff
calculation. You can delete the snapshots manually only when no replication policy
involving this bucket is actively running.
OPSAPS-73254, OPSAPS-73252: Editing a replication policy can
set the user name to an empty string
7.13.2
On an unsecure (non-Kerberos) cluster, creating or editing a
replication policy with an empty Run as Username or
Run on Peer as Username field might cause the replication
jobs to fail.
Use the Cloudera Manager API to update the fields to
contain a null value instead of an empty string.
DMX-4681: Iceberg replication synchronization step fails for
the database created at a custom location without an Ozone key
7.13.2
The synchronization step of the Iceberg replication command
fails during bootstrap replication if you created the database in an Ozone bucket
without providing the Ozone key name. The policy fails even if you have configured the
Location Mapping field to map the correct Ozone buckets.
For existing databases or tables that you created without keys, enter the
location mapping of the source and target om service IDs in
the Location Mapping field in the Iceberg replication
policy. For example, ofs://srcomid, ofs://tgtomid.
For new databases and tables, ensure that you provide a key when you create the
database. For example, CREATE DATABASE db1 LOCATION
‘ofs://omid/volume1/bucket1/db1.db'; and CREATE EXTERNAL
TABLE tb1 (id int) STORED BY ICEBERG LOCATION
'ofs://omid/volume1/bucket1/tb1’.
OPSAPS-76854: Cannot edit existing Iceberg replication
policies after upgrade
7.13.2
You cannot edit the existing Iceberg replication policies in
Replication Manager UI after you upgrade from Cloudera Manager 7.11.3 or 7.13.1 to
7.13.2.0.
You can use the Cloudera Manager API to view the policy
details. To edit the replication policy, use the Cloudera Manager API, or delete and
recreate the policy.
CDPD-63922, CDPD-95711: Atlas replication policies fail when
the number of databases and tables exceed 100,000
7.13.2
When a composite replication policy targets more than 100,000
entities, for example, 100 databases containing 1,000 tables each, the following
issues occur:
A Iceberg replication policy with Atlas metadata migration – The
replication policy fails for both bootstrap and incremental jobs. However, the Cloudera Manager > Replication > Replication Policies page displays the replication policy Status as
Successful, and the Atlas UI does not represent the
expected entities.
A Hive external replication policy with Atlas metadata migration — The
replication policy fails for 400 GB (40,000 entities), the Replication
Policies page displays the replication policy
Status as Failed, and the Atlas UI
becomes unresponsive.
OPSAPS-74398: Ozone and HDFS replication policies might fail
when you use different destination proxy user and source proxy user
7.13.2
HDFS on-premises to on-premises replication fails when the
following conditions are true:
You configure different Run As Username and
Run on Peer as Username during the replication policy
creation process.
The user configured in Run As Username does not
have the permission to access the source path on the source HDFS.
Ozone replication fails when the following conditions are
true:
FSO-to-FSO replication or an OBS-to-OBS replication with
Incremental with fallback to full file listing
or Incremental only replication type.
You configured different Run As Username and
Run on Peer as Username during the replication policy
creation process.
The user configured in Run As Username does not
have the permission to access the source bucket on the source Ozone.
Provide the same permissions to the user configured in
Run As Username as the permissions of Run on Peer
as Username on the source cluster.
OPSAPS-75361: Multiple policies do not start
simultaneously
7.13.2
When multiple Atlas replication policies are scheduled to
start at the same time, some policies might fail to initiate. For example, you
schedule to run seven Atlas replication policies to run simultaneously, only three
might start successfully. The remaining policies are not triggered, remain in a
None state, and do not recur, which results in incomplete
replication. The Replication Policies page displays
None for these policies.
Do not schedule multiple Atlas replication policies to
start at the same time. To avoid this issue, Replication Manager also ensures that a
two-minute gap between two Atlas replication policies creation process is maintained
to avoid this issue.
OPSAPS-76832, OPSAPS-70771: Running replication policy runs
must not allow you to download the performance reports
7.13.2
During a replication policy run, the A server error has
occurred. See Cloudera Manager server log for details error message appears
on the Replication Manager UI, and the Cloudera Manager log shows
java.lang.IllegalStateException: Command has no result data when you
click:
Performance Reports > Performance Summary or Performance Reports > Performance Full on the Replication Policies page.
Download CSV on the Replication History page to
download any report.
This is because the Replication Manager UI shows the performance
report links as enabled and clickable which is incorrect. You can download the
reports only after the replication job run is complete.
None
OPSAPS-76099: Incremental Iceberg replication time exceeds the
bootstrap duration
7.13.2
The incremental Iceberg replication takes a longer time to
complete as compared to bootstrap replication for Iceberg replication policies.
None
OPSAPS-75848: Composite Iceberg and Atlas replication duration
takes 10xto 15x times more duration as compared to standalone Atlas replication
7.13.2
The Atlas replication takes up to 15x the time when run using
the composite Iceberg replication as compared to standalone Atlas replication, though
the Iceberg data gets replicated in the expected time.
None
OPSAPS-75853: The history entries display "Partial success"
for successful composite replication for Iceberg and Atlas
7.13.2
The Cloudera Manager > Replication > Replication History page for a composite Iceberg replication policy displays
Partial success even when both the Atlas and Iceberg
replications were successful which is incorrect.