- OPSAPS-67152: Cloudera Manager does not allow you to update some
configuration parameters.
-
Cloudera Manager does not allow you to set to "0" for the
dfs_access_time_precision
and
dfs_namenode_accesstime_precision
configuration parameters.
You will not be able to update dfs_access_time_precision
and
dfs_namenode_accesstime_precision
to "0". If you try to enter "0"
in these configuration input fields, then the field gets cleared off and results in a
validation error: This field is required.
-
To fix this issue, perform the workaround steps as mentioned in the KB article.
If you need any guidance during this process, contact Cloudera support.
- Cloudera bug: OPSAPS-59764: Memory leak in the
Cloudera Manager agent while downloading the parcels.
-
When using the M2Crpyto library in the Cloudera Manager agent to download parcels
causes a memory leak.
The Cloudera Manager server requires parcels to install a cluster. If any of the URLs
of parcels are modified, then the server provides information to all the Cloudera
Manager agent processes that are installed on each cluster host.
The Cloudera Manager agent then starts checking for updates regularly by downloading
the manifest file that is available under each of the URLs. However, if the URL is
invalid or not reachable to download the parcel, then the Cloudera Manager agent shows
a 404 error message and the memory of the Cloudera Manager agent process increases due
to a memory leak in the file downloader code of the agent.
-
To prevent this memory leak, ensure all URLs of parcels in Cloudera Manager are
reachable. To achieve this, delete all unused and unreachable parcels from the
Cloudera Manager parcels page.
- OPSAPS-65365 Do not use the $
character in the password for the custom Docker Repository for ECS
installations.
- Ensure that the
$ character is not part of the Docker
Repository password.
- OPSX-2713: PVC ECS Installation:
Failed to perform First Run of services.
- If an issue is encountered during the
Install Control Plane step of ECS Cluster
First Run, installation will be re-attempted infinitely rather than
the command failing.
- Since the control plane is installed
and uninstalled in a continuous cycle, it is often possible to
address the cause of the failure while the command is still running,
at which point the next attempted installation should succeed. If
this is not successful, abort the First Run command, delete the
Containerized Cluster, address the cause of the failure, and retry
from the beginning of the Add Cluster wizard. Any nodes that are
re-used must be cleaned before re-attempting installation.
- OPSX-3359: ECS Upgrade
failure
- You may see the following error message
during the Upgrade Cluster > Reapplying all settings >
kubectl-patch
step:
kubectl rollout status deployment/rke2-ingress-nginx-controller -n kube-system --timeout=5m
error: timed out waiting for the condition
- If you see this error, do the
following:
- Check whether all the Kubernetes nodes are ready for
scheduling. Run the following command from the ECS Server
node:
kubectl get nodes
You
will see output similar to the
following:NAME STATUS ROLES AGE VERSION
<node1> Ready,SchedulingDisabled control-plane,etcd,master 103m v1.21.11+rke2r1
<node2> Ready <none> 101m v1.21.11+rke2r1
<node3> Ready <none> 101m v1.21.11+rke2r1
<node4> Ready <none> 101m v1.21.11+rke2r1
-
Run the following command from the ECS Server node for the
node showing a status of
SchedulingDisabled
:
kubectl uncordon
You
will see output similar to the following:
<node1>node/<node1> uncordoned
- Scale down and scale up the
rke2-ingress-nginx-controller pod by
running the following command on the ECS Server
node:
kubectl delete pod rke2-ingress-nginx-controller-<pod number> -n kube-system
- Resume the upgrade.
- OPSX-3547: ECS
upgrade is taking 10+ hours to complete on 25 nodes cluster
- The worst case scenario in rolling restart
during upgrade takes around 24 minutes per node on clusters with 25
nodes.
- During the upgrade, If you see that
the stop operation on a single node takes longer than 25 minutes or
starting a node takes longer than 10 minutes, you can configure
Cloudera Manager to reduce the default timeouts by decreasing the
value of the timeout parameters listed below to speed up the
upgrade. (To change the configuration In the Cloudera Manager Admin
Console, go to the ECS service, click the Configuration tab, and
search for the parameter.)
The stop operation on a single node
has the following steps:
- Graceful drain of the node
This
process has a default timeout of 10 minutes, controlled by the
Cloudera Manager configuration parameter
DRAIN_NODE_TIMEOUT
.
- Non-graceful drain of the node
This
process has a default timeout of 10 minutes, controlled by the
Cloudera Manager configuration parameter
DRAIN_NODE_TIMEOUT
.
- Wait for the workloads to spawn on other nodes in
the cluster.
This process has a default timeout
of 10 minutes, controlled by the Cloudera Manager
configuration parameter
WAIT_TIME_FOR_NODE_READINESS
.
The start operation has the following steps:
- Uncordon the node
There is no
timeout parameter for this step.
- Wait for the workloads to spawn on the node
This process has a default timeout of 10
minutes, controlled by the Cloudera Manager configuration
parameter
WAIT_TIME_FOR_NODE_READINESS
.
- OPSX-3550 Incorrect status on CDP
Private Cloud Data Services services page in the Cloudera Manager
Admin Console while ECS is upgrading
- The Cluster page might show that the Upgrade
failed while an upgrade is in progress.
- Please check the Upgrade Command for
the status of the upgrade. The Cluster page will reflect the new
version once the upgrade command is complete.
- OPSX-735: Kerberos service should
handle Cloudera Manager downtime
- The Cloudera Manager Server in the base
cluster must be running in order to generate Kerberos principals for
Private Cloud. If there is downtime, you may observe
Kerberos-related errors.
- Resolve downtime on Cloudera Manager.
If you encountered Kerberos errors, you can retry the operation
(such as retrying creation of the Virtual Warehouse).