Known Issues in CDP Private Cloud Data Services 1.5.4-SP1

The following are the new known issues in the 1.5.4 service pack SP1 release of CDP Private Cloud Data Services.

OBS-6044 - Warning alert in the ECS Health Test status when a cluster is restarted for stability execution
The following warning alert is shown in the ECS Health Test status when a cluster is restarted in Cloudera Manager for stability execution. Prometheus has issues compacting blocks

This issue occurs when WAL (Write Ahead Logs) are corrupted.

  1. Run the following command to access the Prometheus container's shell:
    kubectl exec -i -t -n <prometheus server namespace> <prometheus server pod name> -c
              <prometheus server container name> -- sh -c "(bash || ash || sh)"
  2. Change the current working directory to the WAL directory of Prometheus.
    1. For Infrastructure Prometheus: The WAL directory location is /Prometheus/wal. For example:

      cd /prometheus/wal

    2. For control plane/environment: The Prometheus directory location is /data/wal. For example:

      cd /data/wal

  3. Note the corrupted segment from Prometheus's pod logs. Example logs:
    21T09:00:07.036Z caller=db.go:1074 level=error component=tsdb msg="compaction failed" err="WAL truncation in Compact: create checkpoint: read segments: corruption in segment
              /prometheus/wal/00000026 at 10978: unexpected full record"
  4. Skip the compression of the corrupted segment by moving the checkpoint. This requires renaming the checkpoint folder in the WAL directory. For example, if the corrupted segment is 00000026 and the current checkpoint folder name is checkpoint.00000020, then rename the checkpoint folder to checkpoint.00000027. For example:

    mv checkpoint.00000020 checkpoint.00000027

OPSX-5810 - Private Cloud Control Plane installation fails at the vault initialization phase due to longhorn-manager pods
At times, longhorn-manager pods will fail to come up with repeating error messages like:
level=error msg="Failed to save TLS secret for longhorn-system/longhorn-webhook-tls: Operation cannot be fulfilled on secrets \"longhorn-webhook-tls\": the object has been modified; please apply your changes to the latest version and try again"

This causes the Longhorn nodes to remain in a NotReady state, stopping volumes from successfully being created/attached.

The following steps can be taken on an ECS Server node to fix the issue:

  1. Stop the Longhorn Manager daemonset by executing following command:
    kubectl -n longhorn-system patch daemonset longhorn-manager -p '{"spec": {"template": {"spec": {"nodeSelector": {"non-existing": "true"}}}}}'
  2. Delete the Longhorn Webhook TLS secret by executing the following command:
    kubectl delete secret longhorn-webhook-tls -n longhorn-system
  3. Start the Longhorn Manager daemonset by executing the following command:
    kubectl -n longhorn-system patch daemonset longhorn-manager --type json -p='[{"op": "remove", "path": "/spec/template/spec/nodeSelector/non-existing"}]'
OPSX-5403 - Typecasting fails when truststore password is integer
The truststore_password in the SCM configuration should not be an integer for Private Cloud installation.
Update truststore_password in the SCM configuration to a non-integer value.
OPSX-4684 - Start ECS command shows finished successfully even though start docker server failed on one of the hosts
Docker service starts with one or more docker roles failed to start because the corresponding host is unhealthy.

Make sure the host is healthy. Start the docker role in the host.

OPSX-4391 - External docker cert not base64 encoded

When using Private Cloud Data Services on ECS, in some rare situations, the CA certificate for the Docker registry in the cdp namespace is incorrectly encoded, resulting in TLS errors when connecting to the Docker registry.

Compare and edit the contents of the "cdp-private-installer-docker-cert" secret in the cdp namespace so that it matches the contents of the "cdp-private-installer-docker-cert" secret in other namespaces. The secrets and their corresponding namespaces can be identified using the command "kubectl get secret -A | grep cdp-private-installer-docker-cert". Inspect each secret using the command "kubectl get secret -n cdp cdp-private-installer-docker-cert -o yaml", replacing "cdp" with the different namespace names. If necessary, modify the secret in the cdp namespace using the command "kubectl edit secret -n cdp cdp-private-installer-docker-cert"

OPSX-3323 - Custom Log redaction does not work for JSON files in diag bundles

The JSON files within the diag bundle will not be redacted.

No workaround available.
OPSX-2772 - For Account Administrator user, update roles functionality should be disabled
When a user with administrative privileges accesses the User Management > Update Roles page in the Management Console, the user is presented with options to select various roles. Selecting or deselecting these roles does not change this user's privileges -- an administrative user, by default, has all privileges, and those privileges cannot be changed.
No workaround available.