Managing Cloudera AI workloads during Control Plane database outage

This topic outlines the impact of the Control Plane database outage on Cloudera AI workloads and provides guidance on managing deployments to minimize disruptions. While Cloudera AI workloads remain unaffected, the Control Plane and its associated services will experience downtime.

Consider the following aspects to minimize disruptions:

Cloudera AI behaviour

The entire Cloudera AI Workbench relies on its own embedded PostgreSQL database for all its operations. During the downtime of the Control Plane's external database, Cloudera AI workloads remain unaffected. Workloads such as Models, Jobs, and Experiments will continue to function without any impact.

Control Plane behaviour

The Control Plane's external PostgreSQL database, that is undergoing maintenance, is used by the Control Plane to store metadata for the Cloudera AI Workbench.

During the downtime of the Control Plane database:
  • The Cloudera Control Plane will be unavailable.
  • The Cloudera AI UI and login functionality will be impacted.

Affected Cloudera AI-owned deployments

The following Cloudera AI-owned deployments are affected by the Control Plane database maintenance or downtime.

  • dp-mlx-control-plane-app: Represents the Control Plane and UI.
  • dp-health-poller: Captures health-related metrics.
  • dp-cadence-worker: Manages Cadence-related workflows

Recommended actions

  1. Scale down the affected deployments before starting the Control Plane database maintenance.
    kubectl scale deployment/dp-mlx-control-plane-app --namespace <control plane namespace> --replicas 0
    kubectl scale deployment/dp-health-poller --namespace <control plane namespace> --replicas 0
    kubectl scale deployment/dp-cadence-worker --namespace <control plane namespace> --replicas 0
  2. Scale the deployments back up after the maintenance of the database is complete.
    kubectl scale deployment/dp-mlx-control-plane-app --namespace <control plane namespace> --replicas 1
    kubectl scale deployment/dp-health-poller --namespace <control plane namespace> --replicas 1
    kubectl scale deployment/dp-cadence-worker --namespace <control plane namespace> --replicas 1