Cleaning up and backing up the Cloudera AI Workbench database manually
With the help of the script you can manually back up and clean up the Cloudera AI Workbench database.
- Configure the Workbench
kubeconfig
by setting it as an environment variable for the script to access. Use the following command:export KUBECONFIG=<your kubeconfig>
The script prompts you to specify a directory to store the backup file, define the retention time (entries older than this will be cleaned up), and optionally provide a Workbench name.
The result is an .sql
file as a snapshot of the Cloudera AI database.
#!/bin/bash
DB_NAME="sense"
# Ask for database configuration
read -p "Enter backup directory: " BACKUP_DIR
while true; do
read -p "Enter retention time (e.g., '30 days', '2 months'): " RETENTION_TIME
# Validate format: number + space + interval unit (singular or plural)
if [[ "$RETENTION_TIME" =~ ^[1-9][0-9]*\ (day|days|month|months|year|years|hour|hours|minute|minutes|second|seconds)$ ]]; then
break
else
echo "Invalid retention time format. Please use formats like '30 days', '2 months', or '1 year'."
fi
done
read -p "If on premises, enter workbench name: " INPUT_NAMESPACE
K8S_NAMESPACE=${INPUT_NAMESPACE:-mlx}
DB_POD="db-0"
# Table, timestamp column, and optional extra condition
TABLES=(
"dashboards:created_at:finished_at IS NOT NULL"
"dashboard_pods:created_at:status != 'running'"
"model_deployments:created_at:stopped_at IS NOT NULL"
"user_events:created_at:"
)
# Ensure backup directory exists
if ! mkdir -p "$BACKUP_DIR"; then
echo "Error: Failed to create or access backup directory '$BACKUP_DIR'. Check your permissions." >&2
exit 1
fi
echo "Backup directory verified: $BACKUP_DIR"
# Create a timestamped backup file
BACKUP_FILE="$BACKUP_DIR/${DB_NAME}_backup_$(date +"%Y%m%d%H%M%S").dump"
echo "Backup file path: $BACKUP_FILE"
# Inside the pod: create the dump file
kubectl exec -n "$K8S_NAMESPACE" -c db "$DB_POD"-- pg_dump -d "$DB_NAME" -F c -f /tmp/db_backup.dump
# Then copy it to local
kubectl cp -c db "$K8S_NAMESPACE/$DB_POD:/tmp/db_backup.dump" "$BACKUP_FILE"
if [ $? -eq 0 ]; then
echo "Backup created successfully: $BACKUP_FILE"
else
echo "Backup failed!" >&2
exit 1
fi
# Prompt the user for confirmation before cleanup
read -p "Are you sure you want to delete old entries from the database? (yes/no): " CONFIRM
if [[ "$CONFIRM" != "yes" ]]; then
echo "Cleanup aborted by user."
exit 0
fi
# Clean up old entries from multiple tables
for ENTRY in "${TABLES[@]}"; do
IFS=":" read -r TABLE TIMESTAMP_COLUMN EXTRA_CONDITION <<< "$ENTRY"
# Build base DELETE query
DELETE_QUERY="DELETE FROM $TABLE WHERE $TIMESTAMP_COLUMN < NOW() - INTERVAL '$RETENTION_TIME'"
# Append extra condition if present
if [ -n "$EXTRA_CONDITION" ]; then
DELETE_QUERY+=" AND $EXTRA_CONDITION"
fi
echo "Executing cleanup query on $TABLE: $DELETE_QUERY"
# Execute the cleanup query using kubectl exec
kubectl exec -n "$K8S_NAMESPACE" -c db "$DB_POD" -- psql -d "$DB_NAME" -c "$DELETE_QUERY"
if [ $? -eq 0 ]; then
echo "Old entries deleted successfully from $TABLE."
else
echo "Error: Failed to delete old entries from $TABLE. Cleanup process interrupted — not all tables were processed." >&2
exit 1
fi
done
echo "Backup and cleanup completed successfully."