Troubleshooting rollbacks

Diagnose and resolve issues encountered before or during the Virtual Warehouse rollback.

Issue: Rollback action is not available

  • Cause: The system does not recognize the Virtual Warehouse as eligible for a rollback due to missing prerequisites or an incorrect failure state.
  • Resolution: Perform the following actions to resolve the issue:
    • Verify that the Backup Virtual Warehouse namespaces before an upgrade option was enabled in your server configuration before the upgrade was initiated.
    • Confirm that the failure was explicitly caused by an upgrade process, not a rebuild or alternate task.
    • If no pre-upgrade backup reference is found, you must either manually re-create the Virtual Warehouse or resolve the source error that caused the upgrade to fail, then retry the process.

Issue: Virtual Warehouse remains in RollingBack status

  • Cause: If the status remains unchanged for more than 15 minutes, it typically indicates that the underlying orchestration workflow has become unresponsive or failed to progress.
  • Resolution: Perform the following steps to resolve the issue:
    1. Check the logs of the cdp-release-dwx-worker pod for any errors.
    2. Resolve any identified errors and retry the rollback process.

Issue: Rollback failed again

  • Cause: Persistent rollback failures typically indicate that the backup file is corrupted or the Disaster Recovery System (DRS) service is malfunctioning.
  • Resolution: Perform the following steps to resolve the issue:
    1. Check the logs of the cdp-release-drs provider pod or the specific backup job for errors.
    2. If the backup is determined to be unusable or corrupted, perform a manual rebuild of the Virtual Warehouse.

Issue: Virtual Warehouse transitions back to Running or Stopped state after starting upgrade process

  • Cause: The DRS backup job for the Virtual Warehouse failed.
  • Resolution: Perform the following steps to resolve the issue:
    1. Check the backup logs to find the failure reason. Investigate the backup failure using one of the following methods:
      • Inspect the logs of the recently triggered backup job directly inside the cdp-drs namespace using your cluster management tool. Identify any error entries that indicate a failure to back up system entities.
      • Retrieve the logs using the CDP CLI.
        1. Run the cdp dw list-backups command to list all recent environment backups.
        2. Find your target Virtual Warehouse in the output and copy the Cloud Resource Name (CRN) of the latest backup named Backup Hive before upgrade or Backup Impala before upgrade.
        3. Run the cdp dw get-logs --crn <backup-crn> command using your copied CRN to pull the detailed failure logs.
    2. Resolve the underlying backup issues identified in the logs and trigger the Virtual Warehouse upgrade again.
    3. If those issues cannot be resolved, disable the Backup Virtual Warehouse namespaces before an upgrade option, and trigger the upgrade. The Rollback button is not available if the upgrade fails in this case.