Troubleshooting Data Lake restore operations

Possible issues with Data Lake restore and suggested resolutions.

Principal services running during restore

The most likely errors in restoring data from backup is that a service is in a state that is incompatible with the restore. Principal services (see "Backup and Restore for the Data Lake">"Principal services") must be stopped before running the restore. Dependent services (see "Backup and Restore for the Data Lake">"Dependent services") must be running to allow the restore to recreate their data. The restore checks the status of the principal services; however, if one of the dependent services is stopped and cannot be accessed to perform the restore operation, the restore operation will fail.

"failureReason": "[Datalake database restore failed.]"

If the principal services (see "Backup and Restore for the Data Lake">"Principal Services") are running on the datalake during a restore operation, restore will fail with the following error message:
{
    "accountId": "9d74eee4-1cad-45d7-b645-7ccf9edbb73d",
    "restoreId": "7c5c92c7-e3d3-408c-b18f-03bcfe0c9369",
    "backupId": "003b9882-e2fa-4fcc-ae8f-528de176c668",
    "userCrn": "crn:altus:iam:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:user:8d1e890c-8a2e-4cf9-96bd-d50478bc027e",
    "internalState": "{ATLAS_ENTITY_AUDIT_EVENTS_TABLE=SUCCESSFUL, EDGE_INDEX_COLLECTION=SUCCESSFUL, DATABASE=FAILED, FULLTEXT_INDEX_COLLECTION=SUCCESSFUL, EDGE_INDEX_COLLECTION_DELETE=SUCCESSFUL, VERTEX_INDEX_COLLECITON_DELETE=SUCCESSFUL, RANGER_AUDITS_COLLECTION_DELETE=SUCCESSFUL, RANGER_AUDITS_COLLECTION=SUCCESSFUL, ATLAS_JANUS_TABLE=SUCCESSFUL, VERTEX_INDEX_COLLECITON=SUCCESSFUL, FULLTEXT_INDEX_COLLECTION_DELETE=SUCCESSFUL}",
    "status": "FAILED",
    "startTime": "2020-08-28 18:27:54.11",
    "endTime": "2020-08-28 18:29:55.507",
    "backupLocation": "s3a://eng-sdx-daily-datalake/biglauer-br-1/backup_01/",
    "failureReason": "[Datalake database restore failed.]"
}

To correct this scenario, stop the principal services (see "Backup and Restore for the Data Lake">"Principal services") and re-run the restore-datalake operation.

Failed restore renders Data Lake inoperable

If the restore operation fails, the Data Lake will be rendered inoperable. A restore-datalake operation must be re-run and complete successfully for the Data Lake to re-gain functionality