Troubleshooting Data Lake restore operations
Possible issues with Data Lake restore and suggested resolutions.
Principal services running during restore
The most likely errors in restoring data from backup is that a service is in a state that is incompatible with the restore. Principal services (see Principal services) must be stopped before running the restore. Dependent services (see Dependent services) must be running to allow the restore to recreate their data. The restore checks the status of the principal services; however, if one of the dependent services is stopped and cannot be accessed to perform the restore operation, the restore operation will fail.
"failureReason": "[Datalake database restore failed.]"
{
"accountId": "8y63idy3-2ygn-98h6-j630-7uie9renq93e",
"restoreId": "7c5c92c7-e3d3-408c-b18f-03bcfe0c9369",
"backupId": "003b9882-e2fa-4fcc-ae8f-528de176c668",
"userCrn": "crn:altus:iam:us-west-1:8y63idy3-2ygn-98h6-j630-7uie9renq93e:user:c87db52v-639m-613g-j94w-8hy944hn4i64",
"internalState": "{ATLAS_ENTITY_AUDIT_EVENTS_TABLE=SUCCESSFUL, EDGE_INDEX_COLLECTION=SUCCESSFUL, DATABASE=FAILED, FULLTEXT_INDEX_COLLECTION=SUCCESSFUL, EDGE_INDEX_COLLECTION_DELETE=SUCCESSFUL, VERTEX_INDEX_COLLECITON_DELETE=SUCCESSFUL, RANGER_AUDITS_COLLECTION_DELETE=SUCCESSFUL, RANGER_AUDITS_COLLECTION=SUCCESSFUL, ATLAS_JANUS_TABLE=SUCCESSFUL, VERTEX_INDEX_COLLECITON=SUCCESSFUL, FULLTEXT_INDEX_COLLECTION_DELETE=SUCCESSFUL}",
"status": "FAILED",
"startTime": "2020-08-28 18:27:54.11",
"endTime": "2020-08-28 18:29:55.507",
"backupLocation": "s3a://eng-sdx-daily-datalake/smith-br-1/backup_01/",
"failureReason": "[Datalake database restore failed.]"
}
To correct this scenario, stop the principal services and re-run the restore-datalake operation.
Failed restore renders Data Lake inoperable
If the restore operation fails, the Data Lake will be rendered inoperable. A restore-datalake operation must be re-run and complete successfully for the Data Lake to re-gain functionality