Minimal setup for cloud storage

This minimal secure setup uses one ADLS Gen2 storage account with multiple containers in it, and multiple managed identities where each managed identity has at least one role assigned.

ADLS Gen2 storage account

You should create one ADLS Gen2 storage account with two containers within it (one for Storage Location Base and another for Logs Location Base). Additionally, you can specify a container for Backup Location Base to store Freeipa backup data separately from logs:

  • One ADLS Gen2 container is required to use as Storage Location Base such as abfs://storagefs@mydatalake.dfs.core.windows.net where mydatalake is your storage account name and storagefs is your container name. The Storage Location Base is used for storing workload data and Ranger audits.
  • One ADLS Gen2 container is required to use as Logs Location Base such as abfs://logsfs@mydatalake.dfs.core.windows.net where mydatalake is your storage account name and logsfs is your container name. The Logs Location Base is used for Data Lake, FreeIPA and Data Hub logs, FreeIPA backups.
  • (Optional) One optional ADLS Gen2 container to use as Backup Location Base such as abfs://backupfs@mydatalake.dfs.core.windows.net where mydatalake is your storage account name and backupfs is your container name. The Backup Location Base is used for FreeIPA backups. If a separate container is not provided, the backups are stored in the Logs Location Base.

Storage Location Base examples

abfs://storagefs@mydatalake.dfs.core.windows.net
Ranger Audit Logs abfs://storagefs@mydatalake.dfs.core.windows.net/ranger/audit
Logs Location Base examples
abfs://logsfs@mydatalake.dfs.core.windows.net
FreeIPA Logs abfs://logsfs@mydatalake.dfs.core.windows.net/cluster-logs/freeipa

If your environment was created prior to February 2021, this is abfs://logsfs@mydatalake.dfs.core.windows.net/freeipa

Backup Location Base examples

If you specify a separate container for FreeIPA backups, the backups are written to that container:

abfs://backupfs@mydatalake.dfs.core.windows.net
FreeIPA Backup

abfs://backupfs@mydatalake.dfs.core.windows.net/cluster-backups/freeipa

If the separate container is not provided, the FreeIPA backups are written to the Logs Location Base. In both cases, the same cluster-backups/freeipa directory structure is created within the container.

Managed identities

You should create four managed identities.

The IDBroker component of CDP uses user-assigned managed identities for controlling access to ADLS Gen2 and stores and manages the mappings between the services/users and the corresponding managed identities. The following managed identities must be created:

Managed identity Description Steps
Assumer identity During Data Lake cluster creation, CDP will attach this identity to the IDBroker VM. IDBroker will then use it to attach the other managed identities to the IDBroker VM. Once these identities are attached to the VM, IDBroker can acquire an access token for them (to eliminate the need to store credentials in the application). Create a managed identity and then 1) assign the Virtual Machine Contributor and Managed Identity Operator roles to this managed identity on the scope of the subscription, and 2) assign the Storage Blob Data Contributor role to this managed identity on the scope of the Logs Location Base container created for CDP.
Data Lake Admin identity This managed identity will be used for CDP services to access data. Create a managed identity and then assign the Storage Blob Data Owner role to this managed identity on the scope of the two containers (Storage Location Base and Logs Location Base) created for CDP.
Ranger Audit identity This managed identity will be used by Ranger to write audits. Create a managed identity and then assign the Storage Blob Data Contributor role to this managed identity on the scope of the Storage Location Base container created for CDP.
Logger identity This managed identity will be used by CDP to write telemetry logs. Create a managed identity and then assign Storage Blob Data Contributor role to this managed identity on the scope of the Logs Location Base and Backup Location Base (if created) created for CDP.

The following diagram illustrates the required setup:

The following documentation provides detailed steps on how to create this setup: