Onboarding CDP users and groups for cloud storage

The minimal setup defined earlier spins up a CDP environment and Data Lake with no end user access to cloud storage. Adding users and groups to a CDP environment involves ensuring that they are properly mapped to managed identities to access cloud storage.

In general, to have new users or groups onboarded, you need to have the following pre-created in Azure:

  • First, you need to create two more containers within the storage account (my-datalake) created earlier, one for data engineers (for example, data-eng) and one for data scientists (for example, data-science).
  • Next, you need to create two more managed identities, one for data engineers (for example, data-eng-mi) and one for data scientists (for example, data-eng-mi) and assign the Storage Blob Data Owner role on the scope of one these two nearly created containers. The data-eng-mi identity will need the Storage Blob Data Owner role on the scope of the data-eng container and the data-science-mi identity will need the Storage Blob Data Owner role on the scope of the data-science container.
  • Finally, you also need to grant the Data Lake Admin identity created earlier the Storage Blob Data Owner role on the scope of these two newly created containers.
The final goal is to have the following that builds on the minimal setup presented earlier: