Cluster planning checklist
When creating a production cluster, refer to this checklist in order to ensure that you’ve performed all prerequisite steps required for cluster deployment.
The following table includes a list of available cluster options and prerequisites required for using each option. Depending on where you are at, you can use this checklist as a starting point for planning your cluster and understanding cloud provider requirements related to specific options.
Option | Description |
---|---|
General Settings | |
Tags | You can define tags that will be applied to your cluster-related resources
(such as VMs) on your cloud provider account. If you would like to use tags, make sure to prepare a list of tags that meet your organization’s requirements as well as the cloud provider’s requirements. For more information, refer to Tags. |
Image catalog | |
Image catalog |
When creating a DataHub cluster, you can use default prewarmed images, or you can customize a default image. If you would like to use custom images, you should prepare the images and an image catalog, and then register the image catalog prior to cluster creation. |
Network and availability | |
Subnet | Your cluster is automatically provisioned within the network selected during
environment creation. There are two possibilities as far as subnets are
concerned:
|
Hardware and compute | |
EC2 instances | Prior to creating a cluster, determine which instance type you would like to use for each host group. Data Hub supports all instance types that have more than 16 GB RAM. If using cluster definitions, default instance types are suggested. Similarly, for custom deployment, there are defaults provided. |
Storage options | Prior to creating a cluster, determine the storage type, a number of storage
volumes attached per instance, and the volume size that you would like to use for
each host group. Supported storage types vary by region, including:
Note that attaching custom disks is not supported. |
Encryption | AWS By default, Data Hubs use the same default key from Amazon’s KMS or CMK as the parent environment but you have an option to pass a different CMK during Data Hub creation.For more information about configuring encryption on an environment level, refer to Adding a customer managed encryption key to a CDP environment running on AWS. For more information about configuring a CMK for a specific Data Hub. refer to EBS Encryption on AWS. Azure By default, local Data Hub disks attached to Azure VMs and the PostgreSQL server instance used by the Data Lake and Data Hubs are encrypted with server-side encryption (SSE) using Platform Managed Keys (PMK), but you can optionally configure SSE with Customer Managed Keys (CMK). This can only be configured on an environment level. For more information, refer to Adding a customer managed encryption key for Azure. |
Cluster extensions | |
Recipes | You can optionally create and run scripts (called "recipes") that perform specific tasks on all nodes of a given host group. If you would like to use recipes, you should prepare them prior to creating a cluster. For more information, refer to Recipes. |
Custom properties | When creating a cluster, it is possible to include a list of custom properties and set them to specific values. If you would like to set custom properties, prior to cluster creation you should prepare a JSON file listing these properties. For more information, refer to Custom Properties. |