Cluster planning checklist

When creating a production cluster, refer to this checklist in order to ensure that you’ve performed all prerequisite steps required for cluster deployment.

The following table includes a list of available cluster options and prerequisites required for using each option. Depending on where you are at, you can use this checklist as a starting point for planning your cluster and understanding cloud provider requirements related to specific options.

Option Description
General Settings
Tags You can define tags that will be applied to your cluster-related resources (such as VMs) on your cloud provider account.

If you would like to use tags, make sure to prepare a list of tags that meet your organization’s requirements as well as the cloud provider’s requirements. For more information, refer to Tags.

Image catalog
Image catalog

When creating a DataHub cluster, you can use default base or prewarmed images. Using custom images is currently not supported.

Network and availability
Subnet Your cluster is automatically provisioned within the network selected during environment creation. There are two possibilities as far as subnets are concerned:
  • If your network includes a single subnet, your cluster is automatically provisioned into that subnet
  • During cluster creation you can select the subnet in which your cluster should be provisioned.
Hardware and compute
EC2 instances Prior to creating a cluster, determine which instance type you would like to use for each host group. Data Hub supports all instance types that have more than 16 GB RAM. If using cluster definitions, default instance types are suggested. Similarly, for custom deployment, there are defaults provided.
Storage options Prior to creating a cluster, determine the storage type, a number of storage volumes attached per instance, and the volume size that you would like to use for each host group.

Supported storage types vary by region, including:

  • Ephemeral
  • Magnetic
  • General purpose
  • SSD
For more information about these options refer to Amazon EC2 Instance Store in AWS documentation.
EBS encryption You can optionally configure encryption for Amazon Elastic Block Store (EBS) volumes used by the cluster's VM instances to store data.

If you would like to use EBS encryption, you must have an existing encryption key located in the region where you would like to create clusters and have appropriate IAM permissions assigned to the encryption key. For more information, refer to EBS Encryption on AWS.

Cluster extensions
Recipes You can optionally create and run scripts (called "recipes") that perform specific tasks on all nodes of a given host group. If you would like to use recipes, you should prepare them prior to creating a cluster. For more information, refer to Recipes.
Custom properties When creating a cluster, it is possible to include a list of custom properties and set them to specific values. If you would like to set custom properties, prior to cluster creation you should prepare a JSON file listing these properties. For more information, refer to Custom Properties.