Creating a Cluster on AWSPDF version

Create a cluster with an external database

Through the CDP CLI, you can configure an external, durable database when you create a Data Hub cluster. An external database keeps a persistent state for cluster services like Hue, DAS and Zeppelin in case of instance failures. You also have the option to configure an external database as a custom property within a cluster definition.

A top-level custom property for the external managed database is available to be set within a cluster definition:

"externalDatabase": {
  "availabilityType": "NONE" | "NON_HA" | "HA"

For more information on using custom cluster definitions to create clusters, see Cluster definitions.

A new flag, called --datahub-database, is available in the CDP CLI create-aws-cluster command. The supported values are NONE, NON_HA, and HA. To create a Data Hub cluster with an external managed database, use the following CLI options:

create-aws-cluster
          [--cluster-name <value>]
          [--cluster-definition-name <value>]
          [--environment-name <value>]
          [--cluster-template-name <value>]
          [--instance-groups <value>]
          [--subnet-id <value>]
          [--image <value>]
          [--tags <value>]
          [--request-template <value>]
          [--datahub-database <value>]
          [--overall-spot-percentage <value>]
          [--cli-input-json <value>]
          [--generate-cli-skeleton]
  • --cluster-name: String. The name of the cluster. This name must be unique, must have between 5 and 40 characters, and must contain only lowercase letters, numbers and hyphens. Names are case-sensitive.
  • --cluster-definition-name: String. The name or CRN of the cluster definition to use for cluster creation.
  • --environment-name: Name or CRN of the environment to use when creating the cluster. The environment must be an AWS environment.
  • --cluster-template-name: String. Name or CRN of the cluster template to use for cluster creation.
  • --instance-groups: Array. Instance group details.
  • --subnet-id: String. The subnet ID.
  • --image: Object. The details of the image used for cluster instances.
  • --tags: Array. Tags to be added to Data Hub-related resources.
  • --request-template: String. JSON template to use for cluster creation. This is different from the cluster template and would be removed in the future.
  • --datahub-database: String. Database type for the Data Hub cluster. Currently supported values: NONE, NON_HA, HA
  • --overall-spot-percentage: Integer. Percentage of spot instances in all the instance groups. This may override the already set spotPercentage values under the instanceGroup(s).
  • --cli-input-json: String. Performs service operation based on the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton. If other arguments are provided on the command line, the CLI values will override the JSON-provided values.
  • --generate-cli-skeleton: Prints a sample input JSON to standard output. Note the specified operation is not run if this argument is specified. The sample input can be used as an argument for --cli-input-json.

For information on shorthand and JSON syntax of these options, and the output of the command, invoke the CDP CLI help.

We want your opinion

How can we improve this page?

What kind of feedback do you have?