Create a cluster with an external database

Through the CDP CLI, you can configure an external, durable database when you create a Data Hub cluster. An external database keeps a persistent state for cluster services like Hue, DAS and Zeppelin in case of instance failures. You also have the option to configure an external database as a custom property within a cluster definition.

A top-level custom property for the external managed database is available to be set within a cluster definition:

"externalDatabase": {
  "availabilityType": "NONE" | "NON_HA" | "HA"

For more information on using custom cluster definitions to create clusters, see Cluster definitions.

A new flag, called --datahub-database, is available in the CDP CLI create-aws-cluster command. The supported values are NONE, NON_HA, and HA. To create a Data Hub cluster with an external managed database, use the following CLI options:

create-aws-cluster
          [--cluster-name <value>]
          [--cluster-definition-name <value>]
          [--environment-name <value>]
          [--cluster-template-name <value>]
          [--instance-groups <value>]
          [--subnet-id <value>]
          [--image <value>]
          [--tags <value>]
          [--request-template <value>]
          [--datahub-database <value>]
          [--overall-spot-percentage <value>]
          [--cli-input-json <value>]
          [--generate-cli-skeleton]
  • --cluster-name: String. The name of the cluster. This name must be unique, must have between 5 and 40 characters, and must contain only lowercase letters, numbers and hyphens. Names are case-sensitive.
  • --cluster-definition-name: String. The name or CRN of the cluster definition to use for cluster creation.
  • --environment-name: Name or CRN of the environment to use when creating the cluster. The environment must be an AWS environment.
  • --cluster-template-name: String. Name or CRN of the cluster template to use for cluster creation.
  • --instance-groups: Array. Instance group details.
  • --subnet-id: String. The subnet ID.
  • --image: Object. The details of the image used for cluster instances.
  • --tags: Array. Tags to be added to Data Hub-related resources.
  • --request-template: String. JSON template to use for cluster creation. This is different from the cluster template and would be removed in the future.
  • --datahub-database: String. Database type for the Data Hub cluster. Currently supported values: NONE, NON_HA, HA
  • --overall-spot-percentage: Integer. Percentage of spot instances in all the instance groups. This may override the already set spotPercentage values under the instanceGroup(s).
  • --cli-input-json: String. Performs service operation based on the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton. If other arguments are provided on the command line, the CLI values will override the JSON-provided values.
  • --generate-cli-skeleton: Prints a sample input JSON to standard output. Note the specified operation is not run if this argument is specified. The sample input can be used as an argument for --cli-input-json.

For information on shorthand and JSON syntax of these options, and the output of the command, invoke the CDP CLI help.