Create a cluster with an external database
Through the CDP CLI, you can configure an external, durable database when you create a Data Hub cluster. An external database keeps a persistent state for cluster services like Hue, DAS and Zeppelin in case of instance failures. You also have the option to configure an external database as a custom property within a cluster definition.
A top-level custom property for the external managed database is available to be set within a cluster definition:
"externalDatabase": {
"availabilityType": "NONE" | "NON_HA" | "HA"
For more information on using custom cluster definitions to create clusters, see Cluster definitions.
A new flag, called --datahub-database
, is available in the CDP CLI
create-aws-cluster
command. The supported values are NONE
,
NON_HA
, and HA
. To create a Data Hub cluster with an
external managed database, use the following CLI options:
create-aws-cluster
[--cluster-name <value>]
[--cluster-definition-name <value>]
[--environment-name <value>]
[--cluster-template-name <value>]
[--instance-groups <value>]
[--subnet-id <value>]
[--image <value>]
[--tags <value>]
[--request-template <value>]
[--datahub-database <value>]
[--overall-spot-percentage <value>]
[--cli-input-json <value>]
[--generate-cli-skeleton]
- --cluster-name: String. The name of the cluster. This name must be unique, must have between 5 and 40 characters, and must contain only lowercase letters, numbers and hyphens. Names are case-sensitive.
- --cluster-definition-name: String. The name or CRN of the cluster definition to use for cluster creation.
- --environment-name: Name or CRN of the environment to use when creating the cluster. The environment must be an AWS environment.
- --cluster-template-name: String. Name or CRN of the cluster template to use for cluster creation.
- --instance-groups: Array. Instance group details.
- --subnet-id: String. The subnet ID.
- --image: Object. The details of the image used for cluster instances.
- --tags: Array. Tags to be added to Data Hub-related resources.
- --request-template: String. JSON template to use for cluster creation. This is different from the cluster template and would be removed in the future.
- --datahub-database: String. Database type for the Data Hub cluster. Currently supported values: NONE, NON_HA, HA
- --overall-spot-percentage: Integer. Percentage of spot instances in all the instance groups. This may override the already set spotPercentage values under the instanceGroup(s).
- --cli-input-json: String. Performs service operation based on the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton. If other arguments are provided on the command line, the CLI values will override the JSON-provided values.
- --generate-cli-skeleton: Prints a sample input JSON to standard output. Note the specified operation is not run if this argument is specified. The sample input can be used as an argument for --cli-input-json.
For information on shorthand and JSON syntax of these options, and the output of the command,
invoke the CDP CLI help
.