Creating a Cluster on AWS
Use these steps to create a cluster.
Troubleshooting Cluster Creation
If you experience problems during cluster creation, refer to Troubleshooting Cluster Creation.
Steps
-
Log in to the Cloudbreak UI.
-
Click Create Cluster and the Create Cluster wizard is displayed.
By default, Basic view is displayed. To view advanced options, click Advanced. To learn about advanced options, refer to Advanced Options. -
On the General Configuration page, specify the following general parameters for your cluster:
Parameter Description Select Credential Choose a previously created credential. Cluster Name Enter a name for your cluster. The name must be between 5 and 40 characters, must start with a letter, and must only include lowercase letters, numbers, and hyphens. Region Select the AWS region in which you would like to launch your cluster. For information on available AWS regions, refer to AWS documentation. Platform Version Choose the HDP version to use for this cluster. Cluster Type Choose one of default cluster configurations, or, if you have defined your own cluster configuration via Ambari blueprint, you can choose it here. For more information on default and custom blueprints, refer to Blueprints. Flex Subscription This option will appear if you have configured your deployment for a Flex subscription. -
On the Hardware and Storage page, for each host group provide the following information to define your cluster nodes and attached storage:
Parameter Description Instance Type Select an instance type. For information about instance types on AWS refer to AWS documentation. Instance Count Enter the number of instances of a given type. Default is 1. Ambari Server You must select one node for Ambari Server. The "Group Size" for that host group must be set to "1". -
On the Network page, provide the following to specify the networking resources that will be used for your cluster:
Parameter Description Select Network Select the virtual network in which you would like your cluster to be provisioned. You can select an existing network or create a new network. Select Subnet Select the subnet in which you would like your cluster to be provisioned. If you are using a new network, create a new subnet. If you are using an existing network, select an existing subnet. Subnet (CIDR) If you selected to create a new subnet, you must define a valid CIDR for the subnet. Default is 10.0.0.0/16. Cloudbreak uses public IP addresses when communicating with cluster nodes.
On AWS, you can configure it to use private IPs instead. For instructions, refer to Configure Communication via Private IPs on AWS. -
Define security groups for each host group. You can either create new security groups and define their rules or reuse existing security groups:
Existing security groups are only available for an existing VPC.
Option Description New Security Group (Default) Creates a new security group with the rules that you defined:
- A set of default rules is provided. You should review and adjust these default rules. If you do not make any modifications, default rules will be applied.
- You may open ports by defining the CIDR, entering port range, selecting protocol and clicking +.
- You may delete default or previously added rules using the delete icon.
- If you don't want to use security group, remove the default rules.
Existing Security Groups Allows you to select an existing security group that is already available in the selected provider region. This selection is disabled if no existing security groups are available in your chosen region. Important
By default, ports 22, 443, and 9443 are set to 0.0.0.0/0 CIDR for inbound access on the Ambari node security group. We strongly recommend that you limit this CIDR, considering the following restrictions:
- Ports 22 and 9443 must be open to Cloudbreak's CIDR. You can set CB_DEFAULT_GATEWAY_CIDR in your Cloudbreak's Profile file in order to automatically open ports 22 and 9443 to your Cloudbreak IP. Refer to Restricting Inbound Access to Clusters.
- Port 22 must be open to your CIDR if you would like to access the master node via SSH.
- Port 443 must be open to your CIDR if you would like to access Cloudbreak web UI in a browser.
Important
By default, port 22 is set to 0.0.0.0/0 CIDR for inbound access on non-Ambari node security groups. We strongly recommend that you remove it.
-
On the Security page, provide the following parameters:
Parameter Description Cluster User You can log in to the Ambari UI using this username. By default, this is set to admin
.Password You can log in to the Ambari UI using this password. Confirm Password Confirm the password. New SSH public key Check this option to specify a new public key and then enter the public key. You will use the matching private key to access your cluster nodes via SSH. Existing SSH public key Select an existing public key. You will use the matching private key to access your cluster nodes via SSH. This is a default option as long as an existing SSH public key is available. -
Click on Create Cluster to create a cluster.
-
You will be redirected to the Cloudbreak dashboard, and a new tile representing your cluster will appear at the top of the page.
Related Links
Blueprints
Default Cluster Security Groups
Amazon EC2 Instance Types (External)
AWS Regions and Endpoints (External)
CIDR (External)
Advanced Options
Click on Advanced to view and enter additional configuration options
Availability Zone
Choose one of the availability zones within the selected region.
Choose Image Catalog
By default, Choose Image Catalog is set to the default image catalog that is provided with Cloudbreak. If you would like to use a different image catalog, you must first create and register it. For complete instructions, refer to Custom Images.
Related Links
Custom Images
Prewarmed and Base Images
Cloudbreak supports the following types of images for launching clusters:
Image Type | Description | Default Images Provided | Support for Custom Images |
---|---|---|---|
Base Images | Base images include default configuration and default tooling. These images include the operating system but do not include Ambari or HDP software. | Yes | Yes |
Prewarmed Images | By default, Cloudbreak launches clusters from prewarmed images. Prewarmed images include the operating system as well as Ambari and HDP. The HDP and Ambari version used by prewarmed images cannot be customized. | Yes | No |
By default, Cloudbreak uses the included default prewarmed images, which include the operating system, as well as Ambari and HDP packages installed. You can optionally select the base image option if you would like to:
- Use an Ambari and HDP versions different than what the prewarmed image includes and/or
- Choose a previously created custom base image
Choose Image
If under Choose Image Catalog, you selected a custom image catalog, under Choose Image you can select an image from that catalog. For complete instructions, refer to Custom Images.
If you are trying to customize Ambari and HDP versions, you can ignore the Choose Image option; in this case default base image is used.
Ambari Repository Specification
If you would like to use a custom Ambari version, provide the following information:
Ambari 2.6.1
If you would like to use Ambari 2.6.1, use the version provided by default in the Cloudbreak web UI, or newer.
Parameter | Description | Example |
---|---|---|
Version | Enter Ambari version. | 2.6.1.3 |
Repo Url | Provide a URL to the Ambari version repo that you would like to use. | http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.6.1.3 |
Repo Gpg Key Url | Provide a URL to the repo GPG key. Each stable RPM package that is published by CentOS Project is signed with a GPG signature. By default, yum and the graphical update tools will verify these signatures and refuse to install any packages that are not signed, or have an incorrect signature. | http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins |
HDP Repository Specification
If you would like to use a custom HDP version, provide the following information:
Parameter | Description | Example |
---|---|---|
Stack | Stack name. | HDP |
Version | Stack version. | 2.6 |
OS | Operating system. | centos7 (Azure, GCP, OpenStack) or amazonlinux (AWS) |
Repository Version | Enter repository version. | 2.6.4.0-91 |
Version Definition File | Enter the URL of the VDF file. | http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.6.4.0/HDP-2.6.4.0-91.xml |
Enable Ambari Server to download and install GPL Licensed LZO packages? | (Optional, only available if using Ambari 2.6.1.0 or newer) Use this option to enable LZO compression in your HDP cluster. LZO is a lossless data compression library that favors speed over compression ratio. Ambari does not install nor enable LZO compression libraries by default, and must be explicitly configured to do so. For more information, refer to Enabling LZO. |
If you choose to use a base image with custom Ambari and/or HDP version, Cloudbreak validates the information entered. When Cloudbreak detects that the information entered is incorrect, it displays a warning marked with the sign. You should review all the warnings before proceeding and make sure that the information that you entered is correct. If you choose to proceed in spite of the warnings, check "Ignore repository warnings".
Related Links
Custom Images
Enable Lifetime Management
Check this option if you would like your cluster to be automatically terminated after a specific amount of time (defined as "Time to Live" in minutes).
Tags
You can optionally add tags, which will help you find your cluster-related resources, such as VMs, in your cloud provider account. refer to Resource Tagging.
Related Links
Resource Tagging
Storage
You can optionally specify the following storage options for your cluster:
Parameter | Description |
---|---|
Storage Type | Select the volume type. The options are:
|
Attached Volumes Per Instance | Enter the number of volumes attached per instance. Default is 1. |
Volume Size (GB) | Enter the size in GBs for each volume. Default is 100. |
Use Spot Instances
Check this option to use EC2 spot instances as your cluster nodes. Next, enter your bid price. The price that is pre-loaded in the form is the current on-demand price for your chosen EC2 instance type.
Note that:
- We recommend not using spot instances for any host group that includes Ambari server components.
- If you choose to use spot instances for a given host group when creating your cluster, any nodes that you add to that host group (during cluster creation or later) will be using spot instances. Any additional nodes will be requested at the same bid price that you entered when creating a cluster.
- If you decide not to use spot instances when creating your cluster, any nodes that you add to your host group (during cluster creation or later) will be using standard on-demand instances.
- Once someone outbids you, the spot instances are taken away, removing the nodes from the cluster.
- If spot instances are not available right away, creating a cluster will take longer than usual.
After creating a cluster, you can view your spot instance requests, including bid price, on the EC2 dashboard under INSTANCES > Spot Requests. For more information about spot instances, refer to AWS documentation.
File System
HDP uses HDFS as the default filesystem and it supports accessing the Amazon S3 object store through the S3A connector.
If you would like to access S3 through the S3A connector, you must configure access to S3 trough an instance profile. For instructions, refer to Configuring Access to S3.
Recipes
This option allows you to select previously uploaded recipes (scripts that can be run pre or post cluster deployment) for each host group. For more information on recipes, refer to Recipes.
Related Links
Recipes
Ambari Server Master Key
The Ambari Server Master Key is used to configure Ambari to encrypt database and Kerberos credentials that are retained by Ambari as part of the Ambari setup.
Enable Kerberos Security
Select this option to enable Kerberos for your cluster. For information about available Kerberos options, refer to Enabling Kerberos Security.
Related Links
Kerberos
Amazon EC2 Instance Store (External)
Amazon EC2 Spot Instances (External)