Cloudbreak

Introduction

Welcome to the Cloudbreak 2.5 Technical Preview documentation!

Cloudbreak simplifies the provisioning, management, and monitoring of on-demand HDP and HDF clusters in virtual and cloud environments. It leverages cloud infrastructure to create host instances, and uses Apache Ambari via Ambari blueprints to provision and manage HDP clusters.

Cloudbreak allows you to create clusters using the Cloudbreak web UI, Cloudbreak CLI, and Cloudbreak REST API. Clusters can be launched on public cloud infrastructure platforms Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform (GCP), and on the private cloud infrastructure platform OpenStack.

Primary Use Cases

Cloudbreak allows you to create, manage, and monitor your HDP and HDF clusters on your chosen cloud platform:

Dynamically deploy, configure, and manage clusters on public and private clouds (AWS, Azure, Google Cloud, OpenStack).
Use automated scaling to seamlessly manage elasticity requirements as cluster workloads change.
Secure your cluster by enabling Kerberos.

Default Cluster Configurations

Cloudbreak includes default cluster configurations (in the form of blueprints) and supports using your own custom cluster configurations (in the form of custom blueprints).

The following default cluster configurations are available:

Platform Version: HDP 2.6

Cluster Type	Main Services	Description	List of All Services Included
Data Science	Spark 2, Zeppelin	Useful for data science with Spark 2 and Zeppelin.	HDFS, YARN, MapReduce2, Tez, Hive, Pig, Sqoop, ZooKeeper, Ambari Metrics, Spark 2, Zeppelin
EDW - Analytics	Hive 2 LLAP, Zeppelin	Useful for EDW analytics using Hive LLAP.	HDFS, YARN, MapReduce2, Tez, Hive 2 LLAP, Druid, Pig, ZooKeeper, Ambari Metrics, Spark 2
EDW - ETL	Hive, Spark 2	Useful for ETL data processing with Hive and Spark 2.	HDFS, YARN, MapReduce2, Tez, Hive, Pig, ZooKeeper, Ambari Metrics, Spark 2

Platform Version: HDF 3.1

Cluster Type	Main Services	Description	List of All Services Included
Flow Management	NiFi	Useful for flow management with NiFi.	NiFi, ZooKeeper, Ambari Metrics

Core Concepts

Refer to Architecture and Core Concepts.

Get Started

To get started with Cloudbreak:

Select the cloud platform on which you would like to launch Cloudbreak.
Select the deployment option that you would like to use.
Launch Cloudbreak.

Select Cloud Platform

You can deploy and use Cloudbreak on the following cloud platforms:

Amazon Web Services (AWS)
Microsoft Azure
Google Cloud Platform (GCP)
OpenStack

Select Deployment Option

There are two basic deployment options:

Deployment option	When to use
Option 1: Instantiate Cloudbreak using one of the provided pre-built cloud images	This is the basic deployment option and the easiest to get started with. The cloud images include Cloudbreak deployer pre-installed on a CentOS VM.
Option 2: Install the Cloudbreak deployer on your own VM	This is an advanced deployment option. Select this option if you have custom VM requirements. The supported operating systems are RHEL, CentOS, and Oracle Linux 7 (64-bit).

Launch Cloudbreak

(Option 1) You can launch Cloudbreak from one of the pre-built images:

(Option 2) Or you can launch Cloudbreak on your own VM on one of these cloud platforms. This is an advanced deployment option that you should only use if you have custom VM requirements.

In general, the steps include meeting the prerequisites, launching Cloudbreak on a VM, and creating the Cloudbreak credential. After performing these steps, you can create a cluster based on one of the default blueprints or upload your own blueprint and then create a cluster.

Note

The Cloudbreak software runs in your cloud environment. You are responsible for cloud infrastructure related charges while running Cloudbreak and the clusters being managed by Cloudbreak.