Cloudera Data Hub

Release Notes

Cloudera Runtime documentation • CDF for Data Hub documentation • Cloudera Manager documentation

Data Hub is a service for launching and managing workload clusters. Data Hub provides a set of default cluster definitions and templates that allow you to quickly provision workload clusters for prescriptive use cases, such as Data Engineering or Flow Management. Through Data Hub and Cloudera Manager you can also access and manage the clusters that you create.

Data Hub
Data Hub Components
  • Cluster deployment: Currently, you can deploy Data Hub clusters on AWS, Azure, or GCP. Deploying a Data Hub cluster requires that you have a cloud provider account with either AWS, Azure or GCP; an environment registered in CDP; and a cloud credential. You can learn more about these specific requirements in the Management Console documentation.

  • Cluster configuration: You can easily create a Data Hub cluster from a default cluster configuration, or you can create your own custom cluster configurations. Data Hub clusters are powered by Cloudera Runtime and Cloudera DataFlow components, and the Runtime and DataFlow for Data Hub documentation contain more information about individual cluster services.

  • Cluster access and management: After you create a Data Hub cluster, you can access cluster web UIs and endpoints through the Data Hub page of the CDP UI. Through the UI you can also perform most cluster management tasks, such as resizing, stopping, or restarting a cluster, as well as stopping and restarting cluster services. Certain cluster management, configuration, and monitoring functions are performed through the Cloudera Manager Admin Console, which is accessible from Data Hub.

  • Running workload clusters: Once your cluster is up and running and you can access it, refer to Cloudera Runtime or Cloudera DataFlow for Data Hub documentation for information about running workload clusters using services such as Hive, Spark, Impala, Hue, or Kafka.

  • API, CLI and SDK: The CDP Control Plane provides API, CLI and SDK tools that you can use to script and automate certain Data Hub cluster creation and management tasks.

When you’re ready to get started with Data Hub, read the in-depth overview.