Getting started

Getting started steps in CDP Public Cloud depend on your use case.


Use case: Regardless of your use case, your first steps in CDP should involve synchronizing your identity provider in CDP so that your users can access to CDP and are authorized to access specific resources within CDP.

For more information, refer to Getting started as an admin.

Burst to the cloud

Use case: You have an on-premise CDH cluster and you would like to migrate a workload to their public cloud environment by replicating the data and creating a Data Hub cluster to host the workload.

  1. Register your on-premise cluster in CDP.
  2. Use Workload Manager to generate a workload, data movement, and compute capacity plan.
  3. Use Replication Manager to migrate your workload to S3.
  4. Create a Data Mart cluster or spin up a Data Warehouse instance.
  5. Use Impala to run queries on the migrated workload.

Born in the cloud

Use case: If you already have data in the cloud, you can provision Data Hub clusters to run your workloads.

  1. Create a Data Engineering cluster cluster.
  2. Ingest the data into managed Hive tables.
  3. Create a Data Mart cluster.
  4. Use Hue to submit queries.
  5. Use Tableau with the JDBC connector.
  6. Use Workload Manager to analyze Impala queries and Spark jobs for resource consumption and performance.