Cloudera Public Cloud glossary
Cloudera Public Cloud documentation uses terminology related to enterprise data cloud and cloud computing.
Cloudera Public Cloud - Cloudera Public Cloud is a cloud service platform that consists of a number of services. It enables administrators to deploy Cloudera service resources and allows end users to process and analyze data by using these resources.
CDP CLI - Provides a command-line interface to access and manage Cloudera services and resources.
Cloudera web console - The web interface for accessing and manage Cloudera services and resources.
Cloudera Observability (data service) - A Cloudera data service used by database and workload administrators to troubleshoot, analyze and optimize workloads in order to improve performance and/or cost.
Cloudera Runtime - The open source software distribution within Cloudera that is maintained, supported, versioned, and packaged by Cloudera. Cloudera Runtime combines the best of Cloudera and HDP. Cloudera Runtime 7.0.0 is the first version.
Cluster - Also known as compute cluster, workload cluster, or Cloudera Data Hub cluster. The cluster created by using the Cloudera Data Hub service for running workloads. A cluster makes it possible to run one or more Cloudera Runtime components on some number of VMs and is associated with exactly one data lake.
Cluster definition - A reusable cluster template in JSON format that can be used for creating multiple Cloudera Data Hub clusters with identical cloud provider settings. Cloudera Data Hub includes a few built-in cluster definitions and allows you to save your own cluster definitions. A cluster definition is not synonymous with a blueprint, which primarily defines Cloudera Runtime services.
Cluster Repair - A feature in Cloudera Management Console that enables you to select specific nodes within a node group for a repair operation. This feature reduces the downtime incurred when only a subset of the nodes are unhealthy.
Cluster template - A reusable cluster template in JSON format that can be used for creating multiple Cloudera Data Hub clusters with identical Cloudera Runtime settings. It primarily defines the list of Cloudera Runtime services included and how their components are distributed on different host groups. Cloudera Data Hub includes a few built-in blueprints and allows you to save your own blueprints. A blueprint is not synonymous with a cluster definition, which primarily defines cloud provider settings.
Control Plane - A Cloudera operated cloud service that includes services like Cloudera Management Console, Cloudera Observability, Cloudera Replication Manager and Cloudera Data Catalog. These services interact with your account in Amazon Web Services (AWS), Microsoft Azure, and Google Cloud to provision and manage compute infrastructure that you can use to manage the lifecycle of data stored in your cloud account. In addition, the Control Plane can interface with your on-premises and Private Cloud infrastructure to support hybrid cloud deployments.
Credential - Allows an administrator to configure access from Cloudera to a cloud provider account so that Cloudera can communicate with that account and provision resources within it. There is one credential per environment.
Cloudera Data Catalog (data service) - A Cloudera data service used by data stewards to browse, search, and tag the content of a data lake, create and manage authorization policies, identify what data a user has accessed, and access the lineage of a particular data set.
Cloudera DataFlow (data service) - A Cloudera data service that enables you to import and deploy your data flow definitions efficiently, securely, and at scale.
Data Lake - A single logical store of data that provides a mechanism for storing, accessing, organizing, securing, and managing that data.
Data Lake cluster - A special cluster type that implements the Cloudera Runtime services (such as HMS, Ranger, Atlas, and so on) necessary to implement a data lake that further provides connectivity to a particular cloud storage service such as S3 or ADLS.
Cloudera Data Hub (service) - A Cloudera service that administrators use to create and manage clusters powered by Cloudera Runtime.
Cloudera Data Warehouse (data service) - A Cloudera data service for creating and managing self-service data warehouses for teams of data analysts.
Data warehouse - The output of the Cloudera Data Warehouse service. Users access data warehouses via standard business intelligence tools such as JDBC or Tableau
Environment - A logical environment defined with a specific virtual network and region in a customer’s cloud provider account. One can refer to a "Cloudera environment" once a cloud provider virtual network, cloud storage, and other cloud provider artifacts present in a customer's AWS, Azure, or GCP account have been registered in Cloudera. Cloudera service components such as Cloudera Data Hub clusters, Cloudera Data Warehouse clusters, and so on, run in an environment.
Image catalog - Defines a set of images that can be used for provisioning Cloudera Data Hub cluster. Cloudera Data Hub includes a built-in image catalog with a set of built-in base and prewarmed images and allows you to register your own image catalog.
Cloudera AI (data service) - A Cloudera data service that administrators use to create and manage Cloudera AI Workbench and that allows data scientists to do their machine learning.
Cloudera AI Workbench- The output of the Cloudera AI Inference service. Each workbench corresponds to a single cluster that can be accessed by end users.
Cloudera Management Console (data service) - A Cloudera data service that allows an administrator to manage environments, users, and services; and download and configure the CLI.
Cloudera Operational Database (data service) - A Cloudera data service that administrators use to create and manage scale-out, autonomous database powered by Apache HBase and Apache Phoenix.
Recipe - A reusable script that can be used to perform a specific task on a specific resource.
Cloudera Replication Manager (data service) - A Cloudera data service used by administrators and data stewards to move, copy, backup, replicate, and restore data in or between data lakes.
Service - A defined subset of Cloudera functionality that enables a Cloudera user to solve a specific problem related to their data lake (process, analyze, predict, and so on). Example services: Cloudera Data Hub, Cloudera Data Warehouse, Cloudera AI.
Shared resources - A set of resources such as cloud credentials, recipes (custom scripts), and other that can be reused across multiple environments.