CDP Private Cloud Data Services glossary
The CDP Private Cloud Data Services documentation uses terms related to enterprise data cloud and cloud computing.
- CDP CLI:
- A command-line interface to access and manage CDP services and resources.
- CDP Private Cloud Base:
- An on-premises version of Cloudera Data Platform. It combines the best of Cloudera Enterprise Data Hub (CDH) and Hortonworks Data Platform Enterprise (HDP) along with new features and enhancements across the stack.
- Cloudera Manager for Data Services:
- A tool to install, manage, monitor, and configure your clusters and services using the Cloudera Manager Admin Console web application or the Cloudera Manager API.
- Cloudera Runtime:
- The core open source software distribution within CDP Private Cloud Base. It includes approximately 50 open-source projects providing data management tools within CDP.
- Control Plane service:
- CDP service that includes services like Management Console, Replication Manager, Data Recovery Service, Service Discovery Service, and so on. These services interact with your environment on ECS or OCP to provision and manage compute infrastructure that you can use to manage the lifecycle of data stored on HDFS or Ozone.
- Data Catalog:
This data service allows the Data Stewards, Business analysts, and Data Administrators to curate different datasets by adding multiple assets in it.
Users can scan and profile information for a data lake to gather metadata about schema and identify sensitive information. The metadata is gathered by the profilers and stored in Cloudera Apache Atlas to be retrieved during search and discovery operations. Allows users to define and tag assets using custom rules to identify and classify assets for their business requirements. Discover metadata in Asset details page about a given asset and also tag and group various assets to create a dataset. Users can search and manage datasets based on various attributes.
- Data Engineering:
- This data service allows you to create, manage, and schedule Apache Spark jobs without the overhead of creating and maintaining Spark clusters. You can define virtual clusters with a range of CPU and memory resources, and the cluster scales up and down as needed to execute your Spark workloads, helping to control your cloud costs.
- Data Lake:
- Creates a protective ring of security and governance around your data. For a CDP Private Cloud deployment, the Data Lake services are hosted on the CDP Private Cloud Base cluster. In addition, the Data Lake services are shared between multiple workloads.
- Data Service:
- A defined subset of CDP functionality that enables a CDP user to solve a specific problem related to their data lake (process, analyze, predict, and so on). Example services: Data Engineering, Data Warehouse, Machine Learning.
- Data Warehouse:
- This data service enables an enterprise to provision a new data warehouse and share a subset of the data with a specific team or department. A Data Warehouse cluster can be created from the Management Console and accessed by end users (data analysts).
- A logical entity that represents the association of your Private Cloud user account with compute resources using which you can provision and manage workloads such as Data Warehouse and Machine Learning.
- Machine Learning:
- This data service enables teams of data scientists to develop, test, train, and ultimately deploy machine learning models for building predictive applications all on the data under management within the enterprise data cloud. A Machine Learning workspace can be created from the Management Console and accessed by end users (data scientists).
- Management Console:
- The user interface for administering CDP Private Cloud Data Services. As a CDP administrator, you can use Management Console to manage environments, data lakes, environment resources, and users across all Private Cloud Data Services.
- Replication Manager:
- This data service allows you to copy and migrate HDFS data, Hive external tables, and Ozone data between CDP Private Cloud Base 7.1.8 or higher clusters using Cloudera Manager version 7.7.3 or higher.