Deployment and high-level architecture
A Cloudera Data Services on premises deployment includes an Environment, a Data Lake, the Management Console, and Data Services (Data Warehouse, Machine Learning, Data Engineering). Other tools and utilities include Replication Manager, Data Recovery Service, CDP CLI, and monitoring using Grafana.
To deploy Cloudera Data Services on premises you need a Cloudera Base on premises cluster, along with container-based clusters that run the Data Services. You can either use a dedicated RedHat OpenShift container cluster or deploy an Cloudera Embedded Container Service container cluster.
The on premises deployment process involves configuring Cloudera Management Console, registering an environment by providing details of the Data Lake configured on the Base cluster, and then creating the workloads.
Platform Managers and Administrators can rapidly provision and deploy the data services through the Cloudera Management Console, and easily scale them up or down as required.
- SDX Data Lake cluster for security, metadata, and governance
- HDFS and Ozone for storage
- Powerful and open-source Cloudera Runtime services such as Ranger, Atlas, Hive Metastore (HMS), and so on
- Networking infrastructure that supports network traffic between storage and compute environments