How to use the CDP Private Cloud Data Services sizing spreadsheet
You can use the sizing spreadsheet to model the hardware requirements for a CDP Private Cloud Data Services deployment.
Overview
The CDP Private Cloud Data Services Sizing spreadsheet is a spreadsheet that you can use to model the quantity and specifications for worker hosts required in a CDP Private Cloud Data Services deployment.
This spreadsheet is intended to use information about workloads you are planning to run and hardware specifications for worker nodes to arrive at an approximate number of worker nodes required for your deployment. Due to the complexity of estimating workloads, Cloudera recommends you review any sizing or purchasing decisions with Cloudera Professional Services before committing to those decisions.
How to access the spreadsheet
You can access the spreadsheet here: CDP Private Cloud Data Services Sizing. The file is in Microsoft Excel format. You can open the file in Excel, or upload it to Google Sheets.
There are three tabs in the spreadsheet. You will make your inputs only on the Worker Node Totals tab. Do not modify the following tabs (these tabs contain data used to calculate values in the spreadsheet and should not be modified):
-
Component Lookup
-
K8s Resources
Workload inputs
The spreadsheet calculates the total amount vcores, RAM, and storage required based on information you enter about the combined workloads you intend to deploy. Then based on the hardware specifications entered, calculates the number of worker nodes required, which is displayed in cell E25.
The following sections describe values you must enter into the spreadsheet. Values are required for each Data Service you intend to deploy, and values to enter for the hardware specifications for your worker nodes.
Control plane and monitoring
Label | Cell | Description |
---|---|---|
PvC Control Plane | B3 | 1 required |
– Monitoring | B4 | Increment this number by one for each environment. |
Cloudera Data Warehouse (CDW)
If you will deploy CDW, on the Worker Node Totals tab, enter the following information:
Label | Cell | Description |
---|---|---|
CDW Data Catalog (min 1 per env) | B6 | Enter the number of Data Catalogs you will need in your deployment. You must have at least one Data Catalog. |
CDW LLAP warehouses | B7 | Enter the number of LLAP warehouses you will need for each Virtual Warehouse in your deployment. |
-- LLAP Executors | B8 | Enter the total number of LLAP Executors you will need in your deployment. |
CDW Impala warehouses | B9 | Enter the number of CDW Impala warehouses for each Virtual Warehouse you will need in your deployment. |
-- Impala Coordinators (2 x for HA) | B10 | Enter the number of Impala Warehouses you will need in your deployment. If you have enabled high availability, enter twice the number of Warehouses. |
-- Impala Executors |
B11 |
Enter the number of Impala Executors you will need in your deployment. |
-- CDW Data Cache | B12 | Enter the amount of CDW Cache space for each coordinator and executor (Default 600) |
Data Viz - small | B13 | Enter the size selected when creating a Data Visualization instance. |
Data Viz -medium | B14 | |
Data Viz -large | B15 |
Cloudera Machine Learning (CML)
Sizing for a CML deployment depends on the number of concurrent jobs you expect to run and the number of Workspaces you provision.
Label | Cell | Description |
---|---|---|
CML Workspace (min of 1 ) | B17 | Enter the number of workspaces you need in your deployment. |
-- CML Small session | B18 | Enter the number of concurrent small-sized sessions you intend to run. |
-- CML Average session | B19 | Enter the number of concurrent average-sized sessions you intend to run. |
For more information about sizing the Cloudera Data Engineering service, see the following topics:
Cloudera Data Engineering (CDE)
Label | Cell | Description |
---|---|---|
CDE Service (min/max 1 per cluster) | B21 | Enter the number of CDE clusters you will need in your deployment. |
CDE Virtual Cluster | B22 | Enter the number of CDE Virtual Clusters you will need in your deployment. |
-- CDE Small jobs | B23 | Enter the number of concurrent small-sized jobs you intend to run. |
-- CDE Avg Jobs | B24 | Enter the number of concurrent average-sized jobs you intend to run. |
For more information about sizing the Cloudera Data Engineering service, see Additional resource requirements for Cloudera Data Engineering.
Worker node hardware specifications
Based on the inputs you supplied for your workloads, the spreadsheet totals the number of vcores, RAM, and storage required for the cluster in cells C20-C26. Then, based on the worker node hardware specifications you enter in cells B26-B29, divides the totals for vcores, RAM and storage by each of the worker node specifications to arrive at the required number of nodes for vcores, RAM and storage shown in cells D5-D29. The final number, in cell E27 chooses the higher value of these cells.
You may notice that the calculated values in cells D26 and D27 are different. This indicates that some nodes are oversubscribed for RAM or vcores. Adjust the hardware specifications for CPU and RAM until the two cells are closer together in value. Changing these values may also change the calculated number of worker nodes.
Label | Cell | Description |
---|---|---|
CPU recommend 32+ cores (64vcores) | B28 | Enter the number of vcores for each worker node. |
RAM (GB) recommend 384GB RAM | B29 | Enter the amount of RAM, in gigabytes, for each worker node. |
Disk (GB) Block (OCP CSI block, ECS Longhorn) | B30 |
Enter the number of gigabytes Block required for: - OpenShift Container Platform: CSI block - Embedded Container Service: ECS Longhorn |
Disk (GB) Fast Cache for CDW (nvme,ssd) | B31 | Enter the number of gigabytes of Fast Cache used in Cloudera Data Warehouse. |
NFS (GB) (choose 1 from below) | B33 | Enter required storage in either cell B30 or cell B31: |
-- Embedded nfs - (subtract from Block provider) non-prod | B33 | Enter the number of gigabytes storage for an embedded NFS. |
-- External nfs | B35 | Enter the number of gigabytes of storage for an External NFS. |
ECS Master Node requires 1 for non
HA - 3 for HA If you are using the Embedded Container Service, you will also need to provision a host for the ECS Master Node (a node running the ECS Server component). The values described here contain Cloudera’s recommendations for specifications for the ECS Master node. |
B38 |
Minimum: 8 vcores Recommended: 16 vcores |
B39 |
Minimum 16 GB RAM Recommended: 32 GB RAM |
|
B40 |
Minimum: 300 GB HDD (This amount is adequate for proof-of-concept cluster.) Recommended: 1 TB HDD |