How to use the CDP Private Cloud Data Services sizing spreadsheet

You can use the sizing spreadsheet to model the hardware requirements for a CDP Private Cloud Data Services deployment.

Overview

The CDP Private Cloud Data Services Sizing spreadsheet is a spreadsheet that you can use to model the quantity and specifications for worker hosts required in a CDP Private Cloud Data Services deployment.

This spreadsheet is intended to use information about workloads you are planning to run and hardware specifications for worker nodes to arrive at an approximate number of worker nodes required for your deployment. Due to the complexity of estimating workloads, Cloudera recommends you review any sizing or purchasing decisions with Cloudera Professional Services before committing to those decisions.

How to access the spreadsheet

You can access the spreadsheet here: CDP Private Cloud Data Services Sizing. The file is in Microsoft Excel format. You can open the file in Excel, or upload it to Google Sheets.

There are three tabs in the spreadsheet. You will make your inputs only on the Worker Node Totals tab. Do not modify the following tabs (these tabs contain data used to calculate values in the spreadsheet and should not be modified):

  • Component Lookup

  • K8s Resources

Workload inputs

The spreadsheet calculates the total amount vcores, RAM, and storage required based on information you enter about the combined workloads you intend to deploy. Then based on the hardware specifications entered, calculates the number of worker nodes required, which is displayed in cell E25.

The following sections describe values you must enter into the spreadsheet. Values are required for each Data Service you intend to deploy, and values to enter for the hardware specifications for your worker nodes.

Control plane and monitoring

Label Cell Description
PvC Control Plane B3 1 required
– Monitoring B4 Increment this number by one for each environment.

Cloudera Data Warehouse (CDW)

If you will deploy CDW, on the Worker Node Totals tab, enter the following information:

Label Cell Description
CDW Data Catalog (min 1 per env) B6 Enter the number of Data Catalogs you will need in your deployment. You must have at least one Data Catalog.
CDW LLAP warehouses B7 Enter the number of LLAP warehouses you will need for each Virtual Warehouse in your deployment.
-- LLAP Executors B8 Enter the total number of LLAP Executors you will need in your deployment.
CDW Impala warehouses B9 Enter the number of CDW Impala warehouses for each Virtual Warehouse you will need in your deployment.
-- Impala Coordinators (2 x for HA) B10 Enter the number of Impala Warehouses you will need in your deployment. If you have enabled high availability, enter twice the number of Warehouses.
-- Impala Executors

B11

Enter the number of Impala Executors you will need in your deployment.
-- CDW Data Cache B12 Enter the amount of CDW Cache space for each coordinator and executor (Default 600)
Data Viz - small B13 Enter the size selected when creating a Data Visualization instance.
Data Viz -medium B14
Data Viz -large B15
For more information about sizing Cloudera Data Warehouse deployments, see:

Cloudera Machine Learning (CML)

Sizing for a CML deployment depends on the number of concurrent jobs you expect to run and the number of Workspaces you provision.

Label Cell Description
CML Workspace (min of 1 ) B17 Enter the number of workspaces you need in your deployment.
-- CML Small session B18 Enter the number of concurrent small-sized sessions you intend to run.
-- CML Average session B19 Enter the number of concurrent average-sized sessions you intend to run.

For more information about sizing the Cloudera Data Engineering service, see the following topics:

Cloudera Data Engineering (CDE)

Label Cell Description
CDE Service (min/max 1 per cluster) B21 Enter the number of CDE clusters you will need in your deployment.
CDE Virtual Cluster B22 Enter the number of CDE Virtual Clusters you will need in your deployment.
-- CDE Small jobs B23 Enter the number of concurrent small-sized jobs you intend to run.
-- CDE Avg Jobs B24 Enter the number of concurrent average-sized jobs you intend to run.

For more information about sizing the Cloudera Data Engineering service, see Additional resource requirements for Cloudera Data Engineering.

Worker node hardware specifications

Based on the inputs you supplied for your workloads, the spreadsheet totals the number of vcores, RAM, and storage required for the cluster in cells C20-C26. Then, based on the worker node hardware specifications you enter in cells B26-B29, divides the totals for vcores, RAM and storage by each of the worker node specifications to arrive at the required number of nodes for vcores, RAM and storage shown in cells D5-D29. The final number, in cell E27 chooses the higher value of these cells.

You may notice that the calculated values in cells D26 and D27 are different. This indicates that some nodes are oversubscribed for RAM or vcores. Adjust the hardware specifications for CPU and RAM until the two cells are closer together in value. Changing these values may also change the calculated number of worker nodes.

Label Cell Description
CPU recommend 32+ cores (64vcores) B28 Enter the number of vcores for each worker node.
RAM (GB) recommend 384GB RAM B29 Enter the amount of RAM, in gigabytes, for each worker node.
Disk (GB) Block (OCP CSI block, ECS Longhorn) B30

Enter the number of gigabytes Block required for:

- OpenShift Container Platform: CSI block

- Embedded Container Service: ECS Longhorn

Disk (GB) Fast Cache for CDW (nvme,ssd) B31 Enter the number of gigabytes of Fast Cache used in Cloudera Data Warehouse.
NFS (GB) (choose 1 from below) B33 Enter required storage in either cell B30 or cell B31:
-- Embedded nfs - (subtract from Block provider) non-prod B33 Enter the number of gigabytes storage for an embedded NFS.
-- External nfs B35 Enter the number of gigabytes of storage for an External NFS.
ECS Master Node requires 1 for non HA - 3 for HA

If you are using the Embedded Container Service, you will also need to provision a host for the ECS Master Node (a node running the ECS Server component).

The values described here contain Cloudera’s recommendations for specifications for the ECS Master node.

B38

Minimum: 8 vcores

Recommended: 16 vcores

B39

Minimum 16 GB RAM

Recommended: 32 GB RAM

B40

Minimum: 300 GB HDD (This amount is adequate for proof-of-concept cluster.)

Recommended: 1 TB HDD