How to use the CDP Private Cloud Data Services sizing spreadsheet

You can use the sizing spreadsheet to model the hardware requirements for a CDP Private Cloud Data Services deployment.

Overview

The CDP Private Cloud Data Services Sizing spreadsheet is a spreadsheet that you can use to model the quantity and specifications for worker hosts required in a CDP Private Cloud Data Services deployment.

This spreadsheet is intended to use information about workloads you are planning to run and hardware specifications for worker nodes to arrive at an approximate number of worker nodes required for your deployment. Due to the complexity of estimating workloads, Cloudera recommends you review any sizing or purchasing decisions with Cloudera Professional Services before committing to those decisions.

How to access the spreadsheet

You can access the spreadsheet here: CDP Private Cloud Data Services Sizing. The file is in Microsoft Excel format. You can open the file in Excel, or upload it to Google Sheets.

There are three tabs in the spreadsheet. You will make your inputs only on the Worker Node Totals tab. Do not modify the following tabs (these tabs contain data used to calculate values in the spreadsheet and should not be modified):

  • Component Lookup

  • K8s Resources

Workload inputs

The spreadsheet calculates the total amount vcores, RAM, and storage required based on information you enter about the combined workloads you intend to deploy. Then based on the hardware specifications entered, calculates the number of worker nodes required, which is displayed in cell E24.

The following sections describe values you must enter into the spreadsheet. Values are required for each Data Service you intend to deploy, and values to enter for the hardware specifications for your worker nodes.

Control plane monitoring

Label Cell Description
CP Monitoring B3 Increment this number by one for each environment.

Cloudera Data Warehouse (CDW)

If you will deploy CDW, on the Worker Node Totals tab, enter the following information:

Label Cell Description
CDW Data Catalog (min 1 per env) B5 Enter the number of Data Catalogs you will need in your deployment. You must have at least one Data Catalog.
CDW LLAP warehouses B6 Enter the number of LLAP warehouses you will need for each Virtual Warehouse in your deployment.
-- LLAP Executors B7 Enter the total number of LLAP Executors you will need in your deployment.
CDW Impala warehouses B8 Enter the number of CDW Impala warehouses for each Virtual Warehouse you will need in your deployment.
-- Impala Coordinators (2 x for HA) B9 Enter the number of Impala Warehouses you will need in your deployment. If you have enabled high availability, enter twice the number of Warehouses.
-- Impala Executors B10 Enter the number of Impala Executors you will need in your deployment.
CDW Cache

B11

Enter the amount of CDW Cache space for each coordinator and executor (Default 600)
Data Viz - small instances B12 Enter the size selected when creating a Data Visualization instance.
Data Viz - medium instances B13
Data Viz - large instances B14
For more information about sizing Cloudera Data Warehouse deployments, see:

Cloudera Machine Learning (CML)

Sizing for a CML deployment depends on the number of concurrent jobs you expect to run and the number of Workspaces you provision.

Label Cell Description
CML Workspace (min of 1 ) B16 Enter the number of workspaces you need in your deployment.
-- CML Small concurrent sessions B17 Enter the number of concurrent small-sized sessions you intend to run.
-- CML Average concurrent sessions B18 Enter the number of concurrent average-sized sessions you intend to run.

For more information about sizing the Cloudera Data Engineering service, see the following topics:

Cloudera Data Engineering (CDE)

Label Cell Description
CDE Service (min/max 1 per cluster) B20 Enter the number of CDE clusters you will need in your deployment.
CDE Virtual Cluster B21 Enter the number of CDE Virtual Clusters you will need in your deployment.
-- CDE Small concurrent jobs B22 Enter the number of concurrent small-sized jobs you intend to run.
-- CDE Average concurrent jobs B23 Enter the number of concurrent average-sized jobs you intend to run.

For more information about sizing the Cloudera Data Engineering service, see Additional resource requirements for Cloudera Data Engineering.

Worker node hardware specifications

Based on the inputs you supplied for your workloads, the spreadsheet totals the number of vcores, RAM, and storage required for the cluster in cells C20-C26. Then, based on the worker node hardware specifications you enter in cells B26-B29, divides the totals for vcores, RAM and storage by each of the worker node specifications to arrive at the required number of nodes for vcores, RAM and storage shown in cells D5-D29. The final number, in cell E27 chooses the higher value of these cells.

You may notice that the calculated values in cells D26 and D27 are different. This indicates that some nodes are oversubscribed for RAM or vcores. Adjust the hardware specifications for CPU and RAM until the two cells are closer together in value. Changing these values may also change the calculated number of worker nodes.

Label Cell Description
CPU recommend 32+ cores (64vcores) B27 Enter the number of vcores for each worker node.
RAM (GB) recommend 384GB RAM B28 Enter the amount of RAM, in gigabytes, for each worker node.
Disk (GB) Block (OCP CSI block, ECS Longhorn) B29

Enter the number of gigabytes Block required for:

- OpenShift Container Platform: CSI block

- Embedded Container Service: ECS Longhorn

Disk (GB) Fast Cache for CDW (nvme,ssd) B30 Enter the number of gigabytes of Fast Cache used in Cloudera Data Warehouse.
CP Block Overhead per host (300 to 1024) B31 Enter the Control Plane block overhead
NFS (GB) (choose 1 from below) B33 Enter required storage in either cell B34 or cell B35
-- Embedded nfs - (subtract from Block provider) non-prod B34 Enter the number of gigabytes storage for an embedded NFS.
-- External nfs B35 Enter the number of gigabytes of storage for an External NFS.
ECS Master Node requires 1 for non HA - 3 for HA

If you are using the Embedded Container Service, you will also need to provision a host for the ECS Master Node (a node running the ECS Server component).

The values described here contain Cloudera’s recommendations for specifications for the ECS Master node.

B38

Minimum:

16 vcores

Recommended:

32 vcores

B39

Minimum:

32 GB RAM

Recommended:

64 GB RAM

B40

Minimum: 300 GB HDD (This amount is adequate for a proof-of-concept cluster.)

Recommended: 1 TB HDD