Security groups

Security groups determine the inbound and outbound traffic to and from your CDP environment. That is, you should use security group settings to allow users from your organization access to CDP resources.

You have two options:

  • Use your existing security groups (recommended for production)
  • Have CDP create new security groups

You should verify the security group limits in your AWS account to ensure that you can create security groups for CDP.

Existing security groups

If you would like to create your own security groups, two security groups must be created: the first security group will be used for all gateway nodes and the second security group will be used for all other nodes. The gateway nodes accept incoming requests for the cluster services and so require an additional port. These security groups will be applied when creating a data lake and FreeIPA during environment creation and when you create Data Hub clusters.

Review the following guidelines prior to adding security groups rules. This describes all the inbound ports that need to be open and provides guidelines for what to enter as a source range:

Gateway security group

This is used for gateway nodes:
Protocol Port Range Source Description
TCP 22 Your CIDR This is an optional port for end user SSH access to cluster hosts. You should open it to your organization’s CIDR.
TCP 443 Your CIDR and CDP CIDR This port is used to access the Data Lake and Data Hub cluster UIs via Knox gateway. You must open this port to your organization’s CIDR in order to access cluster UIs.

When CCM is enabled, you only need to set this to your CIDR.

TCP 9443 CDP CIDR This port is used by CDP to maintain management control of clusters and data lakes.

By default, when CDP creates the security groups automatically, it opens this port to the correct IP.

This port is not needed when CCM is enabled.

TCP, UDP 0-65535 Your internal VPC CIDR (for example 10.10.0.0/16). This is required for internal communication within the VPC.
ICMP N/A Your internal VPC CIDR (for example 10.10.0.0/16). This is required for internal communication within the VPC.

Example rules provided in the VPC console on AWS:

Default security group

This is used for all nodes except Knox gateway nodes:
Protocol Port Range Source Description
TCP 22 Your CIDR This is an optional port for end user SSH access to the hosts. You should open it to your organization’s CIDR.
TCP 443 Your CIDR This port is only required if you are planning to spin up Machine Learning workspaces since HTTPS access to ML workspaces is available over port 443. If you are not planning to use the Machine Learning service, you do not need to open this port.
TCP 9443 CDP CIDR This port is used by CDP to maintain management control of clusters and data lakes.

By default, when CDP creates the security groups automatically, it opens this port to the correct IP.

This port is not needed when CCM is enabled.

TCP, UDP 0-65535 Your VPC CIDR (for example 10.10.0.0/16). This is required for internal communication within the VPC. TCP port 5432 is used by the Data Lake for communication with its attached database.
ICMP N/A Your internal VPC CIDR (for example 10.10.0.0/16). This is required for internal communication within the VPC.

Example rules provided in the VPC console on AWS:

On AWS, you can create security groups and edit their rules from the VPC console > Security Groups.

To create a security group, click on Create security group and provide the following:

You need to create two security groups: Knox and Default (You will see this terminology in the Management Console UI and CLI, so if you decide to choose different names, make sure that you are able to distinguish between the two security groups).

Use the guidelines and examples provided above when editing rules.

To edit security group rules, select the security group and click on Inbound Rules > Edit rules:

New security groups

If you would like CDP to create the security groups for you, you need to provide a CIDR range for inbound traffic to EC2 instances from your organization. CDP creates multiple security groups: one for each Data Lake host group, one for each FreeIPA host group, and one per host group when DataFlow, Data Hub, Data Warehouse, and Machine Learning clusters are created. On these security groups, CDP opens ports as described in Default security group settings documentation.