Security groups
Security groups determine the inbound and outbound traffic to and from your CDP environment. That is, you should use security group settings to allow users from your organization access to CDP resources.
You have two options:
- Use your existing security groups (recommended for production)
- Have CDP create new security groups
You should verify the security group limits in your AWS account to ensure that you can create security groups for CDP.
Existing security groups
If you would like to create your own security groups, two security groups must be created: the first security group will be used for all gateway nodes and the second security group will be used for all other nodes. The gateway nodes accept incoming requests for the cluster services and so require an additional port. These security groups will be applied when creating a data lake and FreeIPA during environment creation and when you create Data Hub clusters.
Review the following guidelines prior to adding security groups rules. This describes all the inbound ports that need to be open and provides guidelines for what to enter as a source range:
Gateway security group
Protocol | Port Range | Source | Description |
---|---|---|---|
TCP | 22 | Your CIDR | This is an optional port for end user SSH access to cluster hosts. You should open it to your organization’s CIDR. |
TCP | 443 | Your CIDR and CDP CIDR | This port is used to access the Data Lake and Data Hub cluster UIs
via Knox gateway. You must open this port to your organization’s CIDR in order to
access cluster UIs. When CCM is enabled, you only need to set this to your CIDR. |
TCP | 9443 | CDP CIDR | This port is used by CDP to maintain management control of clusters and data
lakes. By default, when CDP creates the security groups automatically, it opens this port to the correct IP. This port is not needed when CCM is enabled. |
TCP, UDP | 0-65535 | Your internal VPC CIDR (for example 10.10.0.0/16). | This is required for internal communication within the VPC. |
ICMP | N/A | Your internal VPC CIDR (for example 10.10.0.0/16). | This is required for internal communication within the VPC. |
Example rules provided in the VPC console on AWS:
Default security group
Protocol | Port Range | Source | Description |
---|---|---|---|
TCP | 22 | Your CIDR | This is an optional port for end user SSH access to the hosts. You should open it to your organization’s CIDR. |
TCP | 443 | Your CIDR | This port is only required if you are planning to spin up Machine Learning workspaces since HTTPS access to ML workspaces is available over port 443. If you are not planning to use the Machine Learning service, you do not need to open this port. |
TCP | 9443 | CDP CIDR | This port is used by CDP to maintain management control of clusters and data
lakes. By default, when CDP creates the security groups automatically, it opens this port to the correct IP. This port is not needed when CCM is enabled. |
TCP, UDP | 0-65535 | Your VPC CIDR (for example 10.10.0.0/16). | This is required for internal communication within the VPC. TCP port 5432 is used by the Data Lake for communication with its attached database. |
ICMP | N/A | Your internal VPC CIDR (for example 10.10.0.0/16). | This is required for internal communication within the VPC. |
Example rules provided in the VPC console on AWS:
To create a security group, click on Create security group and provide the following:
You need to create two security groups: Knox and Default (You will see this terminology in the Management Console UI and CLI, so if you decide to choose different names, make sure that you are able to distinguish between the two security groups).
Use the guidelines and examples provided above when editing rules.
To edit security group rules, select the security group and click on Inbound Rules > Edit rules:
New security groups
If you would like CDP to create the security groups for you, you need to provide a CIDR range for inbound traffic to EC2 instances from your organization. CDP creates multiple security groups: one for each Data Lake host group, one for each FreeIPA host group, and one per host group when DataFlow, Data Hub, Data Warehouse, and Machine Learning clusters are created. On these security groups, CDP opens ports as described in Default security group settings documentation.