Default security group settings

Depending on what you chose during environment creation, CDP can create security groups for your environment automatically or you can provide your own security groups.

Environment security groups

Depending on what you chose during environment creation, CDP can create security groups for your environment automatically or you can provide your own security groups.

  • If you choose to use your own security groups, you are asked to create Knox and Default security groups as described in the Environment prerequisites: Security groups documentation.
  • If you choose for CDP to create all security groups required for an environment, the following security groups are created:

Data Lake: master

Naming convention:

${environment-name}-${random-id}-ClusterNodeSecurityGroupmaster-${random-id}

Protocol Port Range Source Description
TCP 22 Your CIDR This is an optional port for end user SSH access to cluster hosts. You should open it to your organization’s CIDR.
TCP 443 Your CIDR This port is used to access the Data Lake and Data Hub cluster UIs via Knox gateway. You must open this port to your You should open it to your organization’s CIDR in order to access cluster UIs.

This port is also required if you are planning to spin up Machine Learning workspaces since HTTPS access to ML workspaces is available over port 443. This port is not used by the Management Console or any other services, so if you are not planning to use the Machine Learning service, you do not need to open this port.

TCP 9443

CDP CIDR

This port is used by CDP to maintain management control of clusters and data lakes.
TCP, UDP 0-65535 Your VPC’s CIDR (for example 10.10.0.0/16) and your subnet’s CIDR (for example 10.0.2.0/24). This is required for internal communication within the VPC.

Data Lake: IDBroker

Naming convention:

${environment-name}-${random-id}-ClusterNodeSecurityGroupidbroker-${random-id}

Protocol Port Range Source Description
TCP 22 Your CIDR

This is an optional port for end user SSH access to cluster hosts.

TCP, UDP 0-65535 Your VPC’s CIDR (for example 10.10.0.0/16) and your subnet’s CIDR (for example 10.0.2.0/24). This is required for internal communication within the VPC.

FreeIPA

Naming convention:

${environment-name}-freeipa-${random-id}-ClusterNodeSecurityGroupmaster-${random-id}

Protocol Port Range Source Description
TCP 22 Your CIDR This is an optional port for end user SSH access to cluster hosts. You should open it to your organization’s CIDR.
TCP 9443 CDP CIDR This port is used by CDP to maintain management control of clusters and data lakes.
TCP, UDP 0-65535 Your VPC’s CIDR (for example 10.10.0.0/16) and your subnet’s CIDR (for example 10.0.2.0/24). This is required for internal communication within the VPC.

Database

Naming convention:

dsecg-dbsvr-${random-id}

Would be helpful if the SG name followed the same convention as others

Protocol Port Range Source Description
TCP 5432 Your VPC’s CIDR (for example 10.10.0.0/16) This port is used for communication between the Data Lake and its attached database.

Data Hub security groups

Depending on what you chose during environment creation, CDP can create security groups for your Data Hub clusters automatically or it can use your pre-created security groups:

  • If during environment creation, you provided your own security groups, CDP uses these security groups when deploying clusters.
  • If during environment creation you chose for CDP to create new security groups, new security groups are created for each Data Hub cluster as follows:

Data Hub: master

Naming convention:

${cluster-name}-${random-i}-ClusterNodeSecurityGroupmaster-${random-id}

Protocol Port Range Source Description
TCP 22 Your CIDR This is an optional port for end user SSH access to cluster hosts. You should open it to your organization’s CIDR.
TCP 443 Your CIDR This port is used to access the Data Lake and Data Hub cluster UIs via Knox gateway. You must open this port to your You should open it to your organization’s CIDR in order to access cluster UIs.
TCP 9443 CDP CIDR This port is used by CDP to maintain management control of clusters and data lakes.
TCP, UDP 0-65535 Your VPC’s CIDR (for example 10.10.0.0/16) and your subnet’s CIDR (for example 10.0.2.0/24). This is required for internal communication within the VPC.

Data Hub: worker

Naming convention: ${cluster-name}-${random-id}-ClusterNodeSecurityGroupworker-${random-id}

Protocol Port Range Source Description
TCP 22 Your CIDR This is an optional port for end user SSH access to cluster hosts.
TCP, UDP 0-65535 Your VPC’s CIDR (for example 10.10.0.0/16) and your subnet’s CIDR (for example 10.0.2.0/24). This is required for internal communication within the VPC.

Data Hub: compute

Naming convention:

${cluster-name}-${random-id}-ClusterNodeSecurityGroupcompute-${random-id}

Protocol Port Range Source Description
TCP 22 Your CIDR This is an optional port for end user SSH access to cluster hosts.
TCP, UDP 0-65535 Your VPC’s CIDR (for example 10.10.0.0/16) and your subnet’s CIDR (for example 10.0.2.0/24). This is required for internal communication within the VPC.

Data Warehouse security groups

CDP always creates new security groups when data warehouses are deployed.

Machine Learning security groups

CDP always creates new security groups when machine learning workspaces are deployed.