Security groups

During the specification of a VPC to CDP, the CDP Admin can also specify the security groups. These are associated with all the workloads launched within that VPC.

Security groups for Data Lakes and Data Hubs

During the specification of a VPC to CDP, the CDP Admin can also specify the security groups. These are associated with all the workloads launched within that VPC. The security groups can be specified in two ways:

The CDP Admin can let CDP create security groups, taking a list of IP Address CIDRs as input.

These will be used in allowing the incoming traffic to the hosts. The list of CIDR ranges should correspond to the address ranges from which the CDP experience workloads will be accessed. In a VPN peered VPC, this would also include address ranges from customer’s on-prem network. This model is useful for initial testing given the ease of set up.

Alternatively, the CDP Admin can create security groups on their own and select them during the setup of the VPC and other network configuration. This model is better for production workloads, as it allows for greater control in the hands of the CDP Admin. However, note that the CDP Admin MUST ensure that the rules are matching this specification.

For a fully private network, security groups should be configured according to the types of access requirements needed by the different services in the workloads.

  • Services accessed only within the VPC must be configured with the following inbound rules:
    • All TCP / UDP / ICMP access is allowed for the CIDRs corresponding to the VPC.
    • Conversely, there is no need to provide any access to these services for any IP CIDRs outside the VPC.
  • Endpoint services are the services that can be accessed outside the VPC through the gateway, chiefly by data consumers or CDP Admins. For example, UIs like Hue, Atlas, Ranger, Cloudera Manager all need to be accessed by data consumers or other administrators. For enabling this, the following in-bound rules are set up:
    • All TCP / UDP / ICMP access is allowed for the CIDRs corresponding to the VPC.
    • All TCP ports that correspond to services like Kafka, HBase, and so on that need to be accessed outside the VPC are to be allowed for the list of CIDR ranges specified at the time of creating the environment/experience, or in the security group created by the CDP admin. Alternatively, all TCP / UDP / ICMP access may be allowed for these CIDR ranges.
    • SSH access is allowed for the CIDR ranges specified at the time of creating the environment.
    • HTTPS access is allowed for the CIDR ranges specified at the time of creating the environment.
  • Note that for a fully private network, even specifying an open access here (such as 0.0.0.0/0) is restrictive because these services are deployed in a private subnet without a public IP address and hence do not have a route to the Internet gateway. However, the list of CIDR ranges may be useful to restrict which private subnets of the customer’s on-prem network can access the services. Rules for EKS based workloads are described in the following section.

Additional rules for EKS-based workloads

At the time of enabling a CDP experience, the CDP Admin can specify a list of CIDR ranges that will be used in allowing the incoming traffic to the workload Elastic Load Balancer(ELB). This list of CIDR ranges should correspond to the address ranges from which the CDP experience workloads will be accessed. In a VPN peered VPC, this would also include address ranges from customer’s on-prem network. In a Fully Private setup, 0.0.0.0/0 implies access only within the VPC and peered VPN network which is still restrictive.

Since public endpoint is enabled by default for all EKS cluster Control Plane at the moment, it is highly recommended to provide a list of outbound public CIDR ranges at the time of provisioning an experience to restrict access to the EKS clusters. By default, the public endpoint is always allowed to connect to the CDP public CIDR range. The following screenshot is an example configuration section for an experience:



Given the context, restricting access to Kubernetes API server and workloads is detailed in Restricting access for CDP services that create their own security groups on AWS by each service.

Within the EKS cluster, there are several security groups defined to facilitate EKS Control Plane-Pod communication, Inter-pod & Inter-worker node communication as well as workload communication through ELBs. These groups are in accordance with AWS documentation (see Amazon EKS security group considerations.)

Outbound connectivity requirements

Outbound traffic from the worker nodes will remain unrestricted and currently this traffic is targeted at other AWS services and CDP services. The comprehensive list of services that get accessed from CDP Environment can be found in AWS documentation (see Amazon EKS security group considerations.)