Public Endpoint Access Gateway for AWS

When using Cluster Connectivity Manager, you can enable Public Endpoint Access Gateway to avoid configuring complex network connectivity between users and internal cloud provider networks.

While you may choose to deploy CDP clusters using private networking to restrict cluster access from the internet, this additional security makes it difficult for users to access UIs and APIs without configuring complex network connectivity between users and internal cloud provider networks. The Public Endpoint Access Gateway provides secure connectivity to UIs and APIs in Data Lake and Data Hub clusters deployed using private networking, allowing users to access these resources without complex changes to their networking or creating direct connections to cloud provider networks.

You can enable the Public Endpoint Access Gateway when registering your AWS environment in CDP. When enabled, the gateway provides a secure connection between end users and the UIs and APIs inside their private network. The gateway interfaces the Knox service, which is automatically integrated with your identity provider configured in CDP, allowing you to authenticate using your SSO credentials without any additional configuration. All communication with the gateway is over TLS, so connections are secure. You can control the IP ranges from where connections to the gateway can be established by configuring your security groups.

The following diagram illustrates this setup:

Enabling Public Endpoint Access Gateway

You can enable Public Endpoint Access Gateway during AWS environment registration after enabling Cluster Connectivity Manager (CCM).

During environment registration via CDP web interface, you can optionally enable Public Endpoint Access Gateway. Once activated, the gateway will be used for the Data Lake and all the Data Hubs within the environment. There is no way to activate it on a per Data Lake or per Data Hub level. Once it is enabled for an environment, there is no way to deactivate it. The gateway can be used either with an existing VPC or with a new VPC created by CDP.

Prerequisites

  • If you choose to enable Public Endpoint Access Gateway, CDP will create two AWS network load balancers (AWS NLB) per cluster (that is for each Data Lake and Data Hub). Make sure that your AWS NLB limits allow for the load balancer creation.
  • If you are using your existing network, you should have at least 2 public subnets in the VPC that you would like to use for CDP. The availability zones of the public and private subnets must match.

Steps

When registering your AWS environment, make sure to do the following:

  1. On the Region, Networking, and Security page, select your existing VPC or select to have a new VPC created.
  2. If you selected an existing VPC, select at least two existing private subnets (or at least three subnets if you would like to provision Data Warehouse instances).
  3. The Enable Cluster Connectivity Manager option is enabled by default to enable communication via private subnets.
  4. Click on Enable Public Endpoint Access Gateway to enable it. This enables UIs and APIs of the Data Lake and Data Hub clusters to be accessible over the internet.
  5. If you selected an existing VPC, under Select Endpoint Access Gateway Subnets, select the public subnets for which you would like to use the gateway. The availability zones of the public subnets must be the same as the availability zones of the private subnets selected under Select Subnets.
  6. Under Security Access Settings, make sure to restrict access to only be accepted from sources coming from your external network range.
  7. Finish registering your environment.

During environment registration via CDP CLI, you can optionally enable public endpoint access gateway using the following CLI parameters:

--endpoint-access-gateway-scheme PUBLIC 
--endpoint-access-gateway-subnet-ids subnet-0232c7711cd864c7b subnet-05d4769d88d875cda 

The first parameter enables the gateway and the second one allows you to specify public subnets. The availability zones of the public subnets must be the same as the availability zones of the private subnets specified under --subnet-ids. For example:

cdp environments create-aws-environment \
--environment-name gk1dev \
--credential-name gk1cred \
--region "us-west-2" \
--security-access cidr=0.0.0.0/0 \
--authentication publicKeyId="gk1" \
--log-storage storageLocationBase=s3a://gk1priv-cdp-bucket,instanceProfile=arn:aws:iam::152813717728:instance-profile/mock-idbroker-admin-role \
--vpc-id vpc-037c6d94f30017c24 \
--subnet-ids subnet-0232c7711cd864c7b subnet-05d4769d88d875cda \
--endpoint-access-gateway-scheme PUBLIC \
--endpoint-access-gateway-subnet-ids subnet-0232c7711cd864c7b subnet-05d4769d88d875cda \
--free-ipa instanceCountByGroup=1 \

Equivalent CLI JSON for an environment request looks like this:

"endpointAccessGatewayScheme": "PUBLIC",
"endpointAccessGatewaySubnetIds": 
       ["subnet-0232c7711cd864c7b", 
       "subnet-05d4769d88d875cda"],