Public Endpoint Access Gateway

If the network into which you are deploying your CDP environment does not have pre-established connectivity with your corporate network, enabling the Public Endpoint Access Gateway can reduce the complexity users face when interacting with the CDP endpoints.

he recommended way to deploy production-ready CDP environments is to deploy them on private networks, but this additional security makes it difficult for users to access UIs and APIs without configuring complex network connectivity between users and internal cloud provider networks. The Public Endpoint Access Gateway provides secure connectivity to UIs and APIs in Data Lake and Data Hub clusters deployed using private networking, allowing users to access these resources without complex changes to their networking or creating direct connections to cloud provider networks.

You can enable the Public Endpoint Access Gateway when registering your AWS, Azure, or GCP environment in CDP. The gateway interfaces the Knox service, which is automatically integrated with your identity provider configured in CDP, allowing you to authenticate using your SSO credentials without any additional configuration. All communication with the gateway is over TLS, so connections are secure. You can control the IP ranges from where connections to the gateway can be established by configuring your security groups.

The following diagram illustrates this setup:

Enable Public Endpoint Access Gateway for AWS

You can enable Public Endpoint Access Gateway during AWS environment registration.

Once activated, the gateway will be used for the Data Lake and all the Data Hubs within the environment. There is no way to activate it on a per Data Lake or per Data Hub level. Once it is enabled for an environment, there is no way to deactivate it. The gateway can be used either with an existing VPC or with a new VPC created by CDP.

Prerequisites

  • If you choose to enable Public Endpoint Access Gateway, CDP will create two AWS network load balancers (AWS NLB) per cluster (that is for each Data Lake and Data Hub). Make sure that your AWS NLB limits allow for the load balancer creation.
  • If you are using your existing network, you should have at least 2 public subnets in the VPC that you would like to use for CDP. The availability zones of the public and private subnets must match.

Steps

When registering your AWS environment, make sure to do the following:

  1. On the Region, Networking, and Security page, under Network, select your existing VPC or select to have a new VPC created.
  2. If you selected an existing VPC, select at least two existing private subnets (or at least three subnets if you would like to provision Data Warehouse instances).
  3. Click on Enable Public Endpoint Access Gateway to enable it. This enables UIs and APIs of the Data Lake and Data Hub clusters to be accessible over the internet.
  4. If you selected an existing VPC, under Select Endpoint Access Gateway Subnets, select the public subnets for which you would like to use the gateway. The availability zones of the public subnets must be the same as the availability zones of the private subnets selected under Select Subnets.
  5. Under Security Access Settings, make sure to restrict access to only be accepted from sources coming from your external network range.
  6. Finish registering your environment.

During environment registration via CDP CLI, you can optionally enable public endpoint access gateway using the following CLI parameters:

--endpoint-access-gateway-scheme PUBLIC 
--endpoint-access-gateway-subnet-ids subnet-0232c7711cd864c7b subnet-05d4769d88d875cda 

The first parameter enables the gateway and the second one allows you to specify public subnets. The availability zones of the public subnets must be the same as the availability zones of the private subnets specified under --subnet-ids. For example:

cdp environments create-aws-environment \
--environment-name gk1dev \
--credential-name gk1cred \
--region "us-west-2" \
--security-access cidr=0.0.0.0/0 \
--authentication publicKeyId="gk1" \
--log-storage storageLocationBase=s3a://gk1priv-cdp-bucket,instanceProfile=arn:aws:iam::152813717728:instance-profile/mock-idbroker-admin-role \
--vpc-id vpc-037c6d94f30017c24 \
--subnet-ids subnet-0232c7711cd864c7b subnet-05d4769d88d875cda \
--endpoint-access-gateway-scheme PUBLIC \
--endpoint-access-gateway-subnet-ids subnet-0232c7711cd864c7b subnet-05d4769d88d875cda \
--free-ipa instanceCountByGroup=1 \

Equivalent CLI JSON for an environment request looks like this:

"endpointAccessGatewayScheme": "PUBLIC",
"endpointAccessGatewaySubnetIds": 
       ["subnet-0232c7711cd864c7b", 
       "subnet-05d4769d88d875cda"],

Enable Public Endpoint Access Gateway for Azure

You can enable Public Endpoint Access Gateway during Azure environment registration.

Once activated, the gateway will be used for the Data Lake and all the Data Hubs within the environment. There is no way to activate it on a per Data Lake or per Data Hub level. Once it is enabled for an environment, there is no way to deactivate it. The gateway can be used either with an existing VNet or with a new VNet created by CDP.

If you choose to enable Public Endpoint Access Gateway, CDP will create two Azure load balancers per cluster (that is, two for each Data Lake and Data Hub).

Steps

When registering your Azure environment, make sure to do the following:

  1. On the Region, Networking, and Security page, under Network, select your existing VNet or select to have a new VNet created.

  2. If you selected an existing VNet, select at least one existing private subnet (or at least three subnets if you would like to provision Data Warehouse instances).

  3. Click on Enable Public Endpoint Access Gateway to enable it. This enables UIs and APIs of the Data Lake and Data Hub clusters to be accessible over the internet.

  4. Under Security Access Settings, make sure to restrict access to only be accepted from sources coming from your external network range.
  5. Finish registering your environment.

During Azure environment registration via CDP CLI, you can optionally enable public endpoint access gateway using the --endpoint-access-gateway-scheme CLI parameter. For example:
cdp environments create-azure-environment 
...
--endpoint-access-gateway-scheme PUBLIC 

Equivalent CLI JSON for an environment request looks like this:

cdp environments create-azure-environment
...
"endpointAccessGatewayScheme": "PUBLIC" 

Enable Public Endpoint Access Gateway for GCP

You can enable Public Endpoint Access Gateway during GCP environment registration.

Once activated, the gateway will be used for the Data Lake and all the Data Hubs within the environment. There is no way to activate it on a per Data Lake or per Data Hub level. Once it is enabled for an environment, there is no way to deactivate it.

If you choose to enable Public Endpoint Access Gateway, CDP will create two Google Cloud Load Balancers (GCLB) per cluster (that is, two for each Data Lake and two for each Data Hub).

Prerequisites

If you would like to use this feature, make sure that "Private Google Access" is disabled on at least one subnet in the VPC.

Steps

When registering your GCP environment, make sure to do the following:

  1. On the Region, Networking, and Security page, under Network, select your existing VPC network.

  2. Select at least one existing private subnet.

  3. Click on Enable Public Endpoint Access Gateway to enable it. This enables UIs and APIs of the Data Lake and Data Hub clusters to be accessible over the internet.
  4. If you selected an existing VPC, under Select Endpoint Access Gateway Subnets, select the public subnets for which you would like to use the gateway. The availability zones of the public subnets must be the same as the availability zones of the private subnets selected under Select Subnets.
  5. Under Security Access Settings, make sure to restrict access to only be accepted from sources coming from your external network range.
  6. Finish registering your environment.

During GCP environment registration via CDP CLI, you can optionally enable Public Endpoint Access Gateway using the following CLI parameter:

 --endpoint-access-gateway-scheme PUBLIC
For example:
cdp environments create-gcp-environment 
 ...
 --endpoint-access-gateway-scheme PUBLIC
Equivalent CLI JSON for an environment request looks like this:
cdp environments create-gcp-environment
...
"endpointAccessGatewayScheme": "PUBLIC"