Public Endpoint Access Gateway

If the network into which you are deploying your Cloudera environment does not have pre-established connectivity with your corporate network, enabling the Public Endpoint Access Gateway can reduce the complexity users face when interacting with the Cloudera endpoints.

The recommended way to deploy production-ready Cloudera environments is to deploy them on private networks, but this additional security makes it difficult for users to access UIs and APIs without configuring complex network connectivity between users and internal cloud provider networks. The Public Endpoint Access Gateway provides secure connectivity to UIs and APIs in Data Lake and Cloudera Data Hub clusters deployed using private networking, allowing users to access these resources without complex changes to their networking or creating direct connections to cloud provider networks.

You can enable the Public Endpoint Access Gateway when registering your AWS, Azure, or GCP environment in Cloudera. The gateway interfaces the Knox service, which is automatically integrated with your identity provider configured in Cloudera, allowing you to authenticate using your SSO credentials without any additional configuration. All communication with the gateway is over TLS, so connections are secure. You can control the IP ranges from where connections to the gateway can be established by configuring your security groups.

The following diagram illustrates this setup:

Enable Public Endpoint Access Gateway for AWS

You can enable Public Endpoint Access Gateway during AWS environment registration.

Once activated, the gateway will be used for the Data Lake and all the Cloudera Data Hub clusters within the environment. There is no way to activate it on a per Data Lake or per Cloudera Data Hub cluster level. Once it is enabled for an environment, there is no way to deactivate it. The gateway can be used either with an existing VPC or with a new VPC created by Cloudera.

Prerequisites

If you choose to enable Public Endpoint Access Gateway, Cloudera will create two AWS network load balancers (AWS NLB) per cluster (that is for each Data Lake and Cloudera Data Hub cluster). Make sure that your AWS NLB limits allow for the load balancer creation.
You should have at least 2 public subnets in the VPC that you would like to use for Cloudera. The availability zones of the public and private subnets must match.

Steps

CDP UI
CDP CLI

When registering your AWS environment, make sure to do the following:

On the Region, Networking, and Security page, under Network, select your existing VPC.
Select at least two existing private subnets (or at least three subnets if you would like to provision Cloudera Data Warehouse instances).
Click on Enable Public Endpoint Access Gateway to enable it. This enables UIs and APIs of the Data Lake and Cloudera Data Hub clusters to be accessible over the internet.
Under Select Endpoint Access Gateway Subnets, select the public subnets for which you would like to use the gateway. The availability zones of the public subnets must be the same as the availability zones of the private subnets selected under Select Subnets.
Under Security Access Settings, make sure to restrict access to only be accepted from sources coming from your external network range.
note
The security access settings do not apply to the network load balancer used by the Public Endpoint Access Gateway, but they apply to the instances that are running in private subnets and to which the Public Endpoint Access Gateway routes traffic. Therefore the security access settings should allow the users’ public IP ranges to be able to connect through the public load balancer.
Finish registering your environment.

During environment registration via CDP CLI, you can optionally enable public endpoint access gateway using the following CLI parameters:

--endpoint-access-gateway-scheme PUBLIC 
--endpoint-access-gateway-subnet-ids subnet-0232c7711cd864c7b subnet-05d4769d88d875cda

The first parameter enables the gateway and the second one allows you to specify public subnets. The availability zones of the public subnets must be the same as the availability zones of the private subnets specified under --subnet-ids. For example:

cdp environments create-aws-environment \
--environment-name gk1dev \
--credential-name gk1cred \
--region "us-west-2" \
--security-access cidr=0.0.0.0/0 \
--authentication publicKeyId="gk1" \
--log-storage storageLocationBase=s3a://gk1priv-cdp-bucket,instanceProfile=arn:aws:iam::152813717728:instance-profile/mock-idbroker-admin-role \
--vpc-id vpc-037c6d94f30017c24 \
--subnet-ids subnet-0232c7711cd864c7b subnet-05d4769d88d875cda \
--endpoint-access-gateway-scheme PUBLIC \
--endpoint-access-gateway-subnet-ids subnet-0232c7711cd864c7b subnet-05d4769d88d875cda \
--free-ipa instanceCountByGroup=1 \

Equivalent CLI JSON for an environment request looks like this:

"endpointAccessGatewayScheme": "PUBLIC",
"endpointAccessGatewaySubnetIds": 
       ["subnet-0232c7711cd864c7b", 
       "subnet-05d4769d88d875cda"],

Enable Public Endpoint Access Gateway for Azure

You can enable Public Endpoint Access Gateway during Azure environment registration.

Once activated, the gateway will be used for the Data Lake and all the Cloudera Data Hub cluster s within the environment. There is no way to activate it on a per Data Lake or per Cloudera Data Hub cluster level. Once it is enabled for an environment, there is no way to deactivate it. The gateway can be used either with an existing VNet or with a new VNet created by Cloudera.

If you choose to enable Public Endpoint Access Gateway, Cloudera will create two Azure load balancers per cluster (that is, two for each Data Lake and Cloudera Data Hub cluster ).

Steps

Cloudera UI
CDP CLI

When registering your Azure environment, make sure to do the following:

On the Region, Networking, and Security page, under Network, select your existing VNet or select to have a new VNet created.
If you selected an existing VNet, select at least one existing private subnet (or at least three subnets if you would like to provision Cloudera Data Warehouse instances).
Click on Enable Public Endpoint Access Gateway to enable it. This enables UIs and APIs of the Data Lake and Cloudera Data Hub cluster s to be accessible over the internet.
Under Security Access Settings, make sure to restrict access to only be accepted from sources coming from your external network range.
note
The security access settings do not apply to the load balancer used by the Public Endpoint Access Gateway, but they apply to the instances that are running in private subnets and to which the Public Endpoint Access Gateway routes traffic. Therefore the security access settings should allow the users’ public IP ranges to be able to connect through the public load balancer.
Finish registering your environment.

During Azure environment registration via CDP CLI, you can optionally enable public endpoint access gateway using the --endpoint-access-gateway-scheme CLI parameter. For example:

cdp environments create-azure-environment 
...
--endpoint-access-gateway-scheme PUBLIC

Equivalent CLI JSON for an environment request looks like this:

cdp environments create-azure-environment
...
"endpointAccessGatewayScheme": "PUBLIC"

Enable Public Endpoint Access Gateway for GCP

You can enable Public Endpoint Access Gateway during GCP environment registration.

Once activated, the gateway will be used for the Data Lake and all the Cloudera Data Hub cluster s within the environment. There is no way to activate it on a per Data Lake or per Cloudera Data Hub cluster level. Once it is enabled for an environment, there is no way to deactivate it.

If you choose to enable Public Endpoint Access Gateway, Cloudera will create two Google Cloud Load Balancers (GCLB) per cluster (that is, two for each Data Lake and two for each Cloudera Data Hub cluster ).

Prerequisites

If you would like to use this feature, make sure that "Private Google Access" is disabled on at least one subnet in the VPC.

Steps

Cloudera UI
CDP CLI

When registering your GCP environment, make sure to do the following:

On the Region, Networking, and Security page, under Network, select your existing VPC network.
Select at least one existing private subnet.
Click on Enable Public Endpoint Access Gateway to enable it. This enables UIs and APIs of the Data Lake and Cloudera Data Hub cluster s to be accessible over the internet.
If you selected an existing VPC, under Select Endpoint Access Gateway Subnets, select the public subnets for which you would like to use the gateway. The availability zones of the public subnets must be the same as the availability zones of the private subnets selected under Select Subnets.
Under Security Access Settings, make sure to restrict access to only be accepted from sources coming from your external network range.
note
The security access settings do not apply to the load balancer used by the Public Endpoint Access Gateway, but they apply to the instances that are running in private subnets and to which the Public Endpoint Access Gateway routes traffic. Therefore the security access settings should allow the users’ public IP ranges to be able to connect through the public load balancer.
Finish registering your environment.

During GCP environment registration via CDP CLI, you can optionally enable Public Endpoint Access Gateway using the following CLI parameter:

 --endpoint-access-gateway-scheme PUBLIC

For example:

cdp environments create-gcp-environment 
 ...
 --endpoint-access-gateway-scheme PUBLIC

Equivalent CLI JSON for an environment request looks like this:

cdp environments create-gcp-environment
...
"endpointAccessGatewayScheme": "PUBLIC"