Use a non-transparent proxy with Cloudera Machine Learning on AWS environments

Cloudera Machine Learning (CML) can use non-transparent proxies if the environment is configured to use a network proxy in Management Console.

Enterprise customers frequently need to deploy CDP in a virtual network that does not have direct internet access. Specifically, the proxy server may be located in a different virtual network, in order to filter traffic for allowed domains or IPs.

Transparent and non-transparent network proxies differ in the following ways.

Transparent network proxy
  • Proxy is unknown to clients and requires no additional client configuration.
  • Usually, connections by way of transparent proxies are configured in route tables on your AWS VPC.
Non-transparent proxy
  • Clients are aware of non-transparent proxies and each client must be specifically configured to use the non-transparent proxy connection.
  • You pass connection or security information (username/password) along with the connection request sent by clients.

You can configure an AWS environment to use non-transparent proxy connections when activating environments for Cloudera Machine Learning (CML).

Configuring the non-transparent proxy

You configure the non-transparent proxy in Management Console > Shared Resources > Proxies, and then selecting Create Proxy Configuration. For more information, see Using a non-transparent proxy.

There is one important step for creating a non-transparent proxy for CML. In the Create Proxy UI, No Proxy Hosts field, enter the following:
runtime-manager.mlx.svc.cluster.local,ds-operator.mlx.svc.cluster.local,feature-flags.mlx,governance-server.mlx.svc.cluster.local,
usage-reporter.mlx.svc.cluster.local,ds-vfs.mlx.svc.cluster.local,
ds-cdh-client.mlx.svc.cluster.local,fluentd.mlx.svc.cluster.local,
livelog.mlx.svc.cluster.local,tgtgen.mlx.svc.cluster.local,web.mlx.svc.cluster.local,
s2i-client.mlx.svc.cluster.local,livelog.mlx,archiver.mlx.svc.cluster.local,cldr.work,
cloudera.site

These hostnames are CML internal service endpoints.

Use a non-transparent proxy in a different VPC

If the customer wants to copy the hostname for the non-transparent proxy and the non-transparent proxy is configured in a different VPC, then CDP needs the CIDR of the non-transparent proxy to allow the inbound access. To configure this, in the Provision Workspace UI, select Use hostname for non-transparent proxy and enter the CIDR range in Inbound Proxy CIDR Ranges.