Customer on-prem network to cloud network

After compute experiences are launched in the customer’s cloud network, data consumers like data engineers, scientists, and analysts will access services running in these experiences. Sometimes, CDP administrators who set up and operate these clusters might need this access to diagnose any issues the clusters face.

Examples of these include:
  • Web UIs such as:
    • Hue: For running SQL queries in Hive tables.
    • CML Workspaces: For accessing Machine Learning projects, models, notebooks, and so on.
    • Cloudera Manager: For Data Hubs and Data Lakes.
    • Atlas and Ranger: For metadata, governance, and security in the Data Lake.
  • JDBC endpoints: Customers can connect tools like Tableau using a JDBC URL pointing to the Hive server.
  • SSH Access: Data engineers might login to nodes on the compute experiences to run data processing jobs using YARN, Spark, or other data pipeline tools.
  • Kube API access: Experiences that run on Amazon EKS (like Cloudera Data Warehouse and Cloudera Machine Learning) also provide admin access to Kubernetes for purposes of diagnosing issues.
  • API access: Customers can use APIs for accessing many of the services exposed via the Web UIs for purposes of automation and integration with other tools, applications, or other workloads they have. For example, CML exposes the CML API v2 to work with Machine Learning projects and other entities.

These services are accessed by these consumers from within a corporate network inside a VPN. These services typically have endpoints that have a DNS name, the format of which is described more completely in the DNS section of this chapter. These DNS names resolve to IP addresses assigned to the nodes, or load balancers fronting the ingest controllers of Kubernetes clusters. Note that these IP addresses are usually private IPs. Therefore, in order to be able to connect to these IPs from the on-premise network within a VPN, some special connectivity setup would be needed- typically accomplished using technologies like VPN Peering, DirectConnect, Transit Gateways, and so on. While there are many options possible here, this document will try to describe one concrete option of achieving this connectivity.