DNS

Recommended DNS configurations for CDP Public Cloud for AWS.

The previous sections dealt with how connectivity is established to the workload infrastructure. This section deals with ‘addressability’. The workloads launched by CDP contain a few services that need to be accessed by the CDP Admins or Data Consumers. These include services like Cloudera Manager, metadata services like the Hive Metastore, Atlas or Ranger, data processing or consumption services such as Oozie server, Hue, and so on. Given the nature of the cloud infrastructure, the IP addresses for the nodes running these services may change (say if the infrastructure is restarted or repaired). However, these should have statically addressable DNS names so that users can access them with the same names.

In order to help with this, CDP assigns DNS names to these nodes. These naming schemes have the following properties:

  • The DNS name is of the following format for a Data Lake node, a Data Hub node or the Data Lake/Data Hub cluster endpoint: workload_name-{host_group<i>}.<environment_identifier>.<customer_identifier>.cloudera.site. An example could be my-dataeng-master0.my-envir.aaaa-1234.cloudera.site.This name has the following components:
    • The base domain is cloudera.site. This is a publicly registered DNS suffix (see Public Suffix List). It is also a registered Route53 hosted zone in a Cloudera owned AWS account.
    • The <customer_identifier> is unique to a customer account on CDP made of alphanumeric characters and ‘-’
    • The <environment_identifier> is generated based on the environment name and is truncated to 8 characters.
    • The <workload_name> is as given for the Data Lake or Data Hub. It is appended with a host group name like gateway, master, worker and so on, depending on the role the node plays in the cluster. If there are more than one of these nodes playing the same role, they are appended with a serial number, <i>
  • The DNS name of the endpoints of the experiences is of the following format:
    • For a Virtual Warehouse in CDW, it is <App-virtualWarehouse_Name>.<CDW_environment_identifier>.dw.<customer_identifier>.cloudera.site
      • App-virtualWarehouse_Name is the name of the VirtualWarehouse created. There could be multiple virtual warehouses for a given CDP environment.
      • CDW_environment_identifier is the identifier for the CDP environment.
    • For a Session Terminal in MLX workspace, it is <TTY_Session_ID>.<MLX_Workspace_ID>.<environment_identifier>.<customer_identifier>.cloudera.site
      • TTY_Session_ID is the ID of the MLX Terminal Session ID.
      • MLX_Workspace_ID is the ID of the MLX workspace created.
      • The <environment_identifier> is generated based on the environment name and is truncated to 8 characters. If the 8th character is a ‘-’, then it is truncated to 7 characters instead.
    • For all the experiences listed above, the common portions of the DNS include.
      • The base domain is cloudera.site. This is a publicly registered DNS suffix (see Public Suffix List). It is also a registered Route53 hosted zone in a Cloudera owned AWS account.
      • The <customer_identifier> is unique to a customer account on CDP made of alphanumeric characters and ‘-’.
  • The length of the DNS name is restricted to 64 characters due to some limitations with Hue workloads.
  • These names are stored as A records in the Route53 hosted zone in the Cloudera managed Control Plane AWS account. Hence, you can resolve these names from any location outside of the VPC. However, note that they would still resolve to private IP addresses and hence are constrained by the connectivity setup described in preceding sections.
  • Within a CDP environment, the DNS resolution happens differently. Every CDP environment has a DNS server that is played by a component called FreeIPA. This server is seeded with the hostnames of the nodes of all workload clusters in the environment. Every node in a Data Lake, Data Hub and Data Warehouse is configured to look up the FreeIPA DNS service for name resolution within the cluster.