DNS
This topic covers recommended DNS configurations for Cloudera on AWS.
The previous sections dealt with how connectivity is established to the workload infrastructure. This section deals with ‘addressability’. The workloads launched by Cloudera contain a few services that need to be accessed by the Cloudera admins or data consumers. These include services like Cloudera Manager, metadata services like the Hive Metastore, Atlas or Ranger, data processing or consumption services such as Oozie server, Hue, and so on. Given the nature of the cloud infrastructure, the IP addresses for the nodes running these services may change (for example, if the infrastructure is restarted or repaired). However, these should have statically addressable DNS names so that users can access them with the same names.
In order to help with this, Cloudera assigns DNS names to these nodes. These naming schemes have the following properties:
- The DNS name is of the following format for each Data Lake node, Cloudera Data Hub node, and the Data Lake/Cloudera Data Hub
cluster endpoint:
<
CLUSTER_NAME>
-{<HOST_GROUP>
<i>
}.<ENVIRONMENT_IDENTIFIER>
.<CUSTOMER_IDENTIFIER>
.cloudera.siteAn example could be
This name has the following components:my-dataeng-master0.my-envir.aaaa-1234.cloudera.site
- The base domain is
cloudera.site
. This is a publicly registered DNS suffix (see Public Suffix List). It is also a registered Route53 hosted zone in a Cloudera owned AWS account. - The
<
is unique to a customer account on Cloudera made of alphanumeric characters and "-"CUSTOMER_IDENTIFIER>
- The
<
is generated based on the environment name and is truncated to 8 characters.ENVIRONMENT_IDENTIFIER>
- The
<
is the cluster name given to the Data Lake or Cloudera Data Hub. It is appended with aCLUSTER_NAME
>
name such as "gateway", "master", "worker", and so on, depending on the role that the node plays in the cluster. If there are more than one of these nodes playing the same role, they are appended with a serial number,<HOST_GROUP>
.<i>
- The base domain is
- The DNS name of the endpoints of the Cloudera data
services is of the following format:
- For a Virtual Warehouse in Cloudera Data Warehouse, it is
<
VIRTUAL_WAREHOUSE_NAME
>.<CDW_ENVIRONMENT_IDENTIFIER>
.dw.<CUSTOMER_IDENTIFIER>
.cloudera.siteThe <
is the name of the Virtual Warehouse created. There could be multiple virtual warehouses for a given Cloudera environment.VIRTUAL_WAREHOUSE_NAME>
The
is the identifier for the Cloudera environment.<CDW_ENVIRONMENT_IDENTIFIER>
- For a Session Terminal in a Cloudera AI workspace,
it is
.<TTY_SESSION_ID>
<CML_WORKSPACE_ID>
.<ENVIRONMENT_IDENTIFIER>
.<CUSTOMER_IDENTIFIER>
.cloudera.siteThe
is the ID of the Cloudera AI Terminal Session ID.<TTY_SESSION_ID>
The
is the ID of the Cloudera AI workspace created.<
CML_WORKSPACE_ID>
- The
<
is generated based on the environment name and is truncated to 8 characters. If the 8th character is a "-" (dash), then it is truncated to 7 characters instead.ENVIRONMENT_IDENTIFIER>
- For all the Cloudera data services listed
above, the common portions of the DNS include.
- The base domain is
cloudera.site
. This is a publicly registered DNS suffix (see Public Suffix List). It is also a registered Route53 hosted zone in a Cloudera owned AWS account. - The
<
is unique to a customer account on Cloudera made of alphanumeric characters and a "-" (dash).CUSTOMER_IDENTIFIER>
- The base domain is
- For a virtual cluster in Cloudera Data Engineering, it is
cloudera.site<VIRTUAL_CLUSTER_ID>
.<CDE_SERVICE_ID>
.<ENVIRONMENT_IDENTIFIER>
.<CUSTOMER_IDENTIFIER>
.The
is the 8-character ID of the Cloudera Data Engineering virtual cluster, for example, afg57p98.<VIRTUAL_CLUSTER_ID>
The
is the ID of the Cloudera Data Engineering service containing the virtual cluster, for example, cde-g6th4kjv.<CDE_SERVICE_ID>
- The
is generated based on the Cloudera environment name and is truncated to 8 characters. if the 8th character is a "-" (dash), then it is truncated to 7 characters instead.<ENVIRONMENT_IDENTIFIER>
- For a DataFlow service in Cloudera DataFlow, it
is
dfx.
cloudera.site<CDF_WORKLOAD_ENDPOINT_ID>
.<CUSTOMER_IDENTIFIER>
.The
is the 8-character ID of the Cloudera DataFlow Service Workload Endpoint, for example, 1bxt50kk.<CDF_WORKLOAD_ENDPOINT_ID>
- For a database in Cloudera Operational Database, it is
<
COD_WORKLOAD_NAME>
-{<HOST-GROUP
>
<i>
}.<ENVIRONMENT_IDENTIFIER>
.
<CUSTOMER_IDENTIFIER>
.
cloudera.site<COD_Workload_Name>
is the ID of the Cloudera Operational Database, for example, cod-1m6yz9uwqhrg2.- The user provides a database name and the environment where they want to create
the database. These two entities are hashed together to create the internal
, which is set as the Cloudera Data Hub cluster.<COD_WORKLOAD_NAME>
- Except for the
, the rest of the DNS name of the endpoint is implemented as per Cloudera Data Hub DNS format as mentioned above.<COD_WORKLOAD_NAME>
- For all the Cloudera data services listed above, the
common portions of the DNS include:
- The base domain is cloudera.site. This is a publicly registered DNS suffix. It is also a registered Route53 hosted zone in a Cloudera owned AWS account.
- The
is unique to a customer account on Cloudera made of alphanumeric characters and a "-" (dash).<CUSTOMER_IDENTIFIER>
- For a Virtual Warehouse in Cloudera Data Warehouse, it is
- The length of the DNS name is restricted to 64 characters due to some limitations with Hue workloads.
- These names are stored as A records in the Route53 hosted zone in the Cloudera managed Cloudera Control Plane AWS account. Hence, you can resolve these names from any location outside of the VPC. However, note that they would still resolve to private IP addresses and hence are constrained by the connectivity setup described in preceding sections.
- Within a Cloudera environment, the DNS resolution happens differently. Every Cloudera environment has a DNS server that is played by a component called FreeIPA. This server is seeded with the hostnames of the nodes of all workload clusters in the environment. Every node in a Data Lake, Cloudera Data Hub, and a Cloudera data service is configured to look up the FreeIPA DNS service for name resolution within the cluster.