Cluster template overrides

You can specify custom configurations that override or append the properties in a built-in Data Hub template or a custom template.

Overview

You can launch Data Hub clusters from a set of pre-defined cluster templates created for prescriptive use cases. These cluster templates are a “shared resource” that define the list of services that will be installed on the Data Hub, including their configurations.

For example, this is a portion of a default template:

{
  "services": [
    {
      "refName": "zookeeper",
      "serviceType": "ZOOKEEPER",
      "serviceConfigs": [
        {
          "name": "service_config_suppression_server_count_validator",
          "value": "true"
        }
      ],
      "roleConfigGroups": [
        {
          "refName": "zookeeper-SERVER-BASE",
          "roleType": "SERVER",
          "base": true
        }
      ]
    },
    {
      "refName": "hdfs",
      "serviceType": "HDFS",
      "serviceConfigs": [
        {
          "name": "hdfs_verify_ec_with_topology_enabled",
          "value": false
        },
        {
          "name": "core_site_safety_valve",
          "value": "<property><name>fs.s3a.buffer.dir</name><value>${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/s3a</value></property><property><name>fs.s3a.committer.name</name><value>directory</value></property>"
        }
      ],
      "roleConfigGroups": [
        {
          "refName": "hdfs-NAMENODE-BASE",
          "roleType": "NAMENODE",
          "base": true,
          "configs": [
            {
              "name": "role_config_suppression_namenode_java_heapsize_minimum_validator",
              "value": "true"
            },
            {
              "name": "role_config_suppression_fs_trash_interval_minimum_validator",
              "value": "true"
            },
            {
              "name": "fs_trash_interval",
              "value": "0"
            },
            {
              "name": "fs_trash_checkpoint_interval",
              "value": "0"
            },
            {
              "name": "erasure_coding_default_policy",
              "value": " "
            }
          ]
        },
        {
          "refName": "hdfs-SECONDARYNAMENODE-BASE",
          "roleType": "SECONDARYNAMENODE",
          "base": true
        },
        {
          "refName": "hdfs-DATANODE-BASE",
          "roleType": "DATANODE",
          "base": true
        },
        {
          "refName": "hdfs-BALANCER-BASE",
          "roleType": "BALANCER",
          "base": true
        },
        {
          "refName": "hdfs-GATEWAY-BASE",
          "roleType": "GATEWAY",
          "base": true,
          "configs": [
            {
              "name": "dfs_client_use_trash",
              "value": false
            },
            {
              "name": "role_config_suppression_hdfs_trash_disabled_validator",
              "value": "true"
            },
            {
              "name": "hdfs_client_env_safety_valve",
              "value": "HADOOP_OPTS=\"-Dorg.wildfly.openssl.path=/usr/lib64 ${HADOOP_OPTS}\""
            }
          ]
        }
      ]

In this section of an example cluster template, two primary types of service configurations are visible: serviceConfigs and configurations for various roleConfigGroups, in addition to a special type of service configuration called safety_valves. You can read about the details of these configs in the Cloudera Manager Configuration Properties reference.

Oftentimes you may want to modify the serviceConfigs, roleConfigGroups, and safety_valves present in a default Data Hub cluster template. Although you can create a custom cluster template by modifying the JSON of a default template, this process can be cumbersome and error-prone.

As an alternative to creating a custom template, you can specify custom configurations that override or append the properties in a default template. These custom configurations are saved as a shared resource called “cluster template overrides,” and can be used and re-used across Data Hub clusters in different environments. As a shared resource, they do not need to be attached to a specific Runtime version.

By using a default cluster template along with a cluster template override, you can create a customized Data Hub cluster, along with receiving improvements to the default templates that are present in newer Cloudera Runtime releases. Cluster template overrides can be used to override or append properties present in both the identified types of service configurations, as well as safety valves.

You are not limited to customizing the configs present in a default cluster template; you can add any valid configuration recognized by Cloudera Manager for an included service. New configurations are appended to the cluster template when the cluster is launched.

Cluster template overrides can be applied to custom templates as well. If you want to apply a cluster template override to a custom template that contains properties that are dynamically replaced during cluster creation, the cluster template override will override any dynamically-replaced properties when the two conflict.

Limitations

At present, there is no way to validate any individual property to ensure that it is valid and recognised by Cloudera Manager, so that it can be overridden to the desired value. Adding an incorrect config name can lead to errors while installing the CM template on the Data Hub cluster, or the property could be ignored by Cloudera Manager entirely.

An invalid value for a particular property can also cause errors during the cluster creation process. Carefully review the Cloudera Manager Configuration Properties reference in regards to configurations that you want to customize.

Cluster template overrides are only for overriding/appending the serviceConfigs, roleConfigGroups, and safety_valves in a Data Hub template. Unlike creating a custom template, you can not add a service to the template.