Deploying CDP in multiple Azure availability zones

You can optionally choose to deploy Data Lake, FreeIPA, and Data Hubs across multiple availability zones (multi-AZ). With multi-AZ support, newly created Azure environments, enterprise Data Lakes and Data Hubs using HA templates can be deployed across multiple availability zones of the selected Azure region. This provides fault tolerance during the extreme event of an availability zone outage.

Each Azure region has multiple availability zones, which act as failure domains, preventing small outages from affecting entire regions. If you choose to deploy your CDP environment (FreeIPA and Data Lake) and Data Hubs across multiple availability zones, each of these components is spread across three availability zones, providing high availability and fault tolerance. This is illustrated in the following diagram:

With the multi-AZ option enabled, your services are deployed in the following way:

  • Azure environments are always created with three FreeIPA servers, deployed on virtual machines spread across three available zones.

  • In an Azure Enterprise Data Lake each host group is configured so that virtual machines of all critical services are spread across three available zones.

  • In HA Data Hubs, virtual machines of each host group are evenly spread across three availability zones, following a round-robin logic.

When a zone failure happens and a cluster needs to be repaired, the replacement VMs are always provisioned in the same subnet and availability zone as the old ones since the detached disks can only be reattached to a VM in the same availability zone. This means that if there is an availability zone outage, cluster repair is not possible.

By default, If you do not enable multi-AZ, CDP and CDP customers do not have visibility into how Azure distributes VMs across availability zones, because the Azure Portal or CLI do not provide this information.

When creating Data Hubs via CDP CLI, you have the option to specify the AZ, which, in addition to allowing you to select the AZs that should be used, allows you to set up AZ targeting, where all nodes of the cluster are placed on the same AZ. This enables creating disaster recovery scenarios, where a primary and secondary cluster are running in different AZs. If an AZ outage occurs and the primary cluster is lost, it is guaranteed that the secondary cluster is not impacted.

Use cases

A multi-AZ Data Lake and FreeIPA constitute a resilient environment that provides a solid basis for multi-AZ Data Hubs and CDP data services. Data Hubs and CDP data services depend on the FreeIPA instance in the Data Lake to provide DNS resolution. Deploying FreeIPA across multiple availability zones ensures that critical DNS resolution is available in the event of an availability zone outage. Furthermore, a medium duty or enterprise Data Lake provides high availability, and additional compute and memory resources for key SDX services and is recommended for production workloads.

Deploying your Data Hubs across multiple availability zones is key if your mission-critical applications depend on HBase and Kafka. Multiple availability zone deployment for operational workloads is considered best practice by the cloud vendors. It ensures that your applications can continue to run in the event of an availability zone outage.

When an entire availability zone fails, HBase automatically rebalances regions among the remaining instances in the cluster to maintain availability. The write-ahead log (WAL), which is replicated across the three availability zones is automatically replayed by the newly assigned region servers in other availability zones to ensure writes to the database are not lost.

When using the multi availability zone feature, CDP ensures that Kafka replicates partitions across brokers in different availability zones. During an availability zone failure this ensures that no data is lost and applications can continue to access the data they need. Cruise Control, which is deployed alongside every Kafka cluster in CDP Public Cloud, detects that topics need to be rebalanced to the remaining brokers. Once the availability zone is back online, you can repair your Kafka cluster, restoring the initial broker distribution across availability zones. Afterwards Cruise Control kicks in and ensures that all topic partitions are balanced across the cluster.

Limitations

The following limitations apply when deploying a multi-AZ CDP:

  • When an AZ is down, you cannot create a new Data Hub, and create or activate CDP data services within the environment. Existing workloads will continue to work.

  • When an AZ is down, you cannot resize, stop, or restart Data Hubs.

  • Non-AZ environments or clusters cannot be converted to multi-AZ.

Azure requirements

In order to use multi-AZ, you should meet the following Azure requirements:

  1. The Azure region that you select should support setting up Azure PostgreSQL Flexible Server in Zone-Redundant HA mode and also the instance types to be used. See Flexible Server Azure Regions.

  2. The ADLS Gen2 storage account should be created as zone-redundant storage (ZRS). To specify ZRS via Azure CLI during storage account creation, the --sku option should be set to Standard_ZRS. Below is a sample Azure CLI command:
    azure % az storage account create \
     --name test-storage \
     --resource-group rg-test-rg \
     --access-tier Cool \
     --allow-blob-public-access false \
     --allow-cross-tenant-replication false \
     --allow-shared-key-access true \
     --enable-hierarchical-namespace true \
     --skuStandard_ZRS

Register a multi-AZ environment

You can register a multi-AZ AWS environment via CDP UI or CDP CLI. You may choose to enable multi-AZ for Data Lake only or for FreeIPA only. There is no requirement to enable both.

Steps

Register your environment as usual, just make sure to do the following:

  1. On the Data Access and Data Lake Scaling page:

    1. Select to use the Enterprise Data Lake.
    2. On the same page, scroll down and in the bottom of the page enable the Advanced Options.
    3. In the Network and Availability section enable the Enable Multiple Availability Zones for Data Lake toggle button in order to enable multi-AZ for Data Lake. The option is disabled by default. The option only appears when the Enterprise Data Lake is selected.
  2. On the Region, Networking, and Security page:

    1. Scroll down and in the bottom of the page enable the Advanced Options.

    2. In the Network and Availability section enable the Enable Multiple Availability Zones for Data Lake toggle button in order to enable multi-AZ for FreeIPA. The option is disabled by default.

  3. Finish registering your environment as usual.

Use the following CDP CLI commands to register an environment with a multi-AZ Data Lake and FreeIPA:

  1. Register an Azure environment using thecdp environments create-azure-environment command and include multiAz=true in the--free-ipa parameter as shown in this example:
    cdp environments create-azure-environment \
    --environment-name test-env \
    ...
    --free-ipa instanceCountByGroup=3,multiAz=true \
    If you do not include the multiAz=true, the default AZ distribution will be used.
    You can also optionally include the--availability-zonesparameter to select the specific availability zones that should be used. Valid values for availability zones are 1,2 and 3. If this parameter is not provided, all AZs are used. For example:
    cdp environments create-azure-environment \
    --environment-name test-env \
    ...
    --free-ipa instanceCountByGroup=3,multiAz=true \
    --availability-zones 1 2
  2. Set IDBroker mappings as usual using thecdp environments set-id-broker-mappingscommand.
  3. Create a Data Lake using the cdp datalake create-azure-datalake command and adding the --multi-az parameter. For example:
    cdp datalake create-azure-datalake \
    --datalake-name test-dl \
    --environment-name test-env \
    ...
    --scale ENTERPRISE \
    --runtime 7.2.17 \
    --multi-az

Create a multi-AZ Data Hub

You can create multi-AZ Data Hubs within any existing environment. Detailed steps are provided below.

Prerequisites

You can create a multi-AZ Data Hub in a multi-AZ environment only. If you are trying to create a multi-AZ Data Hub in an environment that uses the default AZ distribution, you need to first edit that environment and add AZs to it.

Steps

To enable multi-AZ when creating a Data Hub on Azure, navigate to the Advanced Options > Network And Availability and in the “Azure Availability Zones” section click the toggle button next to Enable using multiple availability zones.

You can create a multi-AZ Data Hub by adding the --multi-az option to the Data Hub creation command.

In the --instance-groups parameter, you can optionally include theavailabilityZones to select the specific availability zones that should be used. If this parameter is not provided, all three AZs are used. For example:
cdp datahub create-azure-cluster \
 --cluster-name test-cluster1 \
 --environment-name test-env \
 --cluster-template-name "7.2.17 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie" \
 --multi-az \
cdp datahub create-azure-cluster \
 --cluster-name test-cluster1 \
 --environment-name test-env \
 --cluster-template-name "7.2.17 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie" \
 --multi-az \
 --instance-groups
          nodeCount=1,instanceGroupName=compute,instanceGroupType=CORE,instanceType=Standard_D5_v2,rootVolumeSize=100,attachedVolumeConfiguration=\[\{volumeSize=100,volumeCount=0,volumeType=StandardSSD_LRS\}\],recoveryMode=MANUAL,availabilityZones=\[1,2\]
          nodeCount=0,instanceGroupName=gateway,instanceGroupType=CORE,instanceType=Standard_D8_v3,rootVolumeSize=100,attachedVolumeConfiguration=\[\{volumeSize=100,volumeCount=1,volumeType=StandardSSD_LRS\}\],recoveryMode=MANUAL,availabilityZones=\[2,3\]
          nodeCount=1,instanceGroupName=master,instanceGroupType=GATEWAY,instanceType=Standard_D16_v3,rootVolumeSize=100,attachedVolumeConfiguration=\[\{volumeSize=100,volumeCount=1,volumeType=StandardSSD_LRS\}\],recoveryMode=MANUAL,availabilityZones=\[1,2,3\] 
          nodeCount=3,instanceGroupName=worker,instanceGroupType=CORE,instanceType=Standard_D5_v2,rootVolumeSize=100,attachedVolumeConfiguration=\[\{volumeSize=100,volumeCount=1,volumeType=StandardSSD_LRS\}\],recoveryMode=MANUAL,availabilityZones=\[1,3\]