Using Azure Database for PostgreSQL Flexible Server

CDP uses Azure Database for PostgreSQL Flexible Server. The Flexible Server allows a highly available database to be deployed for Data Lake and Data Hub clusters. You can create Flexible Server instances with public access, where the Azure Database for PostgreSQL server is accessed through a public endpoint, or with private access (“Private Flexible Server”), where the flexible server has no public endpoint accessible through the internet. The latter option requires a private DNS zone to be specified, and a delegated subnet to be created and added to your CDP Azure environment beforehand.

Using the Flexible Server offers the following benefits to CDP customers:

  • Flexible Server is multi-AZ capable and offers zone-redundant High Availability. With the Flexible Server, Data Lakes are backed with a highly-available PostgreSQL configuration of two instances. When using a multi-AZ deployment, the Flexible Server instances are deployed in multiple availability zones for additional fault tolerance.

For a detailed comparison of Single Server and Flexible Server offerings, refer to the Comparison table in the Azure documentation.

Database server options

You can create Flexible Server instances with public access, where the Azure Database for PostgreSQL server is accessed through a public endpoint, or with private access (“Private Flexible Server”), where the Flexible Server has no public endpoint accessible through the internet. The latter option requires a private DNS zone to be specified, and a delegated subnet to be created and added to your CDP Azure environment beforehand.

The Flexible Server with public access is used by default (as it does not require any special networking setup), but during environment creation you can specify to use the Flexible Server with private access. It is also possible to specify to use the Single Server.

New CDP environments on Azure automatically use Flexible Server with public endpoints and Data Hubs automatically inherit the settings from the environment they run in, but you can also enable Flexible Server when creating a Data Hub.

In general, when registering an Azure environment and creating a Data Hub, you can choose to use Flexible Server, Private Flexible Server, or Single Server, but the exact options vary depending on the environment settings. The logic that CDP uses is illustrated in the following flow chart:

  1. When Data Lake or Data Hub creation is initiated, CDP checks if the parent environment has been launched with the private Single Server setup. That is, it checks if one of the following parameters are configured:
    
    --flexible-server-subnet-ids   is empty
    --create-private-endpoints   is specified
    or
    
    --flexible-server-subnet-ids  is empty
    --existing-network-params.databasePrivateDnsZoneId  is specified
  2. If the environment has not been configured for Private Single Server, CDP checks if --database-type=FLEXIBLE_SERVER parameter is specified.
    1. If it is specified, CDP launches the specified database type (Flexible Server or Single Server).
    2. If it is not specified, CDP launches Flexible Server, with public or private setup, based on the environment level settings.
  3. If the parameters have been configured, CDP checks if --database-type=FLEXIBLE_SERVER parameter is set.
    1. If --database-type=FLEXIBLE_SERVER parameter is set, a validation error is returned. The validation error states the following:
      Your environment was created with Azure Private Single Server database setup. If you would like to start your cluster with private flexible server database, you have to change your environment network setup.
    2. If --database-type=FLEXIBLE_SERVER parameter is not set, CDP launches a cluster with a Single Server.

Limitations

The following limitations currently apply:

In order to set up this feature, you should review Azure prerequisites and then you can enable Flexible Server during Azure environment registration in CDP as described in the linked documentation: