Database Requirements

Cloudera Manager and CDH come packaged with an embedded PostgreSQL database for use in non-production environments. The embedded PostgreSQL database is not supported in production environments. For production environments, you must configure your cluster to use dedicated external databases.

After installing a database, upgrade to the latest patch and apply appropriate updates. Available updates may be specific to the operating system on which it is installed.

Notes:

  • Cloudera recommends that you use the default versions of databases that correspond to the operating system of your cluster nodes. Refer to the operating system's documentation to verify support if you choose to use a database other than the default. Note that Hue requires the default MySQL/MariaDB version (if used) of the operating system on which it is installed. For more information, see Hue Databases.
  • Use UTF8 encoding for all custom databases. MySQL and MariaDB must use the MySQL utf8 encoding, not utf8mb4.
  • For MySQL 5.7, you must install the MySQL-shared-compat or MySQL-shared package. This is required for the Cloudera Manager Agent installation.
  • MySQL GTID-based replication is not supported.
  • Both the Community and Enterprise versions of MySQL are supported, as well as MySQL configured by the AWS RDS service.
  • Before upgrading from CDH 5 to CDH 6, check the value of the COMPATIBLE initialization parameter in the Oracle Database using the following SQL query: 
    SELECT name, value FROM v$parameter WHERE name = 'compatible'
    The default value is 12.2.0. If the parameter has a different value, you can set it to the default as shown in the Oracle Database Upgrade Guide.
MySQL Support across Cloudera Enterprise 6 Releases
MySQL Version Cloudera Enterprise 6.x
5.1 (default for RHEL/CentOS/OEL 6)
5.5 (default for Debian 8.9)
5.6
5.7 (default for Ubuntu 16.04, 18.04 LTS)
MariaDB Support across Cloudera Enterprise 6 Releases
MariaDB Version Cloudera Enterprise 6.3 Cloudera Enterprise 6.2 Cloudera Enterprise 6.0 - 6.1
5.5 (default for RHEL/CentOS/OEL 7)
10.0 (default for SLES 12 SP2/SP3, Debian 8.9, Ubuntu 16.04 LTS)
10.1 (default for Debian 9, Ubuntu 18.04 LTS)  
10.2 (default for SLES 12 SP4) Recommend 10.0 because of the known issue OPSAPS-52340.
PostgreSQL Support across Cloudera Enterprise 6 Releases
PostgreSQL Version Cloudera Enterprise 6.1 - 6.3 Cloudera Enterprise 6.0
8.4 (default for RHEL/CentOS/OEL 6)
9.2 (default for RHEL/CentOS/OEL 7)
9.4 (default for Debian 8.9)
9.5 (default for Ubuntu 16.04 LTS)  
9.6 (default for SLES 12 SP2/SP3, Debian 9)
10.x (default for Ubuntu 18.04 LTS)  
Oracle Support across Cloudera Enterprise 6 Releases
Oracle Version Cloudera Enterprise 6.x
12.2 (default for RHEL/CentOS/OEL 6, 7)

RDBMS High Availability Support

Various Cloudera components rely on backing RDBMS services as critical infrastructure. You may require Cloudera components to support deployment in environments where RDBMS services are made highly-available. High availability (HA) solutions for RDBMS are implementation-specific, and can create constraints or behavioral changes in Cloudera components.

This section clarifies the support state and identifies known issues and limitations for HA deployments.

High Availability vs. Load Balancing

Understanding the difference between HA and load balancing is important for Cloudera components, which are designed to assume services are provided by a single RDBMS instance. Load balancing distributes operations across multiple RDBMS services in parallel, while HA focuses on service continuity. Load balanced deployments are often used as part of HA strategies to overcome demands of monitoring and failover management in an HA environment. While less easier to implement, load-balanced deployments require applications tailored to the behavior and limitations of the particular technology.

Support Statement: Cloudera components are not designed for and do not support load balanced deployments of any kind. Any HA strategy involving multiple active RDBMS services must ensure all connections are routed to a single RDBMS service at any given time, regardless of vendor or HA implementation/technology.

General High Availability Support

Cloudera supports various RDBMS options, each of which have multiple possible strategies to implement HA. Cloudera cannot reasonably test and certify on each strategy for each RDBMS. Cloudera expects HA solutions for RDBMS to be transparent to Cloudera software, and therefore are not supported and debugged by Cloudera. It is the responsibility of the customer to provision, configure, and manage the RDBMS HA deployment, so that Cloudera software behaves as it would when interfacing with a single, non-HA service. Cloudera will support and help customers troubleshoot issues when a cluster has HA enabled. While diagnosing database-related problems in Cloudera components, customers may be required to temporarily disable or bypass HA mechanisms for troubleshooting purposes. If an HA-related issue is found, it is the responsibility of the customer to engage with the database vendor so that a solution to that issue can be found.

Support Statement: Cloudera Support may require customers to temporarily bypass HA layers and connect directly to supported RDBMS back-ends to troubleshoot issues. Issues observed only when connected through HA layers are the responsibility of the customer DBA staff to resolve.

Vendor-Specific Notes

Oracle RAC:

  • Cloudera supports Oracle Exadata and RAC instances when they serve as back-end databases for CDH components without HA. Cloudera software is designed with the assumption of a single database instance, and supports normal operations between Cloudera Enterprise and Oracle Exadata (or RAC) in such an environment.
  • Cloudera is an Oracle Partner Network Gold member, allowing us to download and use Oracle commercial software (such as RAC) for development and testing purposes.

MySQL Asynchronous Replication:

  • Supported, tested, and certified
  • Master/master or master/slave topographies are acceptable
  • You must disable Global Transaction Identifiers (GTID)
  • You must use the InnoDB storage engine

MySQL HA with Oracle Clusterware:

MySQL InnoDB Cluster:

  • Prohibited
  • Requires enabling GTIDs

MySQL DRBD:

  • Older HA tech stack for MySQL, does distributed block writes at OS kernel layer
  • Does not add additional semantics or requirements
  • Does have performance tradeoffs for write operations
  • Poorly suited to write-intensive use cases (e.g. Navigator)

MySQL Cluster (NDB):

  • Prohibited
  • Very different performance, management and operational characteristics from InnoDB storage engine

Galera Cluster (Percona Cluster, MariaDB Cluster):

  • Prohibited
  • Adds cluster-wide optimistic locking. This can cause unexpected deadlock errors at commit, or worse, undetected logical database corruption caused by naive retry logic in Cloudera applications