Cloudera Manager and Managed Service Datastores

Cloudera Manager uses various databases and datastores to store information about the Cloudera Manager configuration, as well as information such as the health of the system or task progress. For quick, simple installations, Cloudera Manager can install and configure an embedded PostgreSQL database as part of the Cloudera Manager installation process. In addition, some CDH services use databases and are automatically configured to use a default database. If you plan to use the embedded and default databases provided during the Cloudera Manager installation, see Installation Path A - Automated Installation by Cloudera Manager (Non-Production Mode) and Embedded PostgreSQL Database.

Although the embedded database is useful for getting started quickly, you can also use your own PostgreSQL, MariaDB, MySQL, or Oracle database for the Cloudera Manager Server and services that use databases.

For information about planning, managing, and backing up Cloudera Manager data stores, see Storage Space Planning for Cloudera Manager.

Required Databases

The Cloudera Manager Server, Oozie Server, Sqoop Server, Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server all require databases. The type of data contained in the databases and their estimated sizes are as follows:
  • Cloudera Manager - Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (< 100 MB) is the most important to back up.
  • Oozie Server - Contains Oozie workflow, coordinator, and bundle data. Can grow very large.
  • Sqoop Server - Contains entities such as the connector, driver, links and jobs. Relatively small.
  • Activity Monitor - Contains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed.
  • Reports Manager - Tracks disk utilization and processing activities over time. Medium-sized.
  • Hive Metastore Server - Contains Hive metadata. Relatively small.
  • Hue Server - Contains user account information, job submissions, and Hive queries. Relatively small.
  • Sentry Server - Contains authorization metadata. Relatively small.
  • Cloudera Navigator Audit Server - Contains auditing information. In large clusters, this database can grow large.
  • Cloudera Navigator Metadata Server - Contains authorization, policies, and audit report metadata. Relatively small.
See Backing Up Databases.

The Cloudera Manager Service Host Monitor and Service Monitor roles have an internal datastore.

Cloudera Manager provides three installation paths:
  • Path A automatically installs an embedded PostgreSQL database to meet the requirements of the services. This path reduces the number of installation tasks to complete and choices to make. In Path A you use the embedded PostgreSQL database for the Cloudera Manager Server and can optionally choose to create external databases for Oozie Server, Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server. If you choose to use PostgreSQL for Sqoop Server you must create an external database.
  • Path B and Path C require you to create databases for the Cloudera Manager Server, Oozie Server, Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server. If you choose to use PostgreSQL for Sqoop Server you must create an external database.

Using an external database requires more input and intervention as you install databases or gather information about existing ones. These paths also provide greater flexibility in choosing database types and configurations.

Cloudera Manager supports deploying different types of databases in a single environment, but doing so can create unexpected complications. Cloudera recommends choosing one supported database provider for all of the Cloudera databases.

In most cases, you should install databases and services on the same host. For example, if you create the database for Activity Monitor on myhost1, then you should typically assign the Activity Monitor role to myhost1. You assign the Activity Monitor and Reports Manager roles in the Cloudera Manager wizard during the installation or upgrade process. After completing the installation or upgrade process, you can also modify role assignments in the Management services pages of Cloudera Manager. Although the database location is changeable, before beginning an installation or upgrade, you should decide which hosts to use. The JDBC connector for your database must be installed on the hosts where you assign the Activity Monitor and Reports Manager roles.

You can install the database and services on different hosts. Separating databases from services is more likely in larger deployments and in cases where more sophisticated database administrators choose such a configuration. For example, databases and services might be separated if your environment includes Oracle databases that are managed separately by Oracle database administrators.

Setting up the Cloudera Manager Server Database

The Cloudera Manager Server database stores information about service and host configurations. For demonstration and proof-of-concept deployments you can use an embedded PostgreSQL database. See Embedded PostgreSQL Database.

Preparing a Cloudera Manager Server External Database

Before performing these steps, install and configure a database server as described in Configuring and Starting the MariaDB Server, Configuring and Starting the MySQL Server, Configuring the Oracle Server, or Configuring and Starting the PostgreSQL Server.

  1. Run the scm_prepare_database.sh script on the host where the Cloudera Manager Server package is installed:
    • Installer or package install
      /usr/share/cmf/schema/scm_prepare_database.sh database-type [options] database-name username password
    • Tarball install
      <tarball root>/share/cmf/schema/scm_prepare_database.sh database-type [options] database-name username password
    The script prepares the database by:
    • Creating the Cloudera Manager Server database configuration file.
    • (MariaDB, MySQL, and PostgreSQL) Creating a database for the Cloudera Manager Server to use.
    • (MariaDB, MySQL, and PostgreSQL) Setting up a user account for the Cloudera Manager Server.
  2. Remove the embedded PostgreSQL properties file if it exists:
    • Installer or package install
      /etc/cloudera-scm-server/db.mgmt.properties
    • Tarball install
      <tarball root>/etc/cloudera-scm-server/db.mgmt.properties
After successfully running the scm_prepare_database.sh script, return to Establish Your Cloudera Manager Repository Strategy.

scm_prepare_database.sh Syntax

scm_prepare_database.sh database-type [options] database-name username password
Required Parameters
Parameter Description
database-type One of the supported database types:
  • MariaDB - mysql
  • MySQL - mysql
  • Oracle - oracle
  • PostgreSQL - postgresql
database-name The name of the Cloudera Manager Server database to create or use.
username The username for the Cloudera Manager Server database to create or use.
password The password for the Cloudera Manager Server database to create or use. If you do not specify the password on the command line, the script prompts you to enter it.
Options
Option Description
-h or --host The IP address or hostname of the host where the database is installed. The default is to use the local host.
-P or --port The port number to use to connect to the database. The default port is 3306 for MariaDB, 3306 for MySQL, 5432 for PostgreSQL, and 1521 for Oracle. This option is used for a remote connection only.
-u or --user The admin username for the database application. For -u, no space occurs between the option and the provided value. If this option is supplied, the script creates a user and database for the Cloudera Manager Server; otherwise, it uses the user and database you created previously.
-p or --password The admin password for the database application. The default is no password. For -p, no space occurs between the option and the provided value.
--scm-host The hostname where the Cloudera Manager Server is installed. Omit if the Cloudera Manager Server and the database are installed on the same host.
--config-path The path to the Cloudera Manager Server configuration files. The default is /etc/cloudera-scm-server.
--schema-path The path to the Cloudera Manager schema files. The default is /usr/share/cmf/schema (the location of the script).
-f The script does not stop if an error occurs.
-? or --help Display help.

Example 1: Running the script when MySQL is installed on another host

This example explains how to run the script on the Cloudera Manager Server host (myhost2) and create and use a temporary MySQL user account to connect to MySQL remotely on the MySQL host (myhost1).
  1. At the myhost1 MySQL prompt, create a temporary user who can connect from myhost2:
    mysql> grant all on *.* to 'temp'@'%' identified by 'temp' with grant option;
    Query OK, 0 rows affected (0.00 sec)
  2. On the Cloudera Manager Server host (myhost2), run the script:
    $ sudo /usr/share/cmf/schema/scm_prepare_database.sh mysql -h myhost1.sf.cloudera.com -utemp -ptemp --scm-host myhost2.sf.cloudera.com scm scm scm
    Looking for MySQL binary
    Looking for schema files in /usr/share/cmf/schema
    Verifying that we can write to /etc/cloudera-scm-server
    Creating SCM configuration file in /etc/cloudera-scm-server
    Executing: /usr/java/jdk1.6.0_31/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
    [ main] DbCommandExecutor INFO Successfully connected to database.
    All done, your SCM database is configured correctly!
  3. On myhost1, delete the temporary user:
    mysql> drop user 'temp'@'%';
    Query OK, 0 rows affected (0.00 sec)

Example 2: Running the script to configure Oracle

[root@rhel55-6 ~]# /usr/share/cmf/schema/scm_prepare_database.sh -h cm-oracle.example.com oracle orcl sample_user sample_pass
Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
Executing: /usr/java/jdk1.6.0_31/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
[ main] DbCommandExecutor INFO Successfully connected to database.
All done, your SCM database is configured correctly!

Example 3: Running the script when PostgreSQL is co-located with the Cloudera Manager Server

This example assumes that you have already created the Cloudera Management Server database and database user, naming both scm.
$ /usr/share/cmf/schema/scm_prepare_database.sh postgresql scm scm scm

External Databases for Oozie Server, Sqoop Server, Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server

You can configure Cloudera Manager to use an external database for Oozie Server, Sqoop Server, Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server. If you choose this option, you must create the databases before you run the Cloudera Manager installation wizard. For more information, see the instructions in Configuring an External Database for Oozie, Configuring an External Database for Sqoop, MariaDB Database, MySQL Database, Oracle Database, and External PostgreSQL Database.

External Databases for Hue

By default Hue is configured to use the SQLite database. Cloudera strongly recommends an external database for clusters with multiple Hue users. See Hue Custom Databases.