This is the documentation for Cloudera Manager 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Cloudera Manager and Managed Service Databases

Cloudera Manager uses databases to store information about the Cloudera Manager configuration, as well as information such as the health of the system or task progress. To facilitate rapid completion of simple installations, the Cloudera Manager can install and configure an embedded PostgreSQL database as part of the Cloudera Manager installation process. This automatically installed database is referred to as an embedded PostgreSQL database. In addition, some CDH services use databases and are automatically configured to use a default database. If you plan to use the embedded and default databases provided during the Cloudera Manager installation, see Installation Path A - Automated Installation by Cloudera Manager.

While the embedded database is a useful option for getting started quickly, Cloudera Manager also allows you to opt to use your own PostgreSQL, MySQL, or Oracle database for the Cloudera Manager Server and services that use databases. To learn more about database options or if you are unsure whether or not using the embedded database is right for your environment, continue with the following sections.

What Databases Must Be Installed

The Cloudera Manager Server, Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, and Cloudera Navigator Audit Server all require databases:
  • Cloudera Manager - Contains all the information about what services you have configured, their role assignments, all configuration history, commands, users, and running processes. This is a relatively small database (<100 MB), and is the most important to back up. A monitoring database contains monitoring information about service and host status. In large clusters, this database can grow large.
  • Activity Monitor - Contains information about past activities. In large clusters, this database can grow large.
  • Reports Manager - Keeps track of disk utilization and processing activities over time. Medium-sized.
  • Hive Metastore - Contains Hive metadata. Relatively small.
  • Sentry Server - Contains authorization metadata. Relatively small.
  • Cloudera Navigator Audit Server - Contains auditing information. In large clusters, this database can grow large.
The Host Monitor and Service Monitor have an internal datastore. Configuring an Activity Monitor database is only necessary if there's a MapReduce service in the deployment.
Cloudera Manager provides three install paths:
  • Path A automatically installs an embedded PostgreSQL database to meet the requirements of the services. This path reduces the number of installation tasks you must complete, as well as the number of choices to make. In Path A you can also optionally choose to create external databases for Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, and Cloudera Navigator Audit Server.
  • Path B and Path C require you to create databases for the Cloudera Manager Server, Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, and Cloudera Navigator Audit Server.

Using an external database requires more input and intervention as you either install databases or gather information about existing databases. These paths also provides greater flexibility in choosing database types and configurations.

Cloudera Manager supports deploying different types of databases in a single environment, but doing so may create unexpected complications. Cloudera recommends choosing one of the supported database providers to use for all of the Cloudera databases.

In most cases, you should install databases and services on the same host. For example, if you create the database for Activity Monitor on myhost1, then you should typically assign the Activity Monitor role to myhost1. You will assign the Activity Monitor and Reports Manager roles in the Cloudera Manager wizard during the install or upgrade process. After completing the install or upgrade process, you can also modify role assignments in the Management services pages of Cloudera Manager. While it is true that database location is changeable, before beginning an installation or upgrade, you should decide which hosts you will use. The JDBC connector for your database must be installed on the hosts where you assign the Activity Monitor and Reports Manager roles.

It is possible to install the database and services on different hosts. Separating databases from services is more likely to occur in larger deployments and in cases where more sophisticated database administrators actively choose to establish such a configuration. For example, databases and services might be separated if your environment includes Oracle databases that will be separately managed by Oracle database administrators.

Setting up the Cloudera Manager Server Database

The Cloudera Manager Server database stores information about service and host configurations. You can use an embedded PostgreSQL database or an external database.

Installing and Starting the Cloudera Manager Server Embedded Database

If you are using Installation Path B - Manual Installation Using Cloudera Manager Packages for a demonstration or proof of concept deployment, and you want to use an embedded PostgreSQL database for the Cloudera Management Server, use this procedure to install and start the database:
  1. Install the embedded PostgreSQL database packages:
    • Red Hat-compatible, if you have a yum repo configured:
      $ sudo yum install cloudera-manager-server-db-2
    • Red Hat-compatible, if you're transferring RPMs manually:
      $ sudo yum --nogpgcheck localinstall cloudera-manager-server-db-2.noarch.rpm
    • SLES:
      $ sudo zypper install cloudera-manager-server-db-2
    • Debian/Ubuntu
      $ sudo apt-get install cloudera-manager-server-db-2
  2. Start the PostgreSQL database:
    $ sudo service cloudera-scm-server-db start

Preparing an Cloudera Manager Server External Database

Before performing these steps, install and configure a database as described in MySQL Database, Oracle Database, or External PostgreSQL Database.

  1. Run the scm_prepare_database.sh script:
    • Installer or package install
      /usr/share/cmf/schema/scm_prepare_database.sh
    • Tarball install
      <tarball root>/share/cmf/schema/scm_prepare_database.sh
    on the host where the Cloudera Manager Server package is installed. The script prepares the database by:
    • Creating the Cloudera Manager Server database configuration file.
    • Creating a database for the Cloudera Manager Server to use. This is optional and is only completed if options are specified.
    • Setting up a user account for the Cloudera Manager Server. This is optional and is only completed if options are specified.
  2. Remove the embedded PostgreSQL properties file:
    • Installer or package install
      /etc/cloudera-scm-server/db.mgmt.properties
    • Tarball install
      <tarball root>/etc/cloudera-scm-server/db.mgmt.properties
    if it exists.

scm_prepare_database.sh Syntax

scm_prepare_database.sh database-type [options] database-name username password
  Note: You can also run scm_prepare_database.sh without options to see the syntax.
Table 1. Required Parameters
Parameter Description
database-type One of the supported database types:
  • MySQL - mysql
  • Oracle - oracle
  • PostgreSQL - postgresql
database-name The name of the Cloudera Manager Server database you want to create or use.
username The username for the Cloudera Manager Server database you want to create or use.
password The password for the Cloudera Manager Server database you want to create or use. If you don't specify the password on the command line, the script will prompt you to enter it.
Table 2. Options
Option Description
-h or --host The IP address or hostname of the host where the database is installed. The default is to use the local host.
-P or --port The port number to use to connect to the database. The default port is 3306 for MySQL, 5432 for PostgreSQL, and 1521 for Oracle. This option is used for a remote connection only.
-u or --user The admin username for the database application. For -u, there should not be a space between the option and the provided value. If this option is supplied, the script will create a user and database for the Cloudera Manager Server; otherwise, it will use the existing user and database you created previously.
-p or --password The admin password for the database application. The default is no password. For -p, there should not be a space between the option and the provided value.
--scm-host The hostname where the Cloudera Manager Server is installed. Omit if the Cloudera Manager server and the database are installed on the same host.
--config-path The path to the Cloudera Manager Server configuration files. The default is /etc/cloudera-scm-server.
--schema-path The path to the Cloudera Manager schema files. The default is /usr/share/cmf/schema (the location of the script).
-f The script will not stop if an error is encountered.
-? or --help Display help.

Example 1: Running the script when MySQL is installed on another host

This example explains how to run the script on the Cloudera Manager Server host (myhost2) and create and use a temporary MySQL user account to connect to MySQL remotely on the MySQL host (myhost1).
  1. On myhost1's MySQL prompt, create a temporary user who can connect from myhost2:
    mysql> grant all on *.* to 'temp'@'%' identified by 'temp' with grant option;
    Query OK, 0 rows affected (0.00 sec)
  2. On the Cloudera Manager Server host (myhost2), run the script:
    $ sudo /usr/share/cmf/schema/scm_prepare_database.sh mysql -h myhost1.sf.cloudera.com -utemp -ptemp --scm-host myhost2.sf.cloudera.com scm scm scm
    Looking for MySQL binary
    Looking for schema files in /usr/share/cmf/schema
    Verifying that we can write to /etc/cloudera-scm-server
    Creating SCM configuration file in /etc/cloudera-scm-server
    Executing: /usr/java/jdk1.6.0_31/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
    [ main] DbCommandExecutor INFO Successfully connected to database.
    All done, your SCM database is configured correctly!
  3. On myhost1, delete the temporary user:
    mysql> drop user 'temp'@'%';
    Query OK, 0 rows affected (0.00 sec)

Example 2: Running the script to configure Oracle

[root@rhel55-6 ~]# /usr/share/cmf/schema/scm_prepare_database.sh -h cm-oracle.example.com oracle orcl sample_user sample_pass
Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
Executing: /usr/java/jdk1.6.0_31/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
[ main] DbCommandExecutor INFO Successfully connected to database.
All done, your SCM database is configured correctly!

Example 3: Running the script when PostgreSQL is co-located with the Cloudera Manager Server

This example assumes that you have already created the Cloudera Management Server database and database user, naming both scm.
$ /usr/share/cmf/schema/scm_prepare_database.sh postgresql scm scm scm

External Databases for Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, and Cloudera Navigator Audit Server

You can configure Cloudera Manager to use an external database for Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, and Cloudera Navigator Audit Server. If you choose this option, you must create the databases before you run the Cloudera Manager installation wizard. For more information, see the instructions in  MySQL Database, Oracle Database, or External PostgreSQL Database.

External Databases for Hue, and Oozie

Hue and Oozie are automatically configured with databases, but you can configure these services to use external databases after Cloudera Manager is installed.

Configuring an External Database for Hue

By default Hue is configured to use the SQLite database. If you want to use an external database for Hue, see Using an External Database for Hue.

Configuring an External Database for Oozie

By default Oozie is configured to use the Derby database. If you want to use an external database for Oozie, see Using an External Database for Oozie.
Page generated September 3, 2015.