Cloudera Manager and Managed Service Data Stores
Cloudera Manager uses various databases and data stores to store information about the Cloudera Manager configuration, as well as information such as the health of the system or task progress. For quick, simple installations, Cloudera Manager can install and configure an embedded PostgreSQL database as part of the Cloudera Manager installation process. In addition, some CDH services use databases and are automatically configured to use a default database. If you plan to use the embedded and default databases provided during the Cloudera Manager installation, see Installation Path A - Automated Installation by Cloudera Manager.
Although the embedded database is useful for getting started quickly, you can also use your own PostgreSQL, MySQL, or Oracle database for the Cloudera Manager Server and services that use databases.
For information about planning, managing, and backing up Cloudera Manager data stores, see Storage Space Planning for Cloudera Manager.
Required Databases
- Cloudera Manager - Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (<100 MB) is the most important to back up.
- Oozie Server - Contains Oozie workflow, coordinator, and bundle data. Can grow very large.
- Sqoop Server - Contains entities such as the connector, driver, links and jobs. Relatively small.
- Activity Monitor - Contains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed.
- Reports Manager - Tracks disk utilization and processing activities over time. Medium-sized.
- Hive Metastore Server - Contains Hive metadata. Relatively small.
- Sentry Server - Contains authorization metadata. Relatively small.
- Cloudera Navigator Audit Server - Contains auditing information. In large clusters, this database can grow large.
- Cloudera Navigator Metadata Server - Contains authorization, policies, and audit report metadata. Relatively small.
The Cloudera Manager Service Host Monitor and Service Monitor roles have an internal datastore.
- Path A automatically installs an embedded PostgreSQL database to meet the requirements of the services. This path reduces the number of installation tasks to complete and choices to make. In Path A you use the embedded PostgreSQL database for the Cloudera Manager Server and can optionally choose to create external databases for Oozie Server, Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server. If you choose to use PostgreSQL for Sqoop Server you must create an external database.
- Path B and Path C require you to create databases for the Cloudera Manager Server, Oozie Server, Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server. If you choose to use PostgreSQL for Sqoop Server you must create an external database.
Using an external database requires more input and intervention as you install databases or gather information about existing ones. These paths also provide greater flexibility in choosing database types and configurations.
Cloudera Manager supports deploying different types of databases in a single environment, but doing so can create unexpected complications. Cloudera recommends choosing one supported database provider for all of the Cloudera databases.
In most cases, you should install databases and services on the same host. For example, if you create the database for Activity Monitor on myhost1, then you should typically assign the Activity Monitor role to myhost1. You assign the Activity Monitor and Reports Manager roles in the Cloudera Manager wizard during the installation or upgrade process. After completing the installation or upgrade process, you can also modify role assignments in the Management services pages of Cloudera Manager. Although the database location is changeable, before beginning an installation or upgrade, you should decide which hosts to use. The JDBC connector for your database must be installed on the hosts where you assign the Activity Monitor and Reports Manager roles.
You can install the database and services on different hosts. Separating databases from services is more likely in larger deployments and in cases where more sophisticated database administrators choose such a configuration. For example, databases and services might be separated if your environment includes Oracle databases that are managed separately by Oracle database administrators.
Setting up the Cloudera Manager Server Database
The Cloudera Manager Server database stores information about service and host configurations. You can use an embedded PostgreSQL database or an external database.
Installing and Starting the Cloudera Manager Server Embedded Database
- Install the embedded PostgreSQL database packages:
OS Command Red Hat-compatible, if you have a yum repo configured $ sudo yum install cloudera-manager-server-db-2
Red Hat-compatible, if you're transferring RPMs manually sudo yum --nogpgcheck localinstall cloudera-manager-server-db-2.noarch.rpm
SLES $ sudo zypper install cloudera-manager-server-db-2
Ubuntu or Debian $ sudo apt-get install cloudera-manager-server-db-2
- Start the PostgreSQL database:
$ sudo service cloudera-scm-server-db start
Preparing a Cloudera Manager Server External Database
Before performing these steps, install and configure a database server as described in Configuring and Starting the MySQL Server, Configuring the Oracle Server, or Configuring and Starting the PostgreSQL Server.
- Run the scm_prepare_database.sh script on the
host where the Cloudera Manager Server package is installed:
- Installer or package install
/usr/share/cmf/schema/scm_prepare_database.sh database-type [options] database-name username password
- Tarball install
<tarball root>/share/cmf/schema/scm_prepare_database.sh database-type [options] database-name username password
- Creating the Cloudera Manager Server database configuration file.
- (MySQL and PostgreSQL) Creating a database for the Cloudera Manager Server to use.
- (MySQL and PostgreSQL) Setting up a user account for the Cloudera Manager Server.
- Installer or package install
- Remove the embedded PostgreSQL properties file if it exists:
- Installer or package install
/etc/cloudera-scm-server/db.mgmt.properties
- Tarball install
<tarball root>/etc/cloudera-scm-server/db.mgmt.properties
- Installer or package install
scm_prepare_database.sh Syntax
scm_prepare_database.sh database-type [options] database-name username password
Parameter | Description |
---|---|
database-type | One of the supported database types:
|
database-name | The name of the Cloudera Manager Server database to create or use. |
username | The username for the Cloudera Manager Server database to create or use. |
password | The password for the Cloudera Manager Server database to create or use. If you do not specify the password on the command line, the script prompts you to enter it. |
Option | Description |
---|---|
-h or --host | The IP address or hostname of the host where the database is installed. The default is to use the local host. |
-P or --port | The port number to use to connect to the database. The default port is 3306 for MySQL, 5432 for PostgreSQL, and 1521 for Oracle. This option is used for a remote connection only. |
-u or --user | The admin username for the database application. For -u, no space occurs between the option and the provided value. If this option is supplied, the script creates a user and database for the Cloudera Manager Server; otherwise, it uses the user and database you created previously. |
-p or --password | The admin password for the database application. The default is no password. For -p, no space occurs between the option and the provided value. |
--scm-host | The hostname where the Cloudera Manager Server is installed. Omit if the Cloudera Manager Server and the database are installed on the same host. |
--config-path | The path to the Cloudera Manager Server configuration files. The default is /etc/cloudera-scm-server. |
--schema-path | The path to the Cloudera Manager schema files. The default is /usr/share/cmf/schema (the location of the script). |
-f | The script does not stop if an error occurs. |
-? or --help | Display help. |
Example 1: Running the script when MySQL is installed on another host
- At the myhost1 MySQL prompt, create a temporary user who can connect from myhost2:
mysql> grant all on *.* to 'temp'@'%' identified by 'temp' with grant option; Query OK, 0 rows affected (0.00 sec)
- On the Cloudera Manager Server host (myhost2), run the script:
$ sudo /usr/share/cmf/schema/scm_prepare_database.sh mysql -h myhost1.sf.cloudera.com -utemp -ptemp --scm-host myhost2.sf.cloudera.com scm scm scm Looking for MySQL binary Looking for schema files in /usr/share/cmf/schema Verifying that we can write to /etc/cloudera-scm-server Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/java/jdk1.6.0_31/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. [ main] DbCommandExecutor INFO Successfully connected to database. All done, your SCM database is configured correctly!
- On myhost1, delete the temporary user:
mysql> drop user 'temp'@'%'; Query OK, 0 rows affected (0.00 sec)
Example 2: Running the script to configure Oracle
[root@rhel55-6 ~]# /usr/share/cmf/schema/scm_prepare_database.sh -h cm-oracle.example.com oracle orcl sample_user sample_pass Verifying that we can write to /etc/cloudera-scm-server Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/java/jdk1.6.0_31/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. [ main] DbCommandExecutor INFO Successfully connected to database. All done, your SCM database is configured correctly!
External Databases for Oozie Server, Sqoop Server, Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server
External Databases for Hue
By default Hue is configured to use the SQLite database. Cloudera strongly recommends an external database for clusters with multiple Hue users. See Using an External Database for Hue Using Cloudera Manager.