Cloudera Manager and Managed Service Databases
Cloudera Manager uses databases to store information about the Cloudera Manager configuration, as well as information such as the health of the system or task progress. To facilitate rapid completion of simple installations, the Cloudera Manager can install and configure an embedded PostgreSQL database as part of the Cloudera Manager installation process. This automatically installed database is referred to as an embedded PostgreSQL database. In addition, some CDH services use databases and are automatically configured to use a default database. If you plan to use the embedded and default databases provided during the Cloudera Manager installation, see Installation Path A - Automated Installation by Cloudera Manager.
While the embedded database is a useful option for getting started quickly, Cloudera Manager also allows you to opt to use your own PostgreSQL, MySQL, or Oracle database for the Cloudera Manager Server and services that use databases. To learn more about database options or if you are unsure whether or not using the embedded database is right for your environment, continue with the following sections.
What Databases Must Be Installed
- Cloudera Manager - Contains all the information about what services you have configured, their role assignments, all configuration history, commands, users, and running processes. This is a relatively small database (<100 MB), and is the most important to back up. A monitoring database contains monitoring information about service and host status. In large clusters, this database can grow large.
- Activity Monitor - Contains information about past activities. In large clusters, this database can grow large.
- Reports Manager - Keeps track of disk utilization and processing activities over time. Medium-sized.
- Hive Metastore - Contains Hive metadata. Relatively small.
- Sentry Server - Contains authorization metadata. Relatively small.
- Cloudera Navigator Audit Server - Contains auditing information. In large clusters, this database can grow large.
- Path A automatically installs an embedded PostgreSQL database to meet the requirements of the services. This path reduces the number of installation tasks you must complete, as well as the number of choices to make. In Path A you can also optionally choose to create external databases for Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, and Cloudera Navigator Audit Server.
- Path B and Path C require you to create databases for the Cloudera Manager Server, Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, and Cloudera Navigator Audit Server.
Using an external database requires more input and intervention as you either install databases or gather information about existing databases. These paths also provides greater flexibility in choosing database types and configurations.
Cloudera Manager supports deploying different types of databases in a single environment, but doing so may create unexpected complications. Cloudera recommends choosing one of the supported database providers to use for all of the Cloudera databases.
In most cases, you should install databases and services on the same host. For example, if you create the database for Activity Monitor on myhost1, then you should typically assign the Activity Monitor role to myhost1. You will assign the Activity Monitor and Reports Manager roles in the Cloudera Manager wizard during the install or upgrade process. After completing the install or upgrade process, you can also modify role assignments in the Management services pages of Cloudera Manager. While it is true that database location is changeable, before beginning an installation or upgrade, you should decide which hosts you will use. The JDBC connector for your database must be installed on the hosts where you assign the Activity Monitor and Reports Manager roles.
It is possible to install the database and services on different hosts. Separating databases from services is more likely to occur in larger deployments and in cases where more sophisticated database administrators actively choose to establish such a configuration. For example, databases and services might be separated if your environment includes Oracle databases that will be separately managed by Oracle database administrators.
Setting up the Cloudera Manager Server Database
The Cloudera Manager Server database stores information about service and host configurations. You can use an embedded PostgreSQL database or an external database.
Installing and Starting the Cloudera Manager Server Embedded Database
- Install the embedded PostgreSQL database packages:
- Red
Hat-compatible, if you have a yum repo configured:
$ sudo yum install cloudera-manager-server-db-2
- Red
Hat-compatible, if you're transferring RPMs manually:
$ sudo yum --nogpgcheck localinstall cloudera-manager-server-db-2.noarch.rpm
- SLES:
$ sudo zypper install cloudera-manager-server-db-2
- Debian/Ubuntu
$ sudo apt-get install cloudera-manager-server-db-2
- Red
Hat-compatible, if you have a yum repo configured:
- Start the PostgreSQL
database:
$ sudo service cloudera-scm-server-db start
Preparing an Cloudera Manager Server External Database
Before performing these steps, install and configure a database as described in MySQL Database, Oracle Database, or External PostgreSQL Database.
- Run the scm_prepare_database.sh script:
- Installer or package install
/usr/share/cmf/schema/scm_prepare_database.sh
- Tarball install
<tarball root>/share/cmf/schema/scm_prepare_database.sh
- Creating the Cloudera Manager Server database configuration file.
- Creating a database for the Cloudera Manager Server to use. This is optional and is only completed if options are specified.
- Setting up a user account for the Cloudera Manager Server. This is optional and is only completed if options are specified.
- Installer or package install
- Remove the embedded PostgreSQL properties file:
- Installer or package install
/etc/cloudera-scm-server/db.mgmt.properties
- Tarball install
<tarball root>/etc/cloudera-scm-server/db.mgmt.properties
- Installer or package install
scm_prepare_database.sh Syntax
scm_prepare_database.sh database-type [options] database-name username password
Parameter | Description |
---|---|
database-type | One of the supported database types:
|
database-name | The name of the Cloudera Manager Server database you want to create or use. |
username | The username for the Cloudera Manager Server database you want to create or use. |
password | The password for the Cloudera Manager Server database you want to create or use. If you don't specify the password on the command line, the script will prompt you to enter it. |
Option | Description |
---|---|
-h or --host | The IP address or hostname of the host where the database is installed. The default is to use the local host. |
-P or --port | The port number to use to connect to the database. The default port is 3306 for MySQL, 5432 for PostgreSQL, and 1521 for Oracle. This option is used for a remote connection only. |
-u or --user | The admin username for the database application. For -u, there should not be a space between the option and the provided value. If this option is supplied, the script will create a user and database for the Cloudera Manager Server; otherwise, it will use the existing user and database you created previously. |
-p or --password | The admin password for the database application. The default is no password. For -p, there should not be a space between the option and the provided value. |
--scm-host | The hostname where the Cloudera Manager Server is installed. Omit if the Cloudera Manager server and the database are installed on the same host. |
--config-path | The path to the Cloudera Manager Server configuration files. The default is /etc/cloudera-scm-server. |
--schema-path | The path to the Cloudera Manager schema files. The default is /usr/share/cmf/schema (the location of the script). |
-f | The script will not stop if an error is encountered. |
-? or --help | Display help. |
Example 1: Running the script when MySQL is installed on another host
- On myhost1's MySQL prompt, create a temporary user who can connect from myhost2:
mysql> grant all on *.* to 'temp'@'%' identified by 'temp' with grant option; Query OK, 0 rows affected (0.00 sec)
- On the Cloudera Manager Server host (myhost2), run the script:
$ sudo /usr/share/cmf/schema/scm_prepare_database.sh mysql -h myhost1.sf.cloudera.com -utemp -ptemp --scm-host myhost2.sf.cloudera.com scm scm scm Looking for MySQL binary Looking for schema files in /usr/share/cmf/schema Verifying that we can write to /etc/cloudera-scm-server Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/java/jdk1.6.0_31/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. [ main] DbCommandExecutor INFO Successfully connected to database. All done, your SCM database is configured correctly!
- On myhost1, delete the temporary user:
mysql> drop user 'temp'@'%'; Query OK, 0 rows affected (0.00 sec)
Example 2: Running the script to configure Oracle
[root@rhel55-6 ~]# /usr/share/cmf/schema/scm_prepare_database.sh -h cm-oracle.example.com oracle orcl sample_user sample_pass Verifying that we can write to /etc/cloudera-scm-server Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/java/jdk1.6.0_31/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. [ main] DbCommandExecutor INFO Successfully connected to database. All done, your SCM database is configured correctly!
Example 3: Running the script when PostgreSQL is co-located with the Cloudera Manager Server
This example assumes that you have already created the Cloudera Management Server database and database user, naming both scm.$ /usr/share/cmf/schema/scm_prepare_database.sh postgresql scm scm scm
External Databases for Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, and Cloudera Navigator Audit Server
External Databases for Hue, and Oozie
Hue and Oozie are automatically configured with databases, but you can configure these services to use external databases after Cloudera Manager is installed.
Configuring an External Database for Hue
By default Hue is configured to use the SQLite database. If you want to use an external database for Hue, see Using an External Database for Hue.Configuring an External Database for Oozie
By default Oozie is configured to use the Derby database. If you want to use an external database for Oozie, see Using an External Database for Oozie.<< Permission Requirements | Embedded PostgreSQL Database >> | |