External PostgreSQL Database
If you want to use an external PostgreSQL database, follow these procedures.
Installing the External PostgreSQL Server
- If you already have a PostgreSQL database set up, you can skip to the section Configuring and Starting the PostgreSQL Server to verify that your PostgreSQL configurations meet the requirements for Cloudera Manager.
- It is important that the data directory, which by default is /var/lib/postgresql/data/, is on a partition that has sufficient free space.
- Use one or more of the following commands to set the locale:
export LANGUAGE=en_US.UTF-8 export LANG=en_US.UTF-8 export LC_ALL=en_US.UTF-8 locale-gen en_US.UTF-8 dpkg-reconfigure locales
- Install PostgreSQL packages:
- Red Hat
$ sudo yum install postgresql-server
- SLES
$ sudo zypper install postgresql91-server
Note: This command will install PostgreSQL 9.1. If you want to install a different version, you can use zypper search postgresql to search for available versions. You should install version 8.4 or higher. - Debian/Ubuntu
$ sudo apt-get install postgresql
- Red Hat
Configuring and Starting the PostgreSQL Server
By default, PostgreSQL only accepts connections on the loopback interface. You must reconfigure PostgreSQL to accept connections from the Fully Qualified Domain Name (FQDN) of the hosts hosting the management roles. If you do not make these changes, the management processes will not be able to connect to and use the database on which they depend.
- Initialize the external PostgreSQL database. For some versions of PostgreSQL, this is done automatically the first time that you start the PostgreSQL server. In this case, issue the command:
$ sudo service postgresql start
In other versions, you must explicitly initialize the database using:$ sudo service postgresql initdb
See the PostgreSQL documentation for more details. - Enable MD5 authentication. Edit pg_hba.conf, which is usually found in /var/lib/pgsql/data or /etc/postgresql/8.4/main. Add the following line:
host all all 127.0.0.1/32 md5
If the default pg_hba.conf file contains the following line:host all all 127.0.0.1/32 ident
then the host line specifying md5 authentication shown above must be inserted before this ident line. Failure to do so may cause an authentication error when running the scm_prepare_database.sh script. You can modify the contents of the md5 line shown above to support different configurations. For example, if you want to access PostgreSQL from a different host, replace 127.0.0.1 with your IP address and update postgresql.conf, which is typically found in the same place as pg_hba.conf to include:listen_addresses = '*'
- Configure settings to ensure your system performs as expected. Update these settings in the /var/lib/pgsql/data/postgresql.conf or /var/lib/postgresql/data/postgresql.conf file. Settings vary based on cluster size and resources.
- Small clusters - For small to mid-sized clusters, consider the following suggestions as a starting point for settings. If resources are especially limited, consider reducing the buffer sizes and checkpoint segments further. Ongoing tuning may be required based on each host's resource utilization. For example, if Cloudera Manager is running on the same host as other roles, the following values may be acceptable:
- shared_buffers - 256MB
- wal_buffers - 8MB
- checkpoint_segments - 16
- checkpoint_completion_target - 0.9
- Large clusters - may contain up to 1000 hosts. For large clusters consider the following suggestions as a starting point for settings.
- max_connection - For large clusters, each database is typically hosted on a different host. The general rule is to allow each database on a host 100 maximum connections and then add 50 extra connections. As a result, in the normal case for large clusters, configure each of the five hosts that hosts a single database for 150 connections. You may have to increase the system resources available to PostgreSQL, as described at Connection Settings.
- shared_buffers - 1024MB. This requires that the operating system can allocate sufficient shared memory. See PostgreSQL information on Managing Kernel Resources for more information on setting kernel resources.
- wal_buffers - 16MB. This value is derived from the shared_buffers value. Setting wal_buffers to be approximately 3% of shared_buffers up to a maximum of approximately 16MB works well in most case.
- checkpoint_segments - 128. The PostgreSQL Tuning Guide recommends values between 32 and 256 for write-intensive systems, such as this one.
- checkpoint_completion_target - 0.9. This setting is only available in PostgreSQL 8.3 and later. These versions are highly recommended.
- Small clusters - For small to mid-sized clusters, consider the following suggestions as a starting point for settings. If resources are especially limited, consider reducing the buffer sizes and checkpoint segments further. Ongoing tuning may be required based on each host's resource utilization. For example, if Cloudera Manager is running on the same host as other roles, the following values may be acceptable:
- Configure the PostgreSQL server to start at boot.
- Red Hat
$ sudo /sbin/chkconfig postgresql on $ sudo /sbin/chkconfig --list postgresql postgresql 0:off 1:off 2:on 3:on 4:on 5:on 6:off
- SLES
$ sudo chkconfig --add postgresql
- Debian/Ubuntu
$ sudo chkconfig postgresql on
- Red Hat
- Start or restart the PostgreSQL database:
$ sudo service postgresql restart
Creating Databases for Cloudera Manager, Hive Metastore, and Cloudera Navigator
- If you are not using the Cloudera Manager installer, the Cloudera Manager Server.
- Cloudera Management Service roles:
- Activity Monitor (if using the MapReduce service)
- Reports Manager
- Each Hive Metastore
- Cloudera Navigator Audit Server
You can create these databases on the host where the Cloudera Manager Server will run, or on any other hosts in the cluster. For performance reasons, you should typically install each database on the host on which the service runs, as determined by the roles you will assign during installation or upgrade. In larger deployments or in cases where database administrators are managing the databases the services will use, databases may be separated from services, but do not undertake such an implementation lightly.
The database must be configured to support UTF-8 character set encoding.
Note the values you enter for database names, user names, and passwords. The Cloudera Manager installation wizard requires this information to correctly connect to these databases.
- Connect to PostgreSQL:
$ sudo -u postgres psql
- If you are not using the Cloudera Manager installer, create a database for the Cloudera Manager Server. The database name, user name, and password can be anything you want. Be sure to note the names chosen, as you will need to supply them later when running the scm_prepare_database.sh script.
postgres=# CREATE ROLE scm LOGIN PASSWORD 'scm'; postgres=# CREATE DATABASE scm OWNER scm ENCODING 'UTF8';
- Create databases for the Activity Monitor, Reports Manager, Hive Metastore, and Cloudera Navigator Audit Server:
postgres=# CREATE ROLE <user> LOGIN PASSWORD '<password>'; postgres=# CREATE DATABASE <databaseName> OWNER <user> ENCODING 'UTF8';
where <database>, <user>, and <password> can be anything you want. The examples shown match the default names provided in the Cloudera Manager configuration settings:For PostgreSQL 8.2.23 or later, also do:Role Database User Password Activity Monitor amon amon amon_password Reports Manager rman rman rman_password Cloudera Navigator Audit Server nav nav nav_password Hive Metastore metastore hive hive_password postgres=# ALTER DATABASE Metastore SET standard_conforming_strings = off;
Backing up External PostgreSQL Databases
- Log in to the host where the Cloudera Manager Server is installed.
- Run the following command as root:
cat /etc/cloudera-scm-server/db.properties. The db.properties file contains: # Auto-generated by scm_prepare_database.sh # Mon Jul 27 22:36:36 PDT 2011 com.cloudera.cmf.db.type=postgresql com.cloudera.cmf.db.host=localhost:7432 com.cloudera.cmf.db.name=scm com.cloudera.cmf.db.user=scm com.cloudera.cmf.db.password=NnYfWIjlbk
- Run the following command as root using the parameters from the preceding step:
# pg_dump -h localhost -p 7432 -U scm > /tmp/scm_server_db_backup.$(date +%Y%m%d)
- Enter the password specified for the com.cloudera.cmf.db.password property on the last line of the db.properties file. If you are using the embedded database, Cloudera Manager generated the password for you during installation. If you are using an external database, enter the appropriate information for your database.
<< Embedded PostgreSQL Database | MySQL Database >> | |