External PostgreSQL Database

Installing the External PostgreSQL Server

  1. Use one or more of the following commands to set the locale:
    export LANGUAGE=en_US.UTF-8
    export LANG=en_US.UTF-8
    export LC_ALL=en_US.UTF-8
    locale-gen en_US.UTF-8
    dpkg-reconfigure locales
  2. Install PostgreSQL packages:
    • RHEL
      $ sudo yum install postgresql-server
    • SLES
      $ sudo zypper install postgresql91-server
    • Debian/Ubuntu
      $ sudo apt-get install postgresql

Configuring and Starting the PostgreSQL Server

By default, PostgreSQL only accepts connections on the loopback interface. You must reconfigure PostgreSQL to accept connections from the Fully Qualified Domain Name (FQDN) of the hosts hosting the management roles. If you do not make these changes, the management processes cannot connect to and use the database on which they depend.

  1. Initialize the external PostgreSQL database. For some versions of PostgreSQL, this occurs automatically the first time that you start the PostgreSQL server. In this case, issue the command:
    $ sudo service postgresql start
    In other versions, you must explicitly initialize the database using:
    $ sudo service postgresql initdb
    See the PostgreSQL documentation for more details.
  2. Enable MD5 authentication. Edit pg_hba.conf, which is usually found in /var/lib/pgsql/data or /etc/postgresql/8.4/main. Add the following line:
    host all all 127.0.0.1/32 md5
    If the default pg_hba.conf file contains the following line:
    host all all 127.0.0.1/32 ident
    then the host line specifying md5 authentication shown above must be inserted before this ident line. Failure to do so may cause an authentication error when running the scm_prepare_database.sh script. You can modify the contents of the md5 line shown above to support different configurations. For example, if you want to access PostgreSQL from a different host, replace 127.0.0.1 with your IP address and update postgresql.conf, which is typically found in the same place as pg_hba.conf, to include:
    listen_addresses = '*'
  3. Configure settings to ensure your system performs as expected. Update these settings in the /var/lib/pgsql/data/postgresql.conf or /var/lib/postgresql/data/postgresql.conf file. Settings vary based on cluster size and resources as follows:
    • Small to mid-sized clusters - Consider the following settings as starting points. If resources are limited, consider reducing the buffer sizes and checkpoint segments further. Ongoing tuning may be required based on each host's resource utilization. For example, if the Cloudera Manager Server is running on the same host as other roles, the following values may be acceptable:
      • shared_buffers - 256MB
      • wal_buffers - 8MB
      • checkpoint_segments - 16
      • checkpoint_completion_target - 0.9
    • Large clusters - Can contain up to 1000 hosts. Consider the following settings as starting points.
      • max_connection - For large clusters, each database is typically hosted on a different host. In general, allow each database on a host 100 maximum connections and then add 50 extra connections. You may have to increase the system resources available to PostgreSQL, as described at Connection Settings.
      • shared_buffers - 1024 MB. This requires that the operating system can allocate sufficient shared memory. See PostgreSQL information on Managing Kernel Resources for more information on setting kernel resources.
      • wal_buffers - 16 MB. This value is derived from the shared_buffers value. Setting wal_buffers to be approximately 3% of shared_buffers up to a maximum of approximately 16 MB is sufficient in most cases.
      • checkpoint_segments - 128. The PostgreSQL Tuning Guide recommends values between 32 and 256 for write-intensive systems, such as this one.
      • checkpoint_completion_target - 0.9. This setting is only available in PostgreSQL versions 8.3 and higher, which are highly recommended.
  4. Configure the PostgreSQL server to start at boot.
    • RHEL
      $ sudo /sbin/chkconfig postgresql on
      $ sudo /sbin/chkconfig --list postgresql
      postgresql          0:off   1:off   2:on    3:on    4:on    5:on    6:off
    • SLES
      $ sudo chkconfig --add postgresql
    • Debian/Ubuntu
      $ sudo chkconfig postgresql on
  5. Start or restart the PostgreSQL database:
    $ sudo service postgresql restart

Creating Databases for Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server

Create databases and user accounts for components that require databases:
  • If you are not using the Cloudera Manager installer, the Cloudera Manager Server.
  • Cloudera Management Service roles:
    • Activity Monitor (if using the MapReduce service)
    • Reports Manager
  • Each Hive metastore
  • Sentry Server
  • Cloudera Navigator Audit Server
  • Cloudera Navigator Metadata Server

You can create these databases on the host where the Cloudera Manager Server will run, or on any other hosts in the cluster. For performance reasons, you should install each database on the host on which the service runs, as determined by the roles you assign during installation or upgrade. In larger deployments or in cases where database administrators are managing the databases the services use, you can separate databases from services, but use caution.

The database must be configured to support UTF-8 character set encoding.

Record the values you enter for database names, user names, and passwords. The Cloudera Manager installation wizard requires this information to correctly connect to these databases.

  1. Connect to PostgreSQL:
    $ sudo -u postgres psql
  2. If you are not using the Cloudera Manager installer, create a database for the Cloudera Manager Server. The database name, user name, and password can be any value. Record the names chosen because you will need them later when running the scm_prepare_database.sh script.
    postgres=# CREATE ROLE scm LOGIN PASSWORD 'scm';
    postgres=# CREATE DATABASE scm OWNER scm ENCODING 'UTF8';
  3. Create databases for Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server:
    postgres=# CREATE ROLE user LOGIN PASSWORD 'password';
    postgres=# CREATE DATABASE databaseName OWNER user ENCODING 'UTF8';
    where user, password, and databaseName can be any value. The examples shown match the default names provided in the Cloudera Manager configuration settings:
    Role Database User Password
    Activity Monitor amon amon amon_password
    Reports Manager rman rman rman_password
    Hive Metastore Server metastore hive hive_password
    Sentry Server sentry sentry sentry_password
    Cloudera Navigator Audit Server nav nav nav_password
    Cloudera Navigator Metadata Server navms navms navms_password
    For PostgreSQL 8.2.23 or higher, also run:
    postgres=# ALTER DATABASE Metastore SET standard_conforming_strings = off;

Return to Establish Your Cloudera Manager Repository Strategy.

Configuring the Hue Server to Store Data in PostgreSQL

For information about installing and configuring an external PostgreSQL database , see External PostgreSQL Database.

  1. In the Cloudera Manager Admin Console, go to the Hue service status page.
  2. Select Actions > Stop. Confirm you want to stop the service by clicking Stop.
  3. Select Actions > Dump Database. Confirm you want to dump the database by clicking Dump Database.
  4. Note the host to which the dump was written under Step in the Dump Database Command window. You can also find it by selecting Commands > Recent Commands > Dump Database.
  5. Open a terminal window for the host and go to the dump file in /tmp/hue_database_dump.json.
  6. Remove all JSON objects with useradmin.userprofile in the model field, for example:
    {
    "pk": 14,
    "model": "useradmin.userprofile",
    "fields":
    { "creation_method": "EXTERNAL", "user": 14, "home_directory": "/user/tuser2" }
    },
  7. Install the PostgreSQL server.

    RHEL

    $ sudo yum install postgresql-server

    SLES

    $ sudo zypper install postgresql-server

    Ubuntu or Debian

    $ sudo apt-get install postgresql
  8. Initialize the data directories.
    $ service postgresql initdb
  9. Configure client authentication.
    1. Edit /var/lib/pgsql/data/pg_hba.conf.
    2. Set the authentication methods for local to trust and for host to password and add the following line at the end.
      host
      hue
      hue
      0.0.0.0/0
      md5
  10. Start the PostgreSQL server.
    $ su - postgres
    # /usr/bin/postgres -D /var/lib/pgsql/data > logfile 2>&1 &
    # exit
  11. Configure PostgreSQL to listen on all network interfaces.
    1. Edit /var/lib/pgsql/data/postgresql.conf and set list_addresses.
      listen_addresses = ‘0.0.0.0’     # Listen on all addresses
  12. Create the hue database and grant privileges to a hue user to manage the database.
    # psql -U postgres
    postgres=# create database hue;
    postgres=# \c hue;
    You are now connected to database 'hue'.
    postgres=# create user hue with password 'secretpassword';
    postgres=# grant all privileges on database hue to hue;
    postgres=# \q
  13. Restart the PostgreSQL server.
    $ sudo service postgresql restart
  14. Verify connectivity.
    su - postgres
    # psql –h localhost –U hue –d hue 
    Password for user hue: secretpassword
    hue=> \q
    exit
  15. Configure the PostgreSQL server to start at boot.

    RHEL

    $ sudo /sbin/chkconfig postgresql on
    $ sudo /sbin/chkconfig --list postgresql
    postgresql          0:off   1:off   2:on    3:on    4:on    5:on    6:off

    SLES

    $ sudo chkconfig --add postgresql

    Ubuntu or Debian

    $ sudo chkconfig postgresql on
  16. Configure the Hue database:
    1. In the Cloudera Manager Admin Console, click the HUE service.
    2. Click the Configuration tab.
    3. Select Scope > Hue Server.
    4. Select Category > Advanced.
    5. Set Hue Server Advanced Configuration Snippet (Safety Valve) for hue_safety_valve_server.ini with the following:
      [desktop]
      [[database]]
      engine=postgresql_psycopg2
      name=hue
      host=localhost
      port=5432
      user=hue
      password=secretpassword
    6. Click Save Changes.
  17. Optionally restore the Hue data to the new database:
    1. Select Actions > Synchronize Database.
    2. Determine the foreign key ID.
      bash# su – postgres
      $ psql –h localhost –U hue –d hue
      postgres=# \d auth_permission;
    3. Drop the foreign key that you retrieved in the previous step.
      postgres=# ALTER TABLE auth_permission DROP CONSTRAINT content_type_id_refs_id_XXXXXX;
    4. Delete the rows in the django_content_type table.
      postgres=# TRUNCATE django_content_type CASCADE;
    5. In Hue service instance page, Actions > Load Database. Confirm you want to load the database by clicking Load Database.
    6. Add back the foreign key you dropped.
      bash# su – postgres
      $ psql –h localhost –U hue –d hue 
      postgres=# ALTER TABLE auth_permission ADD CONSTRAINT content_type_id_refs_id_XXXXXX FOREIGN KEY (content_type_id) REFERENCES django_content_type(id) DEFERRABLE INITIALLY DEFERRED;
  18. Start the Hue service.

Configuring PostgreSQL for Oozie

Install PostgreSQL

Create the Oozie User and Oozie Database

For example, using the PostgreSQL psql command-line tool:

$ psql -U postgres
Password for user postgres: *****

postgres=# CREATE ROLE oozie LOGIN ENCRYPTED PASSWORD 'oozie' 
 NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;
CREATE ROLE

postgres=# CREATE DATABASE "oozie" WITH OWNER = oozie
 ENCODING = 'UTF8'
 TABLESPACE = pg_default
 LC_COLLATE = 'en_US.UTF-8'
 LC_CTYPE = 'en_US.UTF-8'
 CONNECTION LIMIT = -1;
CREATE DATABASE

postgres=# \q

Configure PostgreSQL to Accept Network Connections for the Oozie User

  1. Edit the postgresql.conf file and set the listen_addresses property to *, to make sure that the PostgreSQL server starts listening on all your network interfaces. Also make sure that the standard_conforming_strings property is set to off.
  2. Edit the PostgreSQL data/pg_hba.conf file as follows:
    host    oozie         oozie         0.0.0.0/0             md5

Reload the PostgreSQL Configuration

$ sudo -u postgres pg_ctl reload -s -D /opt/PostgreSQL/8.4/data