Step 3: Install and Configure Databases

Some CDH components require a database for operation. Although you can deploy different types of databases in a single environment, doing so can create unexpected complications. Cloudera recommends choosing one supported database provider for all of the Cloudera databases.

Cloudera recommends installing the databases on different hosts than the services. Separating databases from services can help isolate the potential impact from failure or resource contention in one or the other. It can also simplify management in organizations that have dedicated database administrators.

You can use your own PostgreSQL, MariaDB, MySQL, or Oracle database for services that use databases.

Required Databases

The following components all require databases: Oozie Server, Sqoop Server, Hive Metastore Server, Hue Server, and Sentry Server. The type of data contained in the databases and their relative sizes are as follows:

  • Oozie Server - Contains Oozie workflow, coordinator, and bundle data. Can grow very large.
  • Sqoop Server - Contains entities such as the connector, driver, links and jobs. Relatively small.
  • Hive Metastore Server - Contains Hive metadata. Relatively small.
  • Hue Server - Contains user account information, job submissions, and Hive queries. Relatively small.
  • Sentry Server - Contains authorization metadata. Relatively small.

Installing and Configuring Databases

For instructions on installing and configuring databases for CDH, see the instructions for the type of database you want to use: