Installing dependencies for Hue before upgrading to CDP
If you want to use PostgreSQL as a backend database to store Hue metadata, job and query history, and account information, then you must install PostgreSQL server, Python 2.7.5, and the Python psycopg2 package on the Hue hosts to connect Hue to the PostgreSQL database.
source /opt/rh/python275/enable
python --version
- Download the Software Collections Library
repository:
sudo wget -O /etc/yum.repos.d/public-yum-ol7.repo http://yum.oracle.com/public-yum-ol7.repo
- Edit
/etc/yum.repos.d/public-yum-ol7.repo
and make sure thatenabled
is set to1
, as follows:[ol7_software_collections] name=Software Collection Library release 3.0 packages for Oracle Linux 7 (x86_64) baseurl=http://yum.oracle.com/repo/OracleLinux/OL7/SoftwareCollections/x86_64/ gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle gpgcheck=1 enabled=1
For more information, see Installing the Software Collection Library Utility From the Oracle Linux Yum Server in the Oracle documentation.
- Install the Software Collections
utilities:
sudo yum install scl-utils
- Install Python
2.7.5:
sudo yum install python275
- Verify that Python 2.7.5 is
installed:
source /opt/rh/python275/enable python --version
Installing the psycopg2
Python Package
- RHEL 7
-
- Install the
python-pip
package:sudo yum install python-pip
- Install
psycopg2
2.7.5 usingpip
:sudo pip install psycopg2==2.7.5 --ignore-installed
- Install the
- Ubuntu / Debian
-
- Install the
python-pip
package as follows:sudo apt-get install python-pip
- Install
psycopg2
2.7.5 usingpip
:sudo pip install psycopg2==2.7.5 --ignore-installed
- Install the
- SLES 12
- Install the
python-psycopg2
package as follows:sudo zypper install python-psycopg2
Installing and configuring PostgreSQL
RHEL/CentOS 7 include PostgreSQL version 9.2. However, on these OS versions, Hue is compatible only with PostgreSQL versions 10, 11.x, and 12. Therefore, you must download and install the PostgreSQL distribution separately.
Downloading PostgreSQL:
Download the Hue-compatible PostgreSQL distribution from https://www.postgresql.org/download/ based on your OS version.
Installing PostgreSQL:
sudo yum install postgresql-server
Configuring PostgreSQL:
By default, PostgreSQL only accepts connections on the loopback interface. You must reconfigure PostgreSQL to accept connections from the fully qualified domain names (FQDN) of the hosts hosting the services for which you are configuring databases. If you do not make these changes, the services cannot connect to and use the database on which they depend.
- Make sure that
LC_ALL
is set toen_US.UTF-8
and initialize the database as follows:echo 'LC_ALL="en_US.UTF-8"' >> /etc/locale.conf sudo su -l postgres -c "postgresql-setup initdb"
- Enable MD5 authentication. Edit
pg_hba.conf
, which is usually found in/var/lib/pgsql/data
or/etc/postgresql/<version>/main
. Add the following line:host all all 127.0.0.1/32 md5
If the defaultpg_hba.conf
file contains the following line:
then thehost all all 127.0.0.1/32 ident
host
line specifyingmd5
authentication shown above must be inserted before thisident
line. Failure to do so may cause an authentication error when running thescm_prepare_database.sh
script. You can modify the contents of themd5
line shown above to support different configurations. For example, if you want to access PostgreSQL from a different host, replace127.0.0.1
with your IP address and updatepostgresql.conf
, which is typically found in the same place aspg_hba.conf
, to include:listen_addresses = '*'
- Configure settings to ensure your system performs as expected. Update these settings in
the
/var/lib/pgsql/data/postgresql.conf
or/var/lib/postgresql/data/postgresql.conf
file. Settings vary based on cluster size and resources as follows:Small to mid-sized clusters - Consider the following settings as starting points. If resources are limited, consider reducing the buffer sizes and checkpoint segments further. Ongoing tuning may be required based on each host's resource utilization. For example, if the Cloudera Manager Server is running on the same host as other roles, the following values may be acceptable:max_connection
- In general, allow each database on a host 100 maximum connections and then add 50 extra connections. You may have to increase the system resources available to PostgreSQL, as described at Connection Settings.shared_buffers
- 256MBwal_buffers
- 8MBcheckpoint_segments
- 16checkpoint_completion_target
- 0.9
Large clusters - Can contain up to 1000 hosts. Consider the following settings as starting points.max_connection
- For large clusters, each database is typically hosted on a different host. In general, allow each database on a host 100 maximum connections and then add 50 extra connections. You may have to increase the system resources available to PostgreSQL, as described at Connection Settings.shared_buffers
- 1024 MB. This requires that the operating system can allocate sufficient shared memory. See PostgreSQL information on Managing Kernel Resources for more information on setting kernel resources.wal_buffers
- 16 MB. This value is derived from theshared_buffers
value. Settingwal_buffers
to be approximately 3% ofshared_buffers
up to a maximum of approximately 16 MB is sufficient in most cases.checkpoint_segments
- 128. The PostgreSQL Tuning Guide recommends values between 32 and 256 for write-intensive systems, such as this one.checkpoint_completion_target
- 0.9.
- Configure the PostgreSQL server to start at boot.
OS Command RHEL 7 compatible sudo systemctl enable postgresql
SLES sudo chkconfig --add postgresql
Ubuntu sudo chkconfig postgresql on
- Restart the PostgreSQL database:
On RHEL:
sudo systemctl restart postgresql
Other OS:sudo service postgresql restart