Upgrading Hive
Upgrade Hive on all the hosts on which it is running: servers and clients.
Checklist to Help Ensure Smooth Upgrades
- Configure periodic backups of the metastore database. Use mysqldump, or the equivalent for your vendor if you are not using MySQL.
-
Make sure datanucleus.autoCreateSchema is set to false (in all types of database) and datanucleus.fixedDatastore is set to true (for MySQL and Oracle) in all hive-site.xml files. See the configuration instructions for more information about setting the properties in hive-site.xml.
- Insulate the metastore database from users by running the metastore service in Remote mode. If you do not follow this recommendation, make sure you remove DROP, ALTER, and CREATE privileges from the Hive user configured in hive-site.xml. See Configuring the Hive Metastore for complete instructions for each type of supported database.
Upgrading Hive from CDH 4 to CDH 5
If you have already performed the steps to uninstall CDH 4 and all components, as described under Upgrading from CDH 4 to CDH 5, you can skip Step 1 below and proceed with installing the new CDH 5 version of Hive.
Step 1: Remove Hive
You must make sure no Hive processes are running. If Hive processes are running during the upgrade, the new version will not work correctly.
- Exit the Hive console and make sure no Hive scripts are running.
- Stop any HiveServer processes that are running. If HiveServer is running as a daemon, use the
following command to stop it:
$ sudo service hive-server stop
If HiveServer is running from the command line, stop it with <CTRL>-c.
- Stop the metastore. If the metastore is running as a daemon, use the
following command to stop it:
$ sudo service hive-metastore stop
If the metastore is running from the command line, stop it with <CTRL>-c.
- Remove Hive:
$ sudo yum remove hive
To remove Hive on SLES systems:
$ sudo zypper remove hive
To remove Hive on Ubuntu and Debian systems:
$ sudo apt-get remove hive
Step 2: Install the new Hive version on all hosts (Hive servers and clients)
See Installing Hive.
- If you install a newer version of a package that is already on the system, configuration files that you have modified will remain intact.
- If you uninstall a package, the package manager renames any configuration files you have modified from <file> to <file>.rpmsave. If you then re-install the package (probably to install a new version) the package manager creates a new <file> with applicable defaults. You are responsible for applying any changes captured in the original configuration file to the new configuration file. In the case of Ubuntu and Debian upgrades, you will be prompted if you have made changes to a file for which there is a new version; for details, see Automatic handling of configuration files by dpkg.
Step 3: Configure the Hive Metastore
You must configure the Hive metastore and initialize the service before you start the Hive Console. See Configuring the Hive Metastore for detailed instructions.
Step 4: Upgrade the Metastore Schema
- Cloudera strongly encourages you to make a backup copy of your metastore database before running the upgrade scripts. You will need this backup copy if you run into problems during the upgrade or need to downgrade to a previous version.
- You must upgrade the metastore schema before starting Hive after the upgrade. Failure to do so may result in metastore corruption.
- To run a script, you must first cd to the directory that script is in: that is /usr/lib/hive/scripts/metastore/upgrade/<database>.
The current version of CDH 5 includes changes in the Hive metastore schema. If you have been using Hive 0.10 or earlier, you must upgrade the Hive metastore schema after you install the new version of Hive but before you start Hive.
With CDH 5, there are now two ways to do this. You could either use Hive's schematool or use the schema upgrade scripts available with the Hive package.
Using schematool (Recommended):
$ schematool -dbType derby -upgradeSchemaFrom 0.10.0 Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver Metastore connection User: APP Starting upgrade metastore schema from version 0.10.0 to <new_version> Upgrade script upgrade-0.10.0-to-0.11.0.derby.sql Completed upgrade-0.10.0-to-0.11.0.derby.sql Upgrade script upgrade-0.11.0-to-<new_version>.derby.sql Completed upgrade-0.11.0-to-<new_version>.derby.sql schemaTool completed
Possible values for the dbType option are mysql, postgres, derby or oracle. The following table lists the Hive versions corresponding to the older CDH releases.
CDH Releases | Hive Version |
---|---|
CDH 3 |
0.7.0 |
CDH 4.0 |
0.8.0 |
CDH 4.1 |
0.9.0 |
CDH 4.2 and later |
0.10.0 |
See Using the Hive Schema Tool for more details on how to use schematool.
Using Schema Upgrade Scripts:
Run the appropriate schema upgrade scripts available in /usr/lib/hive/scripts/metastore/upgrade/:
- Schema upgrade scripts from 0.7 to 0.8 and from 0.8 to 0.9 for Derby, MySQL, and PostgreSQL
- 0.8 and 0.9 schema scripts for Oracle, but no upgrade scripts (you will need to create your own)
- Schema upgrade scripts from 0.9 to 0.10 for Derby, MySQL, PostgreSQL and Oracle
- Schema upgrade scripts from 0.10 to 0.11 for Derby, MySQL, PostgreSQL and Oracle
For more information about upgrading the schema, see the README in /usr/lib/hive/scripts/metastore/upgrade/.
Step 5: Configure HiveServer2
HiveServer2 is an improved version of the original HiveServer (HiveServer1, no longer supported). Some configuration is required before you initialize HiveServer2; see Configuring HiveServer2 for details.
Step 6: Upgrade Scripts, etc., for HiveServer2 (if necessary)
If you have been running HiveServer1, you may need to make some minor modifications to your client-side scripts and applications when you upgrade:
- HiveServer1 does not support concurrent connections, so many customers run a dedicated instance of HiveServer1 for each client. These can now be replaced by a single instance of HiveServer2.
- HiveServer2 uses a different connection URL and driver class for the JDBC driver. If you have existing scripts that use JDBC to communicate with HiveServer1, you can modify these scripts to work with HiveServer2 by changing the JDBC driver URL from jdbc:hive://hostname:port to jdbc:hive2://hostname:port, and by changing the JDBC driver class name from org.apache.hive.jdbc.HiveDriver to org.apache.hive.jdbc.HiveDriver.
Step 7: Start the Metastore, HiveServer2, and Beeline
See:
Step 8: Upgrade the JDBC driver on the clients
The driver used for CDH 4.x does not work with CDH 5.x. Install the new version, following these instructions.
Upgrading Hive from an Earlier Version of CDH 5
The instructions that follow assume that you are upgrading Hive as part of a CDH 5 upgrade, and have already performed the steps under Upgrading from a CDH 5 Beta Release to the Latest Release .
If you are currently running Hive under MRv1, check for the following property and value in /etc/mapred/conf/mapred-site.xml:
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
Remove this property before you proceed; otherwise Hive queries spawned from MapReduce jobs will fail with a null pointer exception (NPE).
To upgrade Hive from an earlier version of CDH 5, proceed as follows.
Step 1: Stop all Hive Processes and Daemons
You must make sure no Hive processes are running. If Hive processes are running during the upgrade, the new version will not work correctly.
- Stop any HiveServer processes that are running:
$ sudo service hive-server stop
- Stop any HiveServer2 processes that are
running:
$ sudo service hive-server2 stop
- Stop the metastore:
$ sudo service hive-metastore stop
Step 2: Install the new Hive version on all hosts (Hive servers and clients)
Step 3: Verify that the Hive Metastore is Properly Configured
See Configuring the Hive Metastore for detailed instructions.
Step 4: Start the Metastore, HiveServer2, and Beeline
See:
The upgrade is now complete.
<< About Hive | Installing Hive >> | |