Prepare Hive for upgrade

Before upgrading, contact your account team about your eligibility to upgrade to HDP 3.1.4. If you are eligible to upgrade, follow instructions to prepare Hive 2 in HDP 2.6.0 and later for upgrade to Hive 3. Upgrading Hive in releases earlier than HDP 2.6.0 is not supported.

Upgrading to the HDP 3.1.4 from HDP 3.1.0, or earlier, is critical if your Hive data meets both of these conditions:

File format = AVRO or Parquet
Data type = TIMESTAMP

Upgrading to HDP 3.1.4 resolves a number of issues with TIMESTAMP data in AVRO and PARQUET formats. If you do not experience any problems with your TIMESTAMP data, this upgrade is still highly recommended to prevent problems when migrating to future Cloudera releases.

If you cannot upgrade from HDP 3.1.0 to HDP 3.1.4 now, contact Cloudera Support for a hot fix.

Before you begin

If not already installed, install JDK on the node running Hive Metastore.
Check that the Hive Metastore is running. Connectivity between the tool and Hive MetaStore is mandatory.
If you have ACID tables in your Hive metastore, enable ACID operations using Ambari Web or set these Hive configuration properties to enable ACID:
- hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
- hive.support.concurrency=true
Failure to set these properties will result in corrupt or unreadable ACID tables.
Optionally, shut down HiveServer2. Shutting down HiveServer2 is recommended, but not required, to prevent operations on ACID tables while the tool executes.
The pre-upgrade tool might submit compaction jobs, so ensure that the cluster has sufficient capacity to execute those jobs. Set the hive.compactor.worker.threads property to accommodate your data.
If you use Oracle as the backend database for Hive 1.x - Hive 3.x and the ojdbc7 JAR, replace this JAR with ojdbc6 JAR as described in the Cloudera Community "Unable to start Hive Metastore during HDP upgrade" article.
Obtain permissions to perform the steps in preparing Hive for upgrade.

Required permissions

To perform some steps in this procedure, you need Hive service user permissions, or all the permissions to access Hive that Ranger provides. If you use Kerberos, you need to start Hive as the Hive service user with a valid ticket. The Hive service user is usually the default hive user. If you don’t know who the Hive service user is, go to the Ambari Web UI, and click Cluster Admin > Service Accounts, and then look for Hive User.

To perform some steps in this procedure, you also need to login as the HDFS superuser. If you use Kerberos, you need to become the HDFS superuser with a valid ticket.

I. Backup Hive table data using a snapshot

Keep track of how many tables you have before upgrading for comparison after upgrading. Backup Hive table data as follows:

In Ambari, go to Services/Hive/Configs, and check the value of hive.metastore.warehouse.dir to determine the location of the Hive warehouse, /apps/hive/warehouse by default.
On any node in the cluster, as the HDFS superuser, enable snapshots. For example:
$ sudo su - hdfs
$ hdfs dfsadmin -allowSnapshot /apps/hive/warehouse
Output is:
Allowing snaphot on /apps/hive/warehouse succeeded
Create a snapshot of the Hive warehouse. For example:
$ hdfs dfs -createSnapshot /apps/hive/warehouse
Output includes the name and location of the snapshot:
Created snapshot /apps/hive/warehouse/.snapshot/s20181204-164645.898
Start Hive as a user who has SELECT privileges on the tables. For example:
$ beeline
beeline> !connect jdbc:hive2://
Enter username for jdbc:hive2://: hive
Enter password for jdbc:hive2://: *********
Output is, for example:
Connected to: Apache Hive (version 1.2.1000.2.6.5.0-292)
Driver: Hive JDBC (version 1.2.1000.2.6.5.0-292)
Identify all tables outside /apps/hive/warehouse/. For example:
hive> USE my_database;
hive> SHOW TABLES;
Determine the location of each table using the DESCRIBE command. For example:
hive> DESCRIBE FORMATTED my_table partition (dt=’20181130);
Create a snapshot of the directory shown in the location section of the output.
Repeat steps 5-7 for each database and its tables outside /apps/hive/warehouse/.

II. For SparkSQL users only

Non-Acid, managed tables in ORC or in a Hive Native (but non-ORC) format that are owned by the POSIX user hive will not be SparkSQL-compatible after the upgrade unless you perform one of the following actions:

Convert the tables to external Hive tables before the upgrade.
Change the POSIX ownership to an owner other than hive.

You will need to convert managed, ACID v1 tables to external tables after the upgrade, as described later. The HDP 2.x and 3.x Table Type Comparison in section, "Hive Post-upgrade Tasks" identifies SparkSQL-incompatible table types.

III. Download the pre-upgrade tool JAR

SSH into the host running the Hive Metastore.
Change to the /tmp directory.
Execute the following command to download the pre-upgrade tool JAR:
$ wget http://repo.hortonworks.com/content/repositories/releases/org/apache/hive/hive-pre-upgrade/3.1.0.3.1.4.0-315/hive-pre-upgrade-3.1.0.3.1.4.0-315.jar

IV. Get a Kerberos ticket if you use Kerberos

If you use Kerberos, perform these steps; otherwise, skip these steps and go to the procedure for compacting Hive tables (no Kerberos).

Become the Hive service user. For example, run the following su command on Linux:
$ sudo su - hive
In a Kerberized cluster, run kinit to get a Kerberos ticket. For example:
$ kinit -kt /etc/security/keytabs/hive.service.keytab hive/`hostname -f`
Set -Djavax.security.auth.useSubjectCredsOnly=false in a Kerberized environment if, after running kinit, you see the following error:
org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt
Perform the procedure for compacting Hive tables below.

V. Optional: Override default table conversion

To override the default conversion of non-ACID tables to ACID (insert-only, managed table), change managed, non-ACID tables to external:

ALTER TABLE T3 SET TBLPROPERTIES ('EXTERNAL'='TRUE');

For more information about upgrade changes to tables, see HDP 2.x and 3.x Table Type Comparison.

VI. Run compaction on Hive tables

Using the downloaded JAR from step II and your Kerberos ticket (if you use Kerberos) from step III, perform the following procedure to run compaction on Hive tables.

Log in as the hive user.
For example: $ sudo su - hive
Export the JAVA_HOME environment variable if necessary.
For example: $ export JAVA_HOME=[ path to your installed JDK ]
Set STACK_VERSION to the HDP version you are running. For example:
$ export STACK_VERSION=`hdp-select status hive-server2 | awk '{ print $3; }'`
Run the pre-upgrade tool command.
$ $JAVA_HOME/bin/java -cp /usr/hdp/$STACK_VERSION/hive/lib/derby-10.10.2.0.jar:/usr/hdp/$STACK_VERSION/hive/lib/*:/usr/hdp/$STACK_VERSION/hadoop/*:/usr/hdp/$STACK_VERSION/hadoop/lib/*:/usr/hdp/$STACK_VERSION/hadoop-mapreduce/*:/usr/hdp/$STACK_VERSION/hadoop-mapreduce/lib/*:/usr/hdp/$STACK_VERSION/hadoop-hdfs/*:/usr/hdp/$STACK_VERSION/hadoop-hdfs/lib/*:/usr/hdp/$STACK_VERSION/hadoop/etc/hadoop/*:/tmp/hive-pre-upgrade-3.1.0.3.1.4.0-315.jar:/usr/hdp/$STACK_VERSION/hive/conf/conf.server:/etc/hadoop/conf/:/etc/hive/conf/ org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool> {hive_log_dir}/pre_upgrade_{target_version}.log
The output indicates whether you need to perform compaction or not:
- In the /tmp directory, scripts named compacts_nnnnnnnnnnnnn.sql appear that contain ALTER statements for compacting tables. For example:
  ALTER TABLE default.t COMPACT 'major';
  - Generated total of 1 compaction commands
  - The total volume of data to be compacted is 0.001155MB
  From the volume of data to be compacted, you can gauge how long the actual upgrade might take.
- If no scripts appear, a message in the output says you do not need to compact tables:
  ... org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool - No compaction is necessary
For more information about the pre-upgrade tool command, see the Pre-upgrade Tool Command Reference below.
Check the following logs on the Hive Metastore host for any errors:
- {hive_log_dir}/pre_upgrade_{target_version}.log
- /tmp/hive/hive.log
If there are no errors, go to the next step; otherwise, resolve the errors, and repeat this procedure.
On the node where the Hive Metastore resides, log in as a user who has privileges to alter the Hive database.
Start Beeline as the Hive service user. For example:
$ beeline -u 'jdbc:hive2://<Metastore host name>:10000' -n hive
On the Hive command line run the compaction script. For example:
hive> !run /tmp/compacts_nnnnnnnnnnnnn.sql
Output confirms that compaction is queued:
INFO : Compaction enqueued with id 3
…
Proceed to back up the Hive Metastore. This is a mandatory step.

VII. Back up Hive Metastore

After compaction, immediately before upgrading, backup Hive Metastore as follows:

	Important
	Making a backup is critical to prevent data loss.

On the node where the database you use for Hive Metastore resides, back up Hive Metastore before upgrading to HDP. For example, in MySQL, dump each database as follows:
mysqldump <hive_db_schema_name> > </path/to/dump_file>
If you use another database for the Hive Metastore, use the equivalent command, such as export for Postgres, to dump the database.
Proceed to upgrade HDP, assuming no Hive update, delete, or merge occurred after compaction; otherwise, repeat the compaction and Hive Metastore backup procedures, and then upgrade HDP.

Pre-upgrade tool command reference

You can use the following key options with the pre-upgrade tool command:

-execute
Use this option only when you want to run the pre-upgrade tool command in Ambari instead of on the Beeline command line. Using Beeline is recommended. This option automatically executes the equivalent of the generated commands.
-location
Use this option to specify the location to write the scripts generated by the pre-upgrade tool.

You can append --help to the command to see all command options. For example:

$ cd <location of downloaded pre-upgrade tool>

$ $JAVA_HOME/bin/java -Djavax.security.auth.useSubjectCredsOnly=false -cp /usr/hdp/$STACK_VERSION/hive/lib/derby-10.10.2.0.jar:/usr/hdp/$STACK_VERSION/hive/lib/*:/usr/hdp/$STACK_VERSION/hadoop/*:/usr/hdp/$STACK_VERSION/hadoop/lib/*:/usr/hdp/$STACK_VERSION/hadoop-mapreduce/*:/usr/hdp/$STACK_VERSION/hadoop-mapreduce/lib/*:/usr/hdp/$STACK_VERSION/hadoop-hdfs/*:/usr/hdp/$STACK_VERSION/hadoop-hdfs/lib/*:/usr/hdp/$STACK_VERSION/hadoop/etc/hadoop/*:/tmp/hive-pre-upgrade-3.1.0.3.1.4.0-315.jar:/usr/hdp/$STACK_VERSION/hive/conf/conf.server:/etc/hadoop/conf/:/etc/hive/conf/ org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool --help

In a Kerberized environment, if you see the errors after running kinit, include the following option when you run the pre-upgrade tool command, as shown in the --help example above:

-Djavax.security.auth.useSubjectCredsOnly=false

Next Steps

​Prepare Hive for upgrade

Prepare Hive for upgrade