Prepare Hive for upgrade
Before upgrading, contact your account team about your eligibility to upgrade to HDP 3.1.4. If you are eligible to upgrade, follow instructions to prepare Hive 2 in HDP 2.6.0 and later for upgrade to Hive 3. Upgrading Hive in releases earlier than HDP 2.6.0 is not supported.
Upgrading to the HDP 3.1.4 from HDP 3.1.0, or earlier, is critical if your Hive data meets both of these conditions:
File format = AVRO or Parquet
Data type = TIMESTAMP
Upgrading to HDP 3.1.4 resolves a number of issues with TIMESTAMP data in AVRO and PARQUET formats. If you do not experience any problems with your TIMESTAMP data, this upgrade is still highly recommended to prevent problems when migrating to future Cloudera releases.
If you cannot upgrade from HDP 3.1.0 to HDP 3.1.4 now, contact Cloudera Support for a hot fix.
Before you begin
If not already installed, install JDK on the node running Hive Metastore.
Check that the Hive Metastore is running. Connectivity between the tool and Hive MetaStore is mandatory.
If you have ACID tables in your Hive metastore, enable ACID operations using Ambari Web or set these Hive configuration properties to enable ACID:
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.support.concurrency=true
Failure to set these properties will result in corrupt or unreadable ACID tables.
Optionally, shut down HiveServer2. Shutting down HiveServer2 is recommended, but not required, to prevent operations on ACID tables while the tool executes.
The pre-upgrade tool might submit compaction jobs, so ensure that the cluster has sufficient capacity to execute those jobs. Set the
hive.compactor.worker.threads
property to accommodate your data.If you use Oracle as the backend database for Hive 1.x - Hive 3.x and the ojdbc7 JAR, replace this JAR with ojdbc6 JAR as described in the Cloudera Community "Unable to start Hive Metastore during HDP upgrade" article.
Obtain permissions to perform the steps in preparing Hive for upgrade.
Required permissions
To perform some steps in this procedure, you need Hive service user permissions, or all
the permissions to access Hive that Ranger provides. If you use Kerberos, you need to start
Hive as the Hive service user with a valid ticket. The Hive service user is usually the
default hive
user. If you don’t know who the Hive service user is, go to the
Ambari Web UI, and click Cluster Admin > Service Accounts, and then look for Hive
User.
To perform some steps in this procedure, you also need to login as the HDFS superuser. If you use Kerberos, you need to become the HDFS superuser with a valid ticket.
I. Backup Hive table data using a snapshot
Keep track of how many tables you have before upgrading for comparison after upgrading. Backup Hive table data as follows:
In Ambari, go to Services/Hive/Configs, and check the value of
hive.metastore.warehouse.dir
to determine the location of the Hive warehouse, /apps/hive/warehouse by default.On any node in the cluster, as the HDFS superuser, enable snapshots. For example:
$ sudo su - hdfs
$ hdfs dfsadmin -allowSnapshot /apps/hive/warehouse
Output is:
Allowing snaphot on /apps/hive/warehouse succeeded
Create a snapshot of the Hive warehouse. For example:
$ hdfs dfs -createSnapshot /apps/hive/warehouse
Output includes the name and location of the snapshot:
Created snapshot /apps/hive/warehouse/.snapshot/s20181204-164645.898
Start Hive as a user who has SELECT privileges on the tables. For example:
$ beeline
beeline> !connect jdbc:hive2://
Enter username for jdbc:hive2://: hive
Enter password for jdbc:hive2://: *********
Output is, for example:
Connected to: Apache Hive (version 1.2.1000.2.6.5.0-292)
Driver: Hive JDBC (version 1.2.1000.2.6.5.0-292)
Identify all tables outside
/apps/hive/warehouse/
. For example:hive> USE my_database;
hive> SHOW TABLES;
Determine the location of each table using the DESCRIBE command. For example:
hive> DESCRIBE FORMATTED my_table partition (dt=’20181130);
Create a snapshot of the directory shown in the location section of the output.
Repeat steps 5-7 for each database and its tables outside
/apps/hive/warehouse/
.
II. For SparkSQL users only
Non-Acid, managed tables in ORC or in a Hive Native (but non-ORC) format that are owned
by the POSIX user hive
will not be SparkSQL-compatible after the upgrade unless
you perform one of the following actions:
Convert the tables to external Hive tables before the upgrade.
Change the POSIX ownership to an owner other than
hive
.
You will need to convert managed, ACID v1 tables to external tables after the upgrade, as described later. The HDP 2.x and 3.x Table Type Comparison in section, "Hive Post-upgrade Tasks" identifies SparkSQL-incompatible table types.
III. Download the pre-upgrade tool JAR
SSH into the host running the Hive Metastore.
Change to the
/tmp
directory.Execute the following command to download the pre-upgrade tool JAR:
$ wget http://repo.hortonworks.com/content/repositories/releases/org/apache/hive/hive-pre-upgrade/3.1.0.3.1.4.0-315/hive-pre-upgrade-3.1.0.3.1.4.0-315.jar
IV. Get a Kerberos ticket if you use Kerberos
If you use Kerberos, perform these steps; otherwise, skip these steps and go to the procedure for compacting Hive tables (no Kerberos).
Become the Hive service user. For example, run the following
su
command on Linux:$ sudo su - hive
In a Kerberized cluster, run
kinit
to get a Kerberos ticket. For example:$ kinit -kt /etc/security/keytabs/hive.service.keytab hive/`hostname -f`
Set
-Djavax.security.auth.useSubjectCredsOnly=false
in a Kerberized environment if, after runningkinit
, you see the following error:org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt
Perform the procedure for compacting Hive tables below.
V. Optional: Override default table conversion
To override the default conversion of non-ACID tables to ACID (insert-only, managed table), change managed, non-ACID tables to external:
ALTER TABLE T3 SET TBLPROPERTIES ('EXTERNAL'='TRUE');
For more information about upgrade changes to tables, see HDP 2.x and 3.x Table Type Comparison.
VI. Run compaction on Hive tables
Using the downloaded JAR from step II and your Kerberos ticket (if you use Kerberos) from step III, perform the following procedure to run compaction on Hive tables.
Log in as the
hive
user.For example:
$ sudo su - hive
Export the JAVA_HOME environment variable if necessary.
For example:
$ export JAVA_HOME=[ path to your installed JDK ]
Set STACK_VERSION to the HDP version you are running. For example:
$ export STACK_VERSION=`hdp-select status hive-server2 | awk '{ print $3; }'`
Run the pre-upgrade tool command.
$ $JAVA_HOME/bin/java -cp /usr/hdp/$STACK_VERSION/hive/lib/derby-10.10.2.0.jar:/usr/hdp/$STACK_VERSION/hive/lib/*:/usr/hdp/$STACK_VERSION/hadoop/*:/usr/hdp/$STACK_VERSION/hadoop/lib/*:/usr/hdp/$STACK_VERSION/hadoop-mapreduce/*:/usr/hdp/$STACK_VERSION/hadoop-mapreduce/lib/*:/usr/hdp/$STACK_VERSION/hadoop-hdfs/*:/usr/hdp/$STACK_VERSION/hadoop-hdfs/lib/*:/usr/hdp/$STACK_VERSION/hadoop/etc/hadoop/*:/tmp/hive-pre-upgrade-3.1.0.3.1.4.0-315.jar:/usr/hdp/$STACK_VERSION/hive/conf/conf.server:/etc/hadoop/conf/:/etc/hive/conf/ org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool> {hive_log_dir}/pre_upgrade_{target_version}.log
The output indicates whether you need to perform compaction or not:
In the
/tmp
directory, scripts namedcompacts_nnnnnnnnnnnnn.sql
appear that contain ALTER statements for compacting tables. For example:ALTER TABLE default.t COMPACT 'major';
- Generated total of 1 compaction commands
- The total volume of data to be compacted is 0.001155MB
From the volume of data to be compacted, you can gauge how long the actual upgrade might take.
If no scripts appear, a message in the output says you do not need to compact tables:
... org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool - No compaction is necessary
For more information about the pre-upgrade tool command, see the Pre-upgrade Tool Command Reference below.
Check the following logs on the Hive Metastore host for any errors:
{hive_log_dir}/pre_upgrade_{target_version}.log
/tmp/hive/hive.log
If there are no errors, go to the next step; otherwise, resolve the errors, and repeat this procedure.
On the node where the Hive Metastore resides, log in as a user who has privileges to alter the Hive database.
Start Beeline as the Hive service user. For example:
$ beeline -u 'jdbc:hive2://<Metastore host name>:10000' -n hive
On the Hive command line run the compaction script. For example:
hive> !run /tmp/compacts_nnnnnnnnnnnnn.sql
Output confirms that compaction is queued:
INFO : Compaction enqueued with id 3
…
Proceed to back up the Hive Metastore. This is a mandatory step.
VII. Back up Hive Metastore
After compaction, immediately before upgrading, backup Hive Metastore as follows:
Important | |
---|---|
Making a backup is critical to prevent data loss. |
On the node where the database you use for Hive Metastore resides, back up Hive Metastore before upgrading to HDP. For example, in MySQL, dump each database as follows:
mysqldump <hive_db_schema_name> > </path/to/dump_file>
If you use another database for the Hive Metastore, use the equivalent command, such as
export
for Postgres, to dump the database.Proceed to upgrade HDP, assuming no Hive update, delete, or merge occurred after compaction; otherwise, repeat the compaction and Hive Metastore backup procedures, and then upgrade HDP.
Pre-upgrade tool command reference
You can use the following key options with the pre-upgrade tool command:
-execute
Use this option only when you want to run the pre-upgrade tool command in Ambari instead of on the Beeline command line. Using Beeline is recommended. This option automatically executes the equivalent of the generated commands.
-location
Use this option to specify the location to write the scripts generated by the pre-upgrade tool.
You can append --help
to the command to see all command options. For
example:
$ cd <location of downloaded pre-upgrade tool>
$ $JAVA_HOME/bin/java -Djavax.security.auth.useSubjectCredsOnly=false -cp
/usr/hdp/$STACK_VERSION/hive/lib/derby-10.10.2.0.jar:/usr/hdp/$STACK_VERSION/hive/lib/*:/usr/hdp/$STACK_VERSION/hadoop/*:/usr/hdp/$STACK_VERSION/hadoop/lib/*:/usr/hdp/$STACK_VERSION/hadoop-mapreduce/*:/usr/hdp/$STACK_VERSION/hadoop-mapreduce/lib/*:/usr/hdp/$STACK_VERSION/hadoop-hdfs/*:/usr/hdp/$STACK_VERSION/hadoop-hdfs/lib/*:/usr/hdp/$STACK_VERSION/hadoop/etc/hadoop/*:/tmp/hive-pre-upgrade-3.1.0.3.1.4.0-315.jar:/usr/hdp/$STACK_VERSION/hive/conf/conf.server:/etc/hadoop/conf/:/etc/hive/conf/
org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool --help
In a Kerberized environment, if you see the errors after running kinit
,
include the following option when you run the pre-upgrade tool command, as shown in the
--help example above:
-Djavax.security.auth.useSubjectCredsOnly=false
Next Steps