Apache Hive overview
Also available as:
PDF

Apache Hive 3 upgrade process

Supplemental information about preparing for an upgrade, upgrading, and using Hive tables after upgrading to Hive 3 helps you achieve a successful HDP and Apache Ambari major upgrade.

Upgrading from HDP 3.x to 3.1.4

Upgrading to the HDP 3.1.4 from HDP 3.1.0, or earlier, is critical if your Hive data meets both of these conditions.
  • File format = AVRO or Parquet

  • Data type = TIMESTAMP

Upgrading to HDP 3.1.4 resolves a number of issues with TIMESTAMP data in AVRO and PARQUET. If you do not experience any problems with your TIMESTAMP data, this upgrade is still highly recommended to prevent problems when migrating to future Cloudera releases.

Upgrading to HDP 3.1.4 also includes the Hive-16907 bug fix, which rejects `db.table` in SQL queries. The database name and the table name must be enclosed in backticks as follows: `db`.`table`; otherwise, Hive interprets the entire db.table string as the table name.

If you cannot upgrade to HDP 3.1.4 now, contact Cloudera Support for a hot fix.

Upgrading from HDP 2.x to 3.x

Some transactional tables require a major compaction before upgrading to 3.0. Running the Hive pre-upgrade tool identifies the tables that need such a compaction and provides scripts that you run to perform the compaction. Depending on the number of tables and partitions, and the amount of data involved, compactions might take a significant amount of time and resources. The script output of the pre-upgrade tool includes some heuristics that might help estimate the time required. If no script is produced, no compaction is needed.

Compaction cannot occur if the pre-upgrade tool cannot connect to Hive Metastore. During compaction, shutting down HiveServer2 is recommended to prevent users from executing any update, delete, or merge statements on tables during compaction and for the duration of the upgrade process.

You can run the pre-upgrade tool command on the command line before or after upgrading Ambari 2.6.2.2 to 2.7.x. You do not actually use Ambari to run this command.

The following properties can affect compaction:
  • hive.compactor.worker.threads

    Specifies limits of concurrent compactions.

  • hive.compactor.job.queue

    Specifies the Yarn queue of compaction jobs. Each compaction is a MapReduce job.

The pre-upgrade tool looks for files in an ACID table that contains update or delete events, and generates scripts to compact these tables. You prepare Hive for upgrade to obtain and run the scripts. Assuming you upgraded Ambari at some point, you can then upgrade HDP components, including Hive. After upgrading, check that the upgrade process correctly converted your tables.