Apache Ambari Major Upgrade
Also available as:
PDF

Hive Post-upgrade Tasks

You might need to perform the following tasks after upgrading to HDP 3.x.

  • Changing file locations

  • Updating HDFS paths in Ranger policies

  • Adding backticks to table references using dot notation (db.table)

  • Using the Hive Warehouse Connector

  • Installing Data Analytics Studio

Changes include the location of tables, the HDFS path to the Hive warehouse, file ownership, table types, and ACID-compliance. You need to understand the following changes that occur during the upgrade before performing the post-upgrade tasks.

Hive Changes to ACID Properties

Hive 2.x and 3.x have transactional and non-transactional tables. Transactional tables have atomic, consistent, isolation, and durable (ACID) properties. In Hive 2.x, the initial version of ACID transaction processing is ACID v1. In Hive 3.x, the mature version of ACID is ACID v2, which is the default table type in HDP 3.0.

Native and Non-native Storage Formats in Hive

Storage formats are a factor in upgrade changes to table types. No change has occurred to storage formats in HDP 3.x, but storage formats determine table changes that occur during the upgrade. Hive 2.x and 3.x supports the following Hadoop native and non-native storage formats:

  • Native: Tables with built-in support in Hive, such as those in the following file formats:

    • Text

    • Sequence File

    • RC File

    • AVRO File

    • ORC File

    • Parquet File

  • Non-native: Tables that use a storage handler, such as the DruidStorageHandler or HBaseStorageHandler

HDP 3.x Upgrade Changes to Table Types

The following table compares Hive table types and ACID operations before an upgrade from HDP 2.x and after an upgrade to HDP 3.x. The ownership of the Hive table file is a factor in determining table types and ACID operations after the upgrade.

Table 4.1. HDP 2.x and 3.x Table Type Comparison

HDP 2.xHDP 3.x
Table TypeACID v1FormatOwner (user) of Hive Table FileTable TypeACID v2
ExternalNoNative or non-nativehive or non-hiveExternalNo
ManagedYesORChive or non-hiveManaged, updatable**Yes
ManagedNoORChiveManaged, updatable**Yes
non-hiveExternal, with data delete*No
ManagedNoNative (but non-ORC)hiveManaged, insert only**Yes
non-hiveExternal, with data delete*No
ManagedNoNon-nativehive or non-hiveExternal, with data delete*No


* See Dropping an External Table Along with the Data.

** Not SparkSQL-compatible

Other HDP 3.x Upgrade Changes

Managed, ACID tables that are not owned by the hive user remain managed tables after the upgrade, but hive becomes the owner.

After the upgrade, the format of a Hive table is the same as before the upgrade. For example, native or non-native tables remain native or non-native, respectively.

After the upgrade, the location of managed tables or partitions do not change under any one of the following conditions:

  • The old table or partition directory was not in its default location /apps/hive/warehouse before the upgrade.

  • The old table or partition is in a different file system than the new warehouse directory.

  • The old table or partition directory is in a different encryption zone than the new warehouse directory.

Otherwise, the location of managed tables or partitions does change: The upgrade process moves managed files to /warehouse/tablespace/managed/hive. By default, Hive places any new external tables you create in HDP 3.x in /warehouse/tablespace/external/hive.

The /apps/hive directory, which is the former location of the Hive 2.x warehouse, might or might not exist in HDP 3.x.

Correcting Hive File Locations

To perform some steps in this procedure, you need to login as the HDFS superuser. If you use Kerberos, you need to become the superuser with a valid ticket.

If you had external files before the upgrade, the upgrade process carries the external files over to HDP 3.x with no change in location. The external files continue to reside in the /apps/hive directory. Check the /apps/hive directory for files that do not belong there after upgrading. Files that do not belong in /apps/hive are files that appear in the table above as managed files in HDP 3.x. The upgrade process should have moved the managed files to the new /warehouse/tablespace/managed/hive directory. Contacting Hortonworks Support for help to correct this problem is highly recommended. Alternatively, you can perform the following procedure to correct file locations of managed files:

  1. Login as the HDFS superuser.

    $ sudo su - hdfs

  2. Start Hive in Ambari 2.7.x.

  3. On a node in your cluster, start Beeline in the background and a Hive shell in the foreground:

    $ hive

  4. If the database and some, but not all, tables were moved to the new, correct location, use ALTER to move the database and those tables back to the old, incorrect location /apps/hive/warehouse/...; otherwise, skip this step and go to the next step. For example:

    hive> ALTER DATABASE tpcds_bin_partitioned_orc_10 SET LOCATION 'hdfs://ns1/apps/hive/warehouse/tpcds_bin_partitioned_orc_10.db';

    hive> ALTER TABLE tpcds_bin_partitioned_orc_10.store_sales SET LOCATION 'hdfs://ns1/apps/hive/warehouse/tpcds_bin_partitioned_orc_10.db/store_sales';

  5. On the Hive Metastore node, log in as the HDFS superuser.

  6. Set STACK_VERSION to the HDP version you are running. For example:

    $ export STACK_VERSION=`hdp-select status hive-server2 | awk '{ print $3; }'`

  7. Run the following script:

    $ /usr/hdp/$STACK_VERSION/hive/bin/hive --config /etc/hive/conf --service strictmanagedmigration --hiveconf hive.strict.managed.tables=true -m automatic --modifyManagedTables --oldWarehouseRoot /apps/hive/warehouse The migration script corrects the location of any databases or tables that were changed in step 4.

Updating HDFS Paths in Ranger Policies

You need to update the Hive warehouse path, which changes from /apps/hive in earlier releases to /warehouse/tablespace/managed/hive/ in HDP 3.x. Apache Ranger policies that apply to tables in the new warehouse location do not work after the upgrade.

  1. In Ambari, in Services > Ranger > Summary, click Ranger Admin UI.

  2. Log into Ranger as the administrator.

  3. In Service Manager, in HDFS, click the policy list name you want to update.

  4. Click View for each policy, and edit the Resource Path of any policy that refers to the obsolete path.

Add backticks to table references using dot notation

HDP 3.1.4 and later includes the Hive-16907 bug fix, which rejects `db.table` in SQL query scripts. The database name and the table name must be enclosed in backticks as follows: `db`.`table`; otherwise, Hive interprets the entire db.table string as the table name.

Use the Hive Warehouse Connector to access Spark data

In HDP 3.x, you need to change the way you access Spark data as follows:

  • Workflow change: You must use the Hive Warehouse Connector API to access any managed table in the Hive catalog from Spark.

  • Service change: You must low-latency analytical processing (LLAP) in HiveServer Interactive to read ACID, or other Hive-managed tables, from Spark.

    You do not need LLAP to write to ACID, or other managed tables, from Spark. You do not need HWC to access external tables from Spark.

Optionally, install Data Analytics Studio

HDP 3.1.x does not include Hive View or Tez View. In lieu of these capabilities, users who upgrade from 2.6 to 3.1.x can install Data Analytics Studio. Download Data Analytics Studio.