Apache Hive overview
Also available as:
PDF

Changes after upgrading to Apache Hive 3

To locate and use your Apache Hive 3 tables after an upgrade, you need to understand the changes that occur during the upgrade process. Changes to the management and location of tables, permissions to HDFS directories, table types, and ACID-compliance occur.

Hive Management of Tables

Hive 3 takes more control of tables than Hive 2, and requires managed tables adhere to a strict definition. The level of control Hive takes over tables is similar to that of a traditional data base. If there's a change to the Hive data, hive knows about it. This control is a required framework for perfortmance features. For example, if Hive knows that resolving a query does not require scanning tables for new data, Hive returns results from the hive query result cache.

When the underlying data in a materialized view changes, Hive needs to rebuild the materialized view. ACID properties reveal exactly which rows changed, and only those need to be processed and added to the materialized view.

Hive changes to ACID properties

Hive 2.x and 3.x have transactional and non-transactional tables. Transactional tables have atomic, consistent, isolation, and durable (ACID) properties. In Hive 2.x, the initial version of ACID transaction processing is ACID v1. In Hive 3.x, the mature version of ACID is ACID v2, which is the default table type in HDP 3.0.

Native and non-native storage formats

Storage formats are a factor in upgrade changes to table types. Hive 2.x and 3.x supports the following Hadoop native and non-native storage formats:

  • Native: Tables with built-in support in Hive, such as those in the following file formats:
    • Text
    • Sequence File
    • RC File
    • AVRO File
    • ORC File
    • Parquet File
  • Non-native: Tables that use a storage handler, such as the DruidStorageHandler or HBaseStorageHandler

HDP 3.x upgrade changes to table types

The following table compares Hive table types and ACID operations before an upgrade from HDP 2.x and after an upgrade to HDP 3.x. The ownership of the Hive table file is a factor in determining table types and ACID operations after the upgrade.
Table 1. HDP 2.x and 3.x Table Type Comparison
HDP 2.x HDP 3.x
Table Type ACID v1 Format Owner (user) of Hive Table File Table Type ACID v2
External No Native or non-native hive or non-hive External No
Managed Yes ORC hive or non-hive Managed, updatable Yes
Managed No ORC hive Managed, updatable Yes
non-hive External, with data delete* No
Managed No Native (but non-ORC) hive Managed, insert only Yes
non-hive External, with data delete* No
Managed No Non-native hive or non-hive External, with data delete* No

* See Dropping an External Table Along with the Data (link below).

Removal of Hive View and Tez View

HDP 3.1.x does not include Hive View or Tez View. In lieu of these capabilities, users who upgrade from 2.6 to 3.1.x can install Data Analytics Studio. Download Data Analytics Studio.

Hive Impersonation and Security Changes

Hive impersonation was enabled by default in Hive 2 (doAs=true), and disabled by default in Hive 3. Hive impersonation runs Hive as end user, or not. Ranger is recommended for use with Hive 3. You can control HDFS security using Ranger policies, which is simpler than setting up permissions.

Other HDP 3.x upgrade changes

Managed, ACID tables that are not owned by the hive user remain managed tables after the upgrade, but hive becomes the owner.

After the upgrade, the format of a Hive table is the same as before the upgrade. For example, native or non-native tables remain native or non-native, respectively.

After the upgrade, the location of managed tables or partitions do not change under any one of the following conditions:

  • The old table or partition directory was not in its default location /apps/hive/warehouse before the upgrade.
  • The old table or partition is in a different file system than the new warehouse directory.
  • The old table or partition directory is in a different encryption zone than the new warehouse directory.

Otherwise, the location of managed tables or partitions does change: The upgrade process moves managed files to /warehouse/tablespace/managed/hive. By default, Hive places any new external tables you create in HDP 3.x in /warehouse/tablespace/external/hive.

The /apps/hive directory, which is the former location of the Hive 2.x warehouse, might or might not exist in HDP 3.x.

For disaster recovery, Hive supports incremental replication of tables from one cluster to another.

ACID table conversion

During the upgrade process, you can override the conversion of ACID v1 tables to ACID v2. For example, you can choose to first convert everything except ACID v1 tables to external tables, and then later convert them to ACID tables one by one.

After upgrading, to convert a non-transactional table to an ACID v2 transactional table, you use the ALTER TABLE command and set table properties to 'transaction'='true'.For example:
ALTER TABLE T3 SET TBLPROPERTIES ('transactional'='true');