Changes to HDP Hive tables

As a Data Scientist, Architect, Analyst, or other Hive user you need to locate and use your Apache Hive 3 tables after an upgrade. You also need to understand the changes that occur during the upgrade process.

Managed, ACID tables that are not owned by the hive user remain managed tables after the upgrade, but hive becomes the owner.

After the upgrade, the format of a Hive table is the same as before the upgrade. For example, native or non-native tables remain native or non-native, respectively.

After the upgrade, the location of managed tables or partitions do not change under any one of the following conditions:

  • The old table or partition directory was not in its default location /apps/hive/warehouse before the upgrade.
  • The old table or partition is in a different file system than the new warehouse directory.
  • The old table or partition directory is in a different encryption zone than the new warehouse directory.

Otherwise, the upgrade process from HDP to CDP moves managed files to the Hive warehouse /warehouse/tablespace/managed/hive. The upgrade process carries the external files over to CDP with no change in location. By default, Hive places any new external tables you create in /warehouse/tablespace/external/hive. The upgrade process sets the hive.metastore.warehouse.dir property to this location, designating it the Hive warehouse location.

Changes to table references using dot notation

Upgrading to CDP includes the Hive-16907 bug fix, which rejects `db.table` in SQL queries. The dot (.) is not allowed in table names. To reference the database and table in a table name, both must be enclosed in backticks as follows: `db`.`table`.

Changes to ACID properties

Hive 3.x in CDP Private Cloud Base supports transactional and non-transactional tables. Transactional tables have atomic, consistent, isolation, and durable (ACID) properties. In Hive 2.x, the initial version of ACID transaction processing was ACID v1. In Hive 3.x, the mature version of ACID is ACID v2, which is the default table type in CDP Private Cloud Base.

Native and non-native storage formats

Storage formats are a factor in upgrade changes to table types. Hive 2.x and 3.x support the following native and non-native storage formats:

  • Native: Tables with built-in support in Hive, such as those in the following file formats:
    • Text
    • Sequence File
    • RC File
    • AVRO File
    • ORC File
    • Parquet File
  • Non-native: Tables that use a storage handler, such as the DruidStorageHandler or HBaseStorageHandler

CDP upgrade changes to HDP table types

The following table compares Hive table types and ACID operations before an upgrade from HDP 2.x and after an upgrade to CDP. The ownership of the Hive table file is a factor in determining table types and ACID operations after the upgrade.
Table 1. HDP 2.x and CDP Table Type Comparison
Table Type ACID v1 Format Owner (user) of Hive Table File Table Type ACID v2
External No Native or non-native hive or non-hive External No
Managed Yes ORC hive or non-hive Managed, updatable Yes
Managed No ORC hive Managed, updatable Yes
non-hive External, with data delete No
Managed No Native (but non-ORC) hive Managed, insert only Yes
non-hive External, with data delete No
Managed No Non-native hive or non-hive External, with data delete No