Checking and correcting Hive table locations
As a Data Engineer, you need to understand the relocation of files after the upgrade process. The file type and other factors affect the relocation during the upgrade.
The upgrade process changes table types in some cases. The following table compares Hive table types and ACID operations before and after upgrading. The ownership of the Hive table file is a factor in determining table types and ACID operations after the upgrade.
Before Upgrading | After Upgrading | ||||
Table Type | ACID v1 | Format | Owner (user) of Hive Table File | Table Type | ACID v2 |
External | No | Native or non-native | hive or non-hive | External | No |
Managed | Yes | ORC | hive or non-hive | Managed, updatable** | Yes |
Managed | No | ORC | hive | Managed, updatable** | Yes |
non-hive | External, with data delete* | No | |||
Managed | No | Native (but non-ORC) | hive | Managed, insert only** | Yes |
non-hive | External, with data delete* | No | |||
Managed | No | Non-native | hive or non-hive | External, with data delete* | No |
* See Dropping an External Table Along with the Data.
** Not SparkSQL-compatible
If you had external files before the upgrade, the upgrade process carries the
external files over to CDP after upgrading with no change in location. The external
files continue to reside in the /apps/hive/warehouse
directory.
Managed, ACID tables that are not owned by the hive user remain managed tables after the upgrade, but hive becomes the owner.
After the upgrade, the location of managed tables or partitions do not change under any one of the following conditions:
-
The old table or partition directory was not in its default location /apps/hive/warehouse before the upgrade.
-
The old table or partition directory is in a different encryption zone than the new warehouse directory.