HMS table storage

You need to understand how HMS stores Hive tables when you run a CREATE TABLE statement or migrate a table to Cloudera Data Platform. The success or failure of the statement, the resulting table type, and the table location depends on a number of factors.

HMS table transformations

The HMS includes the following Hive metadata about tables that you create:
  • A table definition
  • Column names
  • Data types
  • Comments in a central schema repository
When you use EXTERNAL keyword in the CREATE TABLE statement, HMS stores the table as an external table. When you omit the EXTERNAL keyword and create a managed table, or ingest a managed table, HMS might translate the table into an external table or the table creation can fail, depending on the table properties. An important table property that affects table transformatons is the ACID or Non-ACID table type:
Non-ACID
Table properties do not contain any ACID related properties set to true. For example, the table does not contain such properties transactional=true or insert_only=true.
ACID
Table properties do contain one or more ACID properties set to true.
Full ACID
Table properties contain transactional=true but not insert_only=true.
Insert-only ACID
Table properties contain insert_only=true.
ACID Managed Location Property Comments Action
Non-ACID Yes Yes Migrated to CDP, for example from an HDP or CDH cluster Table stored as external
Non-ACID Yes No Table location is null Table stored in subdirectory metastore.warehouse.external.dir

HMS detects type of client for interacting with HMS, for example Hive or Spark, and compares the capabilities of the client with the table requirement. HMS performs the following actions, depending on the results of the comparison:

Table requirement Client meets requirements Managed Table ACID table type Action
Client can write to any type of ACID table No Yes Yes CREATE TABLE fails
Client can write to full ACID table No Yes insert_only=true CREATE TABLE fails
Client can write to insert-only ACID table No Yes insert_only=true CREATE TABLE fails
If, for example, a Spark client does not have the capabilities required, the following type of error message appears:
Spark has no access to table `mytable`. Clients can access this table only if
they have the following capabilities: CONNECTORREAD,HIVEFULLACIDREAD, HIVEFULLACIDWRITE,
HIVEMANAGESTATS, HIVECACHEINVALIDATE, . . .