Creating a Table
To improve useability and functionality, Hive 3 significantly changed table creation.
- Creates ACID-compliant table, which is the default in CDP
- Supports simple writes and inserts
- Writes to multiple partitions
- Inserts multiple data updates in a single SELECT statement
- Eliminates the need for bucketing.
If you have an ETL pipeline that creates tables in Hive, the tables will be created as ACID. Hive now tightly controls access and performs compaction periodically on the tables. The way you access managed Hive tables from Spark and other clients changes. In CDP, access to external tables requires you to set up security access permissions.
Before Upgrade to CDP
In CDH and HDP 2.6.5, by default CREATE TABLE created a non-ACID table.
After Upgrade to CDP
In CDP, by default CREATE TABLE creates a full, ACID transactional table in ORC format.
- The upgrade process converts Hive managed tables in CDH to external tables. You must change your scripts to create the types of tables required by your use case. For more information, see Apache Hive 3 Tables.
- Configure legacy CREATE TABLE behavior (see link below) to create external tables by default.
- To read Hive ACID tables from Spark, you connect to Hive using the Hive Warehouse Connector (HWC) or the HWC Spark Direct Reader. To write ACID tables to Hive from Spark, you use the HWC and HWC API. Spark creates an external table with the purge property when you do not use the HWC API. For more information, see HWC Spark Direct Reader and Hive Warehouse Connector (links below).
- Set up Ranger policies and HDFS ACLs for tables. For more information, see HDFS ACLs and HDFS ACL Permissions.