To improve useability and functionality, Hive 3 significantly changed table creation.
- Creates ACID-compliant table, which is the default in CDP
- Supports simple writes and inserts
- Writes to multiple partitions
- Inserts multiple data updates in a single SELECT statement
- Eliminates the need for bucketing.
If you have an ETL pipeline that creates tables in Hive, the tables will be created as ACID. Hive now tightly controls access and performs compaction periodically on the tables. The way you access managed Hive tables from Spark and other clients changes. In CDP, access to external tables requires you to set up security access permissions.
Before Upgrade to CDP
In CDH and HDP 2.6.5, by default CREATE TABLE created a non-ACID table.
After Upgrade to CDP
In CDP, by default CREATE TABLE creates a full, ACID transactional table in ORC format.
- Configure legacy CREATE TABLE behavior (see the next section) to create external tables by default.
- To read Hive ACID tables from Spark, you connect to Hive using the Hive Warehouse Connector (HWC) or the HWC Spark Direct Reader. To write ACID tables to Hive from Spark, you use the HWC and HWC API. Spark creates an external table with the purge property when you do not use the HWC API. For more information, see HWC Spark Direct Reader and Hive Warehouse Connector.
- Set up Ranger policies and HDFS ACLs for tables. For more information, see HDFS ACLs and HDFS ACL Permissions.