Understanding CREATE TABLE behavior
Hive table creation has changed significantly since Hive 3 to improve useability and functionality. If you are upgrading from CDH or HDP, you must understand the changes affecting legacy table creation behavior.
- Creates ACID-compliant table, which is the default in CDP
- Supports simple writes and inserts
- Writes to multiple partitions
- Inserts multiple data updates in a single SELECT statement
- Eliminates the need for bucketing.
If you have an ETL pipeline that creates tables in Hive, the tables will be created as ACID. Hive now tightly controls access and performs compaction periodically on the tables. The way you access managed Hive tables from Spark and other clients changes. In CDP, access to external tables requires you to set up security access permissions.
You must understand the behavior of the CREATE TABLE statement in legacy platforms like CDH or HDP and how the legacy behavior changes after you upgrade to CDP.
Before upgrading to CDP
In CDH and HDP 2, by default CREATE TABLE creates a non-ACID table in plain text format.
After upgrading to CDP
- If you are upgrading from HDP 2, by default CREATE TABLE creates a full ACID transactional table in ORC format.
- If you are upgrading from CDH to CDP 7.1.0 through 7.1.7.x, by default CREATE TABLE creates a full ACID transactional table in ORC format
- If you are upgrading from CDH to CDP 7.1.8 or higher, by default the CREATE TABLE statement
creates a non-ACID table in plain text format, which is the same as the legacy behavior in
The table is created with the purge functionality (
external.table.purge = 'true'). Therefore, when the table is dropped, data is also deleted from the file system.
Legacy behavior might solve compatibility problems with your scripts during data migration, for example, when running ETL. However, Apache Hive full ACID (transactional) tables deliver better performance, security, and user experience over non-transactional tables. Hive 3 tables are ACID-compliant, transactional tables having the following full ACID capabilities on data in ORC format only:
Using ACID-compliant, transactional tables causes no performance or operational overload.
Now that you understand the behavior of the CREATE TABLE statement, you can choose to modify the table behavior by configuring certain properties. For more information, see Configuring CREATE TABLE behavior.
If you are a Spark user, switching to legacy behavior is unnecessary. Calling ‘create table’ from SparkSQL, for example, creates an external table after upgrading to CDP as it did before the upgrade. You can connect to Hive using the Hive Warehouse Connector (HWC) to read Hive ACID tables from Spark. To write ACID tables to Hive from Spark, you use the HWC and HWC API. Spark creates an external table with the purge property when you do not use the HWC API. For more information, see Hive Warehouse Connector for accessing Spark data.