Understanding CREATE TABLE behavior
Hive table creation has changed significantly since Hive 3 to improve useability and functionality. You must understand the default behavior of the CREATE TABLE statement in CDP Public Cloud.
- Creates ACID-compliant table, which is the default in CDP
- Supports simple writes and inserts
- Writes to multiple partitions
- Inserts multiple data updates in a single SELECT statement
- Eliminates the need for bucketing.
If you have an ETL pipeline that creates tables in Hive, the tables will be created as ACID. Hive now tightly controls access and performs compaction periodically on the tables. Using ACID-compliant, transactional tables causes no performance or operational overload. The way you access managed Hive tables from Spark and other clients changes. In CDP, access to external tables requires you to set up security access permissions.
In CDP Public Cloud, by default, the CREATE TABLE statement creates a full ACID transactional table in ORC format or insert-only transactional tables for all other table formats. You can choose to modify the default behavior by configuring certain properties. The order of preference for configuration is as follows:
Modify the default CREATE TABLE behavior
- Override default behavior when creating the table
- Irrespective of the database, session, or site-level settings, you can override the default
table behavior by using the MANAGED or EXTERNAL keyword in the CREATE TABLE
statement.
CREATE [MANAGED][EXTERNAL] TABLE foo (id INT);
- Set the default table type at a database level
- You can use the database property,
defaultTableType=EXTERNAL or ACID
to specify the default table type to be created using the CREATE TABLE statement. You can specify this property when creating the database or at a later point using the ALTER DATABASE statement. For example:CREATE DATABASE test_db WITH DBPROPERTIES ('defaultTableType'='EXTERNAL');
In this example, tables created under the
test_db
database using the CREATE TABLE statement creates external tables with the purge fucntionality enabled (external.table.purge = 'true'
).You can also choose to configure a database to allow only external tables to be created and prevent creation of ACID tables. While creating a database, you can set thedatabase property, EXTERNAL_TABLES_ONLY=true
to ensure that only external tables are created in the database. For example:CREATE DATABASE test_db WITH DBPROPERTIES ('EXTERNAL_TABLES_ONLY'='true');
- Set the default table type at a session level
- You can configure the CREATE TABLE behavior within an existing beeline session by setting
hive.create.as.external.legacy
to true or false. Setting the value to true results in configuring the CREATE TABLE statement to create external tables by default.When the session ends, the default CREATE TABLE behavior also ends.
- Set the default table type at a site level
- You can configure the CREATE TABLE behavior at the site level by configuring the
hive.create.as.insert.only
andhive.create.as.acid
properties in Cloudera Manager under Hive configuration. When configured at the site level, the behavior persists from session to session.
If you are a Spark user, switching to legacy behavior is unnecessary. Calling ‘create table’ from SparkSQL, for example, creates an external table after upgrading to CDP as it did before the upgrade. You can connect to Hive using the Hive Warehouse Connector (HWC) to read Hive ACID tables from Spark. To write ACID tables to Hive from Spark, you use the HWC and HWC API. Spark creates an external table with the purge property when you do not use the HWC API. For more information, see Hive Warehouse Connector for accessing Spark data.