Configuring legacy CREATE TABLE behavior

After you upgrade to CDP and migrate old tables, you might want to briefly switch to Hive legacy behavior. Legacy behavior might solve compatibility problems with your scripts during data migration, for example, when running ETL.

By default, executing a CREATE TABLE statement creates a managed Apache Hive 3 table in the Hive metastore. You can change the default behavior to use the legacy CREATE TABLE behavior. When you configure legacy behavior, CREATE TABLE generates external tables. Legacy behavior is recommended only during upgrading due to the advantages of full ACID transactional tables over external tables.

Apache Hive full ACID (transactional) tables deliver better performance, security, and user experience than non-transactional tables. By default, executing a CREATE TABLE statement creates a managed Apache Hive 3 table in the Hive metastore. Hive 3 tables are ACID-compliant, transactional tables having the following full ACID capabilities on data in ORC format only:
  • Insert
  • Update
  • Delete

Using ACID-compliant, transactional tables causes no performance or operational overload. Bucketing is not necessary.

If you are a Spark user, switching to legacy behavior is unnecessary. Calling ‘create table’ from SparkSQL, for example, creates an external table after upgrading to CDP as it did before the upgrade.

Configure legacy CREATE TABLE behavior

When you configure legacy behavior, CREATE TABLE creates an external table in your specified warehouse, which is /warehouse/tablespace/external/hive by default. To configure legacy behavior at the session level, you can pass a property to HiveServer (HS2) in the Beeline connection string when you launch Hive. Alternatively, you can pass the property on the Hive command line to switch to the old behavior. You can also configure legacy create table behavior at the site level by configuring properties in Cloudera Manager. When configured at the site level, legacy behavior persists from session to session.

Session-level configuration

Step 1 describes two ways of configuring legacy CREATE TABLE behavior. You can override the configured legacy behavior as described in step 2 to create a managed table.
  1. Choose one of the following ways to configure legacy CREATE TABLE behavior:
    • To configure legacy behavior in any JDBC client, include hiveCreateAsExternalLegacy=true in the connection string. For example, in Beeline, include the connection string to launch Hive:

      beeline -u jdbc:hive2://10.65.13.98:10000/default;hiveCreateAsExternalLegacy=true \
      -n <your user name> -p
    • To configure legacy behavior within an existing beeline session, set hive.create.as.external.legacy=true. For example:

      hive> SET hive.create.as.external.legacy=true;              
    You can purge the table from the file system and metastore. You can change the DROP behavior, to remove metadata only.
  2. Override the configured legacy behavior to create a managed table by using the MANAGED keyword.
    CREATE MANAGED TABLE test (id INT);                 
    When your session ends, the create legacy behavior also ends. If you issue a CREATE TABLE statement, Hive creates either an insert-only or full ACID table, depending on how you set the following table properties:
    • hive.create.as.insert.only
    • hive.create.as.acid

Site-level configuration

When you configure legacy create table behavior at the site level, the legacy behavior persists from session to session. You configure this behavior at the site level using Cloudera Manager as follows:
  1. In Cloudera Manager > Clusters > Hive On Tez, search for hive.create.
  2. If Create Tables as ACID Insert Only and Create Tables as Full ACID properties appear and are set, uncheck the properties.