Data Access
Also available as:
PDF
loading table of contents...

INSERT ... VALUES, UPDATE, DELETE, and MERGE SQL Statements

INSERT ... VALUES, UPDATE, DELETE, and MERGE SQL statements are supported in Apache Hive 0.14 and later. The INSERT ... VALUES statement enables users to write data to Apache Hive from values provided in SQL statements. The UPDATE and DELETE statements enable users to modify and delete values already written to Hive. The MERGE statement streamlines UPDATEs, DELETEs, and change data capture operations by drawing on coexisting tables. All four statements support auto-commit, which means that each statement is a separate transaction that is automatically committed after the SQL statement is executed.

The INSERT ... VALUES, UPDATE, and DELETE statements require the following property values in the hive-site.xml configuration file:

Configuration PropertyRequired Value

hive.enforce.bucketing

true

hive.exec.dynamic.partition.mode

nonstrict

[Note]Note

Administrators must use a transaction manager that supports ACID and the ORC file format to use transactions. See Hive Transactions for information about configuring other properties related to use ACID-based transactions.

INSERT ... VALUES Statement

The INSERT ... VALUES statement is revised to support adding multiple values into table columns directly from SQL statements. A valid INSERT ... VALUES statement must provide values for each column in the table. However, users may assign null values to columns for which they do not want to assign a value. In addition, the PARTITION clause must be included in the DML.

INSERT INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] VALUES values_row [, values_row...] 

In this syntax, values_row is (value [, value]) and where value is either NULL or any SQL literal.

The following example SQL statements demonstrate several usage variations of this statement:

CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3,2)) CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC; 
INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32); 
CREATE TABLE pageviews (userid VARCHAR(64), link STRING, from STRING) PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC; 
INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23') VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null); 
INSERT INTO TABLE pageviews PARTITION (datestamp) VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21'); 

UPDATE Statement

Use the UPDATE statement to modify data already written to Apache Hive. Depending on the condition specified in the optional WHERE clause, an UPDATE statement may affect every row in a table. You must have both the SELECT and UPDATE privileges to use this statement.

UPDATE tablename SET column = value [, column = value ...] [WHERE expression]; 

The UPDATE statement has the following limitations:

  • The expression in the WHERE clause must be an expression supported by a Hive SELECT clause.

  • Partition and bucket columns cannot be updated.

  • Query vectorization is automatically disabled for UPDATE statements. However, updated tables can still be queried using vectorization.

  • Subqueries are not allowed on the right side of the SET statement.

The following example demonstrates the correct usage of this statement:

UPDATE students SET name = null WHERE gpa <= 1.0; 

DELETE Statement

Use the DELETE statement to delete data already written to Apache Hive.

DELETE FROM tablename [WHERE expression]; 

The DELETE statement has the following limitation: query vectorization is automatically disabled for the DELETE operation. However, tables with deleted data can still be queried using vectorization.

The following example demonstrates the correct usage of this statement:

DELETE FROM students WHERE gpa <= 1,0; 

MERGE Statement

Use the MERGE statement to efficiently perform record-level INSERT, UPDATE, and DELETE operations within Hive tables. The MERGE statement can be a key tool of Hadoop data management.

The MERGE statement is based on ANSI-standard SQL.

The following SQL statement is an example of valid MERGE usage:

merge into customer
using ( select * from new_customer_stage) sub
on sub.id = customer.id
when matched then update set name = sub.name, state = sub.new_state
when not matched then insert values (sub.id, sub.name, sub.state);
[Tip]Tip

For a more in-depth discussion of practical applications of the MERGE statement, see the blog Apache Hive: Moving Beyond Analytics Offload with SQL MERGE.