INSERT ... VALUES, UPDATE, DELETE, and MERGE SQL Statements
INSERT ... VALUES, UPDATE, DELETE, and MERGE SQL statements are supported in Apache Hive 0.14 and later. The INSERT ... VALUES statement enables users to write data to Apache Hive from values provided in SQL statements. The UPDATE and DELETE statements enable users to modify and delete values already written to Hive. The MERGE statement streamlines UPDATEs, DELETEs, and change data capture operations by drawing on coexisting tables. All four statements support auto-commit, which means that each statement is a separate transaction that is automatically committed after the SQL statement is executed.
The INSERT ... VALUES, UPDATE, and DELETE statements require the following property values in the hive-site.xml configuration file:
Configuration Property | Required Value |
---|---|
|
|
|
|
Note | |
---|---|
Administrators must use a transaction manager that supports ACID and the ORC file format to use transactions. See Hive Transactions for information about configuring other properties related to use ACID-based transactions. |
INSERT ... VALUES Statement
The INSERT ... VALUES statement is revised to support adding multiple values into table columns directly from SQL statements. A valid INSERT ... VALUES statement must provide values for each column in the table. However, users may assign null values to columns for which they do not want to assign a value. In addition, the PARTITION clause must be included in the DML.
INSERT INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] VALUES values_row [, values_row...]
In this syntax, values_row
is (value [, value])
and where value
is either NULL
or any SQL
literal.
The following example SQL statements demonstrate several usage variations of this statement:
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3,2)) CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
CREATE TABLE pageviews (userid VARCHAR(64), link STRING, from STRING) PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;
INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23') VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null);
INSERT INTO TABLE pageviews PARTITION (datestamp) VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');
UPDATE Statement
Use the UPDATE statement to modify data already written to Apache Hive. Depending on
the condition specified in the optional WHERE
clause, an
UPDATE
statement may affect every row in a table. You must have both
the SELECT and UPDATE privileges to use this statement.
UPDATE tablename SET column = value [, column = value ...] [WHERE expression];
The UPDATE statement has the following limitations:
The expression in the WHERE clause must be an expression supported by a Hive SELECT clause.
Partition and bucket columns cannot be updated.
Query vectorization is automatically disabled for UPDATE statements. However, updated tables can still be queried using vectorization.
Subqueries are not allowed on the right side of the SET statement.
The following example demonstrates the correct usage of this statement:
UPDATE students SET name = null WHERE gpa <= 1.0;
DELETE Statement
Use the DELETE
statement to delete data already written to Apache
Hive.
DELETE FROM tablename [WHERE expression];
The DELETE statement has the following limitation: query vectorization is automatically disabled for the DELETE operation. However, tables with deleted data can still be queried using vectorization.
The following example demonstrates the correct usage of this statement:
DELETE FROM students WHERE gpa <= 1,0;
MERGE Statement
Use the MERGE statement to efficiently perform record-level INSERT, UPDATE, and DELETE operations within Hive tables. The MERGE statement can be a key tool of Hadoop data management.
The MERGE statement is based on ANSI-standard SQL.
The following SQL statement is an example of valid MERGE usage:
merge into customer using ( select * from new_customer_stage) sub on sub.id = customer.id when matched then update set name = sub.name, state = sub.new_state when not matched then insert values (sub.id, sub.name, sub.state);
Tip | |
---|---|
For a more in-depth discussion of practical applications of the MERGE statement, see the blog Apache Hive: Moving Beyond Analytics Offload with SQL MERGE. |