DML refers to "Data Manipulation Language", a subset of SQL statements that modify the data stored in tables. Because Impala focuses on query performance and leverages the append-only nature of HDFS storage, currently Impala only supports a small set of DML statements:
- DELETE Statement (CDH 5.10 or higher only). Works for Kudu tables only.
- INSERT Statement.
- LOAD DATA Statement. Does not apply for HBase or Kudu tables.
- UPDATE Statement (CDH 5.10 or higher only). Works for Kudu tables only.
- UPSERT Statement (CDH 5.10 or higher only). Works for Kudu tables only.
INSERT in Impala is primarily optimized for inserting large volumes of data in a single statement, to make effective use of the multi-megabyte HDFS blocks. This is the way in Impala to create new data files. If you intend to insert one or a few rows at a time, such as using the INSERT ... VALUES syntax, that technique is much more efficient for Impala tables stored in HBase. See Using Impala to Query HBase Tables for details.
LOAD DATA moves existing data files into the directory for an Impala table, making them immediately available for Impala queries. This is one way in Impala to work with data files produced by other Hadoop components. (CREATE EXTERNAL TABLE is the other alternative; with external tables, you can query existing data files, while the files remain in their original location.)
In CDH 5.10 / Impala 2.8 and higher, Impala does support the UPDATE, DELETE, and UPSERT statements for Kudu tables. For HDFS or S3 tables, to simulate the effects of an UPDATE or DELETE statement in other database systems, typically you use INSERT or CREATE TABLE AS SELECT to copy data from one table to another, filtering out or changing the appropriate rows during the copy operation.
You can also achieve a result similar to UPDATE by using Impala tables stored in HBase. When you insert a row into an HBase table, and the table already contains a row with the same value for the key column, the older row is hidden, effectively the same as a single-row UPDATE.
Impala can perform DML operations for tables or partitions stored in the Amazon S3 filesystem with CDH 5.8 / Impala 2.6 and higher. See Using Impala with the Amazon S3 Filesystem for details.
The other major classifications of SQL statements are data definition language (see DDL Statements) and queries (see SELECT Statement).