DDL statements
DDL refers to Data Definition Language
, a subset of SQL statements that change
the structure of the database schema in some way, typically by creating, deleting, or modifying
schema objects such as databases, tables, and views. Most Impala DDL statements start with the
keywords CREATE
, DROP
, or ALTER
.
The Impala DDL statements are:
- ALTER TABLE statement
- ALTER VIEW statement
- COMPUTE STATS statement
- CREATE DATABASE statement
- CREATE FUNCTION statement
- CREATE TABLE statement
- CREATE VIEW statement
- DROP DATABASE statement
- DROP FUNCTION statement
- DROP TABLE statement
- DROP VIEW statement
- GRANT statement
- REVOKE statement
After Impala executes a DDL command, information about available tables, columns, views,
partitions, and so on is automatically synchronized between all the Impala nodes in a
cluster. (Prior to Impala 1.2, you had to issue a REFRESH
or an
INVALIDATE METADATA
statement manually on the other nodes to make them
aware of the changes.)
If the timing of automatic metadata updates is significant, such as when connecting to different
Impala nodes within an impala-shell
session for load balancing or using
round-robin scheduling, enable the SYNC_DDL
query option. This will ensure
that the DDL statement waits until all nodes are notified about the metadata changes.
See Using Impala with the Amazon S3 Filesystem for details about how Impala DDL statements interact with tables and partitions stored in the Amazon S3 filesystem.
Although INSERT
is classified as a DML statement, when the
SYNC_DDL
option is enabled, INSERT
statements also delay
their completion until all the underlying data and metadata changes are propagated to all
Impala nodes and this option applies to all filesystem-based tables. Internally, Impala
inserts have similarities with DDL statements in traditional database systems, because they
create metadata needed to track HDFS block locations for new files and they potentially add
new partitions to partitioned tables.
Because the SYNC_DDL
query option makes each DDL operation take longer than normal, you
might only enable it before the last DDL operation in a sequence. For example, if you are running a script
that issues multiple of DDL operations to set up an entire new schema, add several new partitions, and so on,
you might minimize the performance overhead by enabling the query option only before the last
CREATE
, DROP
, ALTER
, or INSERT
statement.
The script only finishes when all the relevant metadata changes are recognized by all the Impala nodes, so
you could connect to any node and issue queries through it.
The classification of DDL, DML, and other statements is not necessarily the same between Impala and Hive.
Impala organizes these statements in a way intended to be familiar to people familiar with relational
databases or data warehouse products. Statements that modify the metastore database, such as COMPUTE
STATS
, are classified as DDL. Statements that only query the metastore database, such as
SHOW
or DESCRIBE
, are put into a separate category of utility statements.