DESCRIBE Statement
The DESCRIBE statement displays metadata about a table, such as the column names and their data types. Its syntax is:
DESCRIBE [FORMATTED] table
You can use the abbreviation DESC for the DESCRIBE statement.
The DESCRIBE FORMATTED variation displays additional information, in a format familiar to users of Apache Hive. The extra information includes low-level details such as whether the table is internal or external, when it was created, the file format, the location of the data in HDFS, whether the object is a table or a view, and (for views) the text of the query from the view definition.
Usage notes:
After the impalad daemons are restarted, the first query against a table can take longer than subsequent queries, because the metadata for the table is loaded before the query is processed. This one-time delay for each table can cause misleading results in benchmark tests or cause unnecessary concern. To "warm up" the Impala metadata cache, you can issue a DESCRIBE statement in advance for each table you intend to access later.
When you are dealing with data files stored in HDFS, sometimes it is important to know details such as the path of the data files for an Impala table, and the host name for the namenode. You can get this information from the DESCRIBE FORMATTED output. You specify HDFS URIs or path specifications with statements such as LOAD DATA and the LOCATION clause of CREATE TABLE or ALTER TABLE. You might also use HDFS URIs or paths with Linux commands such as hadoop and hdfs to copy, rename, and so on, data files in HDFS.
If you connect to different Impala nodes within an impala-shell session for load-balancing purposes, you can enable the SYNC_DDL query option to make each DDL statement wait before returning, until the new or changed metadata has been received by all the Impala nodes. See SYNC_DDL Query Option for details.
Each table can also have associated table statistics and column statistics. To see these categories of information, use the SHOW TABLE STATS table_name and SHOW COLUMN STATS table_name statements. See SHOW Statement for details.
Examples:
The following example shows the results of both a standard DESCRIBE and DESCRIBE FORMATTED for different kinds of schema objects:
- DESCRIBE for a table or a view returns the name, type, and comment for each of the columns. For a view, if the column value is computed by an expression, the column name is automatically generated as _c0, _c1, and so on depending on the ordinal number of the column.
- A table created with no special format or storage clauses is designated as a MANAGED_TABLE (an "internal table" in Impala terminology). Its data files are stored in an HDFS directory under the default Hive data directory. By default, it uses Text data format.
- A view is designated as VIRTUAL_VIEW in DESCRIBE FORMATTED output. Some of its properties are NULL or blank because they are inherited from the base table. The text of the query that defines the view is part of the DESCRIBE FORMATTED output.
- A table with additional clauses in the CREATE TABLE statement has differences in DESCRIBE FORMATTED output. The output for T2 includes the EXTERNAL_TABLE keyword because of the CREATE EXTERNAL TABLE syntax, and different InputFormat and OutputFormat fields to reflect the Parquet file format.
[localhost:21000] > create table t1 (x int, y int, s string); Query: create table t1 (x int, y int, s string) [localhost:21000] > describe t1; Query: describe t1 Query finished, fetching results ... +------+--------+---------+ | name | type | comment | +------+--------+---------+ | x | int | | | y | int | | | s | string | | +------+--------+---------+ Returned 3 row(s) in 0.13s [localhost:21000] > describe formatted t1; Query: describe formatted t1 Query finished, fetching results ... +------------------------------+--------------------------------------------------------------------+----------------------+ | name | type | comment | +------------------------------+--------------------------------------------------------------------+----------------------+ | # col_name | data_type | comment | | | NULL | NULL | | x | int | None | | y | int | None | | s | string | None | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | describe_formatted | NULL | | Owner: | cloudera | NULL | | CreateTime: | Mon Jul 22 17:03:16 EDT 2013 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Protect Mode: | None | NULL | | Retention: | 0 | NULL | | Location: | hdfs://127.0.0.1:8020/user/hive/warehouse/describe_formatted.db/t1 | NULL | | Table Type: | MANAGED_TABLE | NULL | | Table Parameters: | NULL | NULL | | | transient_lastDdlTime | 1374526996 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | 0 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | +------------------------------+--------------------------------------------------------------------+----------------------+ Returned 26 row(s) in 0.03s [localhost:21000] > create view v1 as select x, upper(s) from t1; Query: create view v1 as select x, upper(s) from t1 [localhost:21000] > describe v1; Query: describe v1 Query finished, fetching results ... +------+--------+---------+ | name | type | comment | +------+--------+---------+ | x | int | | | _c1 | string | | +------+--------+---------+ Returned 2 row(s) in 0.10s [localhost:21000] > describe formatted v1; Query: describe formatted v1 Query finished, fetching results ... +------------------------------+------------------------------+----------------------+ | name | type | comment | +------------------------------+------------------------------+----------------------+ | # col_name | data_type | comment | | | NULL | NULL | | x | int | None | | _c1 | string | None | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | describe_formatted | NULL | | Owner: | cloudera | NULL | | CreateTime: | Mon Jul 22 16:56:38 EDT 2013 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Protect Mode: | None | NULL | | Retention: | 0 | NULL | | Table Type: | VIRTUAL_VIEW | NULL | | Table Parameters: | NULL | NULL | | | transient_lastDdlTime | 1374526598 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | null | NULL | | InputFormat: | null | NULL | | OutputFormat: | null | NULL | | Compressed: | No | NULL | | Num Buckets: | 0 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | | | NULL | NULL | | # View Information | NULL | NULL | | View Original Text: | SELECT x, upper(s) FROM t1 | NULL | | View Expanded Text: | SELECT x, upper(s) FROM t1 | NULL | +------------------------------+------------------------------+----------------------+ Returned 28 row(s) in 0.03s [localhost:21000] > create external table t2 (x int, y int, s string) stored as parquet location '/user/cloudera/sample_data'; [localhost:21000] > describe formatted t2; Query: describe formatted t2 Query finished, fetching results ... +------------------------------+----------------------------------------------------+----------------------+ | name | type | comment | +------------------------------+----------------------------------------------------+----------------------+ | # col_name | data_type | comment | | | NULL | NULL | | x | int | None | | y | int | None | | s | string | None | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | describe_formatted | NULL | | Owner: | cloudera | NULL | | CreateTime: | Mon Jul 22 17:01:47 EDT 2013 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Protect Mode: | None | NULL | | Retention: | 0 | NULL | | Location: | hdfs://127.0.0.1:8020/user/cloudera/sample_data | NULL | | Table Type: | EXTERNAL_TABLE | NULL | | Table Parameters: | NULL | NULL | | | EXTERNAL | TRUE | | | transient_lastDdlTime | 1374526907 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | com.cloudera.impala.hive.serde.ParquetInputFormat | NULL | | OutputFormat: | com.cloudera.impala.hive.serde.ParquetOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | 0 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | +------------------------------+----------------------------------------------------+----------------------+ Returned 27 row(s) in 0.17s
Cancellation: Cannot be cancelled.
HDFS permissions:
The user ID that the impalad daemon runs under, typically the impala user, must have read and execute permissions for all directories that are part of the table. (A table could span multiple different HDFS directories if it is partitioned. The directories could be widely scattered because a partition can reside in an arbitrary HDFS directory based on its LOCATION attribute.)
Related information:
Overview of Impala Tables, CREATE TABLE Statement, SHOW TABLES Statement, SHOW CREATE TABLE Statement