Query metadata tables feature

Apache Iceberg stores extensive metadata for its tables. From Hive and Impala, you can query the metadata tables as you would query a regular table. For example, you can use projections, joins, filters, and so on.

Iceberg metadata tables include information that is useful for efficient table maintenance (about snapshots, manifests, data, delete files, etc.) as well as statistics that help query engines plan and execute queries more efficiently (value count, min-max values, number of NULLs, etc.).

For more information about the Apache Iceberg Iceberg metadata table types, see the Apache Iceberg MetadataTableType enumeration.

For more information about querying Iceberg metadata, see the Apache Iceberg Spark documentation.

The following sections describe how you can interact with and query Iceberg metadata tables:

List metadata tables

The SHOW METADATA TABLES statement lists all metadata tables belonging to an Iceberg table. You can also filter the tables according to a specific pattern.

Impala Syntax:
SHOW METADATA TABLES IN [database_name.]table_name [[LIKE] "pattern"];
Impala Example:
SHOW METADATA TABLES IN default.ice_table;

Output:
+----------------------+
| name                 |
+----------------------+
| all_data_files       |
| all_delete_files     |
| all_entries          |
| all_files            |
| all_manifests        |
| data_files           |
| delete_files         |
| entries              |
| files                |
| history              |
| manifests            |
| metadata_log_entries |
| partitions           |
| position_deletes     |
| refs                 |
| snapshots            |
+----------------------+

Query metadata tables

You can use the regular SELECT statement from Hive or Impala to query Iceberg metadata tables. To reference a metadata table, use the fully qualified path as shown in the syntax.

Hive or Impala Syntax:
SELECT … FROM database_name.table_name.metadata_table_name;
Hive or Impala Example:
SELECT * FROM default.ice_table.files;

You can select any subset of the columns or all of them using ‘*’. In comparison to regular tables, running a SELECT * from Impala on metadata tables always includes complex-typed columns in the result. The Impala query option, EXPAND_COMPLEX_TYPES only applies to regular tables. However, Hive always includes complex columns irresepctive of whether SELECT queries are run on regular tables or metadata tables.

For Impala queries that have a mix of regular tables and metadata tables, a SELECT * expression where the sources are metadata tables always includes complex types, whereas for SELECT * expressions where the sources are regular tables, complex types are included only if the EXPAND_COMPLEX_TYPES query option is set to 'true'.

In the case of Hive, columns with complex types are always included.

You can also filter the result set using a WHERE clause, use aggregate functions such as MAX or SUM, JOIN metadata tables with other metadata tables or regular tables, and so on.

Example:
SELECT
    s.operation,
    h.is_current_ancestor,
    s.summary
FROM default.ice_table.history h
JOIN default.ice_table.snapshots s
  ON h.snapshot_id = s.snapshot_id
WHERE s.operation = 'append'
ORDER BY made_current_at;

Limitations

  • Impala does not support the DATE and BINARY data types. NULL is returned instead of their actual values.
  • Impala does not support unnesting collections from metadata tables.

DESCRIBE metadata tables

Like regular tables, Iceberg metadata tables have schemas that can be explored using the DESCRIBE statement. ​​The DESCRIBE statement displays metadata about a table, such as the column names and their data types.

To reference the metadata table, use the fully qualified path as shown in the syntax.

Hive or Impala Syntax:
DESCRIBE database_name.table_name.metadata_table_name;
Hive or Impala Example:
DESCRIBE default.ice_table.history;

Output:
+---------------------+-----------+---------+----------+
| name                | type      | comment | nullable |
+---------------------+-----------+---------+----------+
| made_current_at     | timestamp |         | true     |
| snapshot_id         | bigint    |         | true     |
| parent_id           | bigint    |         | true     |
| is_current_ancestor | boolean   |         | true     |
+---------------------+-----------+---------+----------+