Query metadata tables feature

Apache Iceberg stores extensive metadata for its tables. From Hive and Impala, you can query the metadata tables as you would query a regular table. For example, you can use projections, joins, filters, and so on.

The following Iceberg metadata tables are available from Hive and Impala:

entries
files
manifests
partitions
snapshots

The following additional Iceberg metadata tables are available from Impala:

history
all_data_files
all_entries
all_manifests

For more information about querying Iceberg metadata, see the Apache Iceberg Spark documentation.

Hive and Impala Syntax

To reference a metadata table, use the full name of the table as shown in the following syntax:

<DATABASE_NAME>.<TABLE_NAME>.<METADATA_TABLE_NAME>

Hive and Impala Example:

SELECT * FROM default.table_a.files;

You can select any subset of the columns or all of them using ‘*’. In comparison to regular tables, running a SELECT * from Impala on metadata tables always includes complex-typed columns in the result. The Impala query option, EXPAND_COMPLEX_TYPES only applies to regular tables. However, Hive always includes complex columns irresepctive of whether SELECT queries are run on regular tables or metadata tables.

For Impala queries that have a mix of regular tables and metadata tables, a SELECT * expression where the sources are metadata tables always includes complex types, whereas for SELECT * expressions where the sources are regular tables, complex types are included only if the EXPAND_COMPLEX_TYPES query option is set to 'true'.

In the case of Hive, columns with complex types are always included.

You can also filter the result set using a WHERE clause, use aggregate functions such as MAX or SUM, JOIN metadata tables with other metadata tables or regular tables, and so on.

Example:

SELECT
    s.operation,
    h.is_current_ancestor,
    s.summary
FROM default.ice_table.history h
JOIN default.ice_table.snapshots s
  ON h.snapshot_id = s.snapshot_id
WHERE s.operation = 'append'
ORDER BY made_current_at;

Limitations

The following types are not supported: ARRAY, MAP, BINARY
Nested types are not expanded by default when you execute a SELECT *.
You cannot explore the tables. For example, you cannot list the available metadata tables or describe the tables.