Query metadata tables feature

Apache Iceberg stores extensive metadata for its tables. From Hive and Impala, you can query the metadata tables as you would query a regular table. For example, you can use projections, joins, filters, and so on.

The following Iceberg metadata tables are available from Hive and Impala:
  • entries
  • files
  • manifests
  • partitions
  • snapshots
The following additional Iceberg metadata tables are available from Impala:
  • history
  • all_data_files
  • all_entries
  • all_manifests

For more information about querying Iceberg metadata, see the Apache Iceberg Spark documentation.

Hive and Impala Syntax

To reference a metadata table, use the full name of the table as shown in the following syntax:
<DATABASE_NAME>.<TABLE_NAME>.<METADATA_TABLE_NAME>
Hive and Impala Example:
SELECT * FROM default.table_a.files;

You can select any subset of the columns or all of them using ‘*’. In comparison to regular tables, running a SELECT * from Impala on metadata tables always includes complex-typed columns in the result. The Impala query option, EXPAND_COMPLEX_TYPES only applies to regular tables. However, Hive always includes complex columns irresepctive of whether SELECT queries are run on regular tables or metadata tables.

For Impala queries that have a mix of regular tables and metadata tables, a SELECT * expression where the sources are metadata tables always includes complex types, whereas for SELECT * expressions where the sources are regular tables, complex types are included only if the EXPAND_COMPLEX_TYPES query option is set to 'true'.

In the case of Hive, columns with complex types are always included.

You can also filter the result set using a WHERE clause, use aggregate functions such as MAX or SUM, JOIN metadata tables with other metadata tables or regular tables, and so on.

Example:
SELECT
    s.operation,
    h.is_current_ancestor,
    s.summary
FROM default.ice_table.history h
JOIN default.ice_table.snapshots s
  ON h.snapshot_id = s.snapshot_id
WHERE s.operation = 'append'
ORDER BY made_current_at;

Limitations

  • The following types are not supported: ARRAY, MAP, BINARY
  • Nested types are not expanded by default when you execute a SELECT *.
  • You cannot explore the tables. For example, you cannot list the available metadata tables or describe the tables.