Query metadata tables feature
Apache Iceberg stores extensive metadata for its tables. From Hive and Impala, you can query the metadata tables as you would query a regular table. For example, you can use projections, joins, filters, and so on.
- entries
- files
- manifests
- partitions
- snapshots
- history
- all_data_files
- all_entries
- all_manifests
For more information about querying Iceberg metadata, see the Apache Iceberg Spark documentation.
Hive and Impala Syntax
<DATABASE_NAME>.<TABLE_NAME>.<METADATA_TABLE_NAME>
SELECT * FROM default.table_a.files;
You can select any subset of the columns or all of them using ‘*’. In comparison to regular
tables, running a SELECT *
from Impala on metadata tables always includes
complex-typed columns in the result. The Impala query option,
EXPAND_COMPLEX_TYPES
only applies to regular tables. However, Hive always
includes complex columns irresepctive of whether SELECT
queries are run on
regular tables or metadata tables.
For Impala queries that have a mix of regular tables and metadata tables, a
SELECT * expression where the sources are metadata tables always includes complex types,
whereas for SELECT * expressions where the sources are regular tables, complex types are
included only if the EXPAND_COMPLEX_TYPES
query option is set to
'true'.
In the case of Hive, columns with complex types are always included.
You can also filter the result set using a WHERE
clause, use aggregate
functions such as MAX or SUM, JOIN metadata tables with other metadata tables or regular
tables, and so on.
SELECT
s.operation,
h.is_current_ancestor,
s.summary
FROM default.ice_table.history h
JOIN default.ice_table.snapshots s
ON h.snapshot_id = s.snapshot_id
WHERE s.operation = 'append'
ORDER BY made_current_at;
Limitations
- The following types are not supported: ARRAY, MAP, BINARY
- Nested types are not expanded by default when you execute a SELECT *.
- You cannot explore the tables. For example, you cannot list the available metadata tables or describe the tables.