Query metadata tables feature
Apache Iceberg stores extensive metadata for its tables. From Hive and Impala, you can query the metadata tables as you would query a regular table. For example, you can use projections, joins, filters, and so on.
For more information about the Apache Iceberg Iceberg metadata table types, see the Apache Iceberg MetadataTableType enumeration.
For more information about querying Iceberg metadata, see the Apache Iceberg Spark documentation.
The following sections describe how you can interact with and query Iceberg metadata tables:
List metadata tables
The SHOW METADATA TABLES
statement lists all metadata tables belonging to
an Iceberg table. You can also filter the tables according to a specific pattern.
SHOW METADATA TABLES IN [database_name.]table_name [[LIKE] "pattern"];
SHOW METADATA TABLES IN default.ice_table;
Output:
+----------------------+
| name |
+----------------------+
| all_data_files |
| all_delete_files |
| all_entries |
| all_files |
| all_manifests |
| data_files |
| delete_files |
| entries |
| files |
| history |
| manifests |
| metadata_log_entries |
| partitions |
| position_deletes |
| refs |
| snapshots |
+----------------------+
Query metadata tables
You can use the regular SELECT statement from Hive or Impala to query Iceberg metadata tables. To reference a metadata table, use the fully qualified path as shown in the syntax.
SELECT … FROM database_name.table_name.metadata_table_name;
SELECT * FROM default.ice_table.files;
You can select any subset of the columns or all of them using ‘*’. In comparison to regular
tables, running a SELECT *
from Impala on metadata tables always includes
complex-typed columns in the result. The Impala query option,
EXPAND_COMPLEX_TYPES
only applies to regular tables. However, Hive always
includes complex columns irresepctive of whether SELECT
queries are run on
regular tables or metadata tables.
For Impala queries that have a mix of regular tables and metadata tables, a
SELECT * expression where the sources are metadata tables always includes complex types,
whereas for SELECT * expressions where the sources are regular tables, complex types are
included only if the EXPAND_COMPLEX_TYPES
query option is set to
'true'.
In the case of Hive, columns with complex types are always included.
You can also filter the result set using a WHERE
clause, use aggregate
functions such as MAX or SUM, JOIN metadata tables with other metadata tables or regular
tables, and so on.
SELECT
s.operation,
h.is_current_ancestor,
s.summary
FROM default.ice_table.history h
JOIN default.ice_table.snapshots s
ON h.snapshot_id = s.snapshot_id
WHERE s.operation = 'append'
ORDER BY made_current_at;
Limitations
- Impala does not support the DATE and BINARY data types. NULL is returned instead of their actual values.
- Impala does not support unnesting collections from metadata tables.
DESCRIBE metadata tables
Like regular tables, Iceberg metadata tables have schemas that can be explored using the
DESCRIBE
statement. The DESCRIBE
statement displays
metadata about a table, such as the column names and their data types.
To reference the metadata table, use the fully qualified path as shown in the syntax.
DESCRIBE database_name.table_name.metadata_table_name;
DESCRIBE default.ice_table.history;
Output:
+---------------------+-----------+---------+----------+
| name | type | comment | nullable |
+---------------------+-----------+---------+----------+
| made_current_at | timestamp | | true |
| snapshot_id | bigint | | true |
| parent_id | bigint | | true |
| is_current_ancestor | boolean | | true |
+---------------------+-----------+---------+----------+