Time travel feature
From Hive or Impala, you can run point in time queries for auditing and regulatory workflows on Iceberg tables. Time travel queries can be time-based or based on a snapshot ID.
Iceberg generates a snapshot when you create, or modify, a table. A snapshot stores the state of a table. You can specify which snapshot you want to read, and then view the data at that timestamp. In Hive, you can use projections, joins, and filters in time travel queries. In Impala, you can add expressions to the timestamps, as shown in the examples. In Hive and Impala, you can expire snapshots.
Snapshot storage is incremental and dependent on the frequency and scale of updates. By default, Hive and Impala use the latest snapshot. You can query an earlier snapshot of Iceberg tables to get historical information. Hive and Impala use the latest schema to query an earlier table snapshot even if it has a different schema.
Hive or Impala syntax
SELECT * FROM table_name FOR SYSTEM_TIME AS OF 'time_stamp'
SELECT * FROM table_name FOR SYSTEM_VERSION AS OF snapshot_id
time_stamp
The state of the Iceberg table at the time specified by the UTC timestamp.
snapshot_id
The ID of the Iceberg table snapshot from the history output.
Hive examples
SELECT * FROM t FOR SYSTEM_TIME AS OF '2021-08-09 10:35:57' LIMIT 100;
SELECT * FROM t FOR SYSTEM_VERSION AS OF 3088747670581784990 limit 100;
Impala examples
SELECT * from ice_11 FOR SYSTEM_TIME AS OF now() - interval 30 minutes;
SELECT * from ice_11 FOR SYSTEM_TIME AS OF '2022-11-04 13:50:57';
SELECT * FROM t FOR SYSTEM_VERSION AS OF 4319441858259506031 limit 5;