Time travel feature

From Hive or Impala, you can run point in time queries for auditing and regulatory workflows on Iceberg tables. Time travel queries can be time-based or based on a snapshot ID.

Iceberg generates a snapshot when you create, or modify, a table. A snapshot stores the state of a table. You can specify which snapshot you want to read, and then view the data at that timestamp. In Hive, you can use projections, joins, and filters in time travel queries. You can add expressions to the timestamps, as shown in the examples. You can expire snapshots.

Snapshot storage is incremental and dependent on the frequency and scale of updates. By default, Hive and Impala use the latest snapshot. You can query an earlier snapshot of Iceberg tables to get historical information. Hive and Impala use the latest schema to query an earlier table snapshot even if it has a different schema.

Hive or Impala syntax

SELECT * FROM table_name FOR SYSTEM_TIME AS OF 'time_stamp' [expression]

SELECT * FROM table_name FOR SYSTEM_VERSION AS OF snapshot_id [expression]
  • time_stamp

    The state of the Iceberg table at the time specified by the UTC timestamp.

  • snapshot_id

    The ID of the Iceberg table snapshot from the history output.

Hive or Impala examples

SELECT * FROM t FOR SYSTEM_TIME AS OF '2021-08-09 10:35:57' LIMIT 100;

SELECT * FROM t FOR SYSTEM_VERSION AS OF 3088747670581784990 limit 100;

SELECT * from ice_11 FOR SYSTEM_TIME AS OF now() - interval 30 minutes;

SELECT * FROM customers FOR SYSTEM_TIME AS OF timestampAfterSnapshot(table, 0) fv, 
customers FOR SYSTEM_TIME AS OF timestampAfterSnapshot(table, 1) sv 
WHERE fv.first_name=sv.first_name;