Time travel feature

From Impala, you can run point in time queries for auditing and regulatory workflows on Iceberg tables. Time travel queries can be time-based or based on a snapshot ID.

Iceberg generates a snapshot when you create, or modify, a table. A snapshot stores the state of a table. You can specify which snapshot you want to read, and then view the data at that timestamp.

Snapshot storage is incremental and dependent on the frequency and scale of updates. By default, Impala uses the latest snapshot. You can query an earlier snapshot of Iceberg tables to get historical information. Hive and Impala use the latest schema to query an earlier table snapshot even if it has a different schema.

Impala syntax

SELECT * FROM table_name FOR SYSTEM_TIME AS OF 'time_stamp' [expression]

SELECT * FROM table_name FOR SYSTEM_VERSION AS OF snapshot_id [expression]
  • time_stamp

    The state of the Iceberg table at the time specified by the UTC timestamp.

  • snapshot_id

    The ID of the Iceberg table snapshot from the history output.

Impala examples

SELECT * FROM t FOR SYSTEM_TIME AS OF '2021-08-09 10:35:57' LIMIT 100;

SELECT * FROM t FOR SYSTEM_VERSION AS OF 3088747670581784990 limit 100;

SELECT * from ice_11 FOR SYSTEM_TIME AS OF now() - interval 30 minutes;

SELECT * FROM customers FOR SYSTEM_TIME AS OF timestampAfterSnapshot(table, 0) fv, 
customers FOR SYSTEM_TIME AS OF timestampAfterSnapshot(table, 1) sv 
WHERE fv.first_name=sv.first_name;