Time travel feature

From Hive or Impala, you can run point in time queries for auditing and regulatory workflows on Iceberg tables. Time travel queries can be time-based or based on a snapshot ID.

Iceberg uses snapshots to access the complete set of data files in the table. A snapshot stores the state of a table. You can specify which snapshot you want to read and view the data at that timestamp. In Hive, you can expire snapshots and use projections, joins, and filters in time travel queries. In Impala, you can add expressions to the timestamps, as shown in the examples.

Snapshot storage is incremental and dependent on the frequency and scale of updates. By default, Hive and Impala use the latest snapshot. You can query an earlier snapshot of Iceberg tables to get historical information. Hive and Impala use the latest schema to query an earlier table snapshot even if it has a different schema.

Hive or Impala syntax

SELECT * FROM table_name FOR SYSTEM_TIME AS OF 'time_stamp'

SELECT * FROM table_name FOR SYSTEM_VERSION AS OF snapshot_id
  • time_stamp

    The state of the Iceberg table at the time specified by the UTC timestamp.

  • snapshot_id

    The ID of the Iceberg table snapshot from the history output.

Hive examples

SELECT * FROM t FOR SYSTEM_TIME AS OF '2021-08-09 10:35:57' LIMIT 100;

SELECT * FROM t FOR SYSTEM_VERSION AS OF 3088747670581784990 limit 100;

Impala examples

SELECT * FROM t FOR SYSTEM_TIME AS OF now() - interval 7 days;

SELECT * FROM customer_demo_iceberg FOR SYSTEM_TIME AS OF '2021-12-09 05:39:18.689000000' limit 100;

SELECT * FROM t FOR SYSTEM_VERSION AS OF 4319441858259506031 limit 5;