What's New in Apache Iceberg
Learn about the new features of Iceberg in Cloudera Runtime 7.3.2, its service packs and cumulative hotfixes.
Cloudera Runtime 7.3.2
Cloudera Runtime 7.3.2 introduces new features of Iceberg and includes all service packs and cumulative hotfixes from 7.3.1.100 through 7.3.1.706. For a comprehensive record of all updates in Cloudera Runtime 7.3.1.x, see New Features.
- Cloudera Lakehouse Optimizer for Iceberg table optimization
- In Cloudera Runtime 7.3.2 and higher versions, you can use Cloudera Lakehouse Optimizer service in Cloudera Manager to automate the Iceberg table maintenance tasks.
- Integrate Iceberg scan metrics into Impala query profiles
- Iceberg scan metrics are now integrated into the
Frontendsection of Impala query profiles, providing deeper insight into query planning performance for Iceberg tables.The query profile now displays scan metrics from Iceberg's
planFiles()API, including total planning time, counts of data/delete files and manifests, and the number of skipped files.Metrics are displayed on a per-table basis. If a query scans multiple Iceberg tables, a separate metrics section will appear in the profile for each one.
Apache Jira: IMPALA-13628
- Delete orphan files for Iceberg tables
- You can now use the following syntax to remove orphan files for Iceberg
tables:
-- Remove orphan files older than '2022-01-04 10:00:00'. ALTER TABLE ice_tbl EXECUTE remove_orphan_files('2022-01-04 10:00:00'); -- Remove orphan files older than 5 days from now. ALTER TABLE ice_tbl EXECUTE remove_orphan_files(now() - interval 5 days);This feature removes all files from a table’s data directory that are not linked from metadata files and that are older than the value of
older_thanparameter. Deleting orphan files from time to time is recommended to keep the size of a table’s data directory under control.Apache Jira: IMPALA-14492
- Allow forced predicate pushdown to Iceberg
- Since IMPALA-11591, Impala has optimized query planning by avoiding predicate pushdown
to Iceberg unless it is strictly necessary. While this default behavior makes planning
faster, it can miss opportunities to prune files early based on Iceberg's file-level
statistics.
A new table property,
impala.iceberg.push_down_hintis introduced, which allows you to force predicate pushdown for specific columns. The property accepts a comma-separated list of column names, for example,'col_a, col_b'.If a query contains a predicate on any column listed in this property, Impala will push that predicate down to Iceberg for evaluation during the planning phase.
Apache Jira: IMPALA-14123
UPDATEoperations now skip rows that already have the desired value- The
UPDATEstatement for Iceberg and Kudu tables is optimized to reduce unnecessary writes.Previously, an
UPDATEoperation would modify all rows matching theWHEREclause, even if those rows already contained the new value. For Iceberg tables, this resulted in writing unnecessary new data and delete records.With this enhancement, Impala automatically adds an extra predicate to the
UPDATEstatement to exclude rows that already match the target value.Apache Jira: IMPALA-12588
