Impala uses its own C++ implementation to deal with Iceberg tables. This implementation provides significant performance advantages over other engines.
To tune performance, try the following actions:
- Increase parallelism to handle large manifest list
files in Spark.
By default, the number of processors determines the preset value of the
iceberg.worker.num-threadssystem property. Try increasing parallelism by setting the
iceberg.worker.num-threadssystem property to a higher value to speed up query compilation.
- Speed up drop table performance, preventing deletion of data files by using the following
Set external.table.purge=false and gc.enabled=false
- Tune the following table properties to improve concurrency on writes and reduce commit
commit.retry.num-retries(default is 4),
commit.retry.min-wait-ms(default is 100)
Read Iceberg V2 tables from Hive using vectorization when heavy table scanning occurs as in SELECT COUNT(*) FROM TBL_ICEBERG_PART.
- Use Iceberg from Impala for querying Iceberg tables when latency is a concern.
The massively parallel SQL query engine, backend executors written in C++, and frontend (analyzer, planner) written in Java delivers query results fast.
- Cache manifest files as described in the next topic.