Performance tuning

Impala uses its own C++ implementation to deal with Iceberg tables. This implementation provides significant performance advantages over other engines.

To tune performance, try the following actions:

Increase parallelism to handle large manifest list files in Spark.
By default, the number of processors determines the preset value of the iceberg.worker.num-threads system property. Try increasing parallelism by setting the iceberg.worker.num-threads system property to a higher value to speed up query compilation.
Speed up drop table performance, preventing deletion of data files by using the following table properties:
```
Set external.table.purge=false and gc.enabled=false
```
Tune the following table properties to improve concurrency on writes and reduce commit failures: commit.retry.num-retries (default is 4),commit.retry.min-wait-ms (default is 100)
Read Iceberg V2 tables from Hive using vectorization when heavy table scanning occurs as in SELECT COUNT(*) FROM TBL_ICEBERG_PART.
- set hive.llap.io.memory.mode=cache;
- set hive.llap.io.enabled=true;
- set hive.vectorized.execution.enabled=true;
Use Iceberg from Impala for querying Iceberg tables when latency is a concern.
The massively parallel SQL query engine, backend executors written in C++, and frontend (analyzer, planner) written in Java delivers query results fast.
Cache manifest files as described in the next topic.