Performance tuning
Impala uses its own C++ implementation to deal with Iceberg tables. This implementation provides significant performance advantages over other engines.
To tune performance, try the following actions:
- Increase parallelism to handle large manifest list
files in Spark.
By default, the number of processors determines the preset value of the
iceberg.worker.num-threads
system property. Try increasing parallelism by setting theiceberg.worker.num-threads
system property to a higher value to speed up query compilation. - Speed up drop table performance, preventing deletion of data files by using the following
table
properties:
Set external.table.purge=false and gc.enabled=false
- Tune the following table properties to improve concurrency on writes and reduce commit
failures:
commit.retry.num-retries
(default is 4),commit.retry.min-wait-ms
(default is 100) -
Read Iceberg V2 tables from Hive using vectorization when heavy table scanning occurs as in SELECT COUNT(*) FROM TBL_ICEBERG_PART.
-
set hive.llap.io.memory.mode=cache;
-
set hive.llap.io.enabled=true;
-
set hive.vectorized.execution.enabled=true;
-
- Use Iceberg from Impala for querying Iceberg tables when latency is a concern.
The massively parallel SQL query engine, backend executors written in C++, and frontend (analyzer, planner) written in Java delivers query results fast.
- Cache manifest files as described in the next topic.