Impala geospatial query acceleration

Impala geospatial query acceleration uses native C++ implementations to improve query performance by reducing Java Virtual Machine (JVM) overhead and enabling storage-level filtering for Parquet and Iceberg tables.

In Cloudera Data Warehouse 2025.0.21.0 and higher versions, geospatial query acceleration addresses performance bottlenecks such as Java Native Interface (JNI) overhead and frequent data serialization. By using native code, specific functions can access binary headers directly without full geometry deserialization to improve efficiency.

This acceleration is specifically designed to optimize selective queries that filter data by using separate latitude and longitude columns. Because these filters rely on separate numeric columns, Impala can push them down to the Parquet or Iceberg storage level. This allows the system to skip irrelevant data before it reaches the processing engine.

The native implementations include the following geospatial functions:

  • ST_MaxX
  • ST_MinX
  • ST_MaxY
  • ST_MinY
  • ST_EnvIntersects
  • St_Point

Impala uses these native versions automatically after you set the geospatial_library startup flag to HIVE_ESRI.