Impala geospatial query acceleration
Learn about how Cloudera Data Warehouse 2025.0.21.0 uses native geospatial query acceleration to improve query performance by reducing Java Virtual Machine (JVM) overhead and enabling storage-level filtering.
Geospatial query acceleration addresses performance bottlenecks such as Java Native Interface (JNI) overhead and frequent data serialization. By using native code, specific functions can access binary headers directly without needing to deserialize the entire geometry.
This acceleration is specifically designed to optimize selective queries that filter data by using separate latitude and longitude columns. Because this predicate relies on separate numeric columns, Impala can push the filters down to the Parquet or Iceberg storage level. This allows the system to skip irrelevant data before it reaches the processing engine.
To improve efficiency, several geospatial functions now use native C++ code. These functions access the binary format header directly without needing full geometry deserialization. The native implementations include the following geospatial functions:
ST_MaxXfunctionST_MinXfunctionST_MaxYfunctionST_MinYfunctionST_EnvIntersectsfunctionSt_Pointfunction
Impala uses these native versions automatically after you set the geospatial_library startup flag to HIVE_ESRI.
