Configuring query vectorization
You can manage query vectorization by setting HiveServer properties in Cloudera Manager. The names of each property and its description helps set up vectorization.
Vectorization properties
- hive.vectorized.groupby.checkinterval
- In vectorized group-by, the number of row entries added to the hash table before re-checking average variable size for memory usage estimation.
- hive.vectorized.groupby.flush.percent
- Ratio between 0.0 and 1.0 of entries in the vectorized group-by aggregation hash that is flushed when the memory threshold is exceeded.
- hive.vectorized.execution.enabled
- Enable optimization that vectorizes query execution by streamlining operations by processing a block of 1024 rows at a time.
- hive.vectorized.execution.reduce.enabled
- Whether to vectorize the reduce side of query execution.
- hive.vectorized.use.vectorized.input.format
- If enabled, Hive uses the native vectorized input format for vectorized query execution when it is available.
- hive.vectorized.use.checked.expressions
- To enhance performance, vectorized expressions operate using wide data types like long and double. When wide data types are used, numeric overflows can occur during expression evaluation in a different manner for vectorized expressions than they do for non-vectorized expressions. Consequently, different query results can be returned for vectorized expressions compared to results returned for non-vectorized expressions. When this configuration is enabled, Hive uses vectorized expressions that handle numeric overflows in the same way as non-vectorized expressions are handled.
- hive.vectorized.adaptor.usage.mode
- Vectorized Adaptor Usage Mode specifies the extent to which the vectorization engine tries to vectorize UDFs that do not have native vectorized versions available. Selecting the "none" option specifies that only queries using native vectorized UDFs are vectorized. Selecting the "chosen" option specifies that Hive choses to vectorize a subset of the UDFs based on performance benefits using the Vectorized Adaptor. Selecting the "all" option specifies that the Vectorized Adaptor be used for all UDFs even when native vectorized versions are not available.
- hive.vectorized.use.vector.serde.deserialize
- If enabled, Hive uses built-in vector SerDes to process text and sequencefile tables for vectorized query execution.
- hive.vectorized.input.format.excludes
- Specifies a list of file input format classnames to exclude from vectorized query execution using the vectorized input format. Note that vectorized execution can still occur for an excluded input format based on whether row SerDes or vector SerDes are enabled.