Query vectorization

You can use vectorization to improve instruction pipelines for certain data and queries and to optimize how Hive uses the cache. Vectorization processes batches of primitive types on the entire column rather than one row at a time.

Unsupported functionality on vectorized data

Some functionality is not supported on vectorized data:

  • DDL queries
  • DML queries other than single table, read-only queries
  • Formats other than Optimized Row Columnar (ORC)

Supported functionality on vectorized data

The following functionality is supported on vectorized data:

  • Single table, read-only queries

    Selecting, filtering, and grouping data is supported.

  • Partitioned tables
  • The following expressions:
    • Comparison: >, >=, <, <=, =, !=
    • Arithmetic plus, minus, multiply, divide, and modulo
    • Logical AND and OR
    • Aggregates sum, avg, count, min, and max

Supported data types

You can query data of the following types using vectorized queries:

  • tinyint
  • smallint
  • int
  • bigint
  • date
  • boolean
  • float
  • double
  • timestamp
  • stringchar
  • varchar
  • binary