Enabling vectorized query execution
Vectorized query execution is a feature that greatly reduces the CPU usage for typical query operations such as scans, filters, aggregates, and joins. Vectorization is also implemented for the ORC format. Spark also uses Whole Stage Codegen and this vectorization (for Parquet) since Spark 2.0.
Use the following steps to implement the new ORC format and enable vectorization for ORC files with SparkSQL.