Enable Vectorized Query Execution
Vectorized query execution is a feature that greatly reduces the CPU usage for typical query operations such as scans, filters, aggregates, and joins. Vectorization is also implemented for the ORC format. Spark also uses Whole Stage Codegen and this vectorization (for Parquet) since Spark 2.0 (released on July 26, 2016).
Use the following steps to implement the new ORC format and enable vectorization for ORC files with SparkSQL.