Accessing ORC files from Spark
Use the following steps to access ORC files from Apache Spark.
To start using ORC, you can define a SparkSession instance:
import org.apache.spark.sql.SparkSession val spark = SparkSession.builder().getOrCreate() import spark.implicits._
The following example uses data structures to demonstrate working with complex types. The Person struct data type has a name, an age, and a sequence of contacts, which are themselves defined by names and phone numbers.
In this example the physical table scan loads only columns name and age at runtime, without reading the contacts column from the file system. This improves read performance.
You can also use Spark DataFrameReader
and
DataFrameWriter
methods to access ORC files.