CSV header and schema match
Column names of csv headers must match the schema.
Type of change: Configuration/Spark core changes
Spark 1.6 - 2.3
Column names of headers in CSV files are not checked against the against the schema of CSV data.
Spark 2.4
If columns in the CSV header and the
schema have different ordering, the following exception is
thrown:java.lang.IllegalArgumentException: CSV file header does not contain
the expected fields.
Action Required
Make the schema and header order match or set enforceSchema
to false to
prevent getting an exception. For example, read a file or directory of files in CSV format into
Spark DataFrame as follows: df3 = spark.read.option("delimiter", ";").option("header",
True).option("enforeSchema", False).csv(path)
The default "header" option is true and enforceSchema
is False.
If enforceSchema
is set to true, the specified or inferred schema will be
forcibly applied to datasource files, and headers in CSV files are ignored. If
enforceSchema
is set to false, the schema is validated against all headers in
CSV files when the header option is set to true. Field names in the schema and column names in
CSV headers are checked by their positions taking into account
spark.sql.caseSensitive
. Although the default value is true,you should disable
the enforceSchema option to prevent incorrect results.