CSV bad record handling
How Spark treats malformations in CSV files has changed.
Type of change: Property/Spark SQL changes
Spark 1.6 - 2.3
CSV rows are considered malformed if at least one column value in the row is malformed. The CSV parser drops malformed rows in the DROPMALFORMED mode or outputs an error in the FAILFAST mode.
Spark 2.4
A CSV row is considered malformed only when it contains malformed column values requested from CSV datasource, other values are ignored.
Action Required
To restore the Spark 1.6 behavior, set
spark.sql.csv.parser.columnPruning.enabled
to false.