Referencing a corrupt JSON/CSV record
In Spark 2.4, queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column.
Type of change: Syntactic/Spark core
Spark 1.6
A query can reference a _corrupt_record column in raw JSON/CSV files.
Spark 2.4
An exception is thrown if the query is referencing _corrupt_record column in these
files. For example, the following query is not allowed:
spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()
Action Required
Cache or save the parsed results, and then resend the query.
val df = spark.read.schema(schema).json(file).cache()
df.filter($"_corrupt_record".isNotNull).count()