Referencing a corrupt JSON/CSV record

In Spark 2.4, queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column.

Type of change: Syntactic/Spark core

Spark 1.6

A query can reference a _corrupt_record column in raw JSON/CSV files.

Spark 2.4

An exception is thrown if the query is referencing _corrupt_record column in these files. For example, the following query is not allowed: spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()

Action Required

Cache or save the parsed results, and then resend the query.

val df = spark.read.schema(schema).json(file).cache()
    df.filter($"_corrupt_record".isNotNull).count()