java.io.EOFException when reading DAG or Hive proto data files
Condition
You might encounter a java.io.EOFException error when
reading DAG proto data files during processing with the Hue Query processor or
observability tools. These errors typically occur intermittently or after an abrupt
ApplicationMaster (AM) termination or Out-of-Memory (OOM) event.
When this happens, proto files used in data pipelines fail to load correctly, leading to job failures or pipeline interruptions.
Example Error Message
Caused by: java.io.EOFException
at java.base/java.io.DataInputStream.readFully(DataInputStream.java:202)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2505)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2637)
at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
at org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat$1.next(ProtobufMessageInputFormat.java:124)
at org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat$1.next(ProtobufMessageInputFormat.java:84)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
... 24 more
Cause
- DAG proto files become empty or partially written due to an abrupt AM termination or OOM event.
- These incomplete files cannot be identified without attempting to read them.
- Reading these partially written files results in an
EOFExceptionerror.
