Impala sender timed out waiting error
Learn about resolving Impala query failures by increasing the data stream sender timeout startup flag when high system load or delays occur.
Condition
Impala jobs fail under high system load with the error: Sender <IP Address> timed out waiting for receiver fragment occurs. This happens because the Impala coordinator takes longer than the default 120-second timeout to dispatch query fragments to the executor nodes. The sender times out while waiting for a response from the receiver. You can adjust the timeout setting to mitigate this delay and prevent query failures.
Impala jobs stop and display the following error message, which indicates a timeout while the sender waits for the receiver fragment instance:
Query Status: Sender <IP Address>:53006 timed out waiting for receiver fragment instance: ed44b43eb5b13652:9bccaee300000041, dest node: 3
The ImpalaD logs also show a related timeout error:
I0719 04:48:46.672668 59007 rpcz_store.cc:269] Call impala.DataStreamService.TransmitData from <IP Address>:53006 (request call id 948988) took 120721ms. Trace:
Cause
When an Impala query starts, the coordinator dispatches query fragments to the Impala daemons that execute the query. The senders and receivers for the data exchange must connect during fragment startup. Under heavy load, the coordinator's dispatch of fragments can be delayed. The factors that contribute to high system load or delays include issues such as Java Virtual Machine (JVM) pauses, Key Distribution Center (KDC) slowness (when Kerberos (KRB5) authentication is used), system resource limitations, or network latency or disruptions.
If the time between the first batch being sent and the receiver initializing is longer than the --datastream_sender_timeout_ms threshold (default is 120,000 milliseconds or 120 seconds), the query fails with a "sender timed out waiting for receiver fragment instance" error.
