Impala sender timed out waiting error
Resolve Impala query failures during high system load by increasing the data stream sender timeout startup flag.
Condition
Sender <IP Address> timed out waiting for receiver fragmentThis happens because the Impala coordinator takes longer than the default 120-second timeout to dispatch query fragments to the executor nodes. The sender times out while waiting for a response from the receiver. You can adjust the timeout setting to mitigate this delay and prevent query failures.
Impala jobs stop and display the following error message, which indicates a timeout while the sender waits for the receiver fragment instance:
Query Status: Sender <IP Address>:53006 timed out waiting for receiver fragment instance: ed44b43eb5b13652:9bccaee300000041, dest node: 3
The ImpalaD logs also displays a related timeout error:
I0719 04:48:46.672668 59007 rpcz_store.cc:269] Call impala.DataStreamService.TransmitData from <IP Address>:53006 (request call id 948988) took 120721ms. Trace:
Cause
When an Impala query starts, the coordinator dispatches query fragments to the Impala daemons that execute the query. The senders and receivers for the data exchange must connect during fragment startup. Under heavy load, the coordinator's dispatch of fragments can be delayed. The factors that contribute to high system load or delays include issues such as Java Virtual Machine (JVM) pauses, Key Distribution Center (KDC) slowness (when Kerberos (KRB5) authentication is used), system resource limitations, or network latency or disruptions.
If the time between the first batch being sent and the receiver initializing is longer than the --datastream_sender_timeout_ms threshold (default is 120,000 milliseconds or 120 seconds), the query fails with a sender timed out waiting for receiver fragment instance error.
