Impala sender timed out waiting error

Resolve Impala query failures during high system load by increasing the data stream sender timeout startup flag.

Condition

Impala jobs fail under high system load with the following error:

Sender <IP Address> timed out waiting for receiver fragment

This happens because the Impala coordinator takes longer than the default 120-second timeout to dispatch query fragments to the executor nodes. The sender times out while waiting for a response from the receiver. You can adjust the timeout setting to mitigate this delay and prevent query failures.

Impala jobs stop and display the following error message, which indicates a timeout while the sender waits for the receiver fragment instance:

Query Status: Sender <IP Address>:53006 timed out waiting for receiver fragment instance: ed44b43eb5b13652:9bccaee300000041, dest node: 3

The ImpalaD logs also displays a related timeout error:

I0719 04:48:46.672668 59007 rpcz_store.cc:269] Call impala.DataStreamService.TransmitData from <IP Address>:53006 (request call id 948988) took 120721ms. Trace:

Cause

When an Impala query starts, the coordinator dispatches query fragments to the Impala daemons that execute the query. The senders and receivers for the data exchange must connect during fragment startup. Under heavy load, the coordinator's dispatch of fragments can be delayed. The factors that contribute to high system load or delays include issues such as Java Virtual Machine (JVM) pauses, Key Distribution Center (KDC) slowness (when Kerberos (KRB5) authentication is used), system resource limitations, or network latency or disruptions.

If the time between the first batch being sent and the receiver initializing is longer than the --datastream_sender_timeout_ms threshold (default is 120,000 milliseconds or 120 seconds), the query fails with a sender timed out waiting for receiver fragment instance error.

Remedy

Review the ImpalaD logs to confirm the timeout duration exceeds the default limit of 120,000 milliseconds (120 seconds).

Look for log entries similar to the following:

Sender <IP>:53006 timed out waiting for receiver fragment instance: <instance ID>, dest node: 3
Call impala.DataStreamService.TransmitData from <masked IP>:53006 (request call id <call ID>) took 120721ms.

Identify factors that contribute to high system load or delays.

This includes issues such as JVM pauses, KDC slowness (if KRB5 authentication is used), system resource limitations, or network latency or disruptions.
Modify the --datastream_sender_timeout_ms Impala startup flag to a higher value.

For example, setting --datastream_sender_timeout_ms to 240000 (240 seconds) provides an extended period, which prevents sender-receiver timeouts.
Monitor the performance of Impala query execution after adjusting the timeout value to ensure the modification resolves the timeout error. Contact Cloudera Support or troubleshoot the cause of cluster delays if the timeout error message occurs again.