Troubleshooting Linux Container Executor

A list of numeric error codes communicated by the container-executor to the NodeManager that appear in the /var/log/hadoop-yarn NodeManager log.

Table 1. Numeric error codes that are applicable to the container-executor in YARN, but are used by the LinuxContainerExecutor only.

Numeric Code

Name

Description

1

INVALID_ARGUMENT_NUMBER

  • Incorrect number of arguments provided for the given container-executor command
  • Failure to initialize the container localizer

2

INVALID_USER_NAME

The user passed to the container-executor does not exist.

3

INVALID_COMMAND_PROVIDED

The container-executor does not recognize the command it was asked to run.

5

INVALID_NM_ROOT

The passed NodeManager root does not match the configured NodeManager root (yarn.nodemanager.local-dirs), or does not exist.

6

SETUID_OPER_FAILED

Either could not read the local groups database, or could not set UID or GID

7

UNABLE_TO_EXECUTE_CONTAINER_SCRIPT

The container-executor could not run the container launcher script.

8

UNABLE_TO_SIGNAL_CONTAINER

The container-executor could not signal the container it was passed.

9

INVALID_CONTAINER_PID

The PID passed to the container-executor was negative or 0.

18

OUT_OF_MEMORY

The container-executor couldn't allocate enough memory while reading the container-executor.cfg file, or while getting the paths for the container launcher script or credentials files.

20

INITIALIZE_USER_FAILED

Couldn't get, stat, or secure the per-user NodeManager directory.

21

UNABLE_TO_BUILD_PATH

The container-executor couldn't concatenate two paths, most likely because it ran out of memory.

22

INVALID_CONTAINER_EXEC_PERMISSIONS

The container-executor binary does not have the correct permissions set.

24

INVALID_CONFIG_FILE

The container-executor.cfg file is missing, malformed, or has incorrect permissions.

24

Error Starting NodeManager

NodeManager can fail to start up if the nosuid option is set on the file system where the container-executor binary resides.

nosuid prevents the setuid bit on executable from taking effect. This means that the container-executor binary that has the setuid bit set with "root" privileges, is unable to access the container-executor.cfg configuration file owned by "root" and results in error.

25

SETSID_OPER_FAILED

Could not set the session ID of the forked container.

26

WRITE_PIDFILE_FAILED

Failed to write the value of the PID of the launched container to the PID file of the container.

255

Unknown Error

This error has several possible causes. Some common causes are:

  • User accounts on your cluster have a user ID less than the value specified for the min.user.id property in the container-executor.cfg file. The default value is 1000 which is appropriate on Ubuntu systems, but may not be valid for your operating system. For information about setting min.user.id in the container-executor.cfg file.
  • This error is often caused by previous errors; look earlier in the log file for possible causes.
Table 2. Exit status codes apply to all containers in YARN. These exit status codes are part of the YARN framework and are in addition to application specific exit codes that can be set.

Numeric Code

Name

Description

0

SUCCESS

Container has finished succesfully.

-1000

INVALID

Initial value of the container exit code. A container that does not have a COMPLETED state will always return this status.

-100

ABORTED

Containers killed by the framework, either due to being released by the application or being 'lost' due to node failures, for example.

-101

DISKS_FAILED

Container exited due to local disks issues in the NodeManager node. This occurs when the number of good nodemanager-local-directories or nodemanager-log-directories drops below the health threshold.

-102

PREEMPTED

Containers preempted by the framework. This does not count towards a container failure in most applications.

-103

KILLED_EXCEEDED_VMEM

Container terminated because of exceeding allocated virtual memory limit.

-104

KILLED_EXCEEDED_PMEM

Container terminated because of exceeding allocated physical memory limit.

-105

KILLED_BY_APPMASTER

Container was terminated on request of the application master.

-106

KILLED_BY_RESOURCEMANAGER

Container was terminated by the resource manager.

-107

KILLED_AFTER_APP_COMPLETION

Container was terminated after the application finished.