Docker on YARN example: DistributedShell
Learn how to run arbitrary shell command through a DistributedShell YARN application.
- Prepare a UNIX-based Docker image. For example, ubuntu:18.04.
- In Cloudera Manager, select the YARN service.
- Click the Configuration tab.
-
Search for
docker.trusted.registries
and find the Trusted Registries for Docker Containers property. -
Add
library
to the list of trusted registries to allow ubuntu:18.04. - Click Save Changes.
- Restart the YARN service using Cloudera Manager.
-
Search for the
hadoop-yarn-applications-distributedshell
jar in a Cloudera Manager manager host. -
Set the
YARN_JAR
environment variable to the path of thehadoop-yarn-applications-distributedshell
jar.For example, using the default value:
YARN_JAR=/opt/cloudera/parcels/CDH/jars/hadoop-yarn-applications-distributedshell-<jar version number>.jar
-
Choose an arbitrary shell command.
For example “
cat /etc/*-release
” which displays OS-related information in UNIX-based systems. -
Run the DistributedShell job providing the shell command in the
-shell_command
option:sudo -u hdfs hadoop org.apache.hadoop.yarn.applications.distributedshell.Client \ -jar $YARN_JAR \ -shell_command "cat /etc/*-release" \ -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker \ -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=library/ubuntu:18.04
-
Check the output of the command using yarn log command line tool:
sudo -u yarn yarn logs -applicationId <id of the DistributedShell application> -log_files stdout
The output should look like the following in case of the ubuntu image:DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS" NAME="Ubuntu" VERSION="18.04.3 LTS (Bionic Beaver)" ...