Docker on YARN example: DistributedShell

Learn how to run arbitrary shell command through a DistributedShell YARN application.

  1. Prepare a UNIX-based Docker image. For example, ubuntu:18.04.
  2. In Cloudera Manager, select the YARN service.
  3. Click the Configuration tab.
  4. Search for docker.trusted.registries and find the Trusted Registries for Docker Containers property.
  5. Add library to the list of trusted registries to allow ubuntu:18.04.
  6. Click Save Changes.
  7. Restart the YARN service using Cloudera Manager.
  8. Search for the hadoop-yarn-applications-distributedshell jar in a Cloudera Manager manager host.
  9. Set the YARN_JAR environment variable to the path of the hadoop-yarn-applications-distributedshell jar.

    For example, using the default value:

    YARN_JAR=/opt/cloudera/parcels/CDH/jars/hadoop-yarn-applications-distributedshell-<jar version number>.jar

  10. Choose an arbitrary shell command.

    For example “cat /etc/*-release” which displays OS-related information in UNIX-based systems.

  11. Run the DistributedShell job providing the shell command in the -shell_command option:
    sudo -u hdfs hadoop org.apache.hadoop.yarn.applications.distributedshell.Client \
     -jar $YARN_JAR \
     -shell_command "cat /etc/*-release" \
     -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker \
     -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=library/ubuntu:18.04
    
  12. Check the output of the command using yarn log command line tool:
    sudo -u yarn yarn logs -applicationId <id of the DistributedShell application> -log_files stdout
    The output should look like the following in case of the ubuntu image:
    DISTRIB_ID=Ubuntu
    DISTRIB_RELEASE=18.04
    DISTRIB_CODENAME=bionic
    DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS"
    NAME="Ubuntu"
    VERSION="18.04.3 LTS (Bionic Beaver)"
    ...