Docker on YARN example: MapReduce job

Learn how to run the Pi MapReduce example job in a Docker image. In a clean cluster where no fine-grained permissions are set, issues can occur.

  1. Prepare a UNIX-based Docker image with Java, preferably JDK8. For example, ibmjava:8.
  2. In Cloudera Manager, select the YARN service.
  3. Click the Configuration tab.
  4. Search for docker.trusted.registries and find the Trusted Registries for Docker Containers property.
  5. Add library to the list of trusted registries to allow the ibmjava.
  6. Search for map.output.
  7. Find the Compression Codec of MapReduce Map Output property.
  8. Change its value to org.apache.hadoop.io.compress.DefaultCodec.
  9. Click Save Changes.
  10. Restart the YARN service using Cloudera Manager.
  11. Search for the hadoop-mapreduce-example jar in a Cloudera Manager manager host.
  12. Set the YARN_JAR environment variable to the path of the hadoop-mapreduce-example jar.
    For example, using the default value:

    YARN_JAR=/opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-examples-<jar version number>.jar

  13. Start the Pi MapReduce job with the following command:
    yarn jar $YARN_JAR pi \ -Dmapreduce.map.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=library/ibmjava:8" \
    -Dmapreduce.reduce.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=library/ibmjava:8" \
      1 40000