Docker on YARN example: MapReduce job
Learn how to run the Pi MapReduce example job in a Docker image. In a clean cluster where no fine-grained permissions are set, issues can occur.
- Prepare a UNIX-based Docker image with Java, preferably JDK8. For example, ibmjava:8.
- In Cloudera Manager, select the YARN service.
- Click the Configuration tab.
-
Search for
docker.trusted.registries
and find the Trusted Registries for Docker Containers property. -
Add
library
to the list of trusted registries to allow the ibmjava. -
Search for
map.output
. - Find the Compression Codec of MapReduce Map Output property.
-
Change its value to
org.apache.hadoop.io.compress.DefaultCodec
. - Click Save Changes.
- Restart the YARN service using Cloudera Manager.
-
Search for the
hadoop-mapreduce-example
jar in a Cloudera Manager manager host. -
Set the
YARN_JAR
environment variable to the path of thehadoop-mapreduce-example
jar.For example, using the default value:YARN_JAR=/opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-examples-<jar version number>.jar
-
Start the Pi MapReduce job with the following command:
yarn jar $YARN_JAR pi \ -Dmapreduce.map.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=library/ibmjava:8" \ -Dmapreduce.reduce.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=library/ibmjava:8" \ 1 40000