Managing Data Operating System
Also available as:
PDF

Prerequisites for Running Containerized Spark Jobs

To containerize Spark on YARN, you must ensure that the YARN cluster is enabled for Docker.

During application submission, ensure that you specify the following parameters:
  • YARN_CONTAINER_RUNTIME_TYPE=docker
  • YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=<docker_image>
  • YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=host
  • YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=<any volume mounts needed by the spark application>