Cloudera Docker Container

The Cloudera Docker image is a single-node deployment of the Cloudera open-source distribution, including CDH and Cloudera Manager. You can use this environment to learn Hadoop, try new ideas, and test and demonstrate your application.

Docker is different from other virtual machines that isolate or simulate access to the host’s hardware, so that entire guest operating systems can run on them. Docker uses a Linux container, which partitions resources of the host operating system; the container has its own view of the filesystem and other resources, but it runs on the same kernel. Docker provides tooling, a packaging format, and infrastructure around Linux containers and related technologies.

Docker is well supported in several recent Linux distributions. For example, on Ubuntu 14.04, you can install Docker using the following command:

sudo apt-get install docker.io

Importing the Cloudera QuickStart Image

You can import the Docker image by pulling it from the Docker Hub:

docker pull cloudera/quickstart:latest

You can also download the image from the Cloudera website. After the file is downloaded and on your host, you can import it into Docker:

tar xzf cloudera-quickstart-vm-*-docker.tar.gz
docker import - cloudera/quickstart:latest < cloudera-quickstart-vm-*-docker/*.tar

Running a Cloudera QuickStart Container

To run a container using the image, you must know the name or hash of the image. If you followed the import instructions above, the name is cloudera/quickstart:latest. The hash is also printed in the terminal when you import, or you can look up the hashes of all imported images with:

docker images

Once you know the name or hash of the image, you can run it:

docker run --hostname=quickstart.cloudera --privileged=true -t -i [OPTIONS] [IMAGE] /usr/bin/docker-quickstart

The required flags and other options are described in the following table:

Option Description
--hostname=quickstart.cloudera Required: Pseudo-distributed configuration assumes this hostname.
--privileged=true Required: For HBase, MySQL-backed Hive metastore, Hue, Oozie, Sentry, and Cloudera Manager.
-t Required: Allocate a pseudoterminal. Once services are started, a Bash shell takes over. This switch starts a terminal emulator to run the services.
-i Required: If you want to use the terminal, either immediately or connect to the terminal later.
-p 8888 Recommended: Map the Hue port in the guest to another port on the host.
-p [PORT] Optional: Map any other ports (for example, 7180 for Cloudera Manager, 80 for a guided tutorial).
-d Optional: Run the container in the background.

Use /usr/bin/docker-quickstart to start all CDH services, and then run a Bash shell. You can directly run /bin/bash instead if you want to start services manually.

See Networking for details about port mapping.

Connecting to the Docker Shell

If you do not pass the -d flag to docker run, your terminal automatically attaches to the container.

A container dies when you exit the shell, but you can disconnect and leave the container running by typing Ctrl+p followed by Ctrl+q.

If you disconnect from the shell or passed the -d flag on startup, you can connect to the shell later using the following command:

docker attach [CONTAINER HASH]

You can look up the hashes of running containers using the following command:

docker ps

When attaching to a container, you might need to press Enter to see the shell prompt. To disconnect from the terminal without the container exiting, type Ctrl+p followed by Ctrl+q.

Networking

To make a port accessible outside the container, pass the -p <port> flag. Docker maps this port to another port on the host system. You can look up the interface to which it binds and the port number it maps to using the following command:

docker port [CONTAINER HASH] [GUEST PORT]

To interact with the Cloudera QuickStart image from other systems, make sure quickstart.cloudera resolves to the IP address of the machine where the image is running. You might also want to set up port forwarding so that the port you would normally connect to on a real cluster is mapped to the corresponding port.

When you are mapping ports like this, services are not aware and might provide links or other references to specific ports that are no longer available on your client.

Other Notes

The Cloudera stack is designed to run on a distributed cluster. Pausing and stopping a Docker image is like pausing an entire datacenter—some services might shut down because they seem to be out of touch with the rest of the cluster.

Cloudera Manager is not started by default. To see options for starting Cloudera Manager, run the following command:

/home/cloudera/cloudera-manager

See Cloudera documentation and the Cloudera website for other information, including the license agreement associated with the Docker image.