Cloudera Docker Container
The Cloudera Docker image is a single-node deployment of the Cloudera open-source distribution, including CDH and Cloudera Manager. You can use this environment to learn Hadoop, try new ideas, and test and demonstrate your application.
Docker is different from other virtual machines that isolate or simulate access to the host’s hardware, so that entire guest operating systems can run on them. Docker uses a Linux container, which partitions resources of the host operating system; the container has its own view of the filesystem and other resources, but it runs on the same kernel. Docker provides tooling, a packaging format, and infrastructure around Linux containers and related technologies.
Docker is well supported in several recent Linux distributions. For example, on Ubuntu 14.04, you can install Docker using the following command:
sudo apt-get install docker.io
Importing the Cloudera QuickStart Image
You can import the Docker image by pulling it from the Docker Hub:
docker pull cloudera/quickstart:latest
You can also download the image from the Cloudera website. After the file is downloaded and on your host, you can import it into Docker:
tar xzf cloudera-quickstart-vm-*-docker.tar.gz docker import - cloudera/quickstart:latest < cloudera-quickstart-vm-*-docker/*.tar
Running a Cloudera QuickStart Container
To run a container using the image, you must know the name or hash of the image. If you followed the import instructions above, the name is cloudera/quickstart:latest. The hash is also printed in the terminal when you import, or you can look up the hashes of all imported images with:
docker images
Once you know the name or hash of the image, you can run it:
docker run --hostname=quickstart.cloudera --privileged=true -t -i [OPTIONS] [IMAGE] /usr/bin/docker-quickstart
The required flags and other options are described in the following table:
Option | Description |
---|---|
--hostname=quickstart.cloudera | Required: Pseudo-distributed configuration assumes this hostname. |
--privileged=true | Required: For HBase, MySQL-backed Hive metastore, Hue, Oozie, Sentry, and Cloudera Manager. |
-t | Required: Allocate a pseudoterminal. Once services are started, a Bash shell takes over. This switch starts a terminal emulator to run the services. |
-i | Required: If you want to use the terminal, either immediately or connect to the terminal later. |
-p 8888 | Recommended: Map the Hue port in the guest to another port on the host. |
-p [PORT] | Optional: Map any other ports (for example, 7180 for Cloudera Manager, 80 for a guided tutorial). |
-d | Optional: Run the container in the background. |
Use /usr/bin/docker-quickstart to start all CDH services, and then run a Bash shell. You can directly run /bin/bash instead if you want to start services manually.
See Networking for details about port mapping.
Connecting to the Docker Shell
If you do not pass the -d flag to docker run, your terminal automatically attaches to the container.
A container dies when you exit the shell, but you can disconnect and leave the container running by typing Ctrl+p followed by Ctrl+q.
If you disconnect from the shell or passed the -d flag on startup, you can connect to the shell later using the following command:
docker attach [CONTAINER HASH]
You can look up the hashes of running containers using the following command:
docker ps
When attaching to a container, you might need to press Enter to see the shell prompt. To disconnect from the terminal without the container exiting, type Ctrl+p followed by Ctrl+q.
Networking
To make a port accessible outside the container, pass the -p <port> flag. Docker maps this port to another port on the host system. You can look up the interface to which it binds and the port number it maps to using the following command:
docker port [CONTAINER HASH] [GUEST PORT]
To interact with the Cloudera QuickStart image from other systems, make sure quickstart.cloudera resolves to the IP address of the machine where the image is running. You might also want to set up port forwarding so that the port you would normally connect to on a real cluster is mapped to the corresponding port.
When you are mapping ports like this, services are not aware and might provide links or other references to specific ports that are no longer available on your client.
Other Notes
The Cloudera stack is designed to run on a distributed cluster. Pausing and stopping a Docker image is like pausing an entire datacenter—some services might shut down because they seem to be out of touch with the rest of the cluster.
Cloudera Manager is not started by default. To see options for starting Cloudera Manager, run the following command:
/home/cloudera/cloudera-manager
See Cloudera documentation and the Cloudera website for other information, including the license agreement associated with the Docker image.