Recommended Configuration on Amazon Web Services (AWS)

On AWS, Cloudera Data Science Workbench must be used with persistent/long-running Apache Hadoop clusters only.

CDH and Cloudera Manager Hosts
Cloudera Data Science Workbench Hosts
  • Operations
    • Use Cloudera Director to orchestrate operations. Use Cloudera Manager to monitor the cluster.
  • Networking
    • No security group or network restrictions between hosts.
    • HTTP connectivity to the corporate network for browser access. Do not use proxies or manual SSH tunnels.
  • Recommended Instance Types
    • m4.4xlarge–m4.16xlarge

      In this case, bigger is better. That is, one m4.16large is better than four m4.4xlarge hosts. AWS pricing scales linearly, and larger instances have more EBS bandwidth.

  • Storage
    • 100 GB root volume block device (gp2) on all hosts
    • 500 GB Docker block devices (gp2) on all hosts
    • 1 TB Application block device (io1) on master host