Using EBS Volumes for Cloudera Manager and CDH

Cloudera Director 2.2 and higher supports the use of Amazon Elastic Block Store (EBS) volumes with Cloudera Manager and CDH cluster instances. You can use EBS volumes to store HDFS data, stage data for processing, or install other applications. EBS can provide an efficient and cost-effective alternative to S3 or other storage mechanisms.

EBS Volume Types

Cloudera Director supports the EBS volume types gp2, st1, and sc1:
EBS volume type Minimum and Maximum Size Usage
gp2 1 GiB - 16 TiB General-purpose SSD (solid state drive) volume that balances price and performance for a wide variety of transactional workloads.
st1 500 GiB - 16 TiB Low-cost HDD (hard disk drive) volume designed for frequently accessed, throughput-intensive workloads.
sc1 500 GiB - 16 TiB Lowest-cost HDD (hard disk drive) volume designed for less frequently accessed workloads.

For more information, see Amazon EBS Volume Types.

Amazon EC2 Instance Stores

Instance stores, like EBS, provide block storage for EC2 instances, but they cannot be used together with EBS volumes. Instance store volumes are located on disks that are physically attached to the host computer, and they are optionally included with many EC2 instance types.

If an instance type has instance store volumes and you do not specify EBS volumes, Cloudera Director automatically mounts all the instance store volumes that are available. If you do specify EBS volumes, Cloudera Director does not mount instance store volumes.

For more information on EC2 instance stores, see Amazon EC2 Instance Stores in the AWS documentation.

Configuring EBS Volumes

You configure EBS volumes in the instance template in the web UI or in the instance section of the configuration file for clusters launched with the CLI and bootstrap-remote. To configure EBS, provide the following information:
  • Number of EBS volumes you want
  • Type of the EBS volumes (gp2, st1, or sc1). All EBS volumes for an instance must be of the same type.
  • Size of the volumes. Specifying a size outside the ranges defined in the table above causes cluster deployment to fail.
  • Encryption
    • Whether or not to encrypt data in the EBS volume
    • Whether to use the default KMS key for the EBS service or use a custom KMS key

EBS volumes for a Cloudera Manager or CDH cluster instance have the same lifecycle as the instance. EBS volumes are terminated when the instance is terminated. Repair of an instance does not result in the remounting of an existing EBS volume; a new volume is used.

EBS Volume Encryption

Data in EBS volumes can be encrypted at rest. You use two properties for configuring EBS encryption:
  • enableEbsEncryption: Labeled Enable EBS Encryption in the web UI. Set to true or false. If this value is set to true, the data on EBS volumes created with this instance template will be encrypted.
  • ebsKmsKeyId: Labeled EBS KMS Key ID in the web UI. The key used to encrypt data in the EBS volumes. KMS includes a default master key for each service that supports encryption, including EBS. If you leave this field empty, Cloudera Director configures the EBS volumes to use the KMS default master key for EBS. Alternatively, you can import a custom master key from your own key management infrastructure into KMS and specify it here to be used for the EBS service. To specify a custom master key, enter the full Amazon Resource Name (ARN) of the custom master key that you have stored in KMS: arn:aws:kms:your_key_name. For example:
    arn:aws:kms:us-west-1:635144601417:key/39b8cdf2-923e-721b-9c6c-652a7e517d72

For more information about EBS encryption, see Amazon EBS Encryption in the AWS documentation. For more information about KMS, see AWS Key Management Service Details in the AWS documentation.

Configuring an EBS Volume with the Web UI

To configure EBS volumes in the web UI, provide the required values in the Advanced Options section of the instance template:


Configuring EBS Volumes with the Configuration File

To configure EBS volumes in the configuration file for launching clusters with bootstrap-remote, provide the required values and uncomment them in the EBS Volumes section of the file:
  #
  # EBS Volumes
  #
  # Director can create and attach additional EBS volumes to the instance. These volumes
  # will be automatically deleted when the associated instance is terminated. These
  # properties don't apply to the root volume.
  #
  # See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumes.html
  #
  # ebsVolumeCount : 0
  # ebsVolumeType: st1 # specify either st1, sc1 or gp2 volume type
  # ebsVolumeSizeGiB: 500
  #
  # EBS Volume Encryption
  #
  # Encryption can be enabled on the additional EBS volumes. An optional CMK can
  # be specified for volume encryption. Not setting a CMK means the default CMK
  # for EBS will be used. The encryption here does not apply to the root volume.
  #
  # See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html
  #
  # enableEbsEncryption: false
  # ebsKmsKeyId: arn:aws:kms:REPLACE-ME # full ARN of the KMS CMK

Configuring Device Names for EBS Volumes and Instance Store Volumes

When requesting EC2 instances in Cloudera Director with additional EBS volumes or requesting an instance that contains instance store volumes, Cloudera Director will automatically assign device names to the volumes. For more information about device names in EC2, see Device Naming on Linux Instances in the AWS documentation. The way the device names are assigned to the volumes can be configured. This may be necessary to ensure that the device names used by Cloudera Director doesn't overlap with any additional volumes associated with an AMI.

By default, instance store volumes will get device names /dev/sdb, /dev/sdc, /dev/sdd, and so on. The device name prefix and starting character can be configured by adding the following section in etc/aws-plugin.conf under the AWS plugin directory.
ephemeralDeviceMappings {
    deviceNamePrefix: /dev/sd
    rangeStart: b
}
By default, EBS volumes will get device names /dev/sdf, /dev/sdg, /dev/sdh, and so on. The device name prefix and starting character can be configured by adding the following section in etc/aws-plugin.conf under the AWS plugin directory.
ebsDeviceMappings {
    deviceNamePrefix: /dev/sd
    rangeStart: f
}

Note that Cloudera Director does not attach both instance store volumes and EBS volumes at the same time. If EBS volumes are specified, instance store volumes will not be attached.