EMC DSSD D5 Storage Appliance Integration for Hadoop DataNodes

Overview of EMC DSSD D5 Integration

The EMC DSSD D5 provides a high-speed, low-latency storage solution based on flash media. It has been optimized for use as storage for DataNodes in the Cloudera CDH distribution. The DataNode hosts connect directly to the DSSD D5 using a PCIe card interface. In a CDH cluster, only the DataNodes use the DSSD D5 for storage; all other hosts use standard disks.

To manage clusters that use DSSD D5 storage, enable DSSD Mode in Cloudera Manager. All other Hadoop components operate normally. When this mode is enabled, Cloudera Manager can only manage clusters with DSSD D5 DataNodes; you cannot mix cluster types (a cluster that uses only DSSD D5 DataNodes and a cluster that uses regular DataNodes). All DataNodes must connect to the DSSD D5; you cannot mix DataNode types within a cluster.

You can connect multiple instances of a DSSD D5 appliance to a single cluster by defining each DSSD D5 as a "rack." See Configuring Multiple DSSD D5 Appliances in a Cluster.

Installing CDH with DSSD DataNodes

Use Cloudera Manager to install a DSSD D5-enabled cluster. You can install Cloudera Manager in several ways, and you can use Cloudera Manager to install agents and other software on all hosts in your cluster. Installing CDH with DSSD D5 DataNodes is similar to non-DSSD D5 installation, except for the following:
  • You cannot install a DSSD D5 cluster using a Cloudera Manager instance that is already managing a cluster.
  • You set a single property to enable DSSD Mode.
  • You set several DSSD D5-specific properties.
  • When installing CDH and other services from Cloudera Manager, only parcel installations are supported. Package installations are not supported. See Managing Software Installation Using Cloudera Manager.

See Installation with the EMC DSSD D5 for complete installation instructions.