Using CDH with Isilon Storage

Dell EMC Isilon is a storage service with a distributed filesystem that can used in place of HDFS to provide storage for CDH services.

Supported Versions

For Cloudera and Isilon compatibility information, see the product compatibility matrix for Product Compatibility for Dell EMC Isilon.

Differences Between Isilon HDFS and CDH HDFS

The following features of HDFS are not implemented with Isilon OneFS:

  • HDFS caching
  • HDFS encryption
  • HDFS ACLs

Installing Cloudera Manager and CDH with Isilon

For instructions on configuring Isilon and installing Cloudera Manager and CDH with Isilon, see the following EMC documentation:

Upgrading a Cluster with Isilon

To upgrade CDH and Cloudera Manager in a cluster that uses Isilon:
  1. If required, upgrade OneFS to a version compatible with the version of CDH to which you are upgrading. See the product compatibility matrix for Product Compatibility for Dell EMC Isilon. For OneFS upgrade instructions, see the EMC Isilon documentation.
  2. (Optional) Upgrade Cloudera Manager. See Upgrading Cloudera Manager.
  3. Upgrade CDH. See Upgrading CDH.

Configuring Replication with Kerberos and Isilon

If you plan to use replication between clusters that use Isilon storage and that also have enabled Kerberos, do the following:
  1. Create a custom Kerberos Keytab and Kerberos principal that the replication jobs use to authenticate to storage and other CDH services. See Authentication.
  2. In Cloudera Manager, select Administration > Settings.
  3. Search for and enter values for the following properties:
    • Custom Kerberos Keytab Location – Enter the location of the Custom Kerberos Keytab.
    • Custom Kerberos Principal Name – Enter the principal name to use for replication between secure clusters.
  4. When you create a replication schedule, enter the Custom Kerberos Principal Name in the Run As Username field. See Configuring Replication of HDFS Data and Configuring Replication of Hive/Impala Data.
  5. Ensure that both the source and destination clusters have the same set of users and groups. When you set ownership of files (or when maintaining ownership), if a user or group does not exist, the chown command fails on Isilon. See Performance and Scalability Limitations
  6. Cloudera recommends that you do not select the Replicate Impala Metadata option for Hive/Impala replication schedules. If you need to use this feature, create a custom principal of the form hdfs/hostname@realm or impala/hostname@realm.
  7. Add the following property and value to the HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml and Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml properties:
    hadoop.security.token.service.use_ip = false
If the replication MapReduce job fails with the an error similar to the following:
java.io.IOException: Failed on local exception: java.io.IOException:
  org.apache.hadoop.security.AccessControlException:
  Client cannot authenticate via:[TOKEN, KERBEROS];
  Host Details : local host is: "foo.mycompany.com/172.1.2.3";
  destination host is: "myisilon-1.mycompany.com":8020;
Set the Isilon cluster-wide time-to-live setting to a higher value on the destination cluster for the replication: Note that higher values may affect load balancing in the Isilon cluster by causing workloads to be less distributed. A value of 60 is a good starting point. For example:
isi networks modify pool subnet4:nn4 --ttl=60
You can view the settings for a subnet with a command similar to the following:
isi networks list pools --subnet subnet3 -v