HDFS client packages

Use these steps to install client RPMs and configure HDFS for connecting to a Cloudera cluster from an unmanaged node.

Prerequisites

  • Infrastructure: Unmanaged host running one of these operating systems:

    • RHEL or RHEL-compatible (using yum)

    • SUSE Linux Enterprise Server (SLES) (using zypper)

    • Ubuntu or Debian (using apt)

  • Java Development Kit: Full JDK (not just JRE) installed on the unmanaged node, matching the version used on the managed cluster.

Step 1: Set up Kerberos and Java Key Stores

If your cluster uses Kerberos for authentication (highly recommended for secure environments):

  1. Install Java Key Stores and Trust Stores.

    Import your Java key-store (.jks) and trust-store files as required by your cluster security configuration.

  2. Validate Kerberos Client Installation

    Ensure the unmanaged node has Kerberos utilities (krb5-workstation or appropriate packages) installed and configured to communicate with the cluster’s Kerberos KDC.

Step 2: Obtain and Copy Configuration Files

To ensure consistent configuration, copy the necessary service configuration files from a managed Cloudera Manager host to unmanaged nodes.

  1. On a managed node with the related components installed (typically under /etc), locate /etc/hadoop/conf config directory.

  2. Copy this directory, preserving ownership and permissions, to the same location on the unmanaged node.

Step 3: HDFS-Specific configuration and setup

3.1 Install Java Development Kit (JDK)
  • Install the full Java Development Kit (JDK) on the unmanaged node, matching the version used in the managed cluster.
  • The JDK is required for HDFS to function properly.
3.2 Install HDFS and Dependencies
  • Use the unmanaged node’s default package manager to install the HDFS client along with its dependencies:
    • For RHEL/CentOS:
      sudo yum install hdfs-client
    • For Ubuntu/Debian:
      sudo apt-get install hdfs-client
    • For SLES:
      sudo zypper install hdfs-client