Installing or Upgrading CDS Powered by Apache Spark

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

CDS Powered by Apache Spark is distributed as two files: a custom service descriptor file and a parcel, both of which must be installed on the cluster.

Install CDS Powered by Apache Spark

Follow these steps to install CDS Powered by Apache Spark:

  1. Check that all the software prerequisites are satisfied. If not, you might need to upgrade or install other software components first. See CDS Powered by Apache Spark Requirements for details.
  2. Install the CDS Powered by Apache Spark service descriptor into Cloudera Manager.
    1. To download the CDS Powered by Apache Spark service descriptor, in the Version Information table in CDS Versions Available for Download, click the service descriptor link for the version you want to install.
    2. Log on to the Cloudera Manager Server host, and copy the CDS Powered by Apache Spark service descriptor in the location configured for service descriptor files.
    3. Set the file ownership of the service descriptor to cloudera-scm:cloudera-scm with permission 644.
    4. Restart the Cloudera Manager Server with the following command:
      RHEL 7 Compatible, SLES 12, Ubuntu
      systemctl restart cloudera-scm-server
      RHEL 6 Compatible
      service cloudera-scm-server restart
  3. In the Cloudera Manager Admin Console, add the CDS Powered by Apache Spark parcel repository to the Remote Parcel Repository URLs in Parcel Settings as described in Parcel Configuration Settings.
  4. Download the CDS Powered by Apache Spark parcel, distribute the parcel to the hosts in your cluster, and activate the parcel. See Managing Parcels.
  5. Add the Spark 2 service to your cluster.
    1. In step #1, select a dependency option:
      • HDFS, YARN, ZooKeeper: Choose this option if you do not need access to a Hive service.
      • HDFS, Hive, YARN, ZooKeeper: Hive is an optional dependency for the Spark service. If you have a Hive service and want to access Hive tables from your Spark applications, choose this option to include Hive as a dependency and have the Hive client configurations always available to Spark applications.
    2. In step #2, when customizing the role assignments for CDS Powered by Apache Spark, add a gateway role to every host.
    3. Note that the History Server port is 18089 instead of the usual 18088.
    4. Complete the steps to add the Spark 2 service.
  6. Return to the Home page by clicking the Cloudera Manager logo.
  7. Click to restart the cluster.

Upgrading to CDS 2.4 Powered By Apache Spark

If you are already using CDS 2.0, 2.1, 2.2, or 2.3, here are the steps to upgrade to CDS 2.4 Powered by Apache Spark, while keeping any non-default configurations for Spark 2 that have already been applied:

  • Remove the service descriptor JAR for the older version of CDS Powered by Apache Spark from /opt/cloudera/csd. Refer to CDS Powered by Apache Spark Version, Packaging, and Download Information for the names of the JAR files corresponding to each version.

  • Add the service descriptor JAR for CDS 2.4 to /opt/cloudera/csd. Set correct permissions and ownership.

  • Restart the cloudera-scm-server service.

  • In Cloudera Manager, deactivate the parcel corresponding to the older version of CDS.

  • In Cloudera Manager, activate the parcel corresponding to CDS 2.4.

  • Restart services and deploy the client configurations.

  • If you are using Cloudera Data Science Workbench, note that Cloudera Data Science Workbench does not automatically detect configuration changes on the CDH cluster. Perform a full reset of Cloudera Data Science Workbench so that it can pick up any changes as a result of the upgrade. For instructions, see the associated known issue in the Cloudera Data Science Workbench documentation.