Installing CDS 3.0 Powered by Apache Spark

CDS 3.0 Powered by Apache Spark is distributed as two files: a custom service descriptor file and a parcel, both of which must be installed on the cluster.

Install CDS Powered by Apache Spark

Follow these steps to install CDS 3 Powered by Apache Spark:

  1. Check that all the software prerequisites are satisfied. If not, you might need to upgrade or install other software components first.
  2. Install the CDS Powered by Apache Spark service descriptor into Cloudera Manager.
    1. To download the CDS Powered by Apache Spark service descriptor, click the service descriptor link for the version you want to install.
    2. Log on to the Cloudera Manager Server host, and copy the CDS Powered by Apache Spark service descriptor in the location configured for service descriptor files.
    3. Set the file ownership of the service descriptor to cloudera-scm:cloudera-scm with permission 644.
    4. Restart the Cloudera Manager Server with the following command:
      systemctl restart cloudera-scm-server
  3. In the Cloudera Manager Admin Console, add the CDS parcel repository to the Remote Parcel Repository URLs in Parcel Settings as described in Parcel Configuration Settings.
  4. Download the CDS Powered by Apache Spark parcel, distribute the parcel to the hosts in your cluster, and activate the parcel. For instructions, see Managing Parcels.
  5. Add the Spark 3 service to your cluster.
    1. In step 1, select any optional dependencies, such as HBase and Hive, or select No Optional Dependencies.
    2. In step 2, when customizing the role assignments, add a gateway role to every host.
    3. On the Review Changes page, you can enable TLS for the Spark History Server.
    4. Note that the History Server port is 18089 instead of the usual 18088.
    5. Complete the remaining steps in the wizard.
  6. Return to the Home page by clicking the Cloudera Manager logo in the upper left corner.
  7. Click the stale configuration icon to launch the Stale Configuration wizard and restart the necessary services.

Install the Livy for Spark 3 Service Descriptor

CDS 3 supports Apache Livy, but it cannot use the included Livy service, which is compatible with only Spark 2. To add and manage a Livy service compatible with Spark 3, you must install a service descriptor for the Livy for Spark 3 service.

  1. Install the Livy for Spark 3 service descriptor into Cloudera Manager.
    1. To download the service descriptor, click the service descriptor link for the version you want to install.
    2. Log on to the Cloudera Manager Server host, and copy the Livy service descriptor to the location configured for service descriptor files.
    3. Set the file ownership of the service descriptor to cloudera-scm:cloudera-scm with permissions set to 644.
    4. Restart the Cloudera Manager Server with the following command:
      systemctl restart cloudera-scm-server
  2. In the Cloudera Manager Admin Console, add the CDS 3 parcel repository to the Remote Parcel Repository URLs in Parcel Settings as described in Parcel Configuration Settings.
  3. Download the CDS Powered by Apache Spark parcel, distribute the parcel to the hosts in your cluster, and activate the parcel. For instructions, see Managing Parcels.
  4. Add the Livy for Spark 3 service to your cluster.
    1. Note that the Livy port is 28998 instead of the usual 8998.
    2. Complete the remaining steps in the wizard.
  5. Return to the Home page by clicking the Cloudera Manager logo in the upper left corner.
  6. Click the stale configuration icon to launch the Stale Configuration wizard and restart the necessary services.