Creating a Data Hub cluster with custom path

When you create a Data Hub cluster with custom paths for Data Hub clusters, your cluster might fail with an access exception. To resolve this issue, you must add the policy path to the default Zeppelin notebooks policy using Ranger.

Data Hub cluster creation fails and a Zeppelin exception is logged in the Cloudera Manager logs, if you create a Data Hub cluster with a custom path <bucket>/custom-dh, overwriting the YARN location and Zeppelin notebook location to <bucket>/yarn-logs and <bucket>/zeppelin-notebook respectively.

The following image shows the Create Cluster > Advanced Options > Cloud Storage page where you can optionally specify the base storage location used for YARN and Zeppelin:

Figure 1. Cloud Storage page
The image shows the Data Hub cluster creation using custom paths.

The following image shows the Zeppelin exception in the Cloudera Manager logs:

Figure 2. Zeppelin exception in Cloudera Manager logs
The image shows the Zeppelin exception logged in the Cloudera Manager logs.

To resolve this issue, perform the following steps:

  1. Add the policy path to the default Zeppelin notebooks policy using Ranger.

    The following image shows the options to add the policy path to the Zeppelin notebooks policy in Ranger UI:

    Figure 3. Policy path in Ranger UI
    The image shows the options to add the policy path to the Zeppelin notebooks policy in Ranger UI.
  2. For YARN logs aggregation, add the following using Ranger:

    • Add /yarn--logs/{USER} path to the Default: Hadoop App Logs - user default policy.
    • Add /yarn--logs path to the Default: Hadoop App Logs default policy.

    The following sample image shows the update required in the Default: Hadoop App Logs - user default policy:

    Figure 4. Default: Hadoop App Logs - user default policy
    The image shows path that you must add to the Default: Hadoop App Logs - user default policy in Ranger UI.

    The following sample image shows the update required in the Default: Hadoop App Logs default policy:

    Figure 5. Default: Hadoop App Logs default policy
    The image shows the path that you must add to the Default: Hadoop App Logs default policy in Ranger UI.
  3. After adding the policies, perform the following steps:
    1. In Cloudera Manager, re-run the failed commands.
    2. After the commands run successfully, go to the Management Console > Data Hub Clusters page, and click Actions > Retry > Yes option.

      The operation continues from the last failed step.