Known Issues

You might run into some known issues while using Cloudera Private Cloud.

Using dollar character in environment variables in Cloudera AI

Environment variables with the dollar ($) character are not parsed correctly by Cloudera AI. For example, if you set PASSWORD="pass$123" in the project environment variables, and then try to read it using the echo command, the following output will be displayed: pass23

Workaround: Use one of the following commands to print the $ sign:
echo 24 | xxd -r -p
or
echo JAo= | base64 -d
Insert the value of the environment variable by wrapping it in the command substitution using $() or ``. For example, if you want to set the environment variable to ABC$123, specify:
ABC$(echo 24 | xxd -r -p)123
or
ABC`echo 24 | xxd -r -p`123
DSE-37827: Jupyter's RTC extension throws an error and notebooks become unusable

In certain cases, Jupyter’s RTC (Real Time Collaboration) extension may cause errors claiming either that other sessions are active, or that other processes have accessed the notebook files. After these errors, the notebook becomes unusable due to the error messages and the Cloudera AI session needs to be restarted.

Workaround:

You must disable the Jupyter RTC extension by performing the following tasks:
  1. Create a Session.
  2. Open the terminal.
  3. Enter nano /home/cdsw/.jupyter/labconfig/page_config.json.
  4. Add the following lines to the file:
    {
      "disabledExtensions": {
        "@jupyter/collaboration-extension": true
      },
      "lockedExtensions": {
        "@jupyter/collaboration-extension": true
      }
    }
    
  5. Save and close the file.
DSE-36718: Disable auto synchronization feature for users and teams

The automated team and user synchronization feature is disabled. Newly installed or upgraded workbenches do not have the automatic synchronization option in the Cloudera AI UI.

Workaround: none

DSE-36759: AMPs and Feature Announcement sections do not work in NTP setups

Cloudera AI Private Cloud setups with Non Transparent Proxy do not function properly, that affects Cloudera Accelerators for Machine Learning Projects and Feature Announcements. The home page freezes, the feature announcement displays error message, and the AMPs do not load.

Workaround:

To avoid the home page freeze copy the following environment variables from the web deployment, and add them to the environment section of the API deployments:
  • HTTP_PROXY
  • HTTPS_PROXY
  • NO_PROXY
  • http_proxy
  • https_proxy
  • no_proxy
DSE-32943: Enabling Service Accounts
Teams in the Cloudera AI Workbench can only run workloads within team projects with the Run as option for service accounts if they have previously manually added service accounts as a collaborator to the team.
DSE-35013: First Cloudera AI Workbench creation fails

On RHEL 8.8, during the first Cloudera AI Workbench installation on GPU with Cloudera Embedded Container Service external registry, pods might get stuck in the init or CrashLoop state.

First-time workbench installation is expected to fail. Consider this as a test workbench, and apply the following manual workaround for creating subsequent workbenches:
  1. Restart or delete the pods which are in init or CrashLoop state in the test workbench.
  2. Once all pods are in the running state, create new workbenches as needed.
  3. Delete the test workbench from the Cloudera AI UI if no longer needed.
OPSX-4603: Buildkit in Cloudera Embedded Container Service in Cloudera AI Private Cloudd

Issue: BuildKit was introduced in Cloudera Embedded Container Service for building images of models and experiments. BuildKit is a replacement for Docker, which was previously used to build images of Cloudera AI's models and experiments in Cloudera Embedded Container Service. Buildkit is only for OS RHEL8.x and CentOS 8.x.

Buildkit in Cloudera AI Private Cloud 1.5.2 is a Technical Preview feature. Hence, having Docker installed on the nodes/hosts is still mandatory for models and experiments to work smoothly. Upcoming release will be completely eliminating the dependency of Docker on the nodes.

Workaround: None.

DSE-32285: Migration: Migrated models are failing due to image pull errors

Issue: After CDSW to Cloudera AI migration (on-premises) via full-fledged migration tool, migrated models on Cloudera AI Workbench on Private Cloud fails on initial deployment. This is because the initial model deployment tries to pull images from on-premises's registry.

Workaround: Redeploy the migrated model. As this involves the build and deploy process, the image will be built, pushed to the Private Cloud Cloudera AI Workbench's configured registry, and then the same image will be consumed for further usage.

DSE-28768: Spark Pushdown is not working with Scala 2.11 runtime

Issue: Scala and R are not supported for Spark Pushdown.

Workaround: None.

DSE-32304: On Cloudera AI Private Cloud Cloudera Embedded Container Service terminal and ssh connections can terminate

Issue: In Cloudera Private Cloud Cloudera Embedded Container Service, Cloudera AI Terminal and SSH connections can terminate after an uncertain amount of time, usually after 4-10 minutes. This issue affects the usage of local IDEs to work with Cloudera AI, as well as any customer application using a websocket connection.

Workaround: None.

DSE- 35251: Web pod crashes if a project forking takes more than 60 minutes
The web pod crashes if a project forking takes more than 60 minutes. This is because the timeout is set to 60 minutes using the grpc_git_clone_timeout_minutes property. The following error is displayed after the web pod crash:
2024-04-23 22:52:36.384   1737    ERROR      AppServer.VFS.grpc                    crossCopy grpc error    data = [{"error":"1"},{"code":4,"details":"2","metadata":"3"},"Deadline exceeded",{}]
          ["Error: 4 DEADLINE_EXCEEDED: Deadline exceeded\n    at callErrorFromStatus (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/call.js:31:19)\n    at Object.onReceiveStatus (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/client.js:192:76)\n    at Object.onReceiveStatus (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)\n    at Object.onReceiveStatus (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)\n    at /home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/resolving-call.js:94:78\n    at process.processTicksAndRejections (node:internal/process/task_queues:77:11)\nfor call at\n    at ServiceClientImpl.makeUnaryRequest (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/client.js:160:34)\n    at ServiceClientImpl.crossCopy (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)\n    at /home/cdswint/services/web/server-dist/grpc/vfs-client.js:235:19\n    at new Promise (<anonymous>)\n    at Object.crossCopy (/home/cdswint/services/web/server-dist/grpc/vfs-client.js:234:12)\n    at Object.crossCopy (/home/cdswint/services/web/server-dist/models/vfs.js:280:38)\n    at projectForkAsyncWrapper (/home/cdswint/services/web/server-dist/models/projects/projects-create.js:229:19)"]
          node:internal/process/promises:288
          triggerUncaughtException(err, true /* fromPromise */);
          ^Error: 4 DEADLINE_EXCEEDED: Deadline exceeded
          at callErrorFromStatus (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
          at Object.onReceiveStatus (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/client.js:192:76)
          at Object.onReceiveStatus (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
          at Object.onReceiveStatus (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
          at /home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/resolving-call.js:94:78
          at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
          for call at
          at ServiceClientImpl.makeUnaryRequest (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/client.js:160:34)
          at ServiceClientImpl.crossCopy (/home/cdswint/services/web/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
          at /home/cdswint/services/web/server-dist/grpc/vfs-client.js:235:19
          at new Promise (<anonymous>)
          at Object.crossCopy (/home/cdswint/services/web/server-dist/grpc/vfs-client.js:234:12)
          at Object.crossCopy (/home/cdswint/services/web/server-dist/models/vfs.js:280:38)
          at projectForkAsyncWrapper (/home/cdswint/services/web/server-dist/models/projects/projects-create.js:229:19) {
          code: 4,
          details: 'Deadline exceeded',
          metadata: Metadata { internalRepr: Map(0) {}, options: {} }
          }  
Workaround: Increase the timeout limit, for example, to 120 minutes, using the grpc_git_clone_timeout_minutes property.
UPDATE site_config SET grpc_git_clone_timeout_minutes = <new value>;