CVE-2021-44228 Remediation for Cloudera Machine Learning Data Service
The procedure to remediate CVE-2021-44228 on Cloudera Machine Learning Data Service is described in this document.
On December 15 2021, the ML team released version 2.0.25-b110 of Cloudera Machine Learning Data Service for Cloudera Public Cloud. It addresses CVE-2021-44228 which affects Apache Log4j2 versions 2.0 through 2.14.1. We urge all customers to upgrade their workspaces to the latest version.
The Cloudera Machine Learning service code itself is not written in Java and hence is not vulnerable. However, the log4j2 jar file does exist in the (now deprecated) engine as well as the Hadoop CLI and Spark ML Runtimes addons. As a result, the log4j2 jar file is available in sessions, jobs, models, and applications. Because of this, there is no direct threat, but a data scientist could inadvertently use this log4j2 jar file and expose themselves to this vulnerability.
As a result, we are releasing a new version of the engine and ML Runtimes addons that remove this vulnerability.
Upgrade Cloudera Machine Learning Workspace to the new version
Administrators can now upgrade the Cloudera Machine Learning Workspaces to version 2.0.25-b110. To upgrade the workspace, select the Actions icon and select Upgrade Workspace.
Once the workspace is upgraded, you can follow the steps below to ensure the appropriate engine and ML Runtimes addons are used.
ML Runtimes Add-ons
The first step involves the use of runtime addons. These addons consist of Hadoop CLI code and Spark code that is added to ML Runtimes. An administrator must go to the Site Administration -> Runtime/Engine page and ensure that in the “Hadoop CLI Version” drop down box, the chosen selection ends with “HOTFIX-1”.
When starting a session, if “Enable Spark” is turned on, users must use a version of Spark ending with “HOTFIX-1”.
Jobs and applications that use Spark in projects using runtimes, ensure the job or application uses a correct version of the Spark addon by going to the job/application Settings page and selecting a version of Spark that ends with “HOTFIX-1”. Your applications will require a restart.
Models and Experiments that use Spark runtimes must be rebuilt and redeployed with a fixed version of Spark. The models and experiments do not need to be deleted.
Engines
docker.repository.cloudera.com/cloudera/cdsw/engine:15-cml-2021.09-2
The “-2” at the end is important. It is identical to the engine ending in “-1”, except the vulnerability has been removed. This engine is now the default for new workspaces. To ensure this engine is used:
- An administrator should go to Engine Images. Ensure the above version is selected as the default. and scroll down to
- For all projects using engines, go to and verify the engine is set to the version above.
- For all applications and jobs in projects that use engines, go to the application or job settings page and ensure that under Select Job Engine the above version is selected.
- Deploy a new build for all models that use engines.
Technical Details of the Fix
The log4j2 jar file exists in several places and is also packaged inside other jar files. Instead of upgrading the log4j2 jar file, we have chosen to remove the vulnerable java class file. This is one of the mitigations proposed in the CVE text:
"it can be mitigated ... by removing the JndiLookup class from the classpath (example:
zip -q -d log4j-core-*.jar */JndiLookup.class
)."
As a result, older versions of the log4j2 jar file still exist in sessions after the fix, but the offending class file has been removed. To verify this, you can run:
grep JndiLookup.class <jar file>