Security hardened Spark image migration guide

From Cloudera Data Engineering version 1.24.1, the Redhat-based images for Apache Spark 3.x versions are deprecated and security hardened images are provided for runtime usage.

To support successful migrations during in-place upgrades and backup-and-restore operations for clusters from previous versions, Redhat images are continued to be used. When manually creating clusters through the Cloudera Data Engineering UI, the API, or the CDP CLI, the default option is the security hardened image.

The security hardened image differs from the Redhat-based images in the following aspects:

  • Different Java and Python versions are used. For more information, see Compatibility for Cloudera Data Engineering and Runtime components.
  • The security hardened image follows a distroless approach, which excludes packages that are not required for running Spark. Several Linux packages which are present in the Redhat-based images are not included in the security hardened image. As a consequence, any Python or Java module that has a dependency on these excluded packages causes issues in the security hardened image. For more information, see Libraries not included in the security hardened image.

For any packages excluded from the security hardened image, missing for Python or Java modules, you can build a custom image with the required libraries from the security hardened base Spark runtime image. For more information, see Using Custom Spark Runtime Docker Images via API/CLI.