Overview of using Cloudera Data Engineering resources
Key resource management strategies in Cloudera Data Engineering include leveraging Python virtual environments for managing job dependencies and utilizing custom Spark runtime Docker images to integrate specialized packages and libraries into Spark jobs.
Using Python virtual environments
Cloudera Data Engineering supports Python virtual environments to manage job
dependencies by using the python-env
resource type.
For more information, see Using Python virtual environments.
Using custom Spark runtime Docker images via API/CLI
Custom Spark runtime Docker images are used when custom packages and libraries need to be installed and used when executing Spark jobs. These custom packages and libraries can be proprietary software packages like RPMs that need to be compiled to generate the required binaries. Docker images allow you to pre-bake these dependencies into a self-contained Docker file that can be used across multiple Spark jobs.
For more information, see Using custom Spark runtime Docker images via API/CLI.