Using workers API

Using workers API enables parallelized task execution by launching multiple workers from a session, allowing them to process queued parameters with a specified function, making it ideal for use cases like web scraping or other distributed computations.

Launching workers

You can launch the worker engines into the cluster.

Use the following script for launcing workers:

launch_workers(n, cpu, memory, nvidia_gpu=0, kernel="python3", script="", code="", env={})

Specify the following parameters:

n (int) - Defines the number of engines to launch.
cpu (float) - Defines the number of CPU cores to allocate to the engine.
memory (float) - Defines the number of gigabytes of memory to allocate to the engine.
nvidia_gpu (int, optional) - Defines the number of GPUs to allocate to the engine.
kernel (str, optional) - Defines the kernel. The kernel can be R, Python2, Python3, or Scala. This parameter is only available for projects that use legacy engines.
script (str, optional) - Defines the name of a Python source file the worker runs as soon as it starts up.
code (str, optional) - Defines the Python code the engine runs as soon as it starts up. If a script is specified, the code will be ignored.
env (dict, optional) - Defines the environment variables to set in the engine.

Examples:

Launching workers using Python

import cml.workers_v1 as workers
              launched_workers = workers.launch_workers(n=2, cpu=0.2, memory=0.5, code="print('Hello from a Cloudera AI Worker')")

Launching workers using R

library('cml')
              launched_workers <- launch.workers(n=2, cpu=0.2, memory=0.5, env="", code="print('Hello from a Cloudera AI Worker')")

Listing workers

Listing workers returns all information on all the workers in the cluster.

Use the following script for listing workers:

list_workers()

Stopping workers

You can stop specific worker engines.

Use the following script for stopping workers:

stop_workers(*worker_id)

Specify the following parameters:

worker_id (int, optional) - Defines the ID numbers of the worker engines that must be stopped. If an ID is not provided, all the worker engines on the cluster will be stopped.