Sessions example for the CDE CLI

In this example, a Session is created using the Cloudera Data Engineering (CDE) CLI with resources specified during creation. In this example, python environment, files, Git repository, and workload credentials resources are used.

> cde session create --name resources --type pyspark --python-env-resource-name example-virtual-env --runtime-image-resource-name docker-image --mount-1-resource octocat --mount-2-resource example-files --mount-3-resource example-data --workload-credential workload-cred --workload-credential workload-cred-2
{
  "name": "resources",
  "type": "pyspark",
  "creator": "csso_surya.balakrishnan",
  "created": "2023-10-06T03:13:03Z",
  "mounts": [
    {
      "dirPrefix": "/",
      "resourceName": "octocat"
    },
    {
      "dirPrefix": "/",
      "resourceName": "example-files"
    },
    {
      "dirPrefix": "/",
      "resourceName": "example-data"
    }
  ],
  "lastStateUpdated": "2023-10-06T03:13:03Z",
  "state": "starting",
  "interactiveSpark": {
    "id": 1,
    "driverCores": 1,
    "executorCores": 1,
    "driverMemory": "1g",
    "executorMemory": "1g",
    "numExecutors": 1,
    "pythonEnvResourceName": "example-virtual-env"
  },
  "workloadCredentials": [
    "workload-cred",
    "workload-cred-2"
  ],
  "runtimeImageResourceName": "docker-image"
}

> ./cde session interact --name resources
Starting REPL...
Waiting for the session to go into an available state...
Connected to Cloudera Data Engineering...
Press Ctrl+D (i.e. EOF) to exit
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\
      /_/

Type in expressions to have them evaluated.

>>> os.listdir("/app/mount")
['.git', 'README', 'access-logs-ETL-iceberg.py', 'access-logs-ETL.py', 'access-logs.txt', 'cdeoperator.py', 'pyspark-batch-job.py', 'pyspark_wordcount.py', 'spark-load-data.py', 'word_count_templates.txt', 'wordcount_input_1.txt', 'dex-spark-driver-template-txckdpxp.yaml', 'dex-spark-executor-template-txckdpxp.yaml']

>>> sec_path = "/etc/dex/secrets/workload-cred/key1"

>>> with open(sec_path) as f:
...     for line in f:
...         print(line)
value1
>>> sec_path = "/etc/dex/secrets/workload-cred-2/key2"

>>> with open(sec_path) as f:
...     for line in f:
...         print(line)
value2
>>> import pandas

>>> dates = pandas.date_range("20130101", periods=6)

>>> dates
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')