Sessions example for the CDE CLI
In this example, a Session is created using the Cloudera Data Engineering (CDE) CLI with resources specified during creation. In this example, python environment, files, Git repository, and workload credentials resources are used.
> cde session create --name resources --type pyspark --python-env-resource-name example-virtual-env --runtime-image-resource-name docker-image --mount-1-resource octocat --mount-2-resource example-files --mount-3-resource example-data --workload-credential workload-cred --workload-credential workload-cred-2
{
"name": "resources",
"type": "pyspark",
"creator": "csso_surya.balakrishnan",
"created": "2023-10-06T03:13:03Z",
"mounts": [
{
"dirPrefix": "/",
"resourceName": "octocat"
},
{
"dirPrefix": "/",
"resourceName": "example-files"
},
{
"dirPrefix": "/",
"resourceName": "example-data"
}
],
"lastStateUpdated": "2023-10-06T03:13:03Z",
"state": "starting",
"interactiveSpark": {
"id": 1,
"driverCores": 1,
"executorCores": 1,
"driverMemory": "1g",
"executorMemory": "1g",
"numExecutors": 1,
"pythonEnvResourceName": "example-virtual-env"
},
"workloadCredentials": [
"workload-cred",
"workload-cred-2"
],
"runtimeImageResourceName": "docker-image"
}
> ./cde session interact --name resources
Starting REPL...
Waiting for the session to go into an available state...
Connected to Cloudera Data Engineering...
Press Ctrl+D (i.e. EOF) to exit
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\
/_/
Type in expressions to have them evaluated.
>>> os.listdir("/app/mount")
['.git', 'README', 'access-logs-ETL-iceberg.py', 'access-logs-ETL.py', 'access-logs.txt', 'cdeoperator.py', 'pyspark-batch-job.py', 'pyspark_wordcount.py', 'spark-load-data.py', 'word_count_templates.txt', 'wordcount_input_1.txt', 'dex-spark-driver-template-txckdpxp.yaml', 'dex-spark-executor-template-txckdpxp.yaml']
>>> sec_path = "/etc/dex/secrets/workload-cred/key1"
>>> with open(sec_path) as f:
... for line in f:
... print(line)
value1
>>> sec_path = "/etc/dex/secrets/workload-cred-2/key2"
>>> with open(sec_path) as f:
... for line in f:
... print(line)
value2
>>> import pandas
>>> dates = pandas.date_range("20130101", periods=6)
>>> dates
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06'],
dtype='datetime64[ns]', freq='D')