Guidelines for Virtual Cluster upkeep
There are upkeep guidelines for Cloudera Data Engineering (CDE) Spark History Server (SHS) that you'll need to consider.
Lifecycle configuration of CDE Spark event logs
The number of Spark event logs (spark.eventLog.enabled), that are produced by Spark jobs that run via CDE Virtual Cluster, grows indefinitely with each new CDE Spark run. These event logs are not automatically deleted and are stored on the object store under <CDP env storage location>/dex/<Service ID>/<VC ID>/eventlog/.
Some examples of the event log location can look like the following:
- For Amazon Web Services (AWS): s3a://dex-storage-bucket/datalake/logs/dex/cluster-2xvl4pfp/rdw8q2sh/eventlog/
- For Azure: abfs://logs@dexstorageaccount.dfs.core.windows.net/dex/cluster-4p54mk8j/22bnm99g/eventlog/
To avoid delays in event log availability after CDE job runs, you can configure an object
store lifecycle policy so that event logs are deleted automatically on the object store. For
more information about an Amazon S3 lifecycle policy, see Setting lifecycle configuration on a
bucket linked below. For more information about Azure lifecycle management policies, see
Configure a lifecycle management policy linked below.