Troubleshooting Spark issues

Consider some common Spark issues and their recommended solutions.

Distributing SparklyR packages

When using SparklyR, you may see an error from Spark similar to the following:
ERROR sparklyr: RScript (4922) terminated 
unexpectedly: namespace ‘vctrs’ 0.6.3 is being loaded, but >= 0.6.4 is required 
This means you need to upgrade the vctrs package, and configure Spark to use the R libraries available in the .home directory of the Cloudera AI session. This works because with Spark on Kubernetes, the Spark executors are running in the same cluster and have access to the same underlying filesystem. Include the following code when creating the Spark session:
install.packages("vctrs")
print(packageVersion("vctrs")) # expect 0.6.5
...
config$spark.executorEnv.R_LIBS="/home/cdsw/.local/lib/R/4.3/library"
config$spark.executorEnv.R_LIBS_USER="/home/cdsw/.local/lib/R/4.3/library"
config$spark.executorEnv.R_LIBS_SITE="/opt/cmladdons/r/libs"
...
spark_apply(test_fx, packages = TRUE) # true by default