Troubleshooting Spark issues
Consider some common Spark issues and their recommended solutions.
Distributing SparklyR packages
When using SparklyR, you may see an error from Spark similar to the
following:
ERROR sparklyr: RScript (4922) terminated
unexpectedly: namespace ‘vctrs’ 0.6.3 is being loaded, but >= 0.6.4 is required
This means you need to upgrade the
vctrs
package, and configure Spark to
use the R libraries available in the .home directory of the Cloudera AI session. This works
because with Spark on Kubernetes, the Spark executors are running in the same cluster and
have access to the same underlying filesystem. Include the following code when creating the
Spark session:install.packages("vctrs")
print(packageVersion("vctrs")) # expect 0.6.5
...
config$spark.executorEnv.R_LIBS="/home/cdsw/.local/lib/R/4.3/library"
config$spark.executorEnv.R_LIBS_USER="/home/cdsw/.local/lib/R/4.3/library"
config$spark.executorEnv.R_LIBS_SITE="/opt/cmladdons/r/libs"
...
spark_apply(test_fx, packages = TRUE) # true by default