Using Spark 3 from R

R users can access Spark 3 using sparklyr. Although Cloudera does not ship or support sparklyr, we do recommend using sparklyr as the R interface for Cloudera AI.

The spark_apply() function requires the R Runtime environment to be pre-installed on your cluster. This will likely require intervention from your cluster administrator. For details, refer the RStudio documentation.

Install the latest version of sparklyr:
```
install.packages("sparklyr")
```

Optionally, connect to a local or remote Spark 2 cluster:

## Connecting to Spark 3
# Connect to an existing Spark 3 cluster using the spark_connect function.
library(sparklyr)
system.time(sc <- spark_connect())
# The returned Spark 3 connection (sc) provides a remote dplyr data source to the Spark 3 cluster.

For a complete example, see Importing Data into Cloudera AI.

Using Spark 3 from R

We want your opinion

How can we improve this page?