Use JDBC Connection with SparklyR

Obtain the JDBC connection string, as described above, and paste it into the script where the “jdbc” string is shown. You will also need to insert your user name and password, or create environment variables for holding those values.

In your session, open the workbench and add the following code.

# Run this once
#install.packages("sparklyr")
library(sparklyr)

config <- spark_config()
                
config$spark.security.credentials.hiveserver2.enabled="false"
config$spark.datasource.hive.warehouse.read.via.llap="false"
config$spark.datasource.hive.warehouse.read.jdbc.mode="client"
config$spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hs2-aws-2-hive-viz.env-j2ln9x.dw.ylcu-atmi.cloudera.site/default;transportMode=http;httpPath=cliservice;ssl=true;retries=3;user=<username>;password=<password>"

sc <- spark_connect(config = config)
                    
ss <- spark_session(sc)
hive <- invoke_static(sc,"com.hortonworks.hwc.HiveWarehouseSession","session",ss)%>%invoke("build")
                            
df <- invoke(hive,"execute","select * from default.foo")
sparklyr::sdf_collect(df)
    
spark_disconnect(sc)