Setting Up an HTTP Proxy for Spark 2

In Cloudera Data Science Workbench clusters that use an HTTP proxy, follow these steps to support web-related actions in Spark. You must set the Spark configuration parameter extraJavaOptions on your gateway hosts.

To set up a Spark proxy:
  1. Log in to Cloudera Manager.
  2. Go to Spark2 > Configuration.
  3. Filter the properties with Scope > Gateway and Category > Advanced.
  4. Scroll down to Spark 2 Client Advanced Configuration Snippet (Safety Valve) for spark2-conf/spark-defaults.conf.
  5. Enter the following configuration code, substituting your proxy host and port values:
    spark.driver.extraJavaOptions= \
    -Dhttp.proxyHost=<YOUR HTTP PROXY HOST> \
    -Dhttp.proxyPort=<HTTP PORT> \
    -Dhttps.proxyHost=<YOUR HTTPS PROXY HOST> \
    -Dhttps.proxyPort=<HTTPS PORT>
  6. Click Save Changes.
  7. Choose Actions > Deploy Client Configuration.