Enabling the Intel MKL library
This procedure shows how to use Cloudera Manager to enable the Intel MKL math library to accelerate Spark ML applications.
By: Zuling Kang, Senior Solutions Architect at Cloudera, Inc.
-
Intel provides the MKL native library as a Cloudera Manager parcel on its website. You can add it as a remote parcel repository in Cloudera Manager. Then you can download the library and activate it:
- In Cloudera Manager, navigate to .
- Select Configuration.
-
In the section, Remote Parcel Repository URLs, click the plus sign and add the following URL:
http://parcels.repos.intel.com/mkl/latest
- Click Save Changes, and then you are returned to the page that lists available parcels.
-
Click Download for the
mkl
parcel:
- Click Distribute, and when it finishes distributing to the hosts on your cluster, click Activate.
-
The MKL parcel is only composed of Linux shared library files (
.so
files), so to make it accessible to the JVM, a JNI wrapper has to be made. To make the wrapper, use the following MKL wrapper parcel. Use the same procedure described in Step 1 to add the following link to the Cloudera Manager parcel configuration page, download the parcel, distribute it among the hosts and then activate it:https://raw.githubusercontent.com/Intel-bigdata/mkl-wrappers-parcel-repo/master/
- Restart the corresponding CDH services as guided by Cloudera Manager, and redeploy the client configuration if needed.
-
In Cloudera Manager, add the following configuration information into the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf:
spark.driver.extraJavaOptions=-Dcom.github.fommil.netlib.BLAS=com.intel.mkl.MKLBLAS -Dcom.github.fommil.netlib.LAPACK=com.intel.mkl.MKLLAPACK spark.driver.extraClassPath=/opt/cloudera/parcels/mkl_wrapper_parcel/lib/java/mkl_wrapper.jar spark.driverEnv.MKL_VERBOSE=1 spark.executor.extraJavaOptions=-Dcom.github.fommil.netlib.BLAS=com.intel.mkl.MKLBLAS -Dcom.github.fommil.netlib.LAPACK=com.intel.mkl.MKLLAPACK spark.executor.extraClassPath=/opt/cloudera/parcels/mkl_wrapper_parcel/lib/java/mkl_wrapper.jar spark.executorEnv.MKL_VERBOSE=1
This configuration information instructs the Spark application to load the MKL wrapper and use MKL as the default native library for Spark ML.
-
Open the Spark shell again to verify the native library, and you should see the following output:
scala> import com.github.fommil.netlib.BLAS import com.github.fommil.netlib.BLAS scala> println(BLAS.getInstance().getClass().getName()) com.intel.mkl.MKLBLAS