Enabling the Intel MKL library

This procedure shows how to use Cloudera Manager to enable the Intel MKL math library to accelerate Spark ML applications.

By: Zuling Kang, Senior Solutions Architect at Cloudera, Inc.

  1. Intel provides the MKL native library as a Cloudera Manager parcel on its website. You can add it as a remote parcel repository in Cloudera Manager. Then you can download the library and activate it:

    1. In Cloudera Manager, navigate to Hosts > Parcels.
    2. Select Configuration.
    3. In the section, Remote Parcel Repository URLs, click the plus sign and add the following URL:

    4. Click Save Changes, and then you are returned to the page that lists available parcels.
    5. Click Download for the mkl parcel:

    6. Click Distribute, and when it finishes distributing to the hosts on your cluster, click Activate.
  2. The MKL parcel is only composed of Linux shared library files (.so files), so to make it accessible to the JVM, a JNI wrapper has to be made. To make the wrapper, use the following MKL wrapper parcel. Use the same procedure described in Step 1 to add the following link to the Cloudera Manager parcel configuration page, download the parcel, distribute it among the hosts and then activate it:

  3. Restart the corresponding CDH services as guided by Cloudera Manager, and redeploy the client configuration if needed.
  4. In Cloudera Manager, add the following configuration information into the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf:

    spark.driver.extraJavaOptions=-Dcom.github.fommil.netlib.BLAS=com.intel.mkl.MKLBLAS -Dcom.github.fommil.netlib.LAPACK=com.intel.mkl.MKLLAPACK
    spark.executor.extraJavaOptions=-Dcom.github.fommil.netlib.BLAS=com.intel.mkl.MKLBLAS -Dcom.github.fommil.netlib.LAPACK=com.intel.mkl.MKLLAPACK

    This configuration information instructs the Spark application to load the MKL wrapper and use MKL as the default native library for Spark ML.

  5. Open the Spark shell again to verify the native library, and you should see the following output:

    scala> import com.github.fommil.netlib.BLAS
    import com.github.fommil.netlib.BLAS
    scala> println(BLAS.getInstance().getClass().getName())