Using Spark MLlib

Running a Spark MLlib Example

To try Spark MLlib using one of the Spark example applications, do the following:

  1. Download MovieLens sample data and copy it to HDFS:
    $ wget --no-check-certificate \
    https://raw.githubusercontent.com/apache/spark/branch-2.2/data/mllib/sample_movielens_data.txt
    $ hdfs dfs -copyFromLocal sample_movielens_data.txt /user/hdfs
  2. Run the Spark MLlib MovieLens example application, which calculates recommendations based on movie reviews:
    $ spark-submit --master local --class org.apache.spark.examples.mllib.MovieLensALS \
    SPARK_HOME/lib/spark-examples.jar \
    --rank 5 --numIterations 5 --lambda 1.0 --kryo sample_movielens_data.txt

Enabling Native Acceleration For MLlib

MLlib algorithms are compute intensive and benefit from hardware acceleration. To enable native acceleration for MLlib, perform the following tasks.

Install Required Software

  • Install the appropriate libgfortran 4.6+ package for your operating system. No compatible version is available for RHEL 6.
    OS Package Name Package Version
    RHEL 7.1 libgfortran 4.8.x
    SLES 11 SP3 libgfortran3 4.7.2
    Ubuntu 12.04 libgfortran3 4.6.3
    Ubuntu 14.04 libgfortran3 4.8.4
    Debian 7.1 libgfortran3 4.7.2
  • Install the GPL Extras parcel or package.

Verify Native Acceleration

You can verify that native acceleration is working by examining logs after running an application. To verify native acceleration with an MLlib example application:
  1. Do the steps in Running a Spark MLlib Example.
  2. Check the logs. If native libraries are not loaded successfully, you see the following four warnings before the final line, where the RMSE is printed:

    15/07/12 12:33:01 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
    15/07/12 12:33:01 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
    15/07/12 12:33:01 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
    15/07/12 12:33:01 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
    Test RMSE = 1.5378651281107205.

    You see this on a system with no libgfortran. The same error occurs after installing libgfortran on RHEL 6 because it installs version 4.4, not 4.6+.

    After installing libgfortran 4.8 on RHEL 7, you should see something like this:
    15/07/12 13:32:20 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
    15/07/12 13:32:20 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
    Test RMSE = 1.5329939324808561.