Native math libraries for Spark ML

Spark’s MLlib uses the Breeze linear algebra package, which depends on netlib-java for optimized numerical processing. netlib-java is a wrapper for low-level BLAS, LAPACK, and ARPACK libraries.

By: Zuling Kang, Senior Solutions Architect at Cloudera, Inc.

Although Spark's MLlib can use these libraries, due to licensing issues with runtime proprietary binaries, neither the Cloudera distribution of Spark nor the community version of Apache Spark includes the netlib-java native proxies by default. So if you make no manual configuration, netlib-java only uses the F2J library, a Java-based math library that is translated from Fortran77 reference source code.

To check whether you are using native math libraries in Spark ML or the Java-based F2J, use the Spark shell to load and print the implementation library of netlib-java. For example, the following commands return information on the BLAS library and include that it is using F2J in the line, com.github.fommil.netlib.F2jBLAS, which is bolded below:

scala> import com.github.fommil.netlib.BLAS
import com.github.fommil.netlib.BLAS

scala> println(BLAS.getInstance().getClass().getName())
18/12/10 01:07:06 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
18/12/10 01:07:06 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS