Native math libraries for Spark ML
Spark’s MLlib uses the Breeze linear algebra package,
which depends on netlib-java
for optimized numerical
processing. netlib-java
is a wrapper for low-level
BLAS, LAPACK, and ARPACK libraries.
By: Zuling Kang, Senior Solutions Architect at Cloudera, Inc.
Although Spark's MLlib can use these libraries, due to licensing
issues with runtime proprietary binaries, neither the Cloudera
distribution of Spark nor the community version of Apache Spark includes
the netlib-java
native proxies by default. So if you
make no manual configuration, netlib-java
only uses the
F2J library, a Java-based math library that is translated from
Fortran77 reference source code.
To check whether you are using native math libraries in Spark ML or the
Java-based F2J, use the Spark shell to load and print the implementation
library of netlib-java
. For example, the following
commands return information on the BLAS library and include that it is
using F2J in the line, com.github.fommil.netlib.F2jBLAS
,
which is bolded below:
scala> import com.github.fommil.netlib.BLAS import com.github.fommil.netlib.BLAS scala> println(BLAS.getInstance().getClass().getName()) 18/12/10 01:07:06 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 18/12/10 01:07:06 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS com.github.fommil.netlib.F2jBLAS