XUtils

mlpack

A scalable c++ machine learning library. [LGPLv3] [website](http://www.mlpack.org/)


1. Citation details

If you use mlpack in your research or software, please cite mlpack using the citation below (given in BibTeX format):

@article{mlpack2023,
    title     = {mlpack 4: a fast, header-only C++ machine learning library},
    author    = {Ryan R. Curtin and Marcus Edel and Omar Shrit and 
                 Shubham Agrawal and Suryoday Basak and James J. Balamuta and 
                 Ryan Birmingham and Kartik Dutt and Dirk Eddelbuettel and 
                 Rishabh Garg and Shikhar Jaiswal and Aakash Kaushik and 
                 Sangyeon Kim and Anjishnu Mukherjee and Nanubala Gnana Sai and 
                 Nippun Sharma and Yashwant Singh Parihar and Roshan Swain and 
                 Conrad Sanderson},
    journal   = {Journal of Open Source Software},
    volume    = {8},
    number    = {82},
    pages     = {5026},
    year      = {2023},
    doi       = {10.21105/joss.05026},
    url       = {https://doi.org/10.21105/joss.05026}
}

Citations are beneficial for the growth and improvement of mlpack.

3.1.a. Linking with autodownloaded Armadillo

When the autodownloader is used to download Armadillo (-DDOWNLOAD_DEPENDENCIES=ON), the Armadillo runtime library is not built and Armadillo must be used in header-only mode. The autodownloader also does not download dependencies of Armadillo such as OpenBLAS. For this reason, it is recommended to instead install Armadillo using your system package manager, which will also install the dependencies of Armadillo. For example, on Ubuntu and Debian systems, Armadillo can be installed with

sudo apt-get install libarmadillo-dev

and other package managers such as dnf and brew and pacman also have Armadillo packages available.

If the autodownloader is used to provide Armadillo, mlpack programs cannot be linked with -larmadillo. Instead, you must link directly with the dependencies of Armadillo. For example, on a system that has OpenBLAS available, compilation can be done like this:

g++ -O3 -std=c++17 -o my_program my_program.cpp -lopenblas -fopenmp

See the Armadillo documentation for more information on linking Armadillo programs.

3.2. Reducing compile time

mlpack is a template-heavy library, and if care is not used, compilation time of a project can be increased greatly. Fortunately, there are a number of ways to reduce compilation time:

  • Include individual headers, like <mlpack/methods/decision_tree.hpp>, if you are only using one component, instead of <mlpack.hpp>. This reduces the amount of work the compiler has to do.

  • Only use the MLPACK_ENABLE_ANN_SERIALIZATION definition if you are serializing neural networks in your code. When this define is enabled, compilation time will increase significantly, as the compiler must generate code for every possible type of layer. (The large amount of extra compilation overhead is why this is not enabled by default.)

  • If you are using mlpack in multiple .cpp files, consider using extern templates so that the compiler only instantiates each template once; add an explicit template instantiation for each mlpack template type you want to use in a .cpp file, and then use extern definitions elsewhere to let the compiler know it exists in a different file.

Other strategies exist too, such as precompiled headers, compiler options, ccache, and others.

4. Building mlpack bindings to other languages

mlpack is not just a header-only library: it also comes with bindings to a number of other languages, this allows flexible use of mlpack’s efficient implementations from languages that aren’t C++.

In general, you should not need to build these by hand—they should be provided by either your system package manager or your language’s package manager.

Building the bindings for a particular language is done by calling cmake with different options; each example below shows how to configure an individual set of bindings, but it is of course possible to combine the options and build bindings for many languages at once.

4.i. Command-line programs

See also the command-line quickstart.

The command-line programs have no extra dependencies. The set of programs that will be compiled is detailed and documented on the command-line program documentation page.

From the root of the mlpack sources, run the following commands to build and install the command-line bindings:

mkdir build && cd build/
cmake -DBUILD_CLI_PROGRAMS=ON ../
make
sudo make install

You can use make -j<N>, where N is the number of cores on your machine, to build in parallel; e.g., make -j4 will use 4 cores to build.

4.ii. Python bindings

See also the Python quickstart.

mlpack’s Python bindings are available on PyPI and conda-forge, and can be installed with either pip install mlpack or conda install -c conda-forge mlpack. These sources are recommended, as building the Python bindings by hand can be complex.

With that in mind, if you would still like to manually build the mlpack Python bindings, first make sure that the following Python packages are installed:

  • setuptools
  • wheel
  • cython >= 0.24
  • numpy
  • pandas >= 0.15.0

Now, from the root of the mlpack sources, run the following commands to build and install the Python bindings:

mkdir build && cd build/
cmake -DBUILD_PYTHON_BINDINGS=ON ../
make
sudo make install

You can use make -j<N>, where N is the number of cores on your machine, to build in parallel; e.g., make -j4 will use 4 cores to build. You can also specify a custom Python interpreter with the CMake option -DPYTHON_EXECUTABLE=/path/to/python.

4.iii. R bindings

See also the R quickstart.

mlpack’s R bindings are available as the R package mlpack on CRAN. You can install the package by running install.packages('mlpack'), and this is the recommended way of getting mlpack in R.

If you still wish to build the R bindings by hand, first make sure the following dependencies are installed:

  • R >= 4.0
  • Rcpp >= 0.12.12
  • RcppArmadillo >= 0.9.800.0
  • RcppEnsmallen >= 0.2.10.0
  • roxygen2
  • testthat
  • pkgbuild

These can be installed with install.packages() inside of your R environment. Once the dependencies are available, you can configure mlpack and build the R bindings by running the following commands from the root of the mlpack sources:

mkdir build && cd build/
cmake -DBUILD_R_BINDINGS=ON ../
make
sudo make install

You may need to specify the location of the R program in the cmake command with the option -DR_EXECUTABLE=/path/to/R.

Once the build is complete, a tarball can be found under the build directory in src/mlpack/bindings/R/, and then that can be installed into your R environment with a command like install.packages(mlpack_3.4.3.tar.gz, repos=NULL, type='source').

4.v. Go bindings

See also the Go quickstart.

To build mlpack’s Go bindings, ensure that Go >= 1.11.0 is installed, and that the Gonum package is available. You can use go get to install mlpack for Go:

go get -u -d mlpack.org/v1/mlpack
cd ${GOPATH}/src/mlpack.org/v1/mlpack
make install

The process of building the Go bindings by hand is a little tedious, so following the steps above is recommended. However, if you wish to build the Go bindings by hand anyway, you can do this by running the following commands from the root of the mlpack sources:

mkdir build && cd build/
cmake -DBUILD_GO_BINDINGS=ON ../
make
sudo make install

5. Building mlpack’s test suite

mlpack contains an extensive test suite that exercises every part of the codebase. It is easy to build and run the tests with CMake and CTest, as below:

mkdir build && cd build/
cmake -DBUILD_TESTS=ON ../
make
ctest .

If you want to test the bindings, too, you will have to adapt the CMake configuration command to turn on the language bindings that you want to test—see the previous sections for details.

6. Further Resources

More documentation is available for both users and developers.

User documentation:

Tutorials:

Developer documentation:

To learn about the development goals of mlpack in the short- and medium-term future, see the vision document.

If you have problems, find a bug, or need help, you can try visiting the mlpack help page, or mlpack on Github. Alternately, mlpack help can be found on Matrix at #mlpack; see also the community page.


Articles

  • coming soon...