[C/C++ 인공지능] Getting Started with mlpack

Getting Started with mlpack

I’ve recently needed to perform a benchmarking experiment with k-NN in C++, so I found mlpack as what appears to be a popular and high-performance machine learning library in C++.

I’m not a very strong Linux user (though I’m working on it!), so I actually had a lot of trouble getting up and going with mlpack, despite their documentation.

In this guide, I’ll cover the steps needed to get up and running, but also offer some explanation of what’s going in each. So if you’re already an expert at working with C++ libraries in Linux, you may find this post pretty boring :).

Downloading pre-compiled mlpack with the package manager

I’m currently running Ubuntu 16.04, so some of this may be Ubuntu-specific.

The Ubuntu package manager helps you get mlpack installed as well as any dependencies (and it appears that mlpack has a lot of dependencies on, e.g., other linear algebra libraries).

Note that the package manager is different from the “Ubuntu Software Center”. The software center is more like an app-store, and you won’t find mlpack there.

The name of the package is ‘libmlpack-dev’. This is going to install the mlpack libraries and header files for you–it does not include the source code for mlpack, which you don’t need if you’re just planning to reference the libraries. It also does not include any source examples. They provide a couple code examples as text on their website; to run these you would create your own .cpp file and paste in the code (but you also need to supply your own dataset! 0_o). More on example code later.

I found the package name a little confusing (why isn’t it just “mlpack”?), so here are some clarifications on the “lib” and “-dev” parts of the package name:

  • Dynamically-linked libraries like mlpack all have ‘lib’ prepended to their package name (like liblapack, libblas, etc.).
    • The Dynamic Linker in Linux (called “ld”) requires dynamically-linked libraries to have the form lib.so (Reference).
    • “.so” stands for “Shared Object”, and it’s analogous to DLLs on Windows.
  • The “-dev” suffix on the package name is a convention that indicates that this package contains libraries and header files that you can compile against, as opposed to just executable binaries. (Reference)

Another thing that confused me–how would you know the name of the package you want to install if all you know is that the project is called “mlpack”?

This page provides a nice tutorial (with a lot of detail) about how you can find packages and learn about them. Here’s the command that I would have found most helpful, though: apt-cache search 'mlpack'. Those single quotes around mlpack are actually wildcards–they allow it to match any package with mlpack anywhere in the name.

chrismcc@ubuntu:~$ apt-cache search 'mlpack'
libmlpack-dev - intuitive, fast, scalable C++ machine learning library (development libs)
libmlpack2 - intuitive, fast, scalable C++ machine learning library (runtime library)
mlpack-bin - intuitive, fast, scalable C++ machine learning library (binaries)
mlpack-doc - intuitive, fast, scalable C++ machine learning library (documentation)

Here’s what each of those packages provides.

  • libmlpack-dev - If you are going to write code that references the mlpack libraries, this is what you need.
  • libmlpack2 - If you’re not programming with mlpack, but you’re using an application that uses the mlpack libraries, then you’d just need this package with the “runtime library” (the dynamically-linked library).
  • mlpack-bin - The mlpack project actually includes command line tool versions of some of the machine learning algorithms it implements. So, for example, you could run k-means clustering on a dataset from the command line without doing any programming. This package contains those binaries.
  • mlpack-doc - Documentation for the libraries.

So to write our own code using the mlpack libraries, we just need libmlpack-dev. Grab it with the APT (Advanced Packaging Tool) package manager with the following command:

sudo apt-get install libmlpack-dev

This will install mlpack and all of the libraries it depends on. Except one, apparently–you’ll also need to install Boost:

sudo apt-get install libboost-all-dev

Maybe Boost was left out of the dependency list because it’s so commonly used? I don’t know.

Install location

Something that left me pretty confused from the installation was that I had no idea where mlpack was installed to. (Mainly, I wanted to know this because I assumed it would have installed some example source files for me somewhere, but I learned later that it doesn’t include any.)

To list out all of the files installed by mlpack, use this command:

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com

dpkg -L libmlpack-dev

There are some default locations for libraries in Linux, and that’s where you’ll find mlpack:

  • It installs lots of headers under /usr/include/mlpack/.
  • It installs the library file to /usr/lib/x86_64-linux-gnu/libmlpack.so

These default locations are already part of the path for gcc / g++, so you’re all set to #include the mlpack header files in your code!

What's "/usr/"? Is that like my User directory on Windows? (Answer: No.) Linux usually starts you out in your ‘home’ directory, e.g. /home/chrismcc/. This is where you find your personal files (documents, desktop, pictures, etc.). You can also refer to your home directory by just tilde ‘~’. This used to confuse me because I assumed ~ was the symbol for the root of the file system--it’s not! just ‘/’ is the root directory. /usr/ is a directory under root where installed software lives.

Compiling and Running an Example

As a first example, we’ll use the sample code from the mlpack site for doing a nearest neighbor search.

This very simple example takes a dataset of vectors and finds the nearest neighbor for each data point. It uses the dataset both as the reference and the query vectors.

I’ve taken their original example and added some detailed comments to explain what’s going on.

Relevant Documentation:

Save the following source code in a file called knn_example.cpp:

/*
 * ======== knn_example.cpp =========
 * This very simple example takes a dataset of vectors and finds the nearest 
 * neighbor for each data point. It uses the dataset both as the reference and
 * the query vectors.
 *
 * mlpack documentation is here:
 * http://www.mlpack.org/docs/mlpack-2.0.2/doxygen.php
 */

#include <mlpack/core.hpp>
#include <mlpack/methods/neighbor_search/neighbor_search.hpp>

using namespace mlpack;
using namespace mlpack::neighbor; // NeighborSearch and NearestNeighborSort
using namespace mlpack::metric; // ManhattanDistance

int main()
{
    // Armadillo is a C++ linear algebra library; mlpack uses its matrix data type.
    arma::mat data;
    
    /*
     * Load the data from a file. mlpack does not provide an example dataset, 
     * so I've included a tiny one.
     *
     * 'data' is a helper class in mlpack that facilitates saving and loading 
     * matrices and models.
     *
     * Pass the filename, matrix to hold the data, and set fatal = true to have
     * it throw an exception if there is an issue.
     *
     * 'Load' excepts comma-separated and tab-separated text files, and will 
     * infer the format.
     */
    data::Load("data.csv", data, true);
    
    /* 
     * Create a NeighborSearch model. The parameters of the model are specified
     * with templates:
     *  - Sorting method: "NearestNeighborSort" - This class sorts by increasing
     *    distance.
     *  - Distance metric: "ManhattanDistance" - The L1 distance, sum of absolute
     *    distances.
     *
     * Pass the reference dataset (the vectors to be searched through) to the
     * constructor.
     */
    NeighborSearch<NearestNeighborSort, ManhattanDistance> nn(data);
    
    /*
     * Create the matrices to hold the results of the search. 
     *   neighbors [k  x  n] - Indeces of the nearest neighbor(s). 
     *                         One column per data query vector and one row per
     *                        'k' neighbors.
     *   distances [k  x  n] - Calculated distance values.
     *                         One column per data query vector and one row per
     *                        'k' neighbors.
     */
    arma::Mat<size_t> neighbors;
    arma::mat distances; 
    
    /*
     * Find the nearest neighbors. 
     *
     * If no query vectors are provided (as is the case here), then the 
     * reference vectors are searched against themselves.
     *
     * Specify the number of neighbors to find, k = 1, and provide matrices
     * to hold the result indeces and distances.
     */ 
    nn.Search(1, neighbors, distances);
    
    // Print out each neighbor and its distance.
    for (size_t i = 0; i < neighbors.n_elem; ++i)
    {
        std::cout << "Nearest neighbor of point " << i << " is point "
        << neighbors[i] << " and the distance is " << distances[i] << ".\n";
    }
}

And save this toy dataset as data.csv:

3,3,3,3,0
3,4,4,3,0
3,4,4,3,0
3,3,4,3,0
3,6,4,3,0
2,4,4,3,0
2,4,4,1,0
3,3,3,2,0
3,4,4,2,0
3,4,4,2,0
3,3,4,2,0
3,6,4,2,0
2,4,4,2,0

To compile the example, you’ll use g++ (the C++ equivalent of gcc).

$ g++ knn_example.cpp -o knn_example -std=c++11 -larmadillo -lmlpack -lboost_serialization
  • knn_example.cpp - The code to compile.
  • -o knn_example - The binary (executable) to output.
  • -std=c++11 - mlpack documentation says you need to set this flag.
  • -larmadillo -lmlpack -lboost_serialization - The “-l” flag tells the linker to look for these libraries.
    • armadillo is a linear algebra library that mlpack uses.

Finally, to run the example, execute the binary:

$ ./knn_example
Don't leave out the "./"! In Windows, you can just type the name of an executable in the current directory and hit enter and it will run. In Linux, if you want to do the same you need to prepend the "./".

And you should see the following output:

Nearest neighbor of point 0 is point 7 and the distance is 1.
Nearest neighbor of point 1 is point 2 and the distance is 0.
Nearest neighbor of point 2 is point 1 and the distance is 0.
Nearest neighbor of point 3 is point 10 and the distance is 1.
Nearest neighbor of point 4 is point 11 and the distance is 1.
Nearest neighbor of point 5 is point 12 and the distance is 1.
Nearest neighbor of point 6 is point 12 and the distance is 1.
Nearest neighbor of point 7 is point 10 and the distance is 1.
Nearest neighbor of point 8 is point 9 and the distance is 0.
Nearest neighbor of point 9 is point 8 and the distance is 0.
Nearest neighbor of point 10 is point 9 and the distance is 1.
Nearest neighbor of point 11 is point 4 and the distance is 1.
Nearest neighbor of point 12 is point 9 and the distance is 1.

You’re up and running!

[출처] https://mccormickml.com/2017/02/01/getting-started-with-mlpack/

 

 

본 웹사이트는 광고를 포함하고 있습니다.
광고 클릭에서 발생하는 수익금은 모두 웹사이트 서버의 유지 및 관리, 그리고 기술 콘텐츠 향상을 위해 쓰여집니다.
번호 제목 글쓴이 날짜 조회 수
266 [C/C++ 자료구조] 아스키 코드표(ASCII Table) file 졸리운_곰 2023.08.16 8
265 [WSL2] WSL2 외부 접속 설정 file 졸리운_곰 2023.06.26 42
264 [WSL2] Mysql 자동실행 설정하기 file 졸리운_곰 2023.06.25 16
263 [visual studio] Bring Your MFC Application to the Web mfc 어플리케인 web 구동 file 졸리운_곰 2023.03.20 4
262 [C/C++][인터넷] [C++] Full-fledged client-server example with C++ REST SDK 2.10 졸리운_곰 2023.02.20 23
261 [Linux programming][turbo c] The libXbgi Library file 졸리운_곰 2023.02.05 9
260 [visual studio] [borland c] Using the WinBGIm Graphics Library with Visual Studio 2005/2008 2010 file 졸리운_곰 2023.02.05 4
259 [WSL2] 윈도우에서 linux 사용 (WSL 2), xwindows GUI 설정 file 졸리운_곰 2023.01.28 18
258 [WSL2] Windows 11의 WSL2에서 리눅스 X Window 응용프로그램 실행하기 file 졸리운_곰 2023.01.28 23
» [C/C++ 인공지능] Getting Started with mlpack 졸리운_곰 2023.01.28 12
256 [리눅스][linux] Ubuntu/Linux에서 user password를 짧거나 쉬운 password로 변경하는 방법. file 졸리운_곰 2023.01.08 6
255 [linux master][리눅스 마스터][국가기술자격] [GCP 원데이] 서버가 죽는 이유, Message Queue file 졸리운_곰 2023.01.01 12
254 [linux master][리눅스 마스터][국가기술자격] 리눅스 서버 다운 원인 5가지 졸리운_곰 2023.01.01 3
253 [WSL2] WSL에서의 Jupyter notebook 사용하기. file 졸리운_곰 2022.11.27 40
252 [docker] [Oracle] docker에 Oracle 11g 설치하기 file 졸리운_곰 2022.11.26 15
251 [linux master][리눅스 마스터][국가기술자격] Shell In A Box-원격 Linux 서버에 액세스하기위한 웹 기반 SSH 터미널 file 졸리운_곰 2022.11.17 4
250 [linux dev env] [우분투 서버] noVNC 접속 file 졸리운_곰 2022.11.16 3
249 [C/C++][인터넷] [C++] FTP Upload/Download 구현 클래스(매우 유용) file 졸리운_곰 2022.11.16 23
248 [리눅스 일반] ffmpeg에서 m4a 파일을 mp3 파일로 변환할때 생기는 오류에 관하여 file 졸리운_곰 2022.11.11 11
247 [C/C++ 언어일반] C/C++ 스마트 포인터 관련 file 졸리운_곰 2022.11.06 22
대표 김성준 주소 : 경기 용인 분당수지 U타워 등록번호 : 142-07-27414
통신판매업 신고 : 제2012-용인수지-0185호 출판업 신고 : 수지구청 제 123호 개인정보보호최고책임자 : 김성준 sjkim70@stechstar.com
대표전화 : 010-4589-2193 [fax] 02-6280-1294 COPYRIGHT(C) stechstar.com ALL RIGHTS RESERVED