JAVA 인공지능 How to use Weka in your Java code

2020.02.01 23:43

How to use Weka in your Java code

Indroduction

Weka is a standard Java tool for performing both machine learning experiments and for embedding trained models in Java applications. It can be used for supervised and unsupervised learning. There are three ways to use Weka first using command line, second using Weka GUI, and third through its API with Java. Weka's library provides a large collection of machine learning algorithms, implemented in Java.
The Objective of this post is to explain how to generate a model from ARFF data file and how to classify a new instance with this model using Weka API.

Requirements:

Java Development Kit (JDK), you can download from http://www.oracle.com/technetwork/java/javase/downloads/index.html.
Netbeans, you can download from https://netbeans.org/downloads.
Source Code, you can download from https://github.com/emara-geek/weka-example.
Weka library. You can embed it to your java code via maven. Just insert these line in your pom file.

<dependency>

          <groupId>nz.ac.waikato.cms.weka</groupId>

          <artifactId>weka-stable</artifactId>

          <version>3.8.0</version>

</dependency>

Dataset

Weka uses a data file format called ARFF (Attribute-Relation File Format). It is a file consists of a list of all the instances, with the attribute values for each instance being separated by commas.
I will use Iris 2D dataset in this example. It has three attributes petallength, petalwidth, and class (Iris-setosa, Iris-versicolor, and Iris-virginica). Our objective is to generate a model to correctly classify any new instance with petallength and petalwidth attributes to Iris-setosa class, Iris-versicolor class, or Iris-virginica class.

@relation iris-weka.filters.unsupervised.attribute.Remove-R1-2

@attribute petallength numeric
@attribute petalwidth numeric
@attribute class {Iris-setosa,Iris-versicolor,Iris-virginica}

@data
1.4,0.2,Iris-setosa
4.7,1.4,Iris-versicolor
5,1.5,Iris-virginica

Generating a model

In this stage we will generate a model using MultilayerPerceptron (Neural network) to classify iris 2D dataset. I used the default values for neural network learning process, of course, you can change them manually through setter methods.

This is done by ModelGenerator class which has four methods as described in the next table.

Name	Function
loadDataset	Loading dataset from ARFF file and save it to Instances object
buildClassifier	Building classifier for training set using MultilayerPerceptron (Neural network)
evaluateModel	Evaluating the accuracy for the generated model with test set
saveModel	Saving the generated model to a path to use it for future prediction

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com

package com.emaraic.ml;

import java.util.logging.Level;
import java.util.logging.Logger;
import weka.classifiers.Classifier;
import weka.classifiers.evaluation.Evaluation;
import weka.classifiers.functions.MultilayerPerceptron;
import weka.core.Instances;
import weka.core.SerializationHelper;
import weka.core.converters.ConverterUtils.DataSource;

/**
 *
 * @author Taha Emara 
 * Website: http://www.emaraic.com 
 * Email : taha@emaraic.com
 * Created on: Jun 28, 2017
 * Github link: https://github.com/emara-geek/weka-example
 */
public class ModelGenerator {

    public Instances loadDataset(String path) {
        Instances dataset = null;
        try {
            dataset = DataSource.read(path);
            if (dataset.classIndex() == -1) {
                dataset.setClassIndex(dataset.numAttributes() - 1);
            }
        } catch (Exception ex) {
            Logger.getLogger(ModelGenerator.class.getName()).log(Level.SEVERE, null, ex);
        }

        return dataset;
    }

    public Classifier buildClassifier(Instances traindataset) {
        MultilayerPerceptron m = new MultilayerPerceptron();
        
        //m.setGUI(true);
        //m.setValidationSetSize(0);
        //m.setBatchSize("100");
        //m.setLearningRate(0.3);
        //m.setSeed(0);
        //m.setMomentum(0.2);
        //m.setTrainingTime(500);//epochs
        //m.setNormalizeAttributes(true);
        
        /*Multipreceptron parameters and its default values 
        *Learning Rate for the backpropagation algorithm (Value should be between 0 - 1, Default = 0.3).
        *m.setLearningRate(0);
        
	*Momentum Rate for the backpropagation algorithm (Value should be between 0 - 1, Default = 0.2).
	*m.setMomentum(0);
        
        *Number of epochs to train through (Default = 500).
        *m.setTrainingTime(0)
        
	*Percentage size of validation set to use to terminate training (if this is non zero it can pre-empt num of epochs.
	 (Value should be between 0 - 100, Default = 0).
        *m.setValidationSetSize(0);
        
	*The value used to seed the random number generator (Value should be >= 0 and and a long, Default = 0).
        *m.setSeed(0);
        
        *The hidden layers to be created for the network(Value should be a list of comma separated Natural 
	numbers or the letters 'a' = (attribs + classes) / 2, 
	'i' = attribs, 'o' = classes, 't' = attribs .+ classes) for wildcard values, Default = a).
         *m.setHiddenLayers("2,3,3"); three hidden layer with 2 nodes in first layer and 3 nodends in second and 3 nodes in the third.
        
        *The desired batch size for batch prediction  (default 100).
        *m.setBatchSize("1");
         */
        try {
            m.buildClassifier(traindataset);

        } catch (Exception ex) {
            Logger.getLogger(ModelGenerator.class.getName()).log(Level.SEVERE, null, ex);
        }
        return m;
    }

    public String evaluateModel(Classifier model, Instances traindataset, Instances testdataset) {
        Evaluation eval = null;
        try {
            // Evaluate classifier with test dataset
            eval = new Evaluation(traindataset);
            eval.evaluateModel(model, testdataset);
        } catch (Exception ex) {
            Logger.getLogger(ModelGenerator.class.getName()).log(Level.SEVERE, null, ex);
        }
        return eval.toSummaryString("", true);
    }

    public void saveModel(Classifier model, String modelpath) {

        try {
            SerializationHelper.write(modelpath, model);
        } catch (Exception ex) {
            Logger.getLogger(ModelGenerator.class.getName()).log(Level.SEVERE, null, ex);
        }
    }

}

Classification using the generated model

In this stage we will use the generated model with ModelGenerator class to classify a new instance. This is done by ModelClassifier class.

package com.emaraic.ml;

import java.util.ArrayList;
import java.util.logging.Level;
import java.util.logging.Logger;
import weka.classifiers.Classifier;
import weka.classifiers.functions.MultilayerPerceptron;
import weka.core.Attribute;
import weka.core.DenseInstance;
import weka.core.Instances;
import weka.core.SerializationHelper;

/**
 * This is a classifier for iris.2D.arff dataset  
 * @author Taha Emara 
 * Website: http://www.emaraic.com 
 * Email  : taha@emaraic.com
 * Created on: Jul 1, 2017
 * Github link: https://github.com/emara-geek/weka-example
 */
public class ModelClassifier {

    private Attribute petallength;
    private Attribute petalwidth;

    private ArrayList attributes;
    private ArrayList classVal;
    private Instances dataRaw;


    public ModelClassifier() {
        petallength = new Attribute("petallength");
        petalwidth = new Attribute("petalwidth");
        attributes = new ArrayList();
        classVal = new ArrayList();
        classVal.add("Iris-setosa");
        classVal.add("Iris-versicolor");
        classVal.add("Iris-virginica");

        attributes.add(petallength);
        attributes.add(petalwidth);

        attributes.add(new Attribute("class", classVal));
        dataRaw = new Instances("TestInstances", attributes, 0);
        dataRaw.setClassIndex(dataRaw.numAttributes() - 1);
    }

    
    public Instances createInstance(double petallength, double petalwidth, double result) {
        dataRaw.clear();
        double[] instanceValue1 = new double[]{petallength, petalwidth, 0};
        dataRaw.add(new DenseInstance(1.0, instanceValue1));
        return dataRaw;
    }


    public String classifiy(Instances insts, String path) {
        String result = "Not classified!!";
        Classifier cls = null;
        try {
            cls = (MultilayerPerceptron) SerializationHelper.read(path);
            result = classVal.get((int) cls.classifyInstance(insts.firstInstance()));
        } catch (Exception ex) {
            Logger.getLogger(ModelClassifier.class.getName()).log(Level.SEVERE, null, ex);
        }
        return result;
    }


    public Instances getInstance() {
        return dataRaw;
    }
    

}

Test

In Test class, I provide a complete example for using ModelGenerator and ModelClassifier classes to generate a model and use it for future prediction.

import com.emaraic.ml.ModelClassifier;
import com.emaraic.ml.ModelGenerator;
import weka.classifiers.functions.MultilayerPerceptron;
import weka.core.Debug;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Normalize;

/**
 *
 * @author Taha Emara 
 * Website: http://www.emaraic.com 
 * Email : taha@emaraic.com
 * Created on: Jul 1, 2017
 * Github link: https://github.com/emara-geek/weka-example
 */
public class Test {

    public static final String DATASETPATH = "/Users/Emaraic/Temp/ml/iris.2D.arff";
    public static final String MODElPATH = "/Users/Emaraic/Temp/ml/model.bin";

    public static void main(String[] args) throws Exception {
        
        ModelGenerator mg = new ModelGenerator();

        Instances dataset = mg.loadDataset(DATASETPATH);

        Filter filter = new Normalize();

        // divide dataset to train dataset 80% and test dataset 20%
        int trainSize = (int) Math.round(dataset.numInstances() * 0.8);
        int testSize = dataset.numInstances() - trainSize;

        dataset.randomize(new Debug.Random(1));// if you comment this line the accuracy of the model will be droped from 96.6% to 80%
        
        //Normalize dataset
        filter.setInputFormat(dataset);
        Instances datasetnor = Filter.useFilter(dataset, filter);

        Instances traindataset = new Instances(datasetnor, 0, trainSize);
        Instances testdataset = new Instances(datasetnor, trainSize, testSize);

        // build classifier with train dataset             
        MultilayerPerceptron ann = (MultilayerPerceptron) mg.buildClassifier(traindataset);

        // Evaluate classifier with test dataset
        String evalsummary = mg.evaluateModel(ann, traindataset, testdataset);
        System.out.println("Evaluation: " + evalsummary);

        //Save model 
        mg.saveModel(ann, MODElPATH);

        //classifiy a single instance 
        ModelClassifier cls = new ModelClassifier();
        String classname =cls.classifiy(Filter.useFilter(cls.createInstance(1.6, 0.2, 0), filter), MODElPATH);
        System.out.println("\n The class name for the instance with petallength = 1.6 and petalwidth =0.2 is  " +classname);

    }

}

Output

Evaluation: 
Correctly Classified Instances          29               96.6667 %
Incorrectly Classified Instances         1                3.3333 %
Kappa statistic                          0.9497
K&B Relative Info Score               2783.763  %
K&B Information Score                   44.1136 bits      1.4705 bits/instance
Class complexity | order 0              47.6278 bits      1.5876 bits/instance
Class complexity | scheme                3.5142 bits      0.1171 bits/instance
Complexity improvement     (Sf)         44.1136 bits      1.4705 bits/instance
Mean absolute error                      0.046 
Root mean squared error                  0.1051
Relative absolute error                 10.3365 %
Root relative squared error             22.2694 %
Total Number of Instances               30     


The class name for the instance with petallength = 1.6 and petalwidth =0.2 is  Iris-setosa

[출처] http://www.emaraic.com/blog/weka-java-example

본 웹사이트는 광고를 포함하고 있습니다.
광고 클릭에서 발생하는 수익금은 모두 웹사이트 서버의 유지 및 관리, 그리고 기술 콘텐츠 향상을 위해 쓰여집니다.

이 게시물을

번호	제목	글쓴이	날짜	조회 수
18	[java 인공지능] Spring AI 로 ChatGPT API 만들기	졸리운_곰	2024.12.30	51
17	[java 인공지능] 오라클, 자바 머신러닝 라이브러리 ‘트리뷰오’ 오픈소스로 공개	졸리운_곰	2023.08.27	103
16	[java 인공지능] 자바를 위한 머신 러닝 라이브러리	졸리운_곰	2023.08.27	121
15	[Java 인공지능] 오라클, 자바 머신러닝 라이브러리 ‘트리뷰오’ 오픈소스로 공개	졸리운_곰	2023.08.13	116
14	[java 인공지능] [java] 라이프 게임 (life game)	졸리운_곰	2021.10.19	167
»	How to use Weka in your Java code	졸리운_곰	2020.02.01	186
12	weka and java eclipse example : A Simple Machine Learning Example in Java	졸리운_곰	2020.01.31	135
11	머신러닝? weka	졸리운_곰	2020.01.31	186
10	[Weka] Weka를 이용한 Iris 데이터 머신러닝	졸리운_곰	2020.01.30	211
9	[강좌] WEKA 사용법 (간단한 분류, 의사결정트리 분석 설명)	졸리운_곰	2020.01.30	213
8	[JESS] Jess , 이클립스 연동	졸리운_곰	2019.12.22	105
7	Jess 간단한 문법 요약	졸리운_곰	2019.12.22	120
6	Jess 6.1 다운로드 friedman-hill_src_1_jess_se	졸리운_곰	2019.12.22	55
5	java artificial intelligence Rule Engine Jess Working Memory	졸리운_곰	2019.12.22	176
4	Defining Functions in Jess	졸리운_곰	2019.12.22	211
3	Jess Language Basics	졸리운_곰	2019.12.22	132
2	Embedding Jess in a Java Application	졸리운_곰	2019.12.22	82
1	다섯개의 탑 자바로 머신러닝 라이브러리 Top 5 machine learning libraries for Java	졸리운_곰	2017.08.22	222

첫 페이지 1 끝 페이지

쓰기

태그