How to use Weka in your Java code

Indroduction

Weka is a standard Java tool for performing both machine learning experiments and for embedding trained models in Java applications. It can be used for supervised and unsupervised learning. There are three ways to use Weka first using command line, second using Weka GUI, and third through its API with Java. Weka's library provides a large collection of machine learning algorithms, implemented in Java.
The Objective of this post is to explain how to generate a model from ARFF data file and how to classify a new instance with this model using Weka API.

Requirements:

  1. Java Development Kit (JDK), you can download from http://www.oracle.com/technetwork/java/javase/downloads/index.html.
  2. Netbeans, you can download from https://netbeans.org/downloads.
  3. Source Code, you can download from https://github.com/emara-geek/weka-example.
  4. Weka library. You can embed it to your java code via maven. Just insert these line in your pom file.
<dependency>

          <groupId>nz.ac.waikato.cms.weka</groupId>

          <artifactId>weka-stable</artifactId>

          <version>3.8.0</version>

</dependency>

 

Dataset

Weka uses a data file format called ARFF (Attribute-Relation File Format). It is a file consists of a list of all the instances, with the attribute values for each instance being separated by commas.
I will use Iris 2D dataset in this example. It has three attributes petallength, petalwidth, and class (Iris-setosa, Iris-versicolor, and Iris-virginica). Our objective is to generate a model to correctly classify any new instance with petallength and petalwidth attributes to Iris-setosa class, Iris-versicolor class, or Iris-virginica class.

@relation iris-weka.filters.unsupervised.attribute.Remove-R1-2

@attribute petallength numeric
@attribute petalwidth numeric
@attribute class {Iris-setosa,Iris-versicolor,Iris-virginica}

@data
1.4,0.2,Iris-setosa
4.7,1.4,Iris-versicolor
5,1.5,Iris-virginica

Generating a model

In this stage we will generate a model using MultilayerPerceptron (Neural network) to classify iris 2D dataset. I used the default values for neural network learning process, of course, you can change them manually through setter methods.

This is done by ModelGenerator class which has four methods as described in the next table.

Name Function
loadDataset Loading dataset from ARFF file and save it to Instances object
buildClassifier Building classifier for training set using MultilayerPerceptron (Neural network)
evaluateModel Evaluating the accuracy for the generated model with test set
saveModel Saving the generated model to a path to use it for future prediction

 

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com

package com.emaraic.ml;

import java.util.logging.Level;
import java.util.logging.Logger;
import weka.classifiers.Classifier;
import weka.classifiers.evaluation.Evaluation;
import weka.classifiers.functions.MultilayerPerceptron;
import weka.core.Instances;
import weka.core.SerializationHelper;
import weka.core.converters.ConverterUtils.DataSource;

/**
 *
 * @author Taha Emara 
 * Website: http://www.emaraic.com 
 * Email : taha@emaraic.com
 * Created on: Jun 28, 2017
 * Github link: https://github.com/emara-geek/weka-example
 */
public class ModelGenerator {

    public Instances loadDataset(String path) {
        Instances dataset = null;
        try {
            dataset = DataSource.read(path);
            if (dataset.classIndex() == -1) {
                dataset.setClassIndex(dataset.numAttributes() - 1);
            }
        } catch (Exception ex) {
            Logger.getLogger(ModelGenerator.class.getName()).log(Level.SEVERE, null, ex);
        }

        return dataset;
    }

    public Classifier buildClassifier(Instances traindataset) {
        MultilayerPerceptron m = new MultilayerPerceptron();
        
        //m.setGUI(true);
        //m.setValidationSetSize(0);
        //m.setBatchSize("100");
        //m.setLearningRate(0.3);
        //m.setSeed(0);
        //m.setMomentum(0.2);
        //m.setTrainingTime(500);//epochs
        //m.setNormalizeAttributes(true);
        
        /*Multipreceptron parameters and its default values 
        *Learning Rate for the backpropagation algorithm (Value should be between 0 - 1, Default = 0.3).
        *m.setLearningRate(0);
        
	*Momentum Rate for the backpropagation algorithm (Value should be between 0 - 1, Default = 0.2).
	*m.setMomentum(0);
        
        *Number of epochs to train through (Default = 500).
        *m.setTrainingTime(0)
        
	*Percentage size of validation set to use to terminate training (if this is non zero it can pre-empt num of epochs.
	 (Value should be between 0 - 100, Default = 0).
        *m.setValidationSetSize(0);
        
	*The value used to seed the random number generator (Value should be >= 0 and and a long, Default = 0).
        *m.setSeed(0);
        
        *The hidden layers to be created for the network(Value should be a list of comma separated Natural 
	numbers or the letters 'a' = (attribs + classes) / 2, 
	'i' = attribs, 'o' = classes, 't' = attribs .+ classes) for wildcard values, Default = a).
         *m.setHiddenLayers("2,3,3"); three hidden layer with 2 nodes in first layer and 3 nodends in second and 3 nodes in the third.
        
        *The desired batch size for batch prediction  (default 100).
        *m.setBatchSize("1");
         */
        try {
            m.buildClassifier(traindataset);

        } catch (Exception ex) {
            Logger.getLogger(ModelGenerator.class.getName()).log(Level.SEVERE, null, ex);
        }
        return m;
    }

    public String evaluateModel(Classifier model, Instances traindataset, Instances testdataset) {
        Evaluation eval = null;
        try {
            // Evaluate classifier with test dataset
            eval = new Evaluation(traindataset);
            eval.evaluateModel(model, testdataset);
        } catch (Exception ex) {
            Logger.getLogger(ModelGenerator.class.getName()).log(Level.SEVERE, null, ex);
        }
        return eval.toSummaryString("", true);
    }

    public void saveModel(Classifier model, String modelpath) {

        try {
            SerializationHelper.write(modelpath, model);
        } catch (Exception ex) {
            Logger.getLogger(ModelGenerator.class.getName()).log(Level.SEVERE, null, ex);
        }
    }

}

 

Classification using the generated model

In this stage we will use the generated model with ModelGenerator class to classify a new instance. This is done by ModelClassifier class.

 

package com.emaraic.ml;

import java.util.ArrayList;
import java.util.logging.Level;
import java.util.logging.Logger;
import weka.classifiers.Classifier;
import weka.classifiers.functions.MultilayerPerceptron;
import weka.core.Attribute;
import weka.core.DenseInstance;
import weka.core.Instances;
import weka.core.SerializationHelper;

/**
 * This is a classifier for iris.2D.arff dataset  
 * @author Taha Emara 
 * Website: http://www.emaraic.com 
 * Email  : taha@emaraic.com
 * Created on: Jul 1, 2017
 * Github link: https://github.com/emara-geek/weka-example
 */
public class ModelClassifier {

    private Attribute petallength;
    private Attribute petalwidth;

    private ArrayList attributes;
    private ArrayList classVal;
    private Instances dataRaw;


    public ModelClassifier() {
        petallength = new Attribute("petallength");
        petalwidth = new Attribute("petalwidth");
        attributes = new ArrayList();
        classVal = new ArrayList();
        classVal.add("Iris-setosa");
        classVal.add("Iris-versicolor");
        classVal.add("Iris-virginica");

        attributes.add(petallength);
        attributes.add(petalwidth);

        attributes.add(new Attribute("class", classVal));
        dataRaw = new Instances("TestInstances", attributes, 0);
        dataRaw.setClassIndex(dataRaw.numAttributes() - 1);
    }

    
    public Instances createInstance(double petallength, double petalwidth, double result) {
        dataRaw.clear();
        double[] instanceValue1 = new double[]{petallength, petalwidth, 0};
        dataRaw.add(new DenseInstance(1.0, instanceValue1));
        return dataRaw;
    }


    public String classifiy(Instances insts, String path) {
        String result = "Not classified!!";
        Classifier cls = null;
        try {
            cls = (MultilayerPerceptron) SerializationHelper.read(path);
            result = classVal.get((int) cls.classifyInstance(insts.firstInstance()));
        } catch (Exception ex) {
            Logger.getLogger(ModelClassifier.class.getName()).log(Level.SEVERE, null, ex);
        }
        return result;
    }


    public Instances getInstance() {
        return dataRaw;
    }
    

}

 

Test

In Test class, I provide a complete example for using ModelGenerator and ModelClassifier classes to generate a model and use it for future prediction.

 

import com.emaraic.ml.ModelClassifier;
import com.emaraic.ml.ModelGenerator;
import weka.classifiers.functions.MultilayerPerceptron;
import weka.core.Debug;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Normalize;

/**
 *
 * @author Taha Emara 
 * Website: http://www.emaraic.com 
 * Email : taha@emaraic.com
 * Created on: Jul 1, 2017
 * Github link: https://github.com/emara-geek/weka-example
 */
public class Test {

    public static final String DATASETPATH = "/Users/Emaraic/Temp/ml/iris.2D.arff";
    public static final String MODElPATH = "/Users/Emaraic/Temp/ml/model.bin";

    public static void main(String[] args) throws Exception {
        
        ModelGenerator mg = new ModelGenerator();

        Instances dataset = mg.loadDataset(DATASETPATH);

        Filter filter = new Normalize();

        // divide dataset to train dataset 80% and test dataset 20%
        int trainSize = (int) Math.round(dataset.numInstances() * 0.8);
        int testSize = dataset.numInstances() - trainSize;

        dataset.randomize(new Debug.Random(1));// if you comment this line the accuracy of the model will be droped from 96.6% to 80%
        
        //Normalize dataset
        filter.setInputFormat(dataset);
        Instances datasetnor = Filter.useFilter(dataset, filter);

        Instances traindataset = new Instances(datasetnor, 0, trainSize);
        Instances testdataset = new Instances(datasetnor, trainSize, testSize);

        // build classifier with train dataset             
        MultilayerPerceptron ann = (MultilayerPerceptron) mg.buildClassifier(traindataset);

        // Evaluate classifier with test dataset
        String evalsummary = mg.evaluateModel(ann, traindataset, testdataset);
        System.out.println("Evaluation: " + evalsummary);

        //Save model 
        mg.saveModel(ann, MODElPATH);

        //classifiy a single instance 
        ModelClassifier cls = new ModelClassifier();
        String classname =cls.classifiy(Filter.useFilter(cls.createInstance(1.6, 0.2, 0), filter), MODElPATH);
        System.out.println("\n The class name for the instance with petallength = 1.6 and petalwidth =0.2 is  " +classname);

    }

}

 

Output

Evaluation: 
Correctly Classified Instances          29               96.6667 %
Incorrectly Classified Instances         1                3.3333 %
Kappa statistic                          0.9497
K&B Relative Info Score               2783.763  %
K&B Information Score                   44.1136 bits      1.4705 bits/instance
Class complexity | order 0              47.6278 bits      1.5876 bits/instance
Class complexity | scheme                3.5142 bits      0.1171 bits/instance
Complexity improvement     (Sf)         44.1136 bits      1.4705 bits/instance
Mean absolute error                      0.046 
Root mean squared error                  0.1051
Relative absolute error                 10.3365 %
Root relative squared error             22.2694 %
Total Number of Instances               30     


The class name for the instance with petallength = 1.6 and petalwidth =0.2 is  Iris-setosa
본 웹사이트는 광고를 포함하고 있습니다.
광고 클릭에서 발생하는 수익금은 모두 웹사이트 서버의 유지 및 관리, 그리고 기술 콘텐츠 향상을 위해 쓰여집니다.
번호 제목 글쓴이 날짜 조회 수
17 [java 인공지능] 오라클, 자바 머신러닝 라이브러리 ‘트리뷰오’ 오픈소스로 공개 졸리운_곰 2023.08.27 27
16 [java 인공지능] 자바를 위한 머신 러닝 라이브러리 졸리운_곰 2023.08.27 21
15 [Java 인공지능] 오라클, 자바 머신러닝 라이브러리 ‘트리뷰오’ 오픈소스로 공개 file 졸리운_곰 2023.08.13 24
14 [java 인공지능] [java] 라이프 게임 (life game) file 졸리운_곰 2021.10.19 91
» How to use Weka in your Java code 졸리운_곰 2020.02.01 138
12 weka and java eclipse example : A Simple Machine Learning Example in Java file 졸리운_곰 2020.01.31 69
11 머신러닝? weka file 졸리운_곰 2020.01.31 125
10 [Weka] Weka를 이용한 Iris 데이터 머신러닝 file 졸리운_곰 2020.01.30 123
9 [강좌] WEKA 사용법 (간단한 분류, 의사결정트리 분석 설명) file 졸리운_곰 2020.01.30 118
8 [JESS] Jess , 이클립스 연동 file 졸리운_곰 2019.12.22 50
7 Jess 간단한 문법 요약 졸리운_곰 2019.12.22 71
6 Jess 6.1 다운로드 friedman-hill_src_1_jess_se file 졸리운_곰 2019.12.22 16
5 java artificial intelligence Rule Engine Jess Working Memory 졸리운_곰 2019.12.22 104
4 Defining Functions in Jess 졸리운_곰 2019.12.22 121
3 Jess Language Basics 졸리운_곰 2019.12.22 52
2 Embedding Jess in a Java Application 졸리운_곰 2019.12.22 33
1 다섯개의 탑 자바로 머신러닝 라이브러리 Top 5 machine learning libraries for Java file 졸리운_곰 2017.08.22 161
대표 김성준 주소 : 경기 용인 분당수지 U타워 등록번호 : 142-07-27414
통신판매업 신고 : 제2012-용인수지-0185호 출판업 신고 : 수지구청 제 123호 개인정보보호최고책임자 : 김성준 sjkim70@stechstar.com
대표전화 : 010-4589-2193 [fax] 02-6280-1294 COPYRIGHT(C) stechstar.com ALL RIGHTS RESERVED