You are on page 1of 173

 WEKA: A Machine

Machine Learning with Learning Toolkit


WEKA  The Explorer
• Classification and
Regression
• Clustering
Eibe Frank • Association Rules
• Attribute Selection
Department of Computer Science,
University of Waikato, New Zealand • Data Visualization
 The Experimenter
 The Knowledge
Flow GUI
 Conclusions
WEKA: the bird

Copyright: Martin Kramer (mkramer@wxs.nl)


12/8/2008 University of Waikato 2
WEKA: the software
 Machine learning/data mining software written in
Java (distributed under the GNU Public License)
 Used for research, education, and applications
 Complements “Data Mining” by Witten & Frank
 Main features:
 Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
 Graphical user interfaces (incl. data visualization)

 Environment for comparing learning algorithms

12/8/2008 University of Waikato 3


WEKA: versions
 There are several versions of WEKA:
 WEKA 3.0: “book version” compatible with
description in data mining book
 WEKA 3.2: “GUI version” adds graphical user
interfaces (book version is command-line only)
 WEKA 3.3: “development version” with lots of
improvements
 This talk is based on the latest snapshot of WEKA
3.3 (soon to be WEKA 3.4)

12/8/2008 University of Waikato 4


WEKA only deals with “flat” files
@relation heart-disease-simplified

@attribute age numeric


@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}

@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
12/8/2008 University of Waikato 5
WEKA only deals with “flat” files
@relation heart-disease-simplified

@attribute age numeric


@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}

@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
12/8/2008 University of Waikato 6
12/8/2008 University of Waikato 7
12/8/2008 University of Waikato 8
12/8/2008 University of Waikato 9
Explorer: pre-processing the data
 Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
 Data can also be read from a URL or from an SQL
database (using JDBC)
 Pre-processing tools in WEKA are called “filters”
 WEKA contains filters for:
 Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …

12/8/2008 University of Waikato 10


12/8/2008 University of Waikato 11
12/8/2008 University of Waikato 12
12/8/2008 University of Waikato 13
12/8/2008 University of Waikato 14
12/8/2008 University of Waikato 15
12/8/2008 University of Waikato 16
12/8/2008 University of Waikato 17
12/8/2008 University of Waikato 18
12/8/2008 University of Waikato 19
12/8/2008 University of Waikato 20
12/8/2008 University of Waikato 21
12/8/2008 University of Waikato 22
12/8/2008 University of Waikato 23
12/8/2008 University of Waikato 24
12/8/2008 University of Waikato 25
12/8/2008 University of Waikato 26
12/8/2008 University of Waikato 27
12/8/2008 University of Waikato 28
12/8/2008 University of Waikato 29
12/8/2008 University of Waikato 30
12/8/2008 University of Waikato 31
Explorer: building “classifiers”
 Classifiers in WEKA are models for predicting
nominal or numeric quantities
 Implemented learning schemes include:
 Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
 “Meta”-classifiers include:
 Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …

12/8/2008 University of Waikato 32


12/8/2008 University of Waikato 33
12/8/2008 University of Waikato 34
12/8/2008 University of Waikato 35
12/8/2008 University of Waikato 36
12/8/2008 University of Waikato 37
12/8/2008 University of Waikato 38
12/8/2008 University of Waikato 39
12/8/2008 University of Waikato 40
12/8/2008 University of Waikato 41
12/8/2008 University of Waikato 42
12/8/2008 University of Waikato 43
12/8/2008 University of Waikato 44
12/8/2008 University of Waikato 45
12/8/2008 University of Waikato 46
12/8/2008 University of Waikato 47
12/8/2008 University of Waikato 48
12/8/2008 University of Waikato 49
12/8/2008 University of Waikato 50
12/8/2008 University of Waikato 51
12/8/2008 University of Waikato 52
12/8/2008 University of Waikato 53
12/8/2008 University of Waikato 54
12/8/2008 University of Waikato 55
12/8/2008 University of Waikato 56
12/8/2008 University of Waikato 57
12/8/2008 University of Waikato 58
12/8/2008 University of Waikato 59
12/8/2008 University of Waikato 60
12/8/2008 University of Waikato 61
12/8/2008 University of Waikato 62
12/8/2008 University of Waikato 63
12/8/2008 University of Waikato 64
12/8/2008
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. University of Waikato 65
12/8/2008
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. University of Waikato 66
12/8/2008
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. University of Waikato 67
12/8/2008 University of Waikato 68
12/8/2008 University of Waikato 69
12/8/2008 University of Waikato 70
12/8/2008 University of Waikato 71
12/8/2008 University of Waikato 72
12/8/2008 University of Waikato 73
12/8/2008 University of Waikato 74
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

12/8/2008 University of Waikato 75


12/8/2008 University of Waikato 76
12/8/2008 University of Waikato 77
12/8/2008 University of Waikato 78
12/8/2008 University of Waikato 79
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

12/8/2008 University of Waikato 80


QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

12/8/2008 University of Waikato 81


12/8/2008 University of Waikato 82
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

12/8/2008 University of Waikato 83


12/8/2008 University of Waikato 84
12/8/2008 University of Waikato 85
12/8/2008 University of Waikato 86
12/8/2008 University of Waikato 87
12/8/2008 University of Waikato 88
12/8/2008 University of Waikato 89
12/8/2008 University of Waikato 90
12/8/2008 University of Waikato 91
Explorer: clustering data
 WEKA contains “clusterers” for finding groups of
similar instances in a dataset
 Implemented schemes are:
 k-Means, EM, Cobweb, X-means, FarthestFirst
 Clusters can be visualized and compared to “true”
clusters (if given)
 Evaluation based on loglikelihood if clustering
scheme produces a probability distribution

12/8/2008 University of Waikato 92


12/8/2008 University of Waikato 93
12/8/2008 University of Waikato 94
12/8/2008 University of Waikato 95
12/8/2008 University of Waikato 96
12/8/2008 University of Waikato 97
12/8/2008 University of Waikato 98
12/8/2008 University of Waikato 99
12/8/2008 University of Waikato 100
12/8/2008 University of Waikato 101
12/8/2008 University of Waikato 102
12/8/2008 University of Waikato 103
12/8/2008 University of Waikato 104
12/8/2008 University of Waikato 105
12/8/2008 University of Waikato 106
12/8/2008 University of Waikato 107
Explorer: finding associations
 WEKA contains an implementation of the Apriori
algorithm for learning association rules
 Works only with discrete data
 Can identify statistical dependencies between
groups of attributes:
 milk, butter ⇒ bread, eggs (with confidence 0.9 and
support 2000)
 Apriori can compute all rules that have a given
minimum support and exceed a given confidence

12/8/2008 University of Waikato 108


12/8/2008 University of Waikato 109
12/8/2008 University of Waikato 110
12/8/2008 University of Waikato 111
12/8/2008 University of Waikato 112
12/8/2008 University of Waikato 113
12/8/2008 University of Waikato 114
12/8/2008 University of Waikato 115
Explorer: attribute selection
 Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
 Attribute selection methods contain two parts:
 A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
 An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
 Very flexible: WEKA allows (almost) arbitrary
combinations of these two

12/8/2008 University of Waikato 116


12/8/2008 University of Waikato 117
12/8/2008 University of Waikato 118
12/8/2008 University of Waikato 119
12/8/2008 University of Waikato 120
12/8/2008 University of Waikato 121
12/8/2008 University of Waikato 122
12/8/2008 University of Waikato 123
12/8/2008 University of Waikato 124
Explorer: data visualization
 Visualization very useful in practice: e.g. helps to
determine difficulty of the learning problem
 WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)
 To do: rotating 3-d visualizations (Xgobi-style)
 Color-coded class values
 “Jitter” option to deal with nominal attributes (and
to detect “hidden” data points)
 “Zoom-in” function
12/8/2008 University of Waikato 125
12/8/2008 University of Waikato 126
12/8/2008 University of Waikato 127
12/8/2008 University of Waikato 128
12/8/2008 University of Waikato 129
12/8/2008 University of Waikato 130
12/8/2008 University of Waikato 131
12/8/2008 University of Waikato 132
12/8/2008 University of Waikato 133
12/8/2008 University of Waikato 134
12/8/2008 University of Waikato 135
12/8/2008 University of Waikato 136
12/8/2008 University of Waikato 137
Performing experiments
 Experimenter makes it easy to compare the
performance of different learning schemes
 For classification and regression problems
 Results can be written into file or database
 Evaluation options: cross-validation, learning
curve, hold-out
 Can also iterate over different parameter settings
 Significance-testing built in!

12/8/2008 University of Waikato 138


12/8/2008 University of Waikato 139
12/8/2008 University of Waikato 140
12/8/2008 University of Waikato 141
12/8/2008 University of Waikato 142
12/8/2008 University of Waikato 143
12/8/2008 University of Waikato 144
12/8/2008 University of Waikato 145
12/8/2008 University of Waikato 146
12/8/2008 University of Waikato 147
12/8/2008 University of Waikato 148
12/8/2008 University of Waikato 149
12/8/2008 University of Waikato 150
12/8/2008 University of Waikato 151
The Knowledge Flow GUI
 New graphical user interface for WEKA
 Java-Beans-based interface for setting up and
running machine learning experiments
 Data sources, classifiers, etc. are beans and can
be connected graphically
 Data “flows” through components: e.g.,
“data source” -> “filter” -> “classifier” -> “evaluator”
 Layouts can be saved and loaded again later

12/8/2008 University of Waikato 152


12/8/2008 University of Waikato 153
12/8/2008 University of Waikato 154
12/8/2008 University of Waikato 155
12/8/2008 University of Waikato 156
12/8/2008 University of Waikato 157
12/8/2008 University of Waikato 158
12/8/2008 University of Waikato 159
12/8/2008 University of Waikato 160
12/8/2008 University of Waikato 161
12/8/2008 University of Waikato 162
12/8/2008 University of Waikato 163
12/8/2008 University of Waikato 164
12/8/2008 University of Waikato 165
12/8/2008 University of Waikato 166
12/8/2008 University of Waikato 167
12/8/2008 University of Waikato 168
12/8/2008 University of Waikato 169
12/8/2008 University of Waikato 170
12/8/2008 University of Waikato 171
12/8/2008 University of Waikato 172
Conclusion: try it yourself!
 WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
 Also has a list of projects based on WEKA
 WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger
,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,
Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,
Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang

12/8/2008 University of Waikato 173

You might also like