Professional Documents
Culture Documents
Navneet Goyal
Introduction
If I tell you that you can achieve better
learns
Introduction
Majority of ML tasks fall under:
Supervised Learning (for eg. Classification)
Unsupervised Learning (for eg. Clustering &
Model Building)
Passive Learning
Introduction
One of the most resource intensive task
is gathering of data!
In most cases, we have limited resources
for collecting data
Try to make the best use of these
resources
Randomly collected data instances are
independent & identically distributed (iid)
Can we guide the sampling process?
Introduction
In most cases, data is abundantly
available
Mails, images, videos, songs, speeches,
documents, ratings, tweets, etc.
Which of these are different from others?
Mails & ratings
Labeled data is freely available
Others?
Labeled instances are very difficult, time
Introduction
Some Examples where labeled data is
hard to come by:
Speech Recognition
Document Classification
Image & Video annotation
Introduction
Speech Recognition
Accurate labeling of speech utterances is
extremely time consuming and requires
trained linguists
Annotation at the word level can take ten
times longer than the actual audio (e.g.,
one minute of speech takes ten minutes to
label), and annotating phonemes can take
400 times as long (e.g., nearly seven hours)
The problem is compounded for rare
languages or dialects
Labeling bottleneck
Active learning systems attempt to overcome
Introduction
Document classification
Large pool of unlabelled documents
available
Randomly pick documents to be
labeled manually
OR
Carefully choose (or query) from the
pool that are to be labeled
Introduction
Parameter estimation and structure discovery tasks
Studying lung cancer in a medical setting
preliminary list of the ages and smoking habits of
Active Learning
We need not fix our desired queries in
advance
Instead, we can choose our next query
based upon the answers to our
previous queries
The process of guiding the sampling
process by querying for certain types of
instances based upon the data that we
have seen so far is called active
learning
Active Learning
Active Learning
An interesting analogy!
A passive learner is a student that gathers
Active Learning
The core difference between an active
Active Learning
The key hypothesis is that if the
Active Learning
ML algorithms choose the training
Active Learning
Also called Query Learning in ML
Optimal Experiment Design in
Statistics
By querying unlabelled data
What kind of queries?
How queries are formulated?
Query strategy frameworks
decision stump)
Passive learner will be presented with n
labeled examples and will produce a
predictor that minimizes the number of
disagreements
That is, the learner could choose R
such that:
|{1 i n : f(xi) yi}| is minimum
*Algorithms for Active Learning
Daniel Joseph Hsu, Columbia Univ. Dissertation, 2010
make a query (label request) that results in labeling (for free) at least
half of the other unlabeled points. Viewed another way, the query
eliminates at least half of the potential classifiers still in contention.
We crucially made an assumption that the labels y i = f(xi) correspond
to some threshold function f
The binary search for b pretends that all points to the left of positive
example(s)
(Angluin 1988)
The learner may request labels for any
problems
But, labeling such arbitrary instances can
be awkward if the oracle is a human
annotator
For eg.: human oracles to train a ANN to
classify handwritten characters
Many of the queries images generated by the
free or inexpensive
First sampled from the actual
distribution and the learner decide
whether or not to request its label
to query an instance?
Informativeness measure or query strategy
Region of uncertainty
Part of the instance space that is still ambiguous
to the learner
Query only those instances that fall in the region
set
Request labels for 1 or more carefully selected
instances
Focus on difficult to label tuple
Analogy with Boosting?
Focus on most informative instance
Greedy approach?
to query next
Newly labeled instances are added to the
labeled set
appropriate??
When memory or processing power is limited,
as with mobile and embedded devices
PotentialofActive Learning
PotentialofActive Learning
Learning Curves
Active learning algorithms are
training data
The model is re-trained using the new training data
The process repeat until we have budget left for getting
labels or we have attained the desired accuracy!
uncertain about
x x
x x
Figure courtesy:
Irina Rish, IBM T.J. Watson Research Center
label sets
in version space
In AL, we try to constrain the size of
the version space as much as
possible
Why?
So that the search can be more
precise with as few labeled
instances as possible
must:
Be able to construct a committee of
models
Boosting & Bagging
uncertainty sampling
x x
x x
Figure courtesy:
Irina Rish, IBM T.J. Watson Research Center
choose?
classifiers/hypotheses
Sample a set of classifiers from distribution
Natural for ensemble methods which are
already samples
Random forests, Bagged classifiers, etc.
Measures of disagreement
Entropy of predicted responses
W eb Searching
A Web based company wishes to gather
Personalized Em ailFilter
The user wishes to create a personalized
Relevance feedback
The user wishes to sort through a
Active Learning
Happy ACTIVE LEARNING from now
on!!