Active Learning from Evolving Streaming Data Indrė Žliobaitė [email protected] STRC, Bournemouth University 2012 June 13

www.infer.eu

Joint work with ●

Albert Bifet



Geoff Holmes



Bernhard Pfahringer

Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G. (2011). Active Learning with Evolving Streaming Data. Proc. of the 21st European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD'11), Springer LNCS 6913, p. 597-612.

setting

Data stream mining

Chemical production plant given sensor readings predict the quality of the output 24/7 plant operation

Model does not change

Process changes

source: Evonik Industries

Examples of data streams

Sensor data Web data (logs,content)

Activity data

Mining data streams ●



Data ●

arrives in real time, potentially infinitely



is changing over time



not possible to store everything, discard (or archive) after processing

Requirements for predictive models ●

operate in less than example arrival time



fit into strictly limited memory



adapt to changing data (update/retrain online) –

otherwise accuracy will degrade over time

Predictive models for data streams Receive an example

MODEL

... predict

...

Predictive models for data streams Receive an example

MODEL

... predict

receive true label update the model

...

Predictive models for data streams Receive another example

UPDATED MODEL

... predict

receive true label

save the model, discard the example

...

Predictive models for data streams Receive an example

MODEL

...

...

It is unreasonable to ask for feedback at every iteration,

predict

labels may be costly or infeasible to obtain due to receive true label update the model

- human labour (text, images) - laboratory tests - destructive tests

Active learning for data streams Receive an example

MODEL

... predict

no

... are resources for labelling available?

Active learning for data streams Receive an example

MODEL

...

no

...

no are resources for labelling available?

predict

yes receive true label update the model

save the model, discard the example

yes

do we need THIS label?

Active learning for data streams Receive an example

MODEL

...

no

...

no are resources for labelling available?

predict

yes receive true label update the model

save the model, discard the example

yes

do we need THIS label?

now or never ACTIVE LEARNING STRATEGIES can answer this question, they can be used as a wrapper with a learning model of user's choice

Problem setting summary ●

Supervised learning



Evolving (changing) streaming data



Models need to adapt to changes over time



For adapting, feedback is needed



True labels may be costly or infeasible to obtain



We need to decide whether to ask for the true label for an example now or never

active learning strategies for data streams How to decide whether to ask for the true label for a given example?

Do we need this label?

ask for all

ask for some

select actively

Fixed threshold

our strategies

at random

Do we need this label?

ask for all

ask for some

select actively

Fixed threshold

our strategies

at random

Random strategy (naive) ●

Receive example Xt



If z
ask for the true label yt

original (instance space)

uniform random sampling

Random strategy (naive) ●

Receive example Xt



If z
ask for the true label yt

slow to learn

Do we need this label?

ask for all

ask for some

select actively

Fixed threshold

our strategies

at random

Online active learning in the data stream setting? ●

Online setting ●

fix a threshold (e.g. uncertainty threshold)



check every incoming example against the threshold



if over the threshold, ask for the true label

Fixed uncertainty ●

Receive example Xt and a prediction y*t



If labelling budget is available [u/t < B] ●

If uncertainty of Xt is greater than threshold [ P(y*t|Xt) < K ] –

ask for the true label yt



update the model with (Xt,yt), u=u+1

original

threshold

Online active learning in the data stream setting? ●

Online setting ●

fix a threshold (e.g. uncertainty threshold)



check every incoming example against the threshold



if over the threshold, ask for the true label

PROBLEMS for streaming data data is changing, models need to evolve if the threshold is fixed, model becomes confident, stops learning, fails to notice changes and fails to adapt

What is needed? ●

In data streams ●

Changes may happen at any time



Requirement 1 –

we should ask for labels over time in a balanced way

certainty

Budget: we have resources to label up to 1/3 of the incoming examples

threshold

label time

0 0 1 0 0 Available labelling resources

certainty

threshold label label label

Need to label but no resources time

0 0 1 0 0 1 1 0 0 0 0 Available labelling resources

nothing is labelled

certainty

very certain, but not necessarily accurate, as data evolves

threshold

time

0 0 1 1 1 2 2 2 3 Available labelling resources

What is needed? ●

In data streams ●

Changes may happen at any time



Requirement 1 –



we should ask for labels over time in a balanced way

Changes may happen anywhere ●

Requirement 2



given enough time, we should ask label for any data point otherwise, we may never detect changes in some regions, and model will never adapt –

changes

all data

labelled by fixed threshold

Changes in the regions where classifier is very certain should not be missed

Do we need this label?

ask for all

ask for some

select actively

Fixed threshold

our strategies

at random

What is needed?



Requirement 1

we should ask for labels over time in a balanced way we propose: adaptive threshold –





Requirement 2

given enough time, we should ask label for any data point we propose: add randomization to the threshold –



Adaptive uncertainty strategy ●

Receive example Xt and a prediction y*t



If labelling budget is available [u/t < B] ●



If uncertainty of Xt is greater than threshold [ P(y*t|Xt) < K ] –

ask for the true label yt



update the model with (Xt,yt), increment budget counter u=u+1



shrink the threshold [K = K(1 – s) ]

else –

expand the threshold [K = K(1 + s) ]

Requirement 1 balances labelling budget over infinite time

Randomized uncertainty ●

Receive example Xt and a prediction y*t



If labelling budget is available [u/t < B] ●



If uncertainty of Xt is greater than randomized threshold [ P(y*t|Xt) < Krandomized, Krandomized = Kv, where v ~ N(1,d) ] –

ask for the true label yt



update the predictive model with (Xt,yt), u=u+1



shrink the threshold [K = K(1 – s) ]

else –

expand the threshold [K = K(1 + s) ]

Requirement 2 balances labelling to cover the instance space

Properties of the techniques

Random Fixed uncertainty

Requirement 1 balancing over time

Requirement 2 coverage of instance space

Training data distribution

yes

full

iid

no handled

partial partial

biased biased

full

biased

Adaptive uncertainty Randomized uncertainty handled

Summary of strategies

Random

Fixed

RandVarUncertainty

empirical results

MOA ●





{M}assive {O}nline {A}nalysis is a framework for online learning from data streams It is closely related to WEKA It includes a collection of online and offline algorithms and tools for evaluation ●

classification



clustering



Easy to extend



Easy to design and run experiments

Experimental evaluation ●

Strategies ●

random sampling, fixed uncertainty, adaptive uncertainty, randomized (adaptive) uncertainty, selective uncertainty



Adaptive learner: DDM (Gama et al, 2004)



Evaluation: accuracy over a dataset, accuracy in time



Datasets





synthetic (hyperplane)



real-life textual with our labels (IMDB-E, IMDB-D, Reuters)



real-life with expected changes (Electricity, Cover type, Airlines)

The results demonstrate advantages of our strategies against fixed threshold and random sampling in the data stream settings where data is evolving

REUTERS data s ta rt 0 .8

r a n d o m iz e d u n c e r ta in ty

g e o m e tr ic a c c u r a c y

0 .7

0 .5

Fixed uncertainty becomes very confident in its predictions and adapts slowly

ra n d o m

0 .6 a d a p tiv e u n c e r ta in ty

0 .4 0 .3 0 .2 0 .1

fix e d u n c e r ta in ty

500

1000 tim e

1500

2000

REUTERS data s ta b le

fix e d u n c e rta in ty

g e o m e tr ic a c c u r a c y

0 .9

Fixed uncertainty and adaptive uncertainty do not waste labelling budget for querying very certain examples, thus is more accurate when there are no changes in data

a d a p tiv e u n c e r ta in ty

0 .8 5

0 .8 ra n d o m iz e d u n c e rta in ty

0 .7 5 ra n d o m

0 .7

3500

4000 tim e

4500

5000

REUTERS data change 1 ra n d o m

g e o m e tr ic a c c u r a c y

0 .8

Fixed uncertainty fails to adapt, strategies with randomization adapt faster

ra n d o m iz e d u n c e rta in ty

0 .6 a d a p tiv e u n c e rta in ty

0 .4

0 .2 fix e d u n c e r ta in ty

0

7000

7500

8000

8500 tim e

9000

9500

conclusion

Conclusion ●

We explore active learning in the strict data stream settings



We equip active learning strategies with mechanisms to ●





trade off labelling some of the uncertain examples for labelling very confident examples in order to capture changes anywhere in the input space

Empirical results suggest that our strategies ●





control distribution of labelling budget over infinite time

have an advantage in accuracy against fixed threshold and random sampling in data stream settings where data evolves over time

Adaptive uncertainty is preferred when mild changes are expected, randomized uncertainty if preferred for data with strong changes

Thanks!

Acknowledgements Part of the research leading to these results has received funding from the EC within the Marie Curie Industry and Academia Partnerships and Pathways (IAPP) programme under grant agreement no. 251617.

Active Learning from Evolving Streaming Data

Jun 13, 2012 - Data. ○ arrives in real time, potentially infinitely. ○ is changing over time. ○ not possible to store everything, discard (or archive) after processing. ○ .... It includes a collection of online and offline algorithms and tools for.

2MB Sizes 0 Downloads 135 Views

Recommend Documents

Clustering Based Active Learning for Evolving Data ...
Clustering Based Active Learning for Evolving. Data Streams. Dino Ienco1, Albert Bifet2, Indr˙e Zliobait˙e3 and Bernhard Pfahringer4. 1 Irstea, UMR TETIS, Montpellier, France. LIRMM ... ACLStream (Active Clustering Learning for Data Streams)to bett

Efficient Active Learning with Boosting
compose the set Dn. The whole data set now is denoted by Sn = {DL∪n,DU\n}. We call it semi-supervised data set. Initially S0 = D. After all unlabeled data are labeled, the data set is called genuine data set G,. G = Su = DL∪u. We define the cost

CorrActive Learning: Learning from Noisy Data through ...
we present the past work done on modeling noisy data and also work done in the related .... suggest the utility of the corrActive learner system. 4.3 Synthetic experiments ... placing the human with an autmoatic oracle that has the true labels.

Efficient Active Learning with Boosting
unify semi-supervised learning and active learning boosting. Minimization of ... tant, we derive an efficient active learning algorithm under ... chine learning and data mining fields [14]. ... There lacks more theoretical analysis for these ...... I

Collaborative Research: Citing Structured and Evolving Data
Repeatability: This is specific to data citation and important to this proposal. ... Persistent identifiers (Digital Object Identifiers, Archival Resource Keys, Uniform ...... Storage. In VLDB, pages 201–212, 2003. [56] Alin Deutsch and Val Tannen.

ACTIVE LEARNING BASED CLOTHING IMAGE ...
Electrical Engineering, University of Southern California, Los Angeles, USA. † Research ... Ranking of. Clothing Images. Recommendation based on. User Preferences. User Preference Learning (Training). User-Specific Recommendation (Testing). (a) ...

Theory of Active Learning - Steve Hanneke
Sep 22, 2014 - This contrasts with passive learning, where the labeled data are taken at random. ... However, the presentation is intended to be pedagogical, focusing on results that illustrate ..... of observed data points. The good news is that.

Active learning via Neighborhood Reconstruction
State Key Lab of CAD&CG, College of Computer Science,. Zhejiang ..... the degree of penalty. Once the ... is updated by the following subproblems: ˜anew.

Efficient Active Learning with Boosting
[email protected], [email protected]} handle. For each query, a ...... can be easily generalized to batch mode active learn- ing methods. We can ...

Interacting with VW in active learning - GitHub
Nikos Karampatziakis. Cloud and Information Sciences Lab. Microsoft ... are in human readable form (text). ▷ Connects to the host:port VW is listening on ...

Active Learning Approaches for Learning Regular ...
traction may span across several lines (e.g., HTML elements including their content). Assuming that ..... NoProfit-HTML/Email. 4651. 1.00. 1.00. 1.00. 1.00. 1.00.

Transfer Learning and Active Transfer Learning for ...
1 Machine Learning Laboratory, GE Global Research, Niskayuna, NY USA. 2 Translational ... data in online single-trial ERP classifier calibration, and an Active.

Active Learning and Semi-supervised Learning for ...
Dec 15, 2008 - We introduce our criterion and framework, show how the crite- .... that the system trained using the combined data set performs best according.

Active Learning Approaches for Learning Regular ...
A large class of entity extraction tasks from unstructured data may be addressed by regular expressions, because in ..... management, pages 1285–1294. ACM ...

Efficient Active Learning with Boosting
real-world database, which show the efficiency of our algo- rithm and verify our theoretical ... warehouse and internet usage has made large amount of unsorted ...

Theory of Active Learning - Steve Hanneke
Sep 22, 2014 - efits in these application domains, or might these observations simply reflect the need to ...... ios leading to the lower bounds). In the noisy case, ...