Active Learning from Evolving Streaming Data Indrė Žliobaitė [email protected] STRC, Bournemouth University 2012 June 13

www.infer.eu

Joint work with ●

Albert Bifet



Geoff Holmes



Bernhard Pfahringer

Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G. (2011). Active Learning with Evolving Streaming Data. Proc. of the 21st European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD'11), Springer LNCS 6913, p. 597-612.

setting

Data stream mining

Chemical production plant given sensor readings predict the quality of the output 24/7 plant operation

Model does not change

Process changes

source: Evonik Industries

Examples of data streams

Sensor data Web data (logs,content)

Activity data

Mining data streams ●



Data ●

arrives in real time, potentially infinitely



is changing over time



not possible to store everything, discard (or archive) after processing

Requirements for predictive models ●

operate in less than example arrival time



fit into strictly limited memory



adapt to changing data (update/retrain online) –

otherwise accuracy will degrade over time

Predictive models for data streams Receive an example

MODEL

... predict

...

Predictive models for data streams Receive an example

MODEL

... predict

receive true label update the model

...

Predictive models for data streams Receive another example

UPDATED MODEL

... predict

receive true label

save the model, discard the example

...

Predictive models for data streams Receive an example

MODEL

...

...

It is unreasonable to ask for feedback at every iteration,

predict

labels may be costly or infeasible to obtain due to receive true label update the model

- human labour (text, images) - laboratory tests - destructive tests

Active learning for data streams Receive an example

MODEL

... predict

no

... are resources for labelling available?

Active learning for data streams Receive an example

MODEL

...

no

...

no are resources for labelling available?

predict

yes receive true label update the model

save the model, discard the example

yes

do we need THIS label?

Active learning for data streams Receive an example

MODEL

...

no

...

no are resources for labelling available?

predict

yes receive true label update the model

save the model, discard the example

yes

do we need THIS label?

now or never ACTIVE LEARNING STRATEGIES can answer this question, they can be used as a wrapper with a learning model of user's choice

Problem setting summary ●

Supervised learning



Evolving (changing) streaming data



Models need to adapt to changes over time



For adapting, feedback is needed



True labels may be costly or infeasible to obtain



We need to decide whether to ask for the true label for an example now or never

active learning strategies for data streams How to decide whether to ask for the true label for a given example?

Do we need this label?

ask for all

ask for some

select actively

Fixed threshold

our strategies

at random

Do we need this label?

ask for all

ask for some

select actively

Fixed threshold

our strategies

at random

Random strategy (naive) ●

Receive example Xt



If z
ask for the true label yt

original (instance space)

uniform random sampling

Random strategy (naive) ●

Receive example Xt



If z
ask for the true label yt

slow to learn

Do we need this label?

ask for all

ask for some

select actively

Fixed threshold

our strategies

at random

Online active learning in the data stream setting? ●

Online setting ●

fix a threshold (e.g. uncertainty threshold)



check every incoming example against the threshold



if over the threshold, ask for the true label

Fixed uncertainty ●

Receive example Xt and a prediction y*t



If labelling budget is available [u/t < B] ●

If uncertainty of Xt is greater than threshold [ P(y*t|Xt) < K ] –

ask for the true label yt



update the model with (Xt,yt), u=u+1

original

threshold

Online active learning in the data stream setting? ●

Online setting ●

fix a threshold (e.g. uncertainty threshold)



check every incoming example against the threshold



if over the threshold, ask for the true label

PROBLEMS for streaming data data is changing, models need to evolve if the threshold is fixed, model becomes confident, stops learning, fails to notice changes and fails to adapt

What is needed? ●

In data streams ●

Changes may happen at any time



Requirement 1 –

we should ask for labels over time in a balanced way

certainty

Budget: we have resources to label up to 1/3 of the incoming examples

threshold

label time

0 0 1 0 0 Available labelling resources

certainty

threshold label label label

Need to label but no resources time

0 0 1 0 0 1 1 0 0 0 0 Available labelling resources

nothing is labelled

certainty

very certain, but not necessarily accurate, as data evolves

threshold

time

0 0 1 1 1 2 2 2 3 Available labelling resources

What is needed? ●

In data streams ●

Changes may happen at any time



Requirement 1 –



we should ask for labels over time in a balanced way

Changes may happen anywhere ●

Requirement 2



given enough time, we should ask label for any data point otherwise, we may never detect changes in some regions, and model will never adapt –

changes

all data

labelled by fixed threshold

Changes in the regions where classifier is very certain should not be missed

Do we need this label?

ask for all

ask for some

select actively

Fixed threshold

our strategies

at random

What is needed?



Requirement 1

we should ask for labels over time in a balanced way we propose: adaptive threshold –





Requirement 2

given enough time, we should ask label for any data point we propose: add randomization to the threshold –



Adaptive uncertainty strategy ●

Receive example Xt and a prediction y*t



If labelling budget is available [u/t < B] ●



If uncertainty of Xt is greater than threshold [ P(y*t|Xt) < K ] –

ask for the true label yt



update the model with (Xt,yt), increment budget counter u=u+1



shrink the threshold [K = K(1 – s) ]

else –

expand the threshold [K = K(1 + s) ]

Requirement 1 balances labelling budget over infinite time

Randomized uncertainty ●

Receive example Xt and a prediction y*t



If labelling budget is available [u/t < B] ●



If uncertainty of Xt is greater than randomized threshold [ P(y*t|Xt) < Krandomized, Krandomized = Kv, where v ~ N(1,d) ] –

ask for the true label yt



update the predictive model with (Xt,yt), u=u+1



shrink the threshold [K = K(1 – s) ]

else –

expand the threshold [K = K(1 + s) ]

Requirement 2 balances labelling to cover the instance space

Properties of the techniques

Random Fixed uncertainty

Requirement 1 balancing over time

Requirement 2 coverage of instance space

Training data distribution

yes

full

iid

no handled

partial partial

biased biased

full

biased

Adaptive uncertainty Randomized uncertainty handled

Summary of strategies

Random

Fixed

RandVarUncertainty

empirical results

MOA ●





{M}assive {O}nline {A}nalysis is a framework for online learning from data streams It is closely related to WEKA It includes a collection of online and offline algorithms and tools for evaluation ●

classification



clustering



Easy to extend



Easy to design and run experiments

Experimental evaluation ●

Strategies ●

random sampling, fixed uncertainty, adaptive uncertainty, randomized (adaptive) uncertainty, selective uncertainty



Adaptive learner: DDM (Gama et al, 2004)



Evaluation: accuracy over a dataset, accuracy in time



Datasets





synthetic (hyperplane)



real-life textual with our labels (IMDB-E, IMDB-D, Reuters)



real-life with expected changes (Electricity, Cover type, Airlines)

The results demonstrate advantages of our strategies against fixed threshold and random sampling in the data stream settings where data is evolving

REUTERS data s ta rt 0 .8

r a n d o m iz e d u n c e r ta in ty

g e o m e tr ic a c c u r a c y

0 .7

0 .5

Fixed uncertainty becomes very confident in its predictions and adapts slowly

ra n d o m

0 .6 a d a p tiv e u n c e r ta in ty

0 .4 0 .3 0 .2 0 .1

fix e d u n c e r ta in ty

500

1000 tim e

1500

2000

REUTERS data s ta b le

fix e d u n c e rta in ty

g e o m e tr ic a c c u r a c y

0 .9

Fixed uncertainty and adaptive uncertainty do not waste labelling budget for querying very certain examples, thus is more accurate when there are no changes in data

a d a p tiv e u n c e r ta in ty

0 .8 5

0 .8 ra n d o m iz e d u n c e rta in ty

0 .7 5 ra n d o m

0 .7

3500

4000 tim e

4500

5000

REUTERS data change 1 ra n d o m

g e o m e tr ic a c c u r a c y

0 .8

Fixed uncertainty fails to adapt, strategies with randomization adapt faster

ra n d o m iz e d u n c e rta in ty

0 .6 a d a p tiv e u n c e rta in ty

0 .4

0 .2 fix e d u n c e r ta in ty

0

7000

7500

8000

8500 tim e

9000

9500

conclusion

Conclusion ●

We explore active learning in the strict data stream settings



We equip active learning strategies with mechanisms to ●





trade off labelling some of the uncertain examples for labelling very confident examples in order to capture changes anywhere in the input space

Empirical results suggest that our strategies ●





control distribution of labelling budget over infinite time

have an advantage in accuracy against fixed threshold and random sampling in data stream settings where data evolves over time

Adaptive uncertainty is preferred when mild changes are expected, randomized uncertainty if preferred for data with strong changes

Thanks!

Acknowledgements Part of the research leading to these results has received funding from the EC within the Marie Curie Industry and Academia Partnerships and Pathways (IAPP) programme under grant agreement no. 251617.

Active Learning from Evolving Streaming Data

Jun 13, 2012 - Data. ○ arrives in real time, potentially infinitely. ○ is changing over time. ○ not possible to store everything, discard (or archive) after processing. ○ .... It includes a collection of online and offline algorithms and tools for.

2MB Sizes 0 Downloads 173 Views

Recommend Documents

No documents