Vowpal Wabbit 2015

Kai-Wei Chang, Markus Cozowicz, Hal Daume, Luong Hoang, TK Huang, John Langford http://hunch.net/~vw/ git clone git://github.com/JohnLangford/vowpal_wabbit.git

Why does Vowpal Wabbit exist?

Why does Vowpal Wabbit exist?

1. Prove research.

Why does Vowpal Wabbit exist?

1. 2. 3. 4.

Prove research. Curiosity. Perfectionist. Solve problem better.

A user base becomes addictive 1. Mailing list of >400

A user base becomes addictive 1. Mailing list of >400 2. The official strawman for large scale logistic regression @ NIPS :-)

A user base becomes addictive 1. Mailing list of >400 2. The official strawman for large scale logistic regression @ NIPS :-) 3.

An example

wget http://hunch.net/~jl/VW_raw.tar.gz vw -c rcv1.train.raw.txt -b 22 --ngram 2 --skips 4 -l 0.25 --binary provides stellar performance in 12 seconds.

Surface details 1. BSD license, automated test suite, github repository.

Surface details 1. BSD license, automated test suite, github repository. 2. VW supports all I/O modes: executable, library, port, daemon, service (see next).

Surface details 1. BSD license, automated test suite, github repository. 2. VW supports all I/O modes: executable, library, port, daemon, service (see next). 3. VW has a reasonable++ input format: sparse, dense, namespaces, etc...

Surface details 1. BSD license, automated test suite, github repository. 2. VW supports all I/O modes: executable, library, port, daemon, service (see next). 3. VW has a reasonable++ input format: sparse, dense, namespaces, etc... 4. Mostly C++, but bindings in other languages of varying maturity (python, C#, Java good).

Surface details 1. BSD license, automated test suite, github repository. 2. VW supports all I/O modes: executable, library, port, daemon, service (see next). 3. VW has a reasonable++ input format: sparse, dense, namespaces, etc... 4. Mostly C++, but bindings in other languages of varying maturity (python, C#, Java good). 5. A substantial user base + developer base. Thanks to many who have helped.

What does Vowpal Wabbit do well? Older: 1. Online learning. Support for real online learning. 2. Parallelization. Via allreduce 3. Scalable solutions. Logarithmic time prediction!

What does Vowpal Wabbit do well? Older: 1. Online learning. Support for real online learning. 2. Parallelization. Via allreduce 3. Scalable solutions. Logarithmic time prediction! Newer:

1. Problem Framing.

What does Vowpal Wabbit do well? Older: 1. Online learning. Support for real online learning. 2. Parallelization. Via allreduce 3. Scalable solutions. Logarithmic time prediction! Newer:

1. Problem Framing.

What does Vowpal Wabbit do well? Older: 1. Online learning. Support for real online learning. 2. Parallelization. Via allreduce 3. Scalable solutions. Logarithmic time prediction! Newer:

1. Problem Framing.

2. Learning lifecycle.

What does VW not do well?

1. GPU training. 2. Representational flexibility.

Next

1. 2. 3. 4.

Learning to Search (Hal/John/Kai-Wei) Active Learning (TK) System Integration (Markus) Client side Decision Service (Luong)

What are joint predictions? Task

Input

Output

Machine Translation

Ces deux principes se tiennent à  la croisée de la philosophie, de  la politique, de l’économie, de la  sociologie et du droit.

Both principles lie at the  crossroads of philosophy, politics, economics,  sociology, and law.

Sequence Labeling Syntactic Analysis

The monster ate a big sandwich The monster ate a big sandwich

Det

3d point cloud classification

3d range scan data

Noun VerbDetAdj Noun

The monster ate a big sandwich

The monster ate a big sandwich

...many more... 1

Hal Daumé III ([email protected])

VW learning to search

What are joint predictions? Task

Input

Output

Machine Translation

Ces deux principes se tiennent à  la croisée de la philosophie, de  la politique, de l’économie, de la  sociologie et du droit.

Both principles lie at the  crossroads of philosophy, politics, economics,  sociology, and law.

Structured Structured Prediction Prediction Haiku Haiku

Sequence Labeling Syntactic Analysis 3d point cloud classification

AA joint joint prediction prediction Across Across aa single single input input Loss measured jointly Loss measured jointly 3d range scan data The monster ate a big sandwich The monster ate a big sandwich

Det

Noun VerbDetAdj Noun

The monster ate a big sandwich

The monster ate a big sandwich

...many more... 2

Hal Daumé III ([email protected])

VW learning to search

We want to minimize...

3



Programming complexity. Most joint prediction problems are not addressed using structured learning because of programming complexity.



Test loss. If it doesn't work, game over.



Training time. Debug/develop productivity, hyperparameter tuning, maximum data input.



Test time. Application efficiency.

Hal Daumé III ([email protected])

VW learning to search

Programming complexity

4

Hal Daumé III ([email protected])

VW learning to search

Python interface to VW Library interface to VW (not a command line wrapper) It is actually documented!!! Allows you to write code like: import pyvw vw = pyvw.vw(“--quiet”) ex1 = vw.example(“1 |x a b |y c”) ex2 = vw.example({'x': ['a', ('b', 1.0)], \ 'y': ['c']}) ex1.learn() print ex2.predict() 5

Hal Daumé III ([email protected])

VW learning to search

iPython Notebook for Learning to Search

http://tinyurl.com/pyvwsearch http://tinyurl.com/pyvwtalk http://tinyurl.com/lolstalk2 6

Hal Daumé III ([email protected])

VW learning to search

State of the art accuracy in.... ➢

Part of speech tagging (1 million words) ➢ ➢ ➢ ➢



6 lines of code 1068 lines 777 lines

3.2 seconds 10 seconds to train 6 minutes hours

Named entity recognition (200 thousand words)



wc: vw: CRFsgd: CRF++:



SVMstr:

➢ ➢ ➢

7

wc: vw: CRFsgd: CRF++:

30 lines of code

876 lines

0.8 seconds 5 seconds to train 1 minute (subopt accuracy) 10 minutes (subopt accuracy) 30 minutes (subopt accuracy)

Hal Daumé III ([email protected])

VW learning to search

State of the art accuracy in.... ➢

Part of speech tagging (1 million words) ➢ ➢ ➢ ➢



6 lines of code 1068 lines 777 lines

3.2 seconds 10 seconds to train 6 minutes hours

Named entity recognition (200 thousand words)



wc: vw: CRFsgd: CRF++:



SVMstr:

➢ ➢ ➢

8

wc: vw: CRFsgd: CRF++:

30 lines of code

876 lines

0.8 seconds 5 seconds to train 1 minute (subopt accuracy) 10 minutes (subopt accuracy) 30 minutes (subopt accuracy)

Hal Daumé III ([email protected])

VW learning to search

Training time versus test accuracy

9

Hal Daumé III ([email protected])

VW learning to search

Training time versus test accuracy Gap due to predicting independently

10

Hal Daumé III ([email protected])

VW learning to search

Training time versus test accuracy

11

Hal Daumé III ([email protected])

VW learning to search

Test time speed

Possibly the fastest test-time prediction out there, and without “label dictionary” hacks 12

Hal Daumé III ([email protected])

VW learning to search

Command-line usage % wget http://bilbo.cs.illinois.edu/~kchang10/tmp/wsj.vw.zip % unzip wsj.vw.zip % vw -b 24 -d wsj.train.vw -c --search_task sequence \ --search 45 --search_neighbor_features -1:w,1:w \ --affix -1w,+1w -f wsj.weights % vw -t -i wsj.weights wsj.test.vw

13

Hal Daumé III ([email protected])

VW learning to search

Identifying Relationship between Words

I ate a cake with a folk

1

Dependency Parser in VW

 # lines of code ~ 300 [Arxiv 15a]: Learning to search dependencies 2

Shift-Reduce Parser  Maintain a buffer and a stack  Make predictions from left to right  Three types of actions: Shift, Reduce-Left, Reduce-Right

3

Shift-Reduce Parser  Maintain a buffer and a stack  Make predictions from left to right  Three types of actions: Shift, Reduce-Left, Reduce-Right

I ate a cake Shift

I ate a cake 4

Shift-Reduce Parser  Maintain a buffer and a stack  Make predictions from left to right  Three types of actions: Shift, Reduce-Left, Reduce-Right

I ate a cake

I ate a cake Reduce-Left

Shift

I ate a cake

ate a cake

I 5

Features  Lexicon & POS tags of …  top three words in the stack,  first three words in the buffer,  and their children

 Combination (quadratic, cubic) of features

ate

cake

I

a 6

7

Run the Parser  Under demo/dependencyparsing  Data: 2 2 2:nmod|w ms. |p nnp 3 5 3:sub|w haag |p nnp 0 8 0:root|w plays |p vbz 3 7 3:obj|w piano|p nn 3 4 3:p|w . |p .

Ms. Haag plays piano .

8

Active Learning in VW Streaming Selective Sampling

Source

Repeat: i.i.d.

1

Receive a new x ∼ DX .

2

Query for label? Yes/no

3

If yes, obtain label y.

Learner

Labeler

x1 label o f x1 ?

y1

Goal: Maximize classifier accuracy per label query

x2

Key step: query decision

x3

No query

Active Learning in VW: Simulation Mode

vw --binary --active --simulation --mellowness 0.01 labeled.data --mellowness: small value leads to few label queries vw --binary --active --cover 10 --mellowness 0.01 train.data --cover: number of classifiers used to measure uncertainty about the label. Use a large -b (e.g. 29) with a large --cover (e.g. 50).

Active Learning in VW: Simulation Mode titanic 0.32

Passive Active Active Cover

test error

0.3 0.28 0.26 0.24 0.22 1

10

2

10

number of label queries

3

10

Active Learning in VW: Interactive Mode vw --active --port 6075 --mellowness 0.01 --port: port number VW is listening

Active Learning in VW: Interactive Mode python utl/active interactor.py -v -m -o labeled.dat localhost 6075 unlabeled.dat

nuget.org

using (var vw = new VowpalWabbit("--quiet")) { vw.Learn("1 |f 13:3.9 24:3.4 69:4.6"); var prediction = vw.Predict( "|f 13:3.9 24:3.4 69:4.6", VowpalWabbitPredictionType.Scalar); vw.SaveModel("output.model");

public class MyExample { [Feature(FeatureGroup = 'p')] public float Income { get; set; } [Feature(Enumerize = true)] public int Age { get; set; }

} new MyExample { Income = 40, Age = 25 }  "|p Income:40.0 | Age25"

using (var vw = new VowpalWabbit("")) { var ex = new MyExample { Income = 40, Age = 25 }; var label = new SimpleLabel { Label = 1 }; vw.Learn(ex, label); var prediction = vw.Predict(ex, VowpalWabbitPredictionType.Scalar);

}

var vwModel = new VowpalWabbitModel("-t -i m1.model"); using (var pool = new VowpalWabbitThreadedPrediction(vwModel)) { // thread-safe using (var vw = pool.GetOrCreate()) { // vw.Value is not thread-safe vw.Value.Predict(example); } // thread-safe pool.UpdateModel(new VowpalWabbitModel("-t -i m2.model")); }

VowpalWabbitAsync

VowpalWabbitThreadedLearning

VW VowpalWabbitAsync

VW

VW

VW

VW

VowpalWabbitAsync

Examples distributed uniform or round robin AllReduce every N-examples

var settings = new VowpalWabbitSettings( parallelOptions: new ParallelOptions { MaxDegreeOfParallelism = 16 }, exampleCountPerRun: 2000); using (var vw = new VowpalWabbitThreadedLearning(settings)) { using (var vwFeeder = vw.Create()) { var prediction = await vwFeeder.Learn(example, label, VowpalWabbitPredictionType.Scalar); }

await vw.Complete(); }

   



arrives chooses responds

model

User Application

Command Center

settings

User Storage

joined data model

action, prob, context, key Client Library

reward, key

Join Server

AzureML

. . . var serviceConfig = new DecisionServiceConfiguration ( authorizationToken: MwtServiceToken, explorer: new EpsilonGreedyExplorer(. . .) ); var service = new DecisionService(serviceConfig); uint topicId = service.ChooseAction(uniqueKey: userId, context: userContext); . . .

 

Further Pointers Learning to Search tutorial: http://hunch.net/~l2s Talks on Decision Service this afternoon. 2:30 @Learning Systems 4:30 @Adaptive Learning More details: http://aka.ms/mwt Mailing list: [email protected]

Vowpal Wabbit 2015 - PDFKUL.COM

Active Learning in VW: Simulation Mode vw --binary --active --simulation --mellowness 0.01 labeled.data. --mellowness: small value leads to few label queries vw --binary --active --cover 10 --mellowness 0.01 train.data. --cover: number of classifiers used to measure uncertainty about the label. Use a large -b (e.g. 29) with a ...

3MB Sizes 0 Downloads 249 Views

Recommend Documents

Vowpal Wabbit 2015 - GitHub
iPython Notebook for Learning to Search http://tinyurl.com/ ... VW learning to search. 9. Hal Daumé III ([email protected]). Training time versus test accuracy ...

Vowpal Wabbit 2016 - GitHub
Community. 1. BSD license. 2. Mailing list >500, Github >1K forks, >1K,. >1K issues, >100 contributors. 3. The official strawman for large scale logistic regression @ NIPS :-) ...

Vowpal Wabbit 5.1 - GitHub
The Tutorial Plan. 1. Baseline online linear algorithm. 2. Common ... example_39|excuses:0.1 the:0.01 dog ate my homework |teacher male white Bagnell AI ate.

Vowpal Wabbit 6.1 - PDFKUL.COM
What goes wrong? And xes. 2.1 Importance Aware Updates. 2.2 Adaptive updates. 3. LBFGS: Miro's turn. 4. Terascale Learning: Alekh's turn. 5. Common questions we don't have time to cover. 6. Active Learning: See ... (1). 4. Update wi ← wi+ η2(y −

LDA from vowpal wabbit - GitHub
born --- 0.0975 career --- 0.0441 died --- 0.0312 worked --- 0.0287 served --- 0.0273 director --- 0.0209 member --- 0.0176 years --- 0.0167 december --- 0.0164.

Vowpal Wabbit 5.1 - PDFKUL.COM
wixi clipped to interval. [0,1]. 3. Learn truth y ∈ [0,1] with importance I or goto. (1). 4. Update wi ← wi+ η2(y − ˆy)Ixi and go to (1). ... 1 | 13:3.96e-02 24:3.47e-02 69:4.62e-02 example_39|excuses the dog ate my homework. 1 0.500000 examp

Vowpal Wabbit - GitHub
void learn(void* d, learner& base, example* ec). { base.learn(ec); // The recursive call if ( ec-> nal_prediction > 0) //Thresholding ec-> nal_prediction = 1; else.

Vowpal Wabbit - GitHub
QF ellredu™eF „er—s™—le le—rning p—per a most ... vw -c rcv1.train.raw.txt -b 22 --ngram 2. --skips 4 ... ƒolutionX en explor—tion li˜r—ry whi™h r—ndomizes.

Vowpal Wabbit 6.1 - GitHub
It just works. A package in debian & R. Otherwise, users just type make , and get a working system. At least a half-dozen companies use VW. Favorite App: True ...

Vowpal Wabbit 7 Tutorial - PDFKUL.COM
General Options. Other Useful Options. • -b n, default is n=18: log number of weight parameters, increase to reduce collisions from hashing. • -q ab, quadratic features between all features in namespace a* and b*. • --ignore a, removes features

Vowpal Wabbit 7 Tutorial - GitHub
Weight 1 by default. – Label: use {-1,1} for classification, or any real value for regression. 1 | 1:0.43 5:2.1 10:0.1. -1 | I went to school. 10 | race=white sex=male ...

Wabbit online calculator instructions Wabbitemu.pdf
Page 1 of 76. Wabbit TI-84 Plus Silver Edition Emulator Instructions. Go to this website http://wabbit.codeplex.com. Download Wabbitemu. Run Wabbitemu.exe. Select “Create a ROM image Select Calculator Type TI-84 Plus SE. using open source software.

Wabbit online calculator instructions Wabbitemu.pdf
Go to this website http://wabbit.codeplex.com. Download Wabbitemu. Run Wabbitemu.exe. Select “Create a ROM image Select Calculator Type TI-84 Plus SE.

WABBIT TI84+ SE EMULATOR.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. WABBIT TI84+ ...

Wabbit online calculator instructions Wabbitemu.pdf
Page 1 of 2. 14/11/2012 Página 1 de 9 Profesor: Luís Rodolfo Dávila Márquez CÓDIGO: 00076 UFPS. CURSO: CÁLCULO INTEGRAL. UNIDAD 2 A.

pdf-0943\radley-house-wisconsin-wabbit-21-rabbit ...
... the apps below to open or edit this item. pdf-0943\radley-house-wisconsin-wabbit-21-rabbit-bun ... -clothes-sewing-pattern-by-h-m-want-by-h-m-wyant.pdf.

2015-03 March 2015.pdf
thank Drew for hosting a very memorable day. Michelle. Vireya Great Scent-sation. A beautifully perfumed vireya (konori X. viriosum) produced by the Australian.

June 2015
June, 2015. ELECTIVE COURSE : POLITICAL SCIENCE. EPS-08 : GOVERNMENT AND POLITICS IN. AUSTRALIA. Time : 3 hours. Maximum Marks : 100. Note. (i) Section I — Answer any ... aboriginals in Australia ? Elaborate. EPS-08. 1. P.T.O. ... Australian politi

June 2015
BNS-111. No. of Printed Pages : 2. POST BASIC BACHELOR OF SCIENCE. (NURSING) ... (c) Steps of evaluation process of students. (d) Types of data analysis.

Vintage.Style.Billboard.Paper.Model.For.Dioramas.by.Papermau.2015 ...
Page 3 of 3. Page 3 of 3. Vintage.Style.Billboard.Paper.Model.For.Dioramas.by.Papermau.2015.pdf. Vintage.Style.Billboard.Paper.Model.For.Dioramas.by.Papermau.2015.pdf. Open. Extract. Open with. Sign In. Details. Comments. General Info. Type. Dimensio

t- 2015
Sep 23, 2015 - Chief, CID/SGOD. Education Program Supervisors. Public Schools District Supervisors. Public Secondary Schoo Heads. FROM: MANUELA S.

June 2015
DISASTER MANAGEMENT (PGDDM). Ted Examination. Ui../ thane, 2015. MPA-003 : RISK ASSESSMENT AND. VULNERABILITY ANALYSIS. Time : 2 hours. Maximum Marks : 50. Note : Attempt any five questions in about 400 words each, from the following questions given

SWU 2015
Application Deadline:Monday,16 March ... It was my pleasure to be the first Taiwanese who participate this awesome ... trip will stay with me for the rest of my life.

June 2015
Explain the operation of an On-site Treatment 10 plant of waste water in a health care facility. 2. List the importance of monitoring of health care.