Goals for future from last year

1

Finish Scaling up. I want a kilonode program.

2

Native learning reductions. Just like more complicated losses.

3

Other learning algorithms, as interest dictates.

4

Persistent Demonization

Goals for future from last year

1

Finish Scaling up. I want a kilonode program.

Some design considerations

Hadoop compatibility: Widely available, scheduling and robustness Iteration-firendly: Lots of iterative learning algorithms exist Minimum code overhead: Don’t want to rewrite learning algorithms from scratch Balance communication/computation: Imbalance on either side hurts the system

Some design considerations

Hadoop compatibility: Widely available, scheduling and robustness Iteration-firendly: Lots of iterative learning algorithms exist Minimum code overhead: Don’t want to rewrite learning algorithms from scratch Balance communication/computation: Imbalance on either side hurts the system Scalable: John has nodes aplenty

Current system provisions

Hadoop-compatible AllReduce Various parameter averaging routines Parallel implementation of Adaptive GD, CG, L-BFGS Robustness and scalability tested up to 1K nodes and thousands of node hours

Basic invocation on single machine

./spanning tree ../vw --total 2 --node 0 --unique id 0 -d $1 --span server localhost > node 0 2>&1 & ../vw --total 2 --node 1 --unique id 0 -d $1 --span server localhost killall spanning tree

Command-line options

--span server : Location of server for setting up spanning tree --unique id (=0): Unique id for cluster parallel job --total (=1): Total number of nodes used in cluster parallel job --node (=0): Node id in cluster parallel job

Basic invocation on a non-Hadoop cluster Spanning-tree server: Runs on cluster gateway, organizes communication ./spanning tree Worker nodes: Each worker node runs VW ./vw --span server --total --node --unique id -d

Basic invocation in a Hadoop cluster Spanning-tree server: Runs on cluster gateway, organizes communication ./spanning tree Map-only jobs: Map-only job launched on each node using Hadoop streaming hadoop jar $HADOOP HOME/hadoop-streaming.jar -Dmapred.job.map.memory.mb=2500 -input -output -file vw -file runvw.sh -mapper ´runvw.sh ´ -reducer NONE Each mapper runs VW Model stored in /model on HDFS runvw.sh calls VW, used to modify VW arguments

mapscript.sh example //Hadoop-streaming has no specification for number of mappers, we calculate it indirectly total= mapsize=`expr $total / $nmappers` maprem=`expr $total % $nmappers` mapsize=`expr $mapsize + $maprem` ./spanning tree //Starting span-tree server on the gateway //Note the argument min.split.size to specify number of mappers hadoop jar $HADOOP HOME/hadoop-streaming.jar -Dmapred.min.split.size=$mapsize -Dmapred.map.tasks.speculative.execution=true -input $in directory -output $out directory -file ../vw -file runvw.sh -mapper runvw.sh -reducer NONE

Communication and computation

Two main additions in cluster-parallel code: Hadoop-compatible AllReduce communication New and old optimization algorithms modified for AllReduce

Communication protocol

Spanning-tree server runs as daemon and listens for connections Workers via TCP with a node-id and job-id Two workers with same job-id and node-id are duplicates, faster one kept (speculative execution) Available as mapper environment variables in Hadoop mapper=`printenv mapred task id | cut -d " " -f 5` mapred job id=`echo $mapred job id | tr -d ´job ´`

Communication protocol contd.

Each worker connects to spanning-tree sever Server creates a spanning tree on the n nodes, communicates parent and children to each node Node connects to parent and children via TCP AllReduce run on the spanning tree

AllReduce

Every node begins with a number (vector)

1 2 4

3 5

6

7

Extends to other functions: max, average, gather, . . .

AllReduce

Every node begins with a number (vector)

1 11 4

16 5

6

7

Extends to other functions: max, average, gather, . . .

AllReduce

Every node begins with a number (vector)

28 11 4

16 5

6

7

Extends to other functions: max, average, gather, . . .

AllReduce

Every node begins with a number (vector) Every node ends up with the sum

28 28 28

28 28

28

28

Extends to other functions: max, average, gather, . . .

AllReduce Examples

Counting: n = allreduce(1) Average: avg = allreduce(ni )/allreduce(1) Non-uniform averaging: weighted avg = allreduce(ni wi )/allreduce(wi ) Gather: node array = allreduce({0, 0, . . . , |{z} 1 , . . . , 0}) i

AllReduce Examples

Counting: n = allreduce(1) Average: avg = allreduce(ni )/allreduce(1) Non-uniform averaging: weighted avg = allreduce(ni wi )/allreduce(wi ) Gather: node array = allreduce({0, 0, . . . , |{z} 1 , . . . , 0}) i

Current code provides 3 routines: accumulate(): Computes vector sums accumulate scalar(): Computes scalar sums accumulate avg(): Computes weighted and unweighted averages

Machine learning with AllReduce Previously: Single node SGD, multiple passes over data Parallel: Each node runs SGD, averages parameters after every pass (or more often!) Code change: if(global.span server != "") { if(global.adaptive) accumulate weighted avg(global.span server, params->reg); else accumulate avg(global.span server, params->reg, 0); } Weighted averages computed for adaptive updates, weight features differently

Machine learning with AllReduce contd.

L-BFGS requires gradients and loss values One call to AllReduce for each Parallel synchronized L-BFGS updates Same with CG, another AllReduce operation for Hessian Extends to many other common algorithms

Communication and computation

Two main additions in cluster-parallel code: Hadoop-compatible AllReduce communication New and old optimization algorithms modified for AllReduce

Hybrid optimization for rapid convergence

SGD converges fast initially, but slow to squeeze the final bit of precision L-BFGS converges rapidly towards the end, once in a good region

Hybrid optimization for rapid convergence

SGD converges fast initially, but slow to squeeze the final bit of precision L-BFGS converges rapidly towards the end, once in a good region

Hybrid optimization for rapid convergence

SGD converges fast initially, but slow to squeeze the final bit of precision L-BFGS converges rapidly towards the end, once in a good region Each node performs few local SGD iterations, averaging after every pass Switch to L-BFGS with synchronized iterations using AllReduce Two calls to VW

Speedup Near linear speedup 10 9 8

Speedup

7 6 5 4 3 2 1 10

20

30

40

50 60 Nodes

70

80

90

100

Hadoop helps

Na¨ıve implementation driven by slow node Speculative execution ameliorates the problem

Table: Distribution of computing time (in seconds) over 1000 nodes. First three columns are quantiles. The first row is without speculative execution while the second row is with speculative execution.

Without spec. exec. With spec. exec.

5% 29 29

50% 34 33

95% 60 49

Max 758 63

Comm. time 26 10

Fast convergence

auPRC curves for two tasks, higher is better

0.484

0.55

0.482 0.5

0.48 0.478 auPRC

auPRC

0.45 0.4 0.35

0.474 0.472

0.3

Online L−BFGS w/ 5 online passes L−BFGS w/ 1 online pass L−BFGS

0.25 0.2 0

0.476

10

20

30 Iteration

40

50

0.47

Online L−BFGS w/ 5 online passes L−BFGS w/ 1 online pass L−BFGS

0.468 0.466 0

5

10

15 Iteration

20

Conclusions

AllReduce quite general yet easy for machine learning Marriage with Hadoop great for robustness Hybrid optimization strategies effective for rapid convergence John gets his kilonode program

Cluster-parallel learning with VW - PDFKUL.COM

Goals for future from last year. 1. Finish Scaling up. I want a kilonode program. 2. Native learning reductions. Just like more complicated losses. 3. Other learning algorithms, as interest dictates. 4. Persistent Demonization ...

198KB Sizes 0 Downloads 251 Views

Recommend Documents

Interacting with VW in active learning - GitHub
Nikos Karampatziakis. Cloud and Information Sciences Lab. Microsoft ... are in human readable form (text). ▷ Connects to the host:port VW is listening on ...

Cluster-parallel learning with VW - GitHub
´runvw.sh ´ -reducer NONE. Each mapper runs VW. Model stored in /model on HDFS runvw.sh calls VW, used to modify VW ...

1965 VW Beetle.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 1965 VW Beetle.

VW 3.6 Engine Infomation.pdf
Retrying... VW 3.6 Engine Infomation.pdf. VW 3.6 Engine Infomation.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying VW 3.6 Engine Infomation.pdf.

Manual idiotas vw sedan.pdf
Manual idiotas vw sedan.pdf. Manual idiotas vw sedan.pdf. Open. Extract. Open with. Sign In. Main menu. Whoops! There was a problem previewing Manual ...

Batch optimization in VW via LBFGS - GitHub
Dec 16, 2011 - gt. Problem: Hessian can be too big (matrix of size dxd) .... terminate if either: the specified number of passes over the data is reached.

vw touran workshop manual pdf
vw touran workshop manual pdf. vw touran workshop manual pdf. Open. Extract. Open with. Sign In. Main menu. Displaying vw touran workshop manual pdf.

man-91\vw-discount-coupon.pdf
Download. Connect more apps... Try one of the apps below to open or edit this item. man-91\vw-discount-coupon.pdf. man-91\vw-discount-coupon.pdf. Open.

vw beetle haynes manual pdf
File: Vw beetle haynes manual pdf. Download now. Click here if your download doesn't start automatically. Page 1 of 1. vw beetle haynes manual pdf. vw beetle ...

Vw mk5 bentley manual pdf
There was a problem loading more pages. Retrying... Vw mk5 bentley manual pdf. Vw mk5 bentley manual pdf. Open. Extract. Open with. Sign In. Main menu.

vw amarok manual pdf
Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. vw amarok manual pdf. vw amar

vw tiguan owners manual pdf
There was a problem loading more pages. Retrying... vw tiguan owners manual pdf. vw tiguan owners manual pdf. Open. Extract. Open with. Sign In. Main menu.

Sistema combustible vw sedan.pdf
Page 1 of 1. Page 1 of 1. Sistema combustible vw sedan.pdf. Sistema combustible vw sedan.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Sistema combustible vw sedan.pdf. Page 1 of 1.

VW Frozen IC TSB .pdf
Page 3 of 11. VW Frozen IC TSB .pdf. VW Frozen IC TSB .pdf. Open. Extract. Open with. Sign In. Main menu. Displaying VW Frozen IC TSB .pdf. Page 1 of 11.

Sistema Electrico VW Sedan.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Sistema ...

t om,#vw r$v -
qu{dindforurttr r0urJrlonrt6nur 6l riurlrlonrrilnrn. 6r 6o tnr.tnrtilrurJrttilnil{orurtru dn1ilul{uunyrlotu1?t.turrouR.t6'lo'- 1. {ud 8/10/2559. olufiiarJ{oruo:tr.l uunutltrrlil ...

vw tiguan owners manual pdf
Download now. Click here if your download doesn't start automatically. Page 1 of 1. vw tiguan owners manual pdf. vw tiguan owners manual pdf. Open. Extract.

Sistema Electrico VW Sedan.pdf
Sistema Electrico VW Sedan.pdf. Sistema Electrico VW Sedan.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Sistema Electrico VW Sedan.pdf.

Efficient Active Learning with Boosting
compose the set Dn. The whole data set now is denoted by Sn = {DL∪n,DU\n}. We call it semi-supervised data set. Initially S0 = D. After all unlabeled data are labeled, the data set is called genuine data set G,. G = Su = DL∪u. We define the cost

Deep Learning with Differential Privacy
Oct 24, 2016 - In this paper, we combine state-of-the-art machine learn- ing methods with ... tribute to privacy since inference does not require commu- nicating user data to a ..... an Apache 2.0 license from github.com/tensorflow/models. For privac