Goals for future from last year

1

Finish Scaling up. I want a kilonode program.

2

Native learning reductions. Just like more complicated losses.

3

Other learning algorithms, as interest dictates.

4

Persistent Demonization

Goals for future from last year

1

Finish Scaling up. I want a kilonode program.

Some design considerations

Hadoop compatibility: Widely available, scheduling and robustness Iteration-firendly: Lots of iterative learning algorithms exist Minimum code overhead: Don’t want to rewrite learning algorithms from scratch Balance communication/computation: Imbalance on either side hurts the system

Some design considerations

Hadoop compatibility: Widely available, scheduling and robustness Iteration-firendly: Lots of iterative learning algorithms exist Minimum code overhead: Don’t want to rewrite learning algorithms from scratch Balance communication/computation: Imbalance on either side hurts the system Scalable: John has nodes aplenty

Current system provisions

Hadoop-compatible AllReduce Various parameter averaging routines Parallel implementation of Adaptive GD, CG, L-BFGS Robustness and scalability tested up to 1K nodes and thousands of node hours

Basic invocation on single machine

./spanning tree ../vw --total 2 --node 0 --unique id 0 -d $1 --span server localhost > node 0 2>&1 & ../vw --total 2 --node 1 --unique id 0 -d $1 --span server localhost killall spanning tree

Command-line options

--span server : Location of server for setting up spanning tree --unique id (=0): Unique id for cluster parallel job --total (=1): Total number of nodes used in cluster parallel job --node (=0): Node id in cluster parallel job

Basic invocation on a non-Hadoop cluster Spanning-tree server: Runs on cluster gateway, organizes communication ./spanning tree Worker nodes: Each worker node runs VW ./vw --span server --total --node --unique id -d

Basic invocation in a Hadoop cluster Spanning-tree server: Runs on cluster gateway, organizes communication ./spanning tree Map-only jobs: Map-only job launched on each node using Hadoop streaming hadoop jar $HADOOP HOME/hadoop-streaming.jar -Dmapred.job.map.memory.mb=2500 -input -output -file vw -file runvw.sh -mapper ´runvw.sh ´ -reducer NONE Each mapper runs VW Model stored in /model on HDFS runvw.sh calls VW, used to modify VW arguments

mapscript.sh example //Hadoop-streaming has no specification for number of mappers, we calculate it indirectly total= mapsize=`expr $total / $nmappers` maprem=`expr $total % $nmappers` mapsize=`expr $mapsize + $maprem` ./spanning tree //Starting span-tree server on the gateway //Note the argument min.split.size to specify number of mappers hadoop jar $HADOOP HOME/hadoop-streaming.jar -Dmapred.min.split.size=$mapsize -Dmapred.map.tasks.speculative.execution=true -input $in directory -output $out directory -file ../vw -file runvw.sh -mapper runvw.sh -reducer NONE

Communication and computation

Two main additions in cluster-parallel code: Hadoop-compatible AllReduce communication New and old optimization algorithms modified for AllReduce

Communication protocol

Spanning-tree server runs as daemon and listens for connections Workers via TCP with a node-id and job-id Two workers with same job-id and node-id are duplicates, faster one kept (speculative execution) Available as mapper environment variables in Hadoop mapper=`printenv mapred task id | cut -d " " -f 5` mapred job id=`echo $mapred job id | tr -d ´job ´`

Communication protocol contd.

Each worker connects to spanning-tree sever Server creates a spanning tree on the n nodes, communicates parent and children to each node Node connects to parent and children via TCP AllReduce run on the spanning tree

AllReduce

Every node begins with a number (vector)

1 2 4

3 5

6

7

Extends to other functions: max, average, gather, . . .

AllReduce

Every node begins with a number (vector)

1 11 4

16 5

6

7

Extends to other functions: max, average, gather, . . .

AllReduce

Every node begins with a number (vector)

28 11 4

16 5

6

7

Extends to other functions: max, average, gather, . . .

AllReduce

Every node begins with a number (vector) Every node ends up with the sum

28 28 28

28 28

28

28

Extends to other functions: max, average, gather, . . .

AllReduce Examples

Counting: n = allreduce(1) Average: avg = allreduce(ni )/allreduce(1) Non-uniform averaging: weighted avg = allreduce(ni wi )/allreduce(wi ) Gather: node array = allreduce({0, 0, . . . , |{z} 1 , . . . , 0}) i

AllReduce Examples

Counting: n = allreduce(1) Average: avg = allreduce(ni )/allreduce(1) Non-uniform averaging: weighted avg = allreduce(ni wi )/allreduce(wi ) Gather: node array = allreduce({0, 0, . . . , |{z} 1 , . . . , 0}) i

Current code provides 3 routines: accumulate(): Computes vector sums accumulate scalar(): Computes scalar sums accumulate avg(): Computes weighted and unweighted averages

Machine learning with AllReduce Previously: Single node SGD, multiple passes over data Parallel: Each node runs SGD, averages parameters after every pass (or more often!) Code change: if(global.span server != "") { if(global.adaptive) accumulate weighted avg(global.span server, params->reg); else accumulate avg(global.span server, params->reg, 0); } Weighted averages computed for adaptive updates, weight features differently

Machine learning with AllReduce contd.

L-BFGS requires gradients and loss values One call to AllReduce for each Parallel synchronized L-BFGS updates Same with CG, another AllReduce operation for Hessian Extends to many other common algorithms

Communication and computation

Two main additions in cluster-parallel code: Hadoop-compatible AllReduce communication New and old optimization algorithms modified for AllReduce

Hybrid optimization for rapid convergence

SGD converges fast initially, but slow to squeeze the final bit of precision L-BFGS converges rapidly towards the end, once in a good region

Hybrid optimization for rapid convergence

SGD converges fast initially, but slow to squeeze the final bit of precision L-BFGS converges rapidly towards the end, once in a good region

Hybrid optimization for rapid convergence

SGD converges fast initially, but slow to squeeze the final bit of precision L-BFGS converges rapidly towards the end, once in a good region Each node performs few local SGD iterations, averaging after every pass Switch to L-BFGS with synchronized iterations using AllReduce Two calls to VW

Speedup Near linear speedup 10 9 8

Speedup

7 6 5 4 3 2 1 10

20

30

40

50 60 Nodes

70

80

90

100

Hadoop helps

Na¨ıve implementation driven by slow node Speculative execution ameliorates the problem

Table: Distribution of computing time (in seconds) over 1000 nodes. First three columns are quantiles. The first row is without speculative execution while the second row is with speculative execution.

Without spec. exec. With spec. exec.

5% 29 29

50% 34 33

95% 60 49

Max 758 63

Comm. time 26 10

Fast convergence

auPRC curves for two tasks, higher is better

0.484

0.55

0.482 0.5

0.48 0.478 auPRC

auPRC

0.45 0.4 0.35

0.474 0.472

0.3

Online L−BFGS w/ 5 online passes L−BFGS w/ 1 online pass L−BFGS

0.25 0.2 0

0.476

10

20

30 Iteration

40

50

0.47

Online L−BFGS w/ 5 online passes L−BFGS w/ 1 online pass L−BFGS

0.468 0.466 0

5

10

15 Iteration

20

Conclusions

AllReduce quite general yet easy for machine learning Marriage with Hadoop great for robustness Hybrid optimization strategies effective for rapid convergence John gets his kilonode program

Cluster-parallel learning with VW - GitHub

´runvw.sh ´ -reducer NONE. Each mapper runs VW. Model stored in /model on HDFS runvw.sh calls VW, used to modify VW ...

198KB Sizes 114 Downloads 380 Views

Recommend Documents

Interacting with VW in active learning - GitHub
Nikos Karampatziakis. Cloud and Information Sciences Lab. Microsoft ... are in human readable form (text). ▷ Connects to the host:port VW is listening on ...

Cluster-parallel learning with VW - PDFKUL.COM
Goals for future from last year. 1. Finish Scaling up. I want a kilonode program. 2. Native learning reductions. Just like more complicated losses. 3. Other learning algorithms, as interest dictates. 4. Persistent Demonization ...

Batch optimization in VW via LBFGS - GitHub
Dec 16, 2011 - gt. Problem: Hessian can be too big (matrix of size dxd) .... terminate if either: the specified number of passes over the data is reached.

Deep Learning with H2O.pdf - GitHub
best-in-class algorithms such as Random Forest, Gradient Boosting and Deep Learning at scale. .... elegant web interface or fully scriptable R API from H2O CRAN package. · grid search for .... takes to cut the learning rate in half (e.g., 10−6 mea

Microsoft Learning Experiences - GitHub
Performance for SQL Based Applications. Then, if you have not already done so, ... In the Save As dialog box, save the file as plan1.sqlplan on your desktop. 6.

Microsoft Learning Experiences - GitHub
A Windows, Linux, or Mac OS X computer. • Azure Storage Explorer. • The lab files for this course. • A Spark 2.0 HDInsight cluster. Note: If you have not already ...

Microsoft Learning Experiences - GitHub
Start Microsoft SQL Server Management Studio and connect to your database instance. 2. Click New Query, select the AdventureWorksLT database, type the ...

Microsoft Learning Experiences - GitHub
performed by writing code to manipulate data in R or Python, or by using some of the built-in modules ... https://cran.r-project.org/web/packages/dplyr/dplyr.pdf. ... You can also import custom R libraries that you have uploaded to Azure ML as R.

Microsoft Learning Experiences - GitHub
Developing SQL Databases. Lab 4 – Creating Indexes. Overview. A table named Opportunity has recently been added to the DirectMarketing schema within the database, but it has no constraints in place. In this lab, you will implement the required cons

Microsoft Learning Experiences - GitHub
create a new folder named iislogs in the root of your Azure Data Lake store. 4. Open the newly created iislogs folder. Then click Upload, and upload the 2008-01.txt file you viewed previously. Create a Job. Now that you have uploaded the source data

Microsoft Learning Experiences - GitHub
will create. The Azure ML Web service you will create is based on a dataset that you will import into. Azure ML Studio and is designed to perform an energy efficiency regression experiment. What You'll Need. To complete this lab, you will need the fo

Microsoft Learning Experiences - GitHub
Lab 2 – Using a U-SQL Catalog. Overview. In this lab, you will create an Azure Data Lake database that contains some tables and views for ongoing big data processing and reporting. What You'll Need. To complete the labs, you will need the following

Microsoft Learning Experiences - GitHub
The final Execute R/Python Script. 4. Edit the comment of the new Train Model module, and set it to Decision Forest. 5. Connect the output of the Decision Forest Regression module to the Untrained model (left) input of the new Decision Forest Train M

Microsoft Learning Experiences - GitHub
Page 1 ... A web browser and Internet connection. Create an Azure ... Now you're ready to start learning how to build data science and machine learning solutions.

Microsoft Learning Experiences - GitHub
In this lab, you will explore and visualize the data Rosie recorded. ... you will use the Data Analysis Pack in Excel to apply some statistical functions to Rosie's.

Microsoft Learning Experiences - GitHub
created previously. hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. /data/storefile Stocks. 8. Wait for the MapReduce job to complete. Query the Bulk Loaded Data. 1. Enter the following command to start the HBase shell. hbase shell. 2.

Microsoft Learning Experiences - GitHub
videos and demonstrations in the module to learn more. 1. Search for the Evaluate Recommender module and drag it onto the canvas. Then connect the. Results dataset2 (right) output of the Split Data module to its Test dataset (left) input and connect

Microsoft Learning Experiences - GitHub
In this lab, you will create schemas and tables in the AdventureWorksLT database. Before starting this lab, you should view Module 1 – Designing a Normalized ...

Deep Learning - GitHub
2.12 Example: Principal Components Analysis . . . . . . . . . . . . . 48. 3 Probability and .... 11.3 Determining Whether to Gather More Data . . . . . . . . . . . . 426.

Microsoft Learning Experiences - GitHub
Challenge 1: Add Constraints. You have been given the design for a ... add DEFAULT constraints to columns based on the requirements. Challenge 2: Test the ...

Microsoft Learning Experiences - GitHub
Review the following design requirements for your stored procedure: Stored Procedure: Reports. ... @Color (same data type as the Color column in the SalesLT.

Microsoft Learning Experiences - GitHub
A Microsoft Windows, Apple Macintosh, or Linux computer. Create an Azure Subscription. Note: If you already have a Microsoft Azure subscription, you can skip ...

Applied Machine Learning - GitHub
Then in the Upload a new notebook dialog box, browse to select the notebook .... 9. On the browser tab containing the dashboard page for your Azure ML web ...