Normalized Online Learning Tutorial Paul Mineiro joint work with Stephane Ross & John Langford December 9th, 2013

Paul Mineiro

Normalized Online Learning Tutorial

Motivation: Covertype Data Set

54 total features Name Units Elevation, Distance to X meters Aspect, Slope degrees Hillshade at time t “hillshade index” (0-255) Wilderness Area {0, 1}4 Soil Type {0, 1}40

Paul Mineiro

Normalized Online Learning Tutorial

The Geometry of Real Data In practice, features often have different scales.

Paul Mineiro

Normalized Online Learning Tutorial

The Geometry of Real Data In practice, features often have different scales. This is a problem for first-order online learning methods.

Paul Mineiro

Normalized Online Learning Tutorial

The Geometry of Real Data In practice, features often have different scales. This is a problem for first-order online learning methods. Example: “vanilla” online GD regret: √ R ≤ T ||w ∗ ||2 max ||gt ||2 t∈1:T

Paul Mineiro

Normalized Online Learning Tutorial

The Geometry of Real Data In practice, features often have different scales. This is a problem for first-order online learning methods. Example: “vanilla” online GD regret: √ R ≤ T ||w ∗ ||2 max ||gt ||2 t∈1:T

This can be made arbitrarily bad in only two dimensions by scaling one of the dimensions while leaving the other fixed. Not an artifact of the analysis. Paul Mineiro

Normalized Online Learning Tutorial

Example Generate data like this x1 ∼ N(0, 1) √ x2 ∼ N(0, s) 1 z ∼ N(x1 + x2 , 1) s Do squared-loss prediction of z. NB: x2 is statistically identical to x1 scaled by s.

Paul Mineiro

Normalized Online Learning Tutorial

Example

Demo

Paul Mineiro

Normalized Online Learning Tutorial

Summary of Demo

Un-normalized learning I I

Lots of fiddling with learning rate. Slow convergence at extreme scales.

Normalized learning I I

No fiddling with learning rate. Same convergence across different scales.

Paul Mineiro

Normalized Online Learning Tutorial

On “Non-Demo” Datasets

Un-normalized learning I I

Lots of fiddling with learning rate. Slow convergence at extreme scales.

Normalized learning I I

No Less fiddling with learning rate. Same Similar convergence across different scales.

Paul Mineiro

Normalized Online Learning Tutorial

How it Works (Mechanically) Intuition: if feature i scaled by s, then j th coordinate of w ∗ should be scaled by 1/s.

Paul Mineiro

Normalized Online Learning Tutorial

How it Works (Mechanically) Intuition: if feature i scaled by s, then j th coordinate of w ∗ should be scaled by 1/s. Ergo:

Paul Mineiro

Normalized Online Learning Tutorial

How it Works (Mechanically) Intuition: if feature i scaled by s, then j th coordinate of w ∗ should be scaled by 1/s. Ergo: I

(s)

Algorithm keeps track of maxs
Paul Mineiro

Normalized Online Learning Tutorial

How it Works (Mechanically) Intuition: if feature i scaled by s, then j th coordinate of w ∗ should be scaled by 1/s. Ergo: I I

(s)

Algorithm keeps track of maxs maxs
wi ← wi

Paul Mineiro

maxs
.

xi

Normalized Online Learning Tutorial

How it Works (Mechanically) II Intuition: learning rate parameter should control average change in the prediction.

Paul Mineiro

Normalized Online Learning Tutorial

How it Works (Mechanically) II Intuition: learning rate parameter should control average change in the prediction. But: gradient is proportional to input size.

Paul Mineiro

Normalized Online Learning Tutorial

How it Works (Mechanically) II Intuition: learning rate parameter should control average change in the prediction. But: gradient is proportional to input size. Ergo:

Paul Mineiro

Normalized Online Learning Tutorial

How it Works (Mechanically) II Intuition: learning rate parameter should control average change in the prediction. But: gradient is proportional to input size. Ergo: I

(s)

Divide each ∂/∂i by maxs≤t |xi |, and . . .

Paul Mineiro

Normalized Online Learning Tutorial

How it Works (Mechanically) II Intuition: learning rate parameter should control average change in the prediction. But: gradient is proportional to input size. Ergo: I I

(s)

Divide each ∂/∂i by maxs≤t |xi |, and . . . Normalize the entire update by the average change in prediction Nt /t, where (t)

Nt = Nt−1 +

Paul Mineiro

X

(xi )2

i

(maxs≤t |xi |)2

(s)

Normalized Online Learning Tutorial

How it Works (Mechanically) II Intuition: learning rate parameter should control average change in the prediction. But: gradient is proportional to input size. Ergo: I I

(s)

Divide each ∂/∂i by maxs≤t |xi |, and . . . Normalize the entire update by the average change in prediction Nt /t, where (t)

Nt = Nt−1 +

I

X

(xi )2

i

(maxs≤t |xi |)2

(s)

Intuition behind Nt : if this is an example with small xi , prediction is not changing very fast because gradient is normalized by scale. Paul Mineiro

Normalized Online Learning Tutorial

When it fails

Algorithm normalizes by scale estimate derived from history.

Paul Mineiro

Normalized Online Learning Tutorial

When it fails

Algorithm normalizes by scale estimate derived from history. If the scale suddenly gets very large near the end of the input sequence, the scale estimates have been poor for most of the updates.

Paul Mineiro

Normalized Online Learning Tutorial

When it fails

Algorithm normalizes by scale estimate derived from history. If the scale suddenly gets very large near the end of the input sequence, the scale estimates have been poor for most of the updates. |xti | Theorems are driven by ∆i = maxt∈1:T . |x i | t i 0

Paul Mineiro

Normalized Online Learning Tutorial

How to use

It is enabled by default in vw.

Paul Mineiro

Normalized Online Learning Tutorial

How to use

It is enabled by default in vw. To not use:

Paul Mineiro

Normalized Online Learning Tutorial

How to use

It is enabled by default in vw. To not use: --adaptive --invariant

Paul Mineiro

Normalized Online Learning Tutorial

How to use

It is enabled by default in vw. To not use: --adaptive --invariant . . . will you give vanilla AdaGrad without normalization.

Paul Mineiro

Normalized Online Learning Tutorial

Normalized Online Learning Tutorial - GitHub

Normalized Online Learning Tutorial. Paul Mineiro joint work with Stephane Ross & John Langford. December 9th, 2013. Paul Mineiro. Normalized Online ...

230KB Sizes 12 Downloads 337 Views

Recommend Documents

DSQSS Tutorial 2015.12.01 - GitHub
Dec 1, 2015 - Step1 :Choose a site and an imaginary time point. Step2 :Put a worm pair. if no, go to Step4. Step3 :The worm-head moving. When the head ...

Epic Vim Tutorial - GitHub
Jan 19, 2012 - Move back to the start of the first email address in the file. 7. Press Ctrl+v to enter visual block mode. 8. Using the arrow keys, select all email ...

Tutorial OpenPIV - GitHub
Sep 6, 2012 - ... is an open source Particle Image Velocimetry (PIV) analysis software ... the visualization of the vectors but it has to be taken into account ...

Custom Skin Tutorial - GitHub
dashboard.html – defines all dashboard elements supported by the skin .... There are two ways to display numbers: as a string value or as an analog meter.

Metaparse tutorial - GitHub
"a.*a" int main(). { string s; cin

BamTools API Tutorial - GitHub
Mar 23, 2011 - https://github.com/pezmaster31/bamtools/wiki/BamTools-1x_PortingGuide.pdf ... adjust how your app locates the shared library at runtime.

GNU gdb Tutorial - GitHub
The apropos command can be used to find commands. 3. Basic Debugging .... exist in your program; they are assigned by GDB to give you a way of designating ...

Vulkan Tutorial - GitHub
cross-platform and allows you to develop for Windows, Linux and Android at ..... to be described explicitly, there is no default color blend state, for example. 10 ...

WiFiMCU Tutorial - GitHub
2, Quickly Start with WiFiMCU STUDIO ................................................................................. 3 .... 2 Breathing LED -use PWM module . .... Content-Type:text/html. 7.

Cryptography Tutorial Contents - GitHub
In Erlang to encode some data we might do something like: Bin1 = encrypt(Bin, SymKey), ...... Trying all the small strings on your hard disk to see if they are passwords. • Analyzing the swap .... http://cr.yp.to/highspeed/ · coolnacl-20120725.pdf.

Tutorial for Overture/VDM++ - GitHub
Sep 6, 2015 - Overture Technical Report Series. No. TR-004. September ... Year Version Version of Overture.exe. January. 2010. 0.1.5 ... Contents. 3 Overture Tool Support for VDM++. 1. 3.1 Introduction . .... classes and their usage in class diagrams

Vowpal Wabbit 7 Tutorial - GitHub
Weight 1 by default. – Label: use {-1,1} for classification, or any real value for regression. 1 | 1:0.43 5:2.1 10:0.1. -1 | I went to school. 10 | race=white sex=male ...

D Templates: A Tutorial - GitHub
In the next chapters, you'll see how to define function, struct and class templates. ...... can find on the D Programming Language website, but they act in a natural ...... opDispatch is a sort of operator overloading (it's in the same place in the o

Problem Tutorial: “Apples” - GitHub
careful when finding x, cause the multiplication might not fit in the limits of long long. Also don't the forget the case when there's no answer. Page 1 of 1.

Tutorial Blended Learning Menggunakan GoToWebinar.pdf ...
Tutorial Blended Learning Menggunakan GoToWebinar.pdf. Tutorial Blended Learning Menggunakan GoToWebinar.pdf. Open. Extract. Open with. Sign In.

MeqTrees Batch Mode: A Short Tutorial - GitHub
tdlconf.profiles is where you save/load options using the buttons at ... Section is the profile name you supply ... around the Python interface (~170 lines of code).

Tutorial for Overture/VDM-SL - GitHub
2010 1. 0.2. February. 2011 2. 1.0.0. April. 2013 3. 2.0.0. September 2015 4 ..... Figure 3.13: The generated pdf file with test coverage information .... the Overture documentation at http://overturetool.org/documentation/manuals.html for the.

Notes and Tutorial on GDB - GitHub
CSM Linux Users Group ... GDB can make use of special symbols in your program to help you debug. ... exists and is up to date, and if so, call the debugger.

Microsoft Learning Experiences - GitHub
Performance for SQL Based Applications. Then, if you have not already done so, ... In the Save As dialog box, save the file as plan1.sqlplan on your desktop. 6.

Microsoft Learning Experiences - GitHub
A Windows, Linux, or Mac OS X computer. • Azure Storage Explorer. • The lab files for this course. • A Spark 2.0 HDInsight cluster. Note: If you have not already ...

Microsoft Learning Experiences - GitHub
Start Microsoft SQL Server Management Studio and connect to your database instance. 2. Click New Query, select the AdventureWorksLT database, type the ...

Microsoft Learning Experiences - GitHub
performed by writing code to manipulate data in R or Python, or by using some of the built-in modules ... https://cran.r-project.org/web/packages/dplyr/dplyr.pdf. ... You can also import custom R libraries that you have uploaded to Azure ML as R.

Microsoft Learning Experiences - GitHub
Developing SQL Databases. Lab 4 – Creating Indexes. Overview. A table named Opportunity has recently been added to the DirectMarketing schema within the database, but it has no constraints in place. In this lab, you will implement the required cons

Microsoft Learning Experiences - GitHub
create a new folder named iislogs in the root of your Azure Data Lake store. 4. Open the newly created iislogs folder. Then click Upload, and upload the 2008-01.txt file you viewed previously. Create a Job. Now that you have uploaded the source data