SLICE INVERSE REGRESSION WITH SCORE FUNCTIONS by Dmitry Babichev and Francis Bach Problem: we consider the projection pursuit regression model for non-linear regression: y = g (w1T X , . . . , wkT X ) + ε. Goal: to find the effective dimension reduction (e.d.r.) space w1 , . . . , wk . Approach: We use method of moments, using the notion of score functions and extension of Stein’s Lemma: (E(S1 (x)y ) in the e.d.r. space [1]). Definition: the score function S1 (x) defined as S1 (x) = −∇ log p(x), where p(x) is the probability density of X . Extension of Stein’s Lemma: E(S1 (x)|y ) in the e.d.r. space almost surely. [1] T. Stoker, Consistent estimation of scaled coefficients, Econometrica, 54(1986), p.1461-1481.

Results Score function extensions to sliced inverse regression method: first-order (SADE) and second-order (SPHD). Infinite and finite sample cases. Finite sample estimators and their consistency. Learning score functions: in two steps as well as directly. −1

0.7

SADE SPHD PHD+

−2

SADE with true score 1−step algorithm 2−step algorithm

0.6 0.5

−4

ˆ R2 (E, E)

1 ! "2 log 2 R2 E, Eˆ

−3

−5 −6 −7 −8

0.4 0.3 0.2

−9 0.1

−10 −11 10

11

12

13

14

15

16

0

2

4

log 2 n

6

Mean and standard deviation of error for d = 10; y = Comparison of one-step and two-step algorithms.

´ Slice Dmitry Babichev, Francis Bach (INRIA - Ecole Normale InverseSup´ Regression erieure) With Score Functions

8

10

σ

x1 1/2+(x2 +2)2 November 25, 2016

+ ε. 1/1

Non-convex Phase Retrieval of Low-Rank Matrix Columns Seyedehsara Nayer*, Namrata Vaswani*, Yonina C. Eldar** *Iowa State University, **Technion

Contributions Goal: recover a low-rank matrix, X, from phaseless measurements of its columns Applications: X-ray crystallography, astronomy, sub-diffractiona imaging,... Contributions: 1

Develop AltMinTrunc that exploits the low-rank structure of X compute a truncated spectral initialization rest of algorithm: intuitive modification of AltMinPhase for above problem

2

Obtain high probability sample complexity bounds for AltMinTrunc initialization to provide a good approximation of X when rank of X is low enough, these are significantly smaller than what existing single vector phase retrieval algorithms need

Seyedehsara Nayer*, Namrata Vaswani*, Yonina Non-convex C. Eldar** Phase Retrieval of Low-Rank Matrix Columns

2/3

Problem Setting Instead of a single vector x, we have a set of q vectors, x1 , x2 , . . . , xq which are such that the n × q matrix X := [x1 , x2 , . . . , xq ] has rank r  min(n, q) For each xk , we observe a set of m measurements of the form yi,k := (ai,k 0 xk )2 , i = 1, 2, . . . m, k = 1, 2, . . . , q

Motivating application: dynamic solar imaging from phaseless measurements; image changes are often influenced by only a few (r) factors

Seyedehsara Nayer*, Namrata Vaswani*, Yonina Non-convex C. Eldar** Phase Retrieval of Low-Rank Matrix Columns

3/3

“Non-convex Optimization with Frank-Wolfe Algo. & Its Variants” Jean Lafond, Hoi-To Wai and Eric Moulines (Poster ID: 12)



Frank-Wolfe (FW) algorithm is popular for ML tasks due to its efficiency in handling high dimensional problems.



Little is known for its behaviors on non-convex problems.



✓t ✓t+1 = (1

Our Contributions: analyze a general FW algorithm with time varying non-convex objective and inexact linear optimization ▶ ▶



General FW converges as fast as O(/ T ). w/ potential acceleration to (close to) O(/T ).

Non-convex Optimization with Frank-Wolfe Algorithm and Its Variants

t )✓t

+

✓? at

Fig. FW algorithm

J. Lafond, H.-T. Wai and E. Moulines

1/2

t at

Main Results ▶

If the FW gap: gt = maxθ∈C ⟨∇Ft (θt ), θt − θ⟩ is zero, then θt is a stationary point to minθ∈C Ft (θ).



Consider Ft (θ) as a time varying objective function, γt = t−α , α ∈ [0.5, ). G-FW : θt+ = θt + γt (ˆ at − θt ),

ˆ t (θt ) , a⟩ , ˆ t ≈ at := arg min ⟨ ∇F a a∈C



ˆ t can Assumption: as t → ∞, the variation |Ft (θ) − Ft− (θ)| → 0 and a accurately track at , both at a sufficiently fast rate.



Result: (i) the FW gap decreases at mint∈[T /2+,T ] gt = O(/T −α ); (ii) the rate can be improved to close to O(/T ); (iii) the accumulation points of the sequence {θt }t≥ are stationary points.



Applications: (i) Online FW; (ii) Decentralized FW.



Example: Non-cvx. formulation for sparse + low-rank matrix completion.



See you at the poster!

Non-convex Optimization with Frank-Wolfe Algorithm and Its Variants

J. Lafond, H.-T. Wai and E. Moulines

2/2

Approximating Traffic Simulation using Neural Networks and its Application in Traffic Optimization Paweł Gora University of Warsaw

Karol Kurach University of Warsaw

Problem Traffic optimization and many traffic analysis tasks require running (large number of) time-consuming simulations. How to do it efficiently?

Solution We can try to approximate outcomes of simulations (e.g., waiting times) using neural networks!

Results Best average relative error: 1.56% Best maximal relative error: 8.47% Neural networks trained using TensorFlow (TensorTraffic) and Adam optimizer. 200-400 neurons, 1-3 layers, training set: > 50000 traffic signal settings (evaluated in the Traffic Simulation Framework software), CV: 5-fold

An Empirical Study of ADMM for Nonconvex Problems Zheng Xu1 , Soham De1 , M´ario A. T. Figueiredo2 , Christoph Studer 3 , Tom Goldstein1 1 Department

of Computer Science, University of Maryland, College Park, MD de Telecomunica¸co ˜es, Instituto Superior T´ ecnico, Universidade de Lisboa, Portugal 3 Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY 2 Instituto

December, 2016

Alternating direction method of multipliers (ADMM) I

Objective minu,v H(u) + G (v ), subject to Au + Bv = b   uk+1 = vk+1 =   λk+1 =

arg minu H(u) + hλk , −Aui + arg minv G (v ) + hλk , −Bv i + λk + τk (b − Auk+1 − Bvk+1 )

I

Steps

I

(Adaptive) penalty parameter τk

τk 2 τk 2

kb − Au − Bvk k22 kb − Auk+1 − Bv k22

Questions I

Does ADMM converge in practice?

I

Does the update order of H(u) and G (v ) matter?

I

Is the local optimal solution good?

I

Does the penalty parameter τk matter?

I

Is an adaptive penalty choice effective?

Empirical study on nonconvex applications I

Nonconvex applications I I

`0 regularized linear regression minx 21 kDx − ck22 + ρkxk0 L0 ImgRes 1LinReg 2 `0 regularized image denoising minL0 x 2 kx − ck2 + ρk∇xk0 1 Phase retrieval minx 2 ||abs(Dx) − c||22 Eigenvector computation maxx kDxk22 subject to kxk2 = 1 3

2

10

2

10

Iterations

I

10

Iterations

I

1

10

1

10

I

0

`0 regularized linear regression

10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

L0 LinReg

V R A

Vanilla ADMM Residual balance Adaptive ADMM

Empirical study 5

3

10

4

10

0

10 -5 -4 -3 -2 -1 0 1 10 10 10 10 10 10 10 Initial penalty paramet

5

10

L0 ImgRes

Phase Retriev

10 3

10

2

10 4

10

Vanilla ADMM Residual balance Adaptive ADMM

29

V R A

2

10 28

27

Objective Iterations

10

Iterations PSNR

2

Iterations

I

3

10 1 10

26 1

1025

1

10

0

2

10

Vanilla ADMM Residual balance Adaptive ADMM

10 -5 -4 -3 -2 -1 0 1 2 10 10 10 10 10 10 10 10 Initial penalty parameter

3

10

4

10

Vanilla ADMM Residual balance Adaptive ADMM

10

5

10

1010 -5-5 -4-4 -3-3 -2-2 -1-1 0 0 1 1 2 2 33 44 55 10 10 10 10 10 10 10 10 10 1010 1010 1010 1010 1010 1010 10 Initialpenalty penaltyparameter parameter Initial

5

I

10

4

10

Vanilla ADMM Residual balance Adaptive ADMM

28 27

Va Re Ad

23 0

1022 -5 -4 -3 -2 -1 0 1 -5 -4 -3 -2 -1 0 1 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 1 Initial Initialpenalty penaltyparameter paramete 2

29

More results in poster and paper.

24

10 Vanilla ADMM Residual balance Adaptive ADMM

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Huishuai Zhang



Yingbin Liang ∗ Syracuse

† The



Yuejie Chi



University

Ohio State University

December 9, 2016

H. Zhang et al.

(Syracuse Nonconvex University, Phase Retrieval The Ohio State University)

Dec. 9, 2016

1/3

Nonconvex Phase Retrieval Problem: Recover x ∈ Rn /Cn from magnitude of linear measurements yi = |ha i , xi| ,

I

Wirtinger flow (WF) (Cand`es et al.14) minimizes nonconvex loss m 1 X 0 2 `WF (z) := (|a i z| − yi2 )2 , 4m

for i = 1, · · · , m,

I

Reshaped Wirtinger flow(RWF) minimizes another loss m 1 X 0 2 (|a i z| − yi ) . 2m

`(z) :=

i=1

i=1

WF loss surface

5 4.5 4

150

3.5 3

100

2.5 2 1.5

50

1 0.5

0 -2

H. Zhang et al.

0

-1

z2

0

1

22

0

-2

2

1

0

-1

-2-2

-1

0

1

2

z1

(Syracuse Nonconvex University, Phase Retrieval The Ohio State University)

Dec. 9, 2016

2/3

Incremental RWF (IRWF) Problem:

arg minz

Pm

0 i=1 (|a i z|

− yi )2 .

IRWF: For iteration t, choose it uniformly from {1, 2, . . . , m}, let    z (t+1) = z (t) − µ · a 0it z (t) − yit · sgn a 0it z (t) ) a it . I

Converge very fast: To recover a real image (1920 × 1080), #passes time cost(s)

I I

IRWF

RWF

WF

8 13.7

70 107

315 426

Initialize it by spectral method → Provable linear convergence Random initialization → Work well empirically, but lack of proof

Future direction: How does stochastic method escape local minimas and saddle points when initialized randomly? H. Zhang et al.

(Syracuse Nonconvex University, Phase Retrieval The Ohio State University)

Dec. 9, 2016

3/3

L-SR1: A Novel Optimization Method for Deep Learning Vivek Ramamurthy, Nigel Duffy Sentient Technologies

December 9, 2016

1/3

Motivation and Algorithm Outline

Second Order Methods: potential for distributed training large mini-batches curvature information

Critical Weaknesses proliferation of saddle points ill-conditioned curvature matrices line search: multiple gradient/function evaluations

Our Solution ‘limited memory’ symmetric rank one update use of trust region method instead of line search improved conditioning using batch normalization

2/3

Experimental Results

3/3

SLICE INVERSE REGRESSION WITH SCORE ...

Nov 25, 2016 - yi,k := (ai,k/xk)2, i = 1,2,...m, k = 1,2,...,q. Motivating application: dynamic solar imaging from phaseless measurements; image changes are often ...

3MB Sizes 3 Downloads 159 Views

Recommend Documents

Propensity Score Estimation with Boosted Regression ...
methods account for differences between treatment and control groups by modeling the selection process. The propensity score is the probability that a study ...

Propensity Score Estimation with Boosted Regression ...
Daniel F. McCaffrey, Greg Ridgeway, and Andrew R. Morral ...... 4 R is a full-featured, freely available language and environment for statistical ... packages have some different functions for programming and conducting statistical analyses.

A Note on Smoothed Functional Inverse Regression
keywords and phrases: Dimension Reduction, Functional Data Analysis, Inverse regression .... By definition of generalized inverse (Groetsch (1977)) we have.

Inverse Functions and Inverse Trigonometric Functions.pdf ...
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

Slice orientation selection arrangement
Jan 21, 1999 - US RE37,325 E. 7. “1” signal is delivered, the phi buffer is operative and if a logic “0” signal is delivered, the theta buffer is operative. An.

Regression Discontinuity Design with Measurement ...
Nov 20, 2011 - All errors are my own. †Industrial Relations Section, Princeton University, Firestone Library, Princeton, NJ 08544-2098. E-mail: zpei@princeton.

Regression Discontinuity Design with Measurement ...
“The Devil is in the Tails: Regression Discontinuity Design with .... E[D|X∗ = x∗] and E[Y|X∗ = x∗] are recovered by an application of the Bayes' Theorem. E[D|X.

Interpreting Regression Discontinuity Designs with ...
Gonzalo Vazquez-Bare, University of Michigan. We consider ... normalizing-and-pooling strategy so commonly employed in practice may not fully exploit all the information available .... on Chay, McEwan, and Urquiola (2005), where school im-.

Mixtures of Inverse Covariances
class. Semi-tied covariances [10] express each inverse covariance matrix 1! ... This subspace decomposition method is known in coding ...... of cepstral parameter correlation in speech recognition,” Computer Speech and Language, vol. 8, pp.

Toasted Bread Slice Applique.pdf
With white thread we're going to make little circles in the upper left corner. (If. you find it difficult to make them with white thread you can make them using a wite- out pencil). Glue the eyes between row 5 and 6. With black felt make the mouth an

cert petition - Inverse Condemnation
Jul 31, 2017 - COCKLE LEGAL BRIEFS (800) 225-6964. WWW. ...... J., dissenting).3. 3 A number of trial courts and state intermediate appellate ...... of Independent Business Small Business Legal Center filed an amici curiae brief in support ...

Opening Brief - Inverse Condemnation
[email protected] [email protected] [email protected] [email protected] [email protected]. Attorneys for Defendants and Appellants. City of Carson and City of Carson Mobilehome Park Rental Review Board. Case: 16-56255, 0

Amicus Brief - Inverse Condemnation
dedicated to advancing the principles of individual liberty, free markets, and limited government. Cato's. Center for Constitutional Studies was established in.

Opening Brief - Inverse Condemnation
of Oakland v. City of Oakland, 344 F.3d 959, 966-67 (9th Cir. 2003);. Buckles v. King Cnty., 191 F.3d 1127, 1139-41 (9th Cir. 1999). The Court in Del Monte Dunes neither held nor implied that a. Penn Central claim must be decided by a jury; Penn Cent

sought rehearing - Inverse Condemnation
On Writ of Review to the Fourth Circuit Court of Appeal, No. 2016-CA-0096 c/w 2016-CA-0262 and 2016-CA-0331, and the Thirty-Fourth Judicial District Court,. Parish of St. Bernard, State of Louisiana, No. 116-860,. Judge Jacques A. Sanborn, Presiding.

Amicus Brief - Inverse Condemnation
S.C. Coastal Council,. 505 U.S. 1003 ..... protect scenic and recreational use of Oregon's ocean shore. .... Burlington & Quincy Railroad Co., 166 U.S. 226. In.

Equivalence of inverse Sturm-Liouville problems with ...
Sep 15, 2008 - Let v and w to be the solutions of (1.1) satisfying the terminal .... 2 Spectral data ..... where E(λ) is analytic in a neighbourhood of a and. pM. = 1.

full brochure - Inverse Condemnation
Local, State & National Trends. April 25-26, 2013 > Tides Inn > Irvington. 7th ANNUAL CONFERENCE. Enjoy the luxurious Tides. Inn at special discount rates.

Inverse Kinematics
later, the power of today's machines plus the development of different methods allows us to think of IK used in real-time. The most recommendable reference ... (for example, certain part of a car) [Lan98]. The node that is wanted to .... goal and Σ