Shrinkage Structure of Partial Least Squares Ole C. Lingjaerde, Nils Christophersen University of Oslo

Scandinavia Journal of Statistics 2000

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

1 / 20

Outline

1

Introduction Shrinkage structure Partial least square Ritz values Filter factor for PLS

2

Shrinkage structure of PLS

3

Discussion

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

2 / 20

Starting from OLS

Consider linear model y = X β +  where b is a p-vector of unknown parameters and  is an n-dimensional noise vector.

Definition (Singular value decomposition) Consider n × p dimensional matrix X , define the singular value decomposition (SVD) as: X = UDV T

(1)

where U T U = V T V = VV T = Ip and D is a diagonal matrix with singular values σ1 ≥ . . . ≥ σp on the diagonal. So we have X −1 = VD −1 U T and X T X = UD 2 V T .

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

3 / 20

More from OLS

Following Equation 1 in the previous slide, we have βˆOLS = VD −1 U T y =

p X uT y i

i=1

σi

vi

(2)

where σi is the ith eigenvalue of X T X . Note that in cases of high collinearity, we have very small σi certain i. In this case OLS estimation suffers huge variability. fi = uiT y is called Fourier coefficient, which is of interest in latter discussions.

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

4 / 20

The salvation of OLS: shrinkage estimator To solve the collinearity problem suffered by OLS, several shrinkage estimation methodologies has been developed. For example, Principal Component Regression (PCR): βˆPCR =

m X uT y i

i=1

σi

vi

(3)

where m < p. And the Ridge Regression (RR): βˆRR =

p X

σi2 uiT y vi + k σi

(4)

σ2 i=1 i

We may expect PCR and RR to yield ”shrinked” results from βˆOLS .

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

5 / 20

Filter factors Many well-known shrinkage estimator may take the form β=

p X i=1

wi

uiT y vi σi

(5)

where wi is called filter factors. Filter factor may be of interest in discussing the shrinkage of PLS. For example: For OLS, wi = 1 For PCR, wi = 1 for i ≤ m and wi = 0 for i > m For RR, wi =

σi2 2 σi +k σi2 2 σi +ki

For GRR, wi =

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

6 / 20

Partial least square Definition (Partial Least Square (PLS)) For m ≥ 1 we define Krylov space as Km = span{X T y , (X T X )X T y , . . . , (X T X )m−1 X T y } Then the PLS estimate is defined as βˆPLS

= argmin|y − X β|2 subject to β ∈ Km

(6)

Note that the solution of PLS is a function on m. Let Rm be a p × m matrix whose columns span the space of Km , then the explicit solution is  −1 T T T T βˆPLS = Rm Rm X XRm Rm X y

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

(7)

7 / 20

Partial least square (continued)

1

Definition (Partial Least Square (another version)) The general underlying model of multivariate PLS is: X

= TP T + E

Y

= TQ T + F

(8)

where X is an n × p matrix of predictors, Y is an n × 1 matrix of responses, T is an n × m score matrix, and P and Q are p × m and 1 × m loading matrices, respectively. Note that when m = rank(X ), the result is identical to that of OLS. However, the shrinkage property and wi of PLS is not so apparent as PCR and RR. 1

http://en.wikipedia.org/wiki/Partial_least_squares_regression

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

8 / 20

Ritz values

Definition (Ritz values) Let (v , λ) be the eigenvector-eigenvalue pair for X T X . When we want to find a good approximation in Km for (v , λ), an orthogonal projection method we get (u, θ) with X T Xu − θu ⊥ Km

(9)

The θ here is called Ritz value. Intuitively, Ritz value should be approaching the eigen value of X T X as m increases. T X T XR . Ritz value is also the eigenvalues for Rm m

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

9 / 20

Ritz values for PLS Theorem (Property of Ritz values) (m)

The Ritz values θ1 1

λ1 >

(m) θ1

(m)

≥ . . . ≥ θm

≥ ... ≥ (m) θi

(m) θm

satisfy the properties:

> λp

2

λp−m+i ≤

3

Each of the open intervals (θi+1 , θi eigenvalues λj .

4

The Ritz values {θi

< λi , i = 1, . . . , m (m)

(m)

(m)

θ1

(m)

(m+1)

} and {θi (m)

> θ1

) contains one or more

} separate each other. (m)

(m+1)

> . . . > θm > θm+1

Thus for fixed k, the kth largest Rits value increases with m and the kth smallest Ritz value decreases with m Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

10 / 20

A picture explains everything, about Ritz values

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

11 / 20

Filter factor of PLS

Theorem (Filter factor values for PLS) Assume that dim(Km ) = m. The filter factor for PLS with m factors are given by   m Y (m) 1 − λi  wi = 1 − (10) (m) θ j=1 j for i = 1, . . . , p. Here λ1 ≥ . . . ≥ λp are eigen values of X T X . θ1 ≥ . . . ≥ θm are eigen values of VmT X T XVm . Vm is any p × m matrix that form an orthogonal basis for Km . Here we can see that the filter factor for PLS is completely determined by the eigenvalues of matrix X T X and VmT X T XVm .

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

12 / 20

Outline

1

Introduction Shrinkage structure Partial least square Ritz values Filter factor for PLS

2

Shrinkage structure of PLS

3

Discussion

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

13 / 20

Properties of filter factors

Theorem (Largest and smallest filter factors) For all m, we have wpm ≤ 1, and w1

(m)

≥ 1 for m = 1, 3, 5, . . .

(m) w1

≤ 1 for m = 2, 4, 6, . . .

This theorem provides a striking difference between the filter factors of PLS and other shrinkage methods (e.g. PCR, RR). While in other methods the filter factor are generally no larger than 1, the PLS filter factor oscillate at 1.

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

14 / 20

A picture explains everything, for filter factors

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

15 / 20

Property of filter factors, continued Theorem (Filter factors in the middle) For m < M there is a partitioning of the set of integers 1, . . . , p in m + 1 non-empty disjoint sets I1 , . . . , Im+1 , where each element in Ij is smaller than each element in Ik when j < k. When m is even, we have (m)

wi

≤ 1 for i ∈ I1 ∪ I3 ∪ . . . ∪ Im+1

(m) wi

≥ 1 for i ∈ I2 ∪ I4 ∪ . . . ∪ Im

When m is odd, we have (m)

wi

≥ 1 for i ∈ I1 ∪ I3 ∪ . . . ∪ Im+1

(m) wi

≤ 1 for i ∈ I2 ∪ I4 ∪ . . . ∪ Im (m)

We can see that wi also oscillate around 1 with m. But as m increases, (m) wi approaches to 1. Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

16 / 20

Close to unity condition

Theorem (Close to unity condition) For 1 ≤ i ≤ m ≤ p − 2, we have (m)

|wi

(m) (λ1

− 1| ≤ ρi

− λp )m tan2 φ(X T y , vi ) λm p

(m)

Here ρi is defined as a very complicated function on Ritz values and eigenvalues of X T X . φ(X T y , vi ) is the angle between X T y and vi . vi is the ith eigenvector of X T X . (m) This theorem shows that for a small angle between X T y and vi , wi is close to unity.

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

17 / 20

Large deviation condition

Theorem (Large deviation condition) Let N be an integer with m ≤ N ≤ p. δ∗ δ ∗ = min{|λi − λk |, i 6= k, i, k ≤ N}. Given a positive δ ≤ 2λ , suppose 1 there is a set of m distinct indices J ⊆ {1, . . . , N} such that (m) |wi − 1| < δ m for i ∈ J. Then for any index l ∈ {1, . . . , p} \ J, we have (m)

|wl

− 1| >

Y i∈J

|1 −

X λl λl |−δ λi λi i∈J

Y k∈J\{i}

|1 −

λl | + O(δ 2 ) λi

This shows that for l such that λl  λi for all i ∈ J (e.g. high collinearity (m) cases), the correspondence filter factor wl must deviate significantly from 1.

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

18 / 20

Discussion

The filter factor for PLS oscillate around 1 First PLS filter factor quickly converge to 1. But the last Ritz value has poor approximation to corresponding eigenvalue Ritz values approximate their corresponding eigen values in natural order Intermdiate filter factors follow a less consistent pattern Filter factors may be negative! ...

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

19 / 20

Thank you!

Lingjaerde, Christophersen (U Oslo)

Shrinkage Structure of Partial Least Squares

Scandinavia J Stat 2000

20 / 20

Shrinkage Structure of Partial Least Squares

Definition (Partial Least Square (another version)). The general underlying model of multivariate ... Vm is any p × m matrix that form an orthogonal basis for Km.

585KB Sizes 4 Downloads 197 Views

Recommend Documents

Partial Least Squares (PLS) Regression.
as a multivariate technique for non-experimental and experimental data alike ... of predictors is large compared to the number of observations, X is likely to be singular and .... akin to pca graphs (e.g., by plotting observations in a t1 × t2 space

Using Partial Least Squares in Digital Government ... -
relationship to information technology success and few hypotheses ..... Percentage of population with bachelor's degree or higher (2000). -0.7734. Percentage of ...

DISTRIBUTED LEAST MEAN SQUARES STRATEGIES ...
the need for a central processor. In this way, information is pro- ... sensors monitor a field of spatially correlated values, like a tempera- ture or atmospheric ...

Adaptive Least Mean Squares Estimation of Graph ...
processing tools to signals defined over a discrete domain whose elementary ... tated by the graph topology, the analysis tools come to depend on the graph ...... simulations. D. Sampling Strategies. As illustrated in the previous sections, the prope

Ordinary Least Squares Estimation of a Dynamic Game ...
Feb 14, 2015 - 4 Numerical Illustration ... additive market fixed effect to the per-period payoff in the game described above. ..... metrica 50 (1982), 1029 -1054.

Improvement of least-squares integration method with iterative ...
... integration is one of the most effective and widely used methods for shape reconstruction ... There are a variety of optical techniques in three- dimensional (3D) shape .... determined through simulation with the ideal sur- face containing a cert

Supplement to “Generalized Least Squares Model ...
FGLSMA can be employed even when there is no clue about which variables affect the variances. When we are certain that a small number of variables affect the variances but the variance structure is unknown, the semiparametric FGLSMA estimator may be

Nearly Optimal Bounds for Orthogonal Least Squares
Q. Zhang is with the School of Electronic and Information En- gineering ...... Institute of Technology, China, and the Ph.D. degree in Electrical. & Computer ...

Least Squares-Filtered Bayesian Updating for Remaining ... - UF MAE
Critical crack size ai. = Initial crack size. aN. = Crack size at Nth inspection an meas ... methods, and includes particle filters15 and Bayesian techniques16, 17.

Least-squares shot-profile wave-equation migration
example, allowing for migration velocity models that vary in depth only (Gazdag, ... for example, generalized inversion can compensate for incomplete data.

Data reconstruction with shot-profile least-squares ...
signal, and allowing SPDR to reconstruct a shot gather from aliased data. SPDR is .... the earth's reflectors, the signal and alias map to disjoint regions of the model ...... phones are spaced every 80.0-m. In other ... This allows us to compare the

Least-squares shot-profile wave-equation migration
Department of Physics, University of Alberta, Edmonton, Alberta, Canada .... (with a chosen parameterization) are migrated images, c is the Earth's velocity, and ...

A Regularized Weighted Least-Squares Approach
Feb 11, 2014 - we propose a preconditioned conjugate gradient scheme which is ..... that the convergence of a conjugate gradient method is faster if the ...

Least Squares-Filtered Bayesian Updating for ...
Damage in the micro-structure level grows slowly, is often difficult to detect, and is not ..... Due to bias the general trend of the crack size is shifted down from the ...

A Least#Squares Estimator for Monotone Index Models
condition which is weaker than what is required for consistency of Hangs MRC. The main idea behind the minimum distance criterion is as follows. When one ...