Square Deal: Lower Bounds and Improved Relaxations ...

Viewer
Transcript

Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery Cun Mu, Bo Huang, John Wright and Donald Goldfarb Introduction

A Better Convexification: Square Norm

Recovering Objects with Multiple Structures n1×n2×···×nK

Recovering a low-rank tensor X 0 ∈ R from incomplete information: I A recurring problem in signal processing and machine learning I Tucker Decomposition (low-rank):

I

I I

n

Signal x0 ∈ R : several low-dimensional structures (e.g. sparsity, low-rank, etc.) simultaneously imposed k·k(i) : the penalty norm corresponding to the i-th structure (e.g. `1, nuclear norm) Composite norm optimization: minn f (x) := λ1 kxk(1) + λ2 kxk(2) + · · · + λK kxk(K ) subject to G[x] = G[x0]

I

X = reshape X (1), n

I I

(3)

x∈R

Theorem (Multi-structured object recovery): Let x0 6= 0. Suppose that for each i, k·k(i) is Li -Lipschitz. Set 2

κi = n cos (θi ) :=

I

Gaussian measurements: z = G[X 0] ∈ R

I

Complexity Measure Relaxation cos θ :=

Sparse x ∈ R k= kxk0 n1×n2 Column-sparse X ∈ R c = # j | Xej 6= 0 n1×n2 Low-rank X ∈ R (n1 ≥ n2) r = rank(X) I

| n1 = n2 = · · · = nK = n, ranktc(X ) (r , r , · · · , r )}.

A vector optimization problem: minimize(w.r.t.

I

RK+ )

ranktc(X )

kxk P 1

Xej j 2 kXk∗

(a) more general family of regularizers

(b) sharper bound

subject to G[X ] = G[X 0]

cone(∂ kx0k(2))

cone(∂ kx0k(1))

0

x0

x0

I

Theorem (Non-convex recovery): Whenever m ≥ (2r ) + 2nrK + 1, with probability one, null(G) ∩ T2r = {0}, and hence (1) recovers every X 0 ∈ Tr .

C(k·k(1) , x0)

I

C(k·k(2) , x0) I

Convexification: Sum of Nuclear Norms (SNN)? SNN minimization model (Liu et. al. ’09, Signoretto et. al. ’10, plus many K others): X minimize λi kX (i)k∗ subject to G[X ] = G[X 0] (2) i=1

Corollary ([Tomioka et. al. ’11]): Suppose that X 0 ∈ Tr and m ≥ Crn . Then, with high probability, X 0 is the optimal solution to (2), with each λi = 1. Here, C is an

o

C = cone ∂f (x0) = cone(∂ kx0k(1) + ∂ kx0k(2)) ⊆ conv cone(∂ kx0k(1)), cone(∂ kx0k(2)) ⊆ conv circ(x0, θ1), circ(x0, θ2) ⊆ circ(x0, θ1)

A suboptimal bound: — A better bound for SNN model? NO! — Alternative convexification with better performance?

o

C , the polar cone of C, o C = cone ∂f (x0) ⊆ circ(x0, max θi )

I

I

o

C will be “large”, whenever C is “small”. Size of a convex cone: statistical dimension [Amelunxen et. al. ’13]

Square Norm: minimize

kX k

subject to PΩ[X ] = PΩ[X 0] Tensor completion with SNN minimization

30

30

28

28

26

26

24

24

22 20 18

22 20 18

16

16

14

14

12

12

10

10 0.02

0.04

0.06 0.08 0.1 0.12 0.14 0.16 Fraction of entries observed (rho)

0.18

0.2

0.02

0.04

0.06 0.08 0.1 0.12 0.14 0.16 Fraction of entries observed (rho)

0.18

Figure: The colormap indicates the fraction of correct recovery, which increases with brightness from certain failure (black) to certain success (white). Here we generate X 0 ∈ Rn×n×n×n with ranktc(X 0) = (1, 1, 2, 2) randomly.

Summary & Discussion I

i

Then if the number of measurements m ≤ κ − 1, X 0 is not the unique solution to (2), with probability at least (κ−m−1)2 K −1 1 − 4 exp(− 16(κ−1) ). Moreover, there exists X 0 ∈ Tr for which κ = rn .

Partially bridging the gap: Model Sampling Complexity K Non-Convex (2r ) + 2nrK + 1 K −1 SNN Θ(rn ) K K b2c d2e Square Norm Θ(r n )

Corollary (Lower bound for SNN): Let X 0 ∈ Tr be nonzero. Set n o

2 2 K −1

κ = min (X 0)(i) ∗ / kX 0kF × n .

Yes!

subject to PΩ[X ] = PΩ[X 0]

i

K −1

absolute constant.

null(G): a uniformly oriented random subspace of dimension n − m. C: descent cone C(f , x0) := cone {v | f (x0 + v) ≤ f (x0)} ,

When C is “large”: more likely to have a nontrivial intersection I

Sum of Nuclear Norms: K X minimize λi kX (i)k∗

Tensor completion with Square Norm minimization

Optimality Conditions: null(G) ∩ C = {0} I

θ2

θ1

K

I

(5)

i=1

(1)

{X 6= X 0 | G[X ] = G[X 0], ranktc(X ) RK+ ranktc(X 0)} = ∅

I

[1, k ] [n1, cn1] [n1, rn1]

I

(c) clearer geometric interpretation

I

0

κ := n cos θ

Proof sketch

Recoverability: 0

1 k [n , n ] 1 c [n2 , n2 ] 1 r [n2 , n2 ]

2

A Combination of structure-inducing norms is often not significantly more powerful than the best individual structure-inducing norm, first discovered by [Oymak et. al. ’13] Using the elegant geometric framework [Amelunxen et. al. ’13], our result: I

Bounds for Non-Convex Recovery

2 kx0k 2 2 L kx0k2

Size of tensor (n)

WLOG, we assume X 0 in the set

I

I

Square norm: kX k := kX k∗ Square norm minimization model: minimize kX k subject to G[X ] = G[X 0]

Simulation Results for Tensor Completion 2

Object

i = 1, 2, · · · , m

Tr := {X ∈ R

.

Theorem (Square Norm Minimization Model): Let X ∈ Tr . Then using b K2 c d K2 e (5), m ≥ Cr n is sufficient to recover X 0 with high probability.

(4)

Sparse models and their surrogates k·k: n

n1×n2×···×nK

,

, n

d K2 e

m

G i has i.i.d. standard normal entries

I

I

and κ = mini κi . Then if m ≤ κ − 1, ! 2 (κ − m − 1) P x0 is the unique optimal solution to (3) ≤ 4 exp − . 16 (κ − 1)

Figure: Matricization/Unfolding/Flattening

I

b K2 c

A more balanced (square) matrix Preserving the low-rank property: a. X 0 has CP rank r =⇒ rank(X ) ≤ r b K2 c b. X 0 has Tucker rank (r , r , · · · , r ) =⇒ rank(X ) ≤ r

Size of tensor (n)

Figure: X = C ×1 U1 ×2 U2 × U3

2 n kx0k(i) 2 2 Li kx0k2

Let X ∈ Tr . Then we define matrix X as

I

Still Suboptimal! — Open problem: near-optimal convex relaxations for K > 2 — General multi-structured object recovery problem ?!

0.2

Improved Lower Bounds for Non-Utilitarian Truthfulness