Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery Cun Mu, Bo Huang, John Wright and Donald Goldfarb Introduction
A Better Convexification: Square Norm
Recovering Objects with Multiple Structures n1×n2×···×nK
Recovering a low-rank tensor X 0 ∈ R from incomplete information: I A recurring problem in signal processing and machine learning I Tucker Decomposition (low-rank):
I
I I
n
Signal x0 ∈ R : several low-dimensional structures (e.g. sparsity, low-rank, etc.) simultaneously imposed k·k(i) : the penalty norm corresponding to the i-th structure (e.g. `1, nuclear norm) Composite norm optimization: minn f (x) := λ1 kxk(1) + λ2 kxk(2) + · · · + λK kxk(K ) subject to G[x] = G[x0]
I
X = reshape X (1), n
I I
(3)
x∈R
Theorem (Multi-structured object recovery): Let x0 6= 0. Suppose that for each i, k·k(i) is Li -Lipschitz. Set 2
κi = n cos (θi ) :=
I
Gaussian measurements: z = G[X 0] ∈ R
I
Complexity Measure Relaxation cos θ :=
Sparse x ∈ R k= kxk0 n1×n2 Column-sparse X ∈ R c = # j | Xej 6= 0 n1×n2 Low-rank X ∈ R (n1 ≥ n2) r = rank(X) I
| n1 = n2 = · · · = nK = n, ranktc(X ) (r , r , · · · , r )}.
A vector optimization problem: minimize(w.r.t.
I
RK+ )
ranktc(X )
kxk P 1
Xej j 2 kXk∗
(a) more general family of regularizers
(b) sharper bound
subject to G[X ] = G[X 0]
cone(∂ kx0k(2))
cone(∂ kx0k(1))
0
x0
x0
I
Theorem (Non-convex recovery): Whenever m ≥ (2r ) + 2nrK + 1, with probability one, null(G) ∩ T2r = {0}, and hence (1) recovers every X 0 ∈ Tr .
C(k·k(1) , x0)
I
C(k·k(2) , x0) I
Convexification: Sum of Nuclear Norms (SNN)? SNN minimization model (Liu et. al. ’09, Signoretto et. al. ’10, plus many K others): X minimize λi kX (i)k∗ subject to G[X ] = G[X 0] (2) i=1
Corollary ([Tomioka et. al. ’11]): Suppose that X 0 ∈ Tr and m ≥ Crn . Then, with high probability, X 0 is the optimal solution to (2), with each λi = 1. Here, C is an
o
C = cone ∂f (x0) = cone(∂ kx0k(1) + ∂ kx0k(2)) ⊆ conv cone(∂ kx0k(1)), cone(∂ kx0k(2)) ⊆ conv circ(x0, θ1), circ(x0, θ2) ⊆ circ(x0, θ1)
A suboptimal bound: — A better bound for SNN model? NO! — Alternative convexification with better performance?
o
C , the polar cone of C, o C = cone ∂f (x0) ⊆ circ(x0, max θi )
I
I
o
C will be “large”, whenever C is “small”. Size of a convex cone: statistical dimension [Amelunxen et. al. ’13]
Square Norm: minimize
kX k
subject to PΩ[X ] = PΩ[X 0] Tensor completion with SNN minimization
30
30
28
28
26
26
24
24
22 20 18
22 20 18
16
16
14
14
12
12
10
10 0.02
0.04
0.06 0.08 0.1 0.12 0.14 0.16 Fraction of entries observed (rho)
0.18
0.2
0.02
0.04
0.06 0.08 0.1 0.12 0.14 0.16 Fraction of entries observed (rho)
0.18
Figure: The colormap indicates the fraction of correct recovery, which increases with brightness from certain failure (black) to certain success (white). Here we generate X 0 ∈ Rn×n×n×n with ranktc(X 0) = (1, 1, 2, 2) randomly.
Summary & Discussion I
i
Then if the number of measurements m ≤ κ − 1, X 0 is not the unique solution to (2), with probability at least (κ−m−1)2 K −1 1 − 4 exp(− 16(κ−1) ). Moreover, there exists X 0 ∈ Tr for which κ = rn .
Partially bridging the gap: Model Sampling Complexity K Non-Convex (2r ) + 2nrK + 1 K −1 SNN Θ(rn ) K K b2c d2e Square Norm Θ(r n )
Corollary (Lower bound for SNN): Let X 0 ∈ Tr be nonzero. Set n o
2 2 K −1
κ = min (X 0)(i) ∗ / kX 0kF × n .
Yes!
subject to PΩ[X ] = PΩ[X 0]
i
K −1
absolute constant.
null(G): a uniformly oriented random subspace of dimension n − m. C: descent cone C(f , x0) := cone {v | f (x0 + v) ≤ f (x0)} ,
When C is “large”: more likely to have a nontrivial intersection I
Sum of Nuclear Norms: K X minimize λi kX (i)k∗
Tensor completion with Square Norm minimization
Optimality Conditions: null(G) ∩ C = {0} I
θ2
θ1
K
I
(5)
i=1
(1)
{X 6= X 0 | G[X ] = G[X 0], ranktc(X ) RK+ ranktc(X 0)} = ∅
I
[1, k ] [n1, cn1] [n1, rn1]
I
(c) clearer geometric interpretation
I
0
κ := n cos θ
Proof sketch
Recoverability: 0
1 k [n , n ] 1 c [n2 , n2 ] 1 r [n2 , n2 ]
2
A Combination of structure-inducing norms is often not significantly more powerful than the best individual structure-inducing norm, first discovered by [Oymak et. al. ’13] Using the elegant geometric framework [Amelunxen et. al. ’13], our result: I
Bounds for Non-Convex Recovery
2 kx0k 2 2 L kx0k2
Size of tensor (n)
WLOG, we assume X 0 in the set
I
I
Square norm: kX k := kX k∗ Square norm minimization model: minimize kX k subject to G[X ] = G[X 0]
Simulation Results for Tensor Completion 2
Object
i = 1, 2, · · · , m
Tr := {X ∈ R
.
Theorem (Square Norm Minimization Model): Let X ∈ Tr . Then using b K2 c d K2 e (5), m ≥ Cr n is sufficient to recover X 0 with high probability.
(4)
Sparse models and their surrogates k·k: n
n1×n2×···×nK
,
, n
d K2 e
m
G i has i.i.d. standard normal entries
I
I
and κ = mini κi . Then if m ≤ κ − 1, ! 2 (κ − m − 1) P x0 is the unique optimal solution to (3) ≤ 4 exp − . 16 (κ − 1)
Figure: Matricization/Unfolding/Flattening
I
b K2 c
A more balanced (square) matrix Preserving the low-rank property: a. X 0 has CP rank r =⇒ rank(X ) ≤ r b K2 c b. X 0 has Tucker rank (r , r , · · · , r ) =⇒ rank(X ) ≤ r
Size of tensor (n)
Figure: X = C ×1 U1 ×2 U2 × U3
2 n kx0k(i) 2 2 Li kx0k2
Let X ∈ Tr . Then we define matrix X as
I
Still Suboptimal! — Open problem: near-optimal convex relaxations for K > 2 — General multi-structured object recovery problem ?!
0.2