Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons Jingu Kim and Haesun Park Georgia Tech 2008.12.16 2008 Eighth IEEE International Conference on Data Mining (ICDM’08) Pisa, Italy
Jingu Kim and Haesun Park (Georgia Tech)
1 / 35
Outline ❖ Outline Introduction
GOAL: present a new algorithm for NMF and provide related experimental evidences about computational efficiency
Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results
1. Introduction 2. Algorithms for NMF
Summary
3. Block principal pivoting algorithm 4. Comparison results 5. Summary
Jingu Kim and Haesun Park (Georgia Tech)
2 / 35
❖ Outline Introduction ❖ Nonnegative Matrix Factorization ❖ NMF Formulation Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results
Introduction
Summary
Jingu Kim and Haesun Park (Georgia Tech)
3 / 35
Nonnegative Matrix Factorization [Paatero and Tapper, 1994, Lee and Seung, 1999]
❖ Outline Introduction ❖ Nonnegative Matrix Factorization
● Given a matrix A ∈ Rm×n with nonnegative elements and a
desired rank k, find W ∈ Rm×k and H ∈ Rk×n such that
❖ NMF Formulation Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results Summary
A ≈ WH where W and H have nonnegative elements only. ● Nonnegativity constraints are often physically meaningful and
provide natural interpretation: additive linear combinations of nonnegative parts. Successful applications include: ✦ ✦
Pixels in digital image [Lee and Seung, 1999, Li et al., 2001]
✦ ✦
Term-document matrix for text analysis [Xu et al., 2003, Pauca et al., 2004]
Bioinformatics - microarray data analysis [Brunet et al., 2004, H. Kim and Park, 2007] and many more. See references in [Devarajan, 2008]
Speech and audio processing
[Behnke, 2003, Smaragdis and Brown, 2003]
✦ ···
Jingu Kim and Haesun Park (Georgia Tech)
4 / 35
NMF Formulation ❖ Outline Introduction ❖ Nonnegative Matrix Factorization
● Formulation: how to assert A ≈ W H
✦ Minimize the Frobenious norm
❖ NMF Formulation
min kA − W Hk2F s.t.W ≥ 0, H ≥ 0 W,H
Algorithms for NMF and preparation Block principal pivoting algorithm
✦
Alternative formulation that minimizes KL-divergence min D(A||W H) s.t.W ≥ 0, H ≥ 0 X Aij where D(A||B) = Aij log − Aij + Bij B ij ij
Comparison results
W,H
Summary
● Better Approximation vs. Better Representation/Interpretation
✦
SVD: Better Approximation → min kA − W Hk2F
✦
NMF: Better Representation/Interpretation → minkA − W Hk2F where W ≥ 0 and H ≥ 0
Jingu Kim and Haesun Park (Georgia Tech)
5 / 35
❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares
Algorithms for NMF and preparation
❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
6 / 35
Algorithms for NMF ❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares ❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm
● Given a matrix A ∈ Rm×n with nonnegative elements and a
desired rank k, min kA − W Hk2F , s.t. W ≥ 0 and H ≥ 0.
W,H
✦ Non-convex optimization ˆ = W D, H ˆ = D−1 H). ✦ W and H are not unique (think of W ● Algorithms developed
✦ Multiplicative update rules [Lee and Seung, 2001] ✦ Alternating Least Squares (ALS) [Berry et al., 2007] ✦ Alternating Nonnegative Least Squares (ANLS) [Paatero and Tapper, 1994]
Comparison results ■ Summary
Several algorithms using this framework:
[Lin, 2007, Kim et al., 2007, H.
Kim and Park, 2008]
✦
Other algorithms and variants: [Li et al., 2001, Hoyer, 2004, Pauca et al., 2004, Gao and Church, 2005, Chu and Lin, 2008]· · ·
Jingu Kim and Haesun Park (Georgia Tech)
7 / 35
Previous algorithms and drawbacks ❖ Outline Introduction
● Multiplicative Updating Rules: [Lee and Seung, 2001]
Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares ❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results
Hqj
(W T A)qj (AH T )iq ← Hqj and Wiq ← Wiq ((W T W )H))qj (W (HH T ))iq
✦
Under this updating, the distance kA − W Hk2F is monotonically decreasing.
✦
Simple implementation, but a monotonically decreasing property may not imply the convergence to a stationary point [Gonzalez and Zhang, 2005].
● Alternating Least Squares (ALS) [Berry et al., 2007]
T T
T 2
✦ Fix H and solve for W in min H W − A F , and set all negative elements in W to 0.
✦ Fix W and solve for H in min kW H − Ak2F , and set all negative elements in H to 0.
Summary
✦
No claim is made for the convergence to a stationary point.
● →Alternating Nonnegative Least Squares (ANLS) Jingu Kim and Haesun Park (Georgia Tech)
8 / 35
Alternating Nonnegative Least Squares [Paatero and Tapper, 1994]
❖ Outline Introduction Algorithms for NMF and preparation
1. Initialize W (or H) with non-negative values. 2. Iterate the following ANLS until convergence: (a) Fixing W , solve minH≥0 kW H − Ak2F
T T T 2
(b) Fixing H, solve minW ≥0 H W − A F 3. The columns of W are normalized to unit L2 -norm
❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares
● Block coordinate descent method in bound-constrained optimization
❖ NMF/ANLS Algorithms
● Convergence analysis
❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary
✦ No matter how many blocks, if the sub problems have unique solutions, then the limit point of the sequence is a stationary point [Bertsekas, 1999]
✦ For two block problems, any limit point of the sequence is a stationary point [Grippo and Sciandrone, 2000]
✦ It is important to find an optimal solution of 2-(a),(b) at each iteration! ● It remains to provide the algorithm for solving subproblems in 2-(a),(b). How to design a fast algorithm for this?
Jingu Kim and Haesun Park (Georgia Tech)
9 / 35
NMF/ANLS Algorithms 2
Problem to solve : min kCX − BkF
❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares ❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary
X≥0
● Active Set [H. Kim and Park, 2008]
✦ Classical algorithm for NNLS with single right hand side minh≥0 kW h − ak2 is an active set algorithm by [Lawson and Hanson, 1995].
✦ Faster algorithms for multiple right hand side problems by [Bro and Jong, 1997],
and [Van Benthem and Keenan, 2004].
● Projected Gradient [Lin, 2007] xk+1 ← P+ (xk − αk ∇f (xk ))
✦ Improved selection of step constant αk ● Projected Quasi-Newton [Kim et al., 2007] k i h k k ¯ y = P+ y − αD ∇f (y ) xk+1 ← z k 0
✦ Gradient scaling only for inactive variables
Jingu Kim and Haesun Park (Georgia Tech)
10 / 35
Structure of NNLS problems ❖ Outline Introduction Algorithms for NMF and preparation
● Recognizing the long and thin structure is very important for
developing a fast algorithm for NMF.
❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares ❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
11 / 35
Structure of NNLS problems ❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares
● Recognizing the long and thin structure is very important for
developing a fast algorithm for NMF. ● minH≥0 kW H − Ak2F
❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
11 / 35
Structure of NNLS problems ❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares
● Recognizing the long and thin structure is very important for
developing a fast algorithm for NMF. ● minH≥0 kW H − Ak2F
❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
11 / 35
Structure of NNLS problems ❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares
● Recognizing the long and thin structure is very important for
developing a fast algorithm for NMF. ● minH≥0 kW H − Ak2F
❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary
T T
T 2
● minW ≥0 H W − A F
Jingu Kim and Haesun Park (Georgia Tech)
11 / 35
Structure of NNLS problems ❖ Outline Introduction Algorithms for NMF and preparation ❖ Algorithms for NMF ❖ Previous algorithms and drawbacks ❖ Alternating Nonnegative Least Squares
● Recognizing the long and thin structure is very important for
developing a fast algorithm for NMF. ● minH≥0 kW H − Ak2F
❖ NMF/ANLS Algorithms ❖ Structure of NNLS problems Block principal pivoting algorithm Comparison results Summary
T T
T 2
● minW ≥0 H W − A F
Jingu Kim and Haesun Park (Georgia Tech)
11 / 35
❖ Outline Introduction Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case
Block principal pivoting algorithm
❖ Extensions Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
12 / 35
Block principal pivoting algorithm [Portugal et al., 1994]
❖ Outline Introduction
● Consider single right-hand side problem: for x ∈ Rq min kCx − bk22
Algorithms for NMF and preparation Block principal pivoting algorithm
x≥0
● KKT condition for (1):
❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions
(1)
y
=
C T Cx − C T b
(2a)
y
≥
0
(2b)
x
≥
0
(2c)
xi yi
=
0, i = 1, · · · , q
(2d)
● Find x and y that satisfy (2). ● Repeat:
Comparison results Summary
✦ Guess two index sets F and G that partition {1, · · · , q} ✦ Force xG = 0 and yF = 0. Solve xF = arg minxF kCF xF − bk22 T and set yG = CG (CF xF − b).
✦ Check if xF ≥ 0 and yG ≥ 0, optimal values are found. Otherwise, update F and G. Jingu Kim and Haesun Park (Georgia Tech)
13 / 35
How block principal pivoting works ❖ Outline Introduction
T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
14 / 35
How block principal pivoting works ❖ Outline Introduction
T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
15 / 35
How block principal pivoting works ❖ Outline Introduction
T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
16 / 35
How block principal pivoting works ❖ Outline Introduction
T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
17 / 35
How block principal pivoting works ❖ Outline Introduction
T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
18 / 35
How block principal pivoting works ❖ Outline Introduction
T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
19 / 35
How block principal pivoting works ❖ Outline Introduction
T T T T b. CF xF − CG b, and yG = CG CF xF = CF Update by CF
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
20 / 35
Refining exchange rules ❖ Outline Introduction Algorithms for NMF and preparation
● Previous example: block exchange rule. One can also
exchange only subset of infeasible variables. ✦ Exchange only one variable → single principal pivoting
Block principal pivoting algorithm
✦ Exchange several variables → block principal pivoting
❖ Block principal pivoting algorithm
● Active set algorithm is a special instance of single principal
❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary
pivoting algorithm. ● Block exchange rule is not always safe.
✦ The residual is not guaranteed to monotonically decrease. Block exchange rule may lead to a cycle and fail to find an optimal solution (although it occurs rarely).
✦ Modification: if the block exchange rule fails to decrease the number of feasible variables, use a backup exchange rule
✦ With this modification, block principal pivoting algorithm finds the solution of NNLS in finite number of iterations.
[Portugal et al.,
1994]
Jingu Kim and Haesun Park (Georgia Tech)
21 / 35
Multiple right-hand side case min kCX − Bk2F
❖ Outline Introduction Algorithms for NMF and preparation Block principal pivoting algorithm
X≥0
● It is possible to seperately solve for each column of X. →SLOW ● Two improvements [Bro and de Jong, 1997, Van Benthem and Keenan, 2004]
✦ Precompute C T C and C T B: updates of xF and yG is given by
❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case
CFT CF xF
=
CFT b
yG
=
T T CG CF xF − CG b.
All coefficients can be directly retrieved from C T C and C T B!
✦ Exploiting common F and G sets.
❖ Extensions Comparison results Summary
● Let us see why these improvements are effective for our problem. Jingu Kim and Haesun Park (Georgia Tech)
22 / 35
Multiple right-hand side case min kCX − Bk2F
❖ Outline Introduction
X≥0
● Remind the long and thin structure.
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
23 / 35
Multiple right-hand side case min kCX − Bk2F
❖ Outline Introduction
X≥0
● Remind the long and thin structure.
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
23 / 35
Multiple right-hand side case min kCX − Bk2F
❖ Outline Introduction
X≥0
● Remind the long and thin structure.
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case
→
❖ Extensions Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
23 / 35
Multiple right-hand side case min kCX − Bk2F
❖ Outline Introduction
X≥0
● Remind the long and thin structure.
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions
→ ✦ C T C and C T B is small. → Storage is not a problem.
Comparison results Summary
Jingu Kim and Haesun Park (Georgia Tech)
23 / 35
Multiple right-hand side case min kCX − Bk2F
❖ Outline Introduction
X≥0
● Remind the long and thin structure.
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results
→ ✦ C T C and C T B is small. → Storage is not a problem. ✦ X is flat and wide. → More common cases of F and G sets.
Summary
Jingu Kim and Haesun Park (Georgia Tech)
23 / 35
Multiple right-hand side case min kCX − Bk2F
❖ Outline Introduction
X≥0
● Remind the long and thin structure.
Algorithms for NMF and preparation Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results
→ ✦ C T C and C T B is small. → Storage is not a problem. ✦ X is flat and wide. → More common cases of F and G sets.
Summary
● This completes the description of our algorithm for NMF: ANLS framework + Block principal pivoting algorithm with improvements for multiple right-hand sides
Jingu Kim and Haesun Park (Georgia Tech)
23 / 35
Extensions ❖ Outline Introduction
● As other ANLS algorithms, easily extended to other formulations. ● Sparse NMF [H. Kim and Park, 2007]:
Algorithms for NMF and preparation
min
Block principal pivoting algorithm ❖ Block principal pivoting algorithm ❖ How block principal pivoting works ❖ Refining exchange rules ❖ Multiple right-hand side case ❖ Extensions Comparison results
W,H
kA − W Hk2F + η kW k2F + β
n X
j=1
kH(:, j)k21
subject to ∀ij, Wij , Hij ≥ 0. ANLS reformulation [H. Kim and Park, 2007]: alternate the followings
2
W A
H− 0 min √
βe1×k 1×n H≥0 F
2 T H T A
√ min W −
ηI 0 k
W ≥0
k×m
(3)
F
● Similar reformulation for regularized NMF: [Pauca et al., 2006]
Summary
n
min kA −
W,H
W Hk2F
+
α kW k2F
+
β kHk2F
o
(4)
subject to ∀ij, Wij , Hij ≥ 0.
Jingu Kim and Haesun Park (Georgia Tech)
24 / 35
❖ Outline Introduction Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results ❖ Experimental Setup
Comparison results
❖ Synthetic dataset ❖ Text dataset ❖ Image dataset Summary
Jingu Kim and Haesun Park (Georgia Tech)
25 / 35
Experimental Setup ● Stopping criterion: normalized KKT optimality condition as defined in [H. Kim ❖ Outline
and Park,2007]
∆ ≤ ∆0 , where ∆ =
Introduction Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results ❖ Experimental Setup ❖ Synthetic dataset ❖ Text dataset ❖ Image dataset Summary
δW
δ + δH
● Datasets
✦
Synthetic: 300 × 200, create sparse W and H and produce A = W H with noise
✦
Text: Topic Detection and Tracking 2, randomly select 20 topics, 12617 × 1491
✦
Image: Olivetti Research Laboratory face image, 10304 × 400.
● Compared algorithms
✦ ✦ ✦ ✦ ✦ ✦ ✦
(mult) Lee and Seung’s multiplicative updating algorithm (als) Berry et al.’s alternating least squares algorithm (lsqnonneg) ANLS with Lawson and Hanson’s algorithm (projnewton) ANLS with Kim et al.’s projected quasi-Newton algorithm (projgrad) ANLS with Lin’s projected gradient algorithm (activeset) ANLS with Kim and Park’s active set algorithm (blockpivot) ANLS with block principal pivoting algorithm which is proposed in this paper
Jingu Kim and Haesun Park (Georgia Tech)
26 / 35
Synthetic dataset time (sec)
k
multi
als
lsqnonneg
projnewton
projgrad
activeset
blockpivot
5
35.336
36.697
23.188
5.756
0.976
0.262
0.252
10
47.132
52.325
82.619
13.43
4.157
0.848
0.786
20
72.888
83.232
45.007
9.32
4.41
4.004
127.33
62.317
17.252
14.384
40
81.445
22.246
16.132
60
128.76
37.376
21.368
80
276.29
65.566
30.055
30
iterations
5
9784.2
10000
25.6
25.8
30
26.4
26.4
10
10000
10000
34.8
35.2
45
35.2
35.2
20
10000
10000
70.8
104
69.8
69.8
166
205.2
166.6
166.6
40
234.8
118
117.8
60
157.8
84.2
84.2
80
131.8
67.2
67.2
30
residual
5
0.04035
0.04043
0.04035
0.04035
0.04035
0.04035
0.04035
10
0.04345
0.04379
0.04343
0.04343
0.04344
0.04343
0.04343
20
0.04603
0.04556
0.04412
0.04414
0.04412
0.04412
0.04313
0.04316
0.04327
0.04327
40
0.04944
0.04943
0.04944
60
0.04106
0.04063
0.04063
80
0.03411
0.03390
0.03390
30
size 300 × 200, = 10−4 . Average of 10 executions with different initial values. Jingu Kim and Haesun Park (Georgia Tech)
27 / 35
Text dataset k
projgrad
activeset
blockpivot
5
107.24
81.476
82.954
10
131.12
87.012
88.728
Introduction
20
161.56
154.1
144.77
Algorithms for NMF and preparation
30
355.28
314.78
234.61
40
618.1
753.92
479.49
Block principal pivoting algorithm
50
1299.6
1333.4
741.7
60
1616.05
2405.76
1041.78
time (sec) ❖ Outline
Comparison results
5
66.2
60.6
60.6
❖ Experimental Setup
iterations
10
51.8
42
42
❖ Synthetic dataset
20
45.8
44.6
44.6
❖ Text dataset
30
100.6
67.2
67.2
❖ Image dataset
40
118
103.2
103.2
Summary
50
120.4
126.4
126.4
60
154.2
171.4
172.6
5
0.9547
0.9547
0.9547
10
0.9233
0.9229
0.9229
20
0.8898
0.8899
0.8899
30
0.8724
0.8727
0.8727
40
0.8600
0.8597
0.8597
50
0.8490
0.8488
0.8488
60
0.8386
0.8387
0.8387
residual
size 12617 × 1491, = 10−4 . Average of 10 executions with different initial values. Jingu Kim and Haesun Park (Georgia Tech)
28 / 35
Image dataset k
projgrad
activeset
blockpivot
16
68.529
11.751
11.998
25
124.05
25.675
22.305
Introduction
36
109.1
53.528
35.249
Algorithms for NMF and preparation
49
150.49
115.54
57.85
64
169.7
270.64
91.035
Block principal pivoting algorithm
81
249.45
545.94
146.76
time (sec) ❖ Outline
16
26.8
16.4
16.4
Comparison results
25
20.6
15
15
❖ Experimental Setup
36
17.6
13.4
13.4
❖ Synthetic dataset
49
16.2
12.4
12.4
❖ Text dataset
64
16.6
13.2
13.2
❖ Image dataset
81
16.8
14.4
14.4
16
0.1905
0.1907
0.1907
25
0.1757
0.1751
0.1751
36
0.1630
0.1622
0.1622
49
0.1524
0.1514
0.1514
64
0.1429
0.1417
0.1417
81
0.1343
0.1329
0.1329
Summary
iterations
residual
size 10304 × 400, = 5 × 10
Jingu Kim and Haesun Park (Georgia Tech)
−4
. Average of 10 executions with different initial values.
29 / 35
❖ Outline Introduction Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results Summary ❖ Summary
Summary
❖ References
Jingu Kim and Haesun Park (Georgia Tech)
30 / 35
Summary ❖ Outline Introduction
● A new algorithm for NMF is proposed:
ANLS framework + Block principal pivoting algorithm with improvements for multiple right-hand sides
Algorithms for NMF and preparation Block principal pivoting algorithm
●
Important observation: long and thin structure
●
Inherits good convergence property of ANLS framework
●
Extentions for sparse/regularized NMF
●
Outperform other algorithms in computational experiments
●
Source code will become available
Comparison results Summary ❖ Summary ❖ References
Jingu Kim and Haesun Park (Georgia Tech)
31 / 35
Comparison by tolerance 1500 Introduction Algorithms for NMF and preparation Block principal pivoting algorithm Comparison results Summary
avg. elapsed (seconds)
❖ Outline
activeset blockpivot projgrad
1000
500
❖ Summary ❖ References
0
−2
10
−4
10 tolerance
−6
10
12617 × 1491 text dataset. Average of 10 executions with different initial values.
Jingu Kim and Haesun Park (Georgia Tech)
32 / 35
Stopping Criterion ● KKT condition: ❖ Outline
W ≥0 ∂f (W, H)/∂W ≥ 0 W. ∗ (∂f (W, H)/∂W ) = 0
Introduction Algorithms for NMF and preparation
● These conditions can be simplified as
Block principal pivoting algorithm Comparison results Summary ❖ Summary
H ≥0 ∂f (W, H)/∂H ≥ 0 H. ∗ (∂f (W, H)/∂H) = 0
min (W, ∂f (W, H)/∂W )
=
0
(5a)
min (H, ∂f (W, H)/∂H)
=
0
(5b)
where the minimum is taken component wise [Gonzalez and Zhang, 2005].
● Normalized KKT residual: ∆=
❖ References
where
δ δW + δH
(6)
m X k X δ= min(Wiq , (∂f (W, H)/∂W )iq i=1 q=1
n k X X + min(Hqj , (∂f (W, H)/∂H)qj
(7)
q=1 j=1
δW =# (min(W, (∂f (W, H)/∂W ) 6= 0)
(8)
δH =# (min(H, (∂f (W, H)/∂H) 6= 0) .
(9)
● Convergence criterion:∆ ≤ ∆0 where ∆0 was computed using the initial values.
Jingu Kim and Haesun Park (Georgia Tech)
33 / 35
References ● ❖ Outline Introduction Algorithms for NMF and preparation Block principal pivoting algorithm
● ● ● ●
Comparison results Summary ❖ Summary ❖ References
● ● ● ● ● ● ●
S. Behnke. Discovering hierarchical speech features using convolutional non-negative matrix factorization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), pages 2758–2763, 2003 D. P. Bertsekas. Nonlinear programming. Athena Scientific, Belmont, Mass, 1999 R. Bro and S. D. Jong. A fast non-negativity-constrained least squares algorithm. Journal of Chemometrics, 11:393–401, 1997 J. Brunet, P. Tamayo, T. Golub, and J. Mesirov. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences, 101(12):4164–4169, 2004 M. T. Chu and M. M. Lin. Low-dimensional polytope approximation and its applications to nonnegative matrix factorization. SIAM Journal on Scientific Computing, 30(3):1131–1155, 2008. K. Devarajan. Nonnegative matrix factorization: An analytical and interpretive tool in computational biology. PLoS Computational Biology, 4(7), 2008 Y. Gao and G. Church. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics, 21(21):3970–3975, 2005 E. F. Gonzalez and Y. Zhang. Accelerating the lee-seung algorithm for non-negative matrix factorization. Technical report, Tech Report, Department of Computational and Applied Mathematics, Rice University, 2005 P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. The Journal of Machine Learning Research, 5:1457–1469, 2004 D. Kim, S. Sra, and I. S. Dhillon. Fast newton-type methods for the least squares nonnegative matrix approximation problem. In Proceedings of the 2007 SIAM International Conference on Data Mining, 2007 H. Kim and H. Park. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics, 23(12): 1495–1502, 2007 H. Kim and H. Park. Non-negative matrix factorization based on alternating non-negativity constrained least squares and active set method. SIAM Journal in Matrix Analysis and Applications, to appear
Jingu Kim and Haesun Park (Georgia Tech)
34 / 35
References ● ❖ Outline
●
Introduction Algorithms for NMF and preparation Block principal pivoting algorithm
● ●
Comparison results
●
Summary
●
❖ Summary ❖ References
● ● ● ● ● ●
C. L. Lawson and R. J. Hanson. Solving Least Squares Problems. Society for Industrial Mathematics, 1995 D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13, pages 556–562. MIT Press, 2001 D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999 S. Z. Li, X. Hou, H. Zhang, and Q. Cheng. Learning spatially localized, parts-based representation. In CVPR ’01: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001 C.-J. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 19(10): 2756–2779, 2007 P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(1):111–126, 1994 V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons. Text mining using non-negative matrix factorizations. In Proceedings of the 2004 SIAM International Conference on Data Mining, 2004 V. P. Pauca, J. Piper, and R. J. Plemmons. Nonnegative matrix factorization for spectral data analysis. Linear Algebra and Its Applications, 416(1):29–47, 2006 L. F. Portugal, J. J. Judice, and L. N. Vicente. A comparison of block pivoting and interior-point algorithms for linear least squares problems with nonnegative variables. Mathematics of Computation, 63(208):625–643, 1994 P. Smaragdis and J. C. Brown. Non-negative matrix factorization for polyphonic music transcription. In Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on., pages 177–180, 2003 M. H. V. Benthem and M. R. Keenan. Fast algorithm for the solution of large-scale non-negativity-constrained least squares problems. Journal of Chemometrics, 18:441–450, 2004. W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 267–273, New York, NY, USA, 2003. ACM Press.
Jingu Kim and Haesun Park (Georgia Tech)
35 / 35