Smoothness Maximization via Gradient Descents Author: Bin Zhao, Fei Wang, Changshui Zhang Affiliation: Department of Automation, Tsinghua University, Beijing 100084, P.R.China Email: [email protected], [email protected], [email protected]

Semi-Supervised Learning

Iterative Smoothness Maximization

Learning from partially labeled data

Convergence Study

• Smoothness maximization:

• Transductive Learning

∂Q = 0 ⇒ F = (1 − α)(I − αS)−1T ∂F

• Inductive Learning

• Smoothness Maximization: (3)

where S = D −1/2W D −1/2 and α = 1/(1 + µ) • Graph reconstruction: optimize the objective function Q w.r.t. σ via gradient descent

Graph Based Semi-Supervised Learning

∂ 2Q H= = I − S + µI 2 ∂f

∀x ∈ Rn, x 6= 0, xT Hx = xT (I − S)x + µxT x !2 n n X X 1 1 x2i > 0 (8) xj + µ = Wij √ xi − p Dii Djj

• Iterate between the above two steps until convergence

• E: edge set, each edge eij ∈ E associated with a weight wij = exp (−kxi − xj k2/(2σ 2))

Toy Data (Two−moon)

2

unlabeled point labeled point +1 labeled point −1

1.5

1.5

1

1

0.5

0.5

0

0

−0.5

−0.5

−1

−1

−1.5 −2

−1

0

1

(a) 2

d2ij 2σ 2

d2ij 2σ 2

∂ exp(− ) d2ij exp(− ) ∂Wij = = ∂σ ∂σ σ3 2 P d 2 exp(− ij ) X X d W ∂ ∂Wij ∂Dii j ij ij 2σ 2 = = = ∂σ ∂σ ∂σ σ3 j

−1.5 3 −2

2

Gradient Computing

Compute the gradient of Q(F, σ) w.r.t. σ with F fixed as F ∗ = arg minF Q  " " #2 # n  Fj Fj Fi ∂Q(F, σ) X ∂Wij Fi √ = −p −p −Wij √  ∂σ ∂σ Dii Dii Djj Djj i,j=1     ∂D F F ∂D  jj  j ii · q i −q (4)   3 ∂σ 3 ∂σ  Djj Dii

Classification Result with Sigma=0.15

2

−1

0

(b)

Hence, the Hessian matrix is positive definite, moreover, f ∗ is the ∗ is the global minimum of Q unique zero point of ∂Q . Therefore, f ∂f with σ fixed. Q(fn+1, σn) < Q(fn, σn) (9)

(1)

The Influence of σ

1

2

3

i=1

i,j=1

Represent the dataset as an weighted undirected graph G =< V, E > • V: node set corresponding to the labeled and unlabeled examples

(7)

• Graph Reconstruction: learning rate η in gradient descent guarantee Q(fn+1, σn+1) < Q(fn+1, σn)

(10)

Hence, objective function Q decreases monotonically and is lower bounded by 0. The algorithm is guaranteed to converge.

Digit Recognition

(5)

Sigma ~ Step of Iteration

2

1

(6)

j

0.95 0.9

1.95

0.85 0.8

1.9

Classification Result with Sigma=0.4

Digit Recognition Accuracy on USPS

0.75 0.7

1.85

1.5

Learnign Rate Selection

1 0.5

ISM NN LLGC SVM

0.65 1.8 0

20

40

(a)

60

80

100

0.6

10

20

(b)

30

40

50

0 −0.5 −1 −1.5 −2

−1

0

(c)

1

2

3

• σ is updated as σnew = σold − η ∂Q ∂σ |σ=σold • learning rate η affects the performance of gradient descent severely

• In ISM, η is selected dynamically to accelerate convergence of the algorithm. We hope the objective function decreases monotonically to guarantee the convergence. Also, to simplify our method, we hope the value of σ avoids oscillation.

Graph Construction

Object Recognition

70

Sigma ~ Step of Iteration

1

68

0.9

66

0.8

64 62

• Graph construction is at the heart of GBSSL. The final classification result is significantly affected by σ • No reliable approach that can determine an optimal σ automatically so far

Implementation of Iterative Smoothness Maximization

Objective Function

• Tij = 1 if xi is labeled as ti = j and Tij = 0 otherwise • Classification result represented as F = [F1T , . . . , FnT ]T , which determines the label of xi by ti = arg max1≤j≤c Fij P • W is the weight matrix, and Dii = j Wij  

2 n n

F

X F 1X

i j √ p Q= Wij kFi − Tik2 (2) −



Dii 2 Djj i=1

• First term measures label smoothness • Second term measures label fitness

• µ > 0 adjust the tradeoff between these two terms

0.7 ISM NN LLGC SVM

60

0.6

58 56 0

10

20

(a)

30

40

50

0.5

5

10

(b)

15

1. Initialization. σ = σ0, total iteration steps N0, initial learning rate η0 and small learning rate ηs 2. Calculate the optimal F .

i,j=1

Object Recognition Accuracy on COIL−20

3. Update σ with gradient descent and adjust learning rate η. σˆ1 = σn − η ∂Q ∂σ |σ=σn   |σ=σˆ1 (a) If Q(Fn+1, σˆ1) < Q(Fn+1, σn) and sgn ∂Q ∂σ   ∂Q | = sgn ∂Q , then η = 2η, σ ˆ = σ − η n 2 ∂σ σ=σn ∂σ |σ=σn . i. If Q(Fn+1, σˆ2) < Q(Fn+1, σn) and     ∂Q sgn ∂Q | = sgn ∂σ σ=σˆ2 ∂σ |σ=σn , then σn+1 = σˆ2

ii. Else σn+1 = σˆ1, η = η/2 (b) Else η = ηs, σn+1 = σn

4. If n > N0, quit iteration and output classification result; else, go to step 2.

References

• D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf, Learning with local and global consistency, Advances in Neural Information Processing Systems, 2003. • M. Belkin, P. Niyogi, and V. Sindhwani, On manifold regularization, Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, 2005. • X. Zhu, Semi-supervied learning literature survey, Computer Sciences Technical Report, 1530, University of Wisconsin- Madison, 2006. • L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, Advances in Neural Information Processing Systems, 2004.

Smoothness Maximization via Gradient Descents

Iterate between the above two steps until convergence ... to guarantee the convergence. Also ... X. Zhu, Semi-supervied learning literature survey, Computer Sci-.

265KB Sizes 3 Downloads 197 Views

Recommend Documents

No documents