A Novel Model of Working Set Selection for SMO ...

Viewer
Transcript

Motivation Our New Method Computational Comparison Conclusion

A Novel Model of Working Set Selection for SMO Decomposition Methods Zhen-Dong Zhao1 Lei Yuan2 Yu-Xuan Wang2 Bao2 Shun-Yi Zhang1 Yan-Fei Sun1 1 Institute

of Information & Network Technology Nanjing University of Posts & Telecomm. China 2 School

of Communications & Information Engineering Nanjing University of Posts & Telecomm. China

International Conference on Tools with AI Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Sheng

Motivation Our New Method Computational Comparison Conclusion

Main Topic

SVM Training Methods: B.Boser[1992], E. Osuna[1997], T.Joachims[1998] SMO Method:J.C.Platt[1999], S.S.Keerthi[2001], C.-J. Lin[2001,2006] The key point of SMO is how to choose 2 α, LIBSVM uses WSS 3 to pick up them. Our work will compare with WSS 3, which is implemented in LIBSVM.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Main Topic

SVM Training Methods: B.Boser[1992], E. Osuna[1997], T.Joachims[1998] SMO Method:J.C.Platt[1999], S.S.Keerthi[2001], C.-J. Lin[2001,2006] The key point of SMO is how to choose 2 α, LIBSVM uses WSS 3 to pick up them. Our work will compare with WSS 3, which is implemented in LIBSVM.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Main Topic

SVM Training Methods: B.Boser[1992], E. Osuna[1997], T.Joachims[1998] SMO Method:J.C.Platt[1999], S.S.Keerthi[2001], C.-J. Lin[2001,2006] The key point of SMO is how to choose 2 α, LIBSVM uses WSS 3 to pick up them. Our work will compare with WSS 3, which is implemented in LIBSVM.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Traditional Method for Traning SVMs

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Traditional Method for Traning SVMs

Key Work in SVM Training Quadratic Problem The key work in training SVMs is to solve the follow quadratic optimization problem: 1 T α Qα − eT α 2 0 ≤ αi ≤ C, i = 1, . . . , l

minf (α) = αB

subject to

(1)

yT α = 0 where e is the vector of all ones, C is the upper bound of all variables, and Qij = yi yj K (xi , xj ), K (xi , xj ) is the kernel function.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Traditional Method for Traning SVMs

SMO(Sequential Minimal Optimization) Method Explaination

Figure: Steepest Desent Method in QP

Employing the thinking of Steepest Desent Method, the WSS 3(LIBSVM) adjusts 2 α every time, in another words, adjusts the direction very slowly in every iteration.

Figure: The Model of WSS 3 Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Traditional Method for Traning SVMs

Interesting Phenomena

Interesting Phenomena We notice that several α are selected to optimize the QP problem repeatedly, while others remain untouched. But does this kind of reuse necessary? The second picture indicates that repeatedly selection just reduces this optimal value a little. Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Traditional Method for Traning SVMs

Hypothesis

Hypothesis The direct and simple method is to reduce even eliminate this kind of reselection during the training precedure. Figure: Our direct and simple modification

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Model and Definition Some Properties of WSS-WR Model

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Model and Definition Some Properties of WSS-WR Model

Our New Model: Working Sets Selection without Reselection (WSS-WR) Definition T k +1 , k ∈ {1, · · · , 2l } is denoted as optimized set, in which ∀α ∈ T has been selected and optimized once in working set selection. Definition Figure: The Model of WSS-WR

C k +1 ⊂ {1, . . . , l} \ T k is called aviable set, in which ∀α ∈ C has never been selected.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Model and Definition Some Properties of WSS-WR Model

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Model and Definition Some Properties of WSS-WR Model

Theoretical Proof

Theorem 1 The values of all selected {αi , αj } ⊂ B will always be 0.

Lemma 1 In the Model of WSS-WR, Iup ≡ I1 and Ilow ≡ I4 .

Theorem 2 The algorithm will be terminated after a maximun of d 2l e iterations.

Lemma 2 In the Model of WSS-WR, α1new = α2new .

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Datasets and Kernel Methods

Datasets 1

Datasets in Classification: a1a, w1a, Australian, Splice, Breast-cancer, Diabetes, Fourclass, German.number, Heart

2

Datasets in Regression: MPG, MG

3

Large Datasets: a9a, w8a, IJCNN1

Kernel Methods Kernel methods: RBF, Linear, Polynomial, Sigmoid

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Experiment Setting

Experiment Methods 1

"Parameter selection" step: Conduct five-fold cross validation to find the best parameters.

2

"Final training" step: Train the whole set with the best parameter to obtain the final model.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Cross Validation Accuracy of Classification

method WSS-WR WSS 3

a1a 83.4268 83.8006

w1a 97.7796 97.9814

aust. 86.2319 86.8116

spli. 86 86.8

brea. 97.3646 97.2182

method WSS-WR WSS 3

diab. 77.9948 77.474

four. 100 100

germ. 75.8 77.6

heart 85.1852 84.4444

Table: Accuracy comparison between WSS 3 and WSS-WR (RBF)

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Notes The data in red indicate the datasets in which WSS-WR achieves lower accuracy than WSS 3 does, while the green ones indicate the opposite.

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Iteration and Time Consumption Ratios of WSS-WR and WSS 3 Notes RBF kernel for the "parameter selection" step (top: with shrinking, bottom: without shrinking).

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Iteration and Time Consumption ratios of WSS-WR and WSS 3 Notes RBF kernel for the "final training" step (top: with shrinking, bottom: without shrinking).

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

The Effects by Caching Size and Shrinking Techniques method 100M, shrinking 100M, nonshrinking 100K, shrinking 100K, nonshrinking

100M, shrinking 100M, nonshrinking 100K, shrinking 100K, nonshrinking

WSS-WR RBF Linear 56.8936 15.6038 56.8936 15.6038 59.9929 17.7689 59.8346 15.3692 WSS 3 RBF Linear 400.7368 49.5343 505.7596 62.5636 1655.7840 232.5998 1602.4966 533.9386

Poly. 9.3744 9.7964 9.4823 8.9024

Sigm. 15.9825 16.1321 16.7169 16.8676

Poly. 34.0337 34.3321 140.9247 93.3340

Sigm. 29.0977 32.2544 103.3014 220.2574

Table: W1a, comparison of caching and shrinking

Explanation The caching size and shrinking technique almost do not affect the training time in WSS-WR Model, but the training time is obviously changed by those techniques in WSS 3. Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Experiment on Large Datasets

300MB cache Problem #data a9a 32,561 w8a 49,749 IJCNN1 49,990 1MB cache Problem #data a9a 32,561 w8a 49,749 IJCNN1 49,990

#feat. 123 300 22 #feat. 123 300 22

RBF kernel Iter. Time 0.3439 0.8498 0.0149 0.2106 0.3474 0.7422 RBF kernel Iter. Time 0.0522 0.0687 0.0370 0.0365 0.1187 0.1673

Table: Large datasets: Iteration and time ratios of WSS-WR and WSS 3 for 16-point "parameter selection".

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Convergence of WSS-WR Notes The comparison of convergence on several datasets between WSS-WR and WSS 3 with RBF kernel. (These datasets are: a1a, w1a, australian and splice, respectively) Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Conclusion

We have Proposed a new Working Set Selection – WSS-WR; Definition and theoretical proof of this model were also given; Full-scale experiments were given to demonstrate that WSS-WR outperforms WSS 3 in almost all of the datasets during the "parameters selection" and "final training" step.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Conclusion

We have Proposed a new Working Set Selection – WSS-WR; Definition and theoretical proof of this model were also given; Full-scale experiments were given to demonstrate that WSS-WR outperforms WSS 3 in almost all of the datasets during the "parameters selection" and "final training" step.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Conclusion

We have Proposed a new Working Set Selection – WSS-WR; Definition and theoretical proof of this model were also given; Full-scale experiments were given to demonstrate that WSS-WR outperforms WSS 3 in almost all of the datasets during the "parameters selection" and "final training" step.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Future Work

This is a prelimiary work; To extend this model to Multi-class classification is our coming work; Theoretical study on the convergence of WSS-WR and to continually improve this model are our future work.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Future Work

This is a prelimiary work; To extend this model to Multi-class classification is our coming work; Theoretical study on the convergence of WSS-WR and to continually improve this model are our future work.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Future Work

This is a prelimiary work; To extend this model to Multi-class classification is our coming work; Theoretical study on the convergence of WSS-WR and to continually improve this model are our future work.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Thank you

Q&A Thank you! Question? E-Mail: [email protected]

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

A Novel Model of Working Set Selection for SMO ...

A novel QSAR model for predicting induction of ...

ACTIVE MODEL SELECTION FOR GRAPH ... - Semantic Scholar

Inference complexity as a model-selection criterion for ...

Model Selection for Support Vector Machines

Kin Selection, Multi-Level Selection, and Model Selection

A Theory of Model Selection in Reinforcement Learning - Deep Blue

A Theory of Model Selection in Reinforcement Learning

Working Draft For Discussion Purposes Only Draft of model ...

A Partial Set Covering Model for Protein Mixture ...

Quantum Model Selection

A Novel Model for Academic, Transcultural, and Global ICT Education ...

A Novel Model for Academic, Transcultural, and Global ICT ...

Appendices - A Novel Dynamic Pricing Model for the ...

A Multi-level Selection Model for the Emergence of ... - Springer Link

On the Prospects for Building a Working Model of ... - Semantic Scholar

A Computational Model of Adaptation to Novel Stable ...

A Novel Lumped Spatial Model of Tire Contact

A Computational Model of Adaptation to Novel Stable ...

MUX: Algorithm Selection for Software Model Checkers - Microsoft

On the Prospects for Building a Working Model of the ... - Brown CS