Motivation Our New Method Computational Comparison Conclusion
A Novel Model of Working Set Selection for SMO Decomposition Methods Zhen-Dong Zhao1 Lei Yuan2 Yu-Xuan Wang2 Bao2 Shun-Yi Zhang1 Yan-Fei Sun1 1 Institute
of Information & Network Technology Nanjing University of Posts & Telecomm. China 2 School
of Communications & Information Engineering Nanjing University of Posts & Telecomm. China
International Conference on Tools with AI Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Sheng
Motivation Our New Method Computational Comparison Conclusion
Main Topic
SVM Training Methods: B.Boser[1992], E. Osuna[1997], T.Joachims[1998] SMO Method:J.C.Platt[1999], S.S.Keerthi[2001], C.-J. Lin[2001,2006] The key point of SMO is how to choose 2 α, LIBSVM uses WSS 3 to pick up them. Our work will compare with WSS 3, which is implemented in LIBSVM.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Main Topic
SVM Training Methods: B.Boser[1992], E. Osuna[1997], T.Joachims[1998] SMO Method:J.C.Platt[1999], S.S.Keerthi[2001], C.-J. Lin[2001,2006] The key point of SMO is how to choose 2 α, LIBSVM uses WSS 3 to pick up them. Our work will compare with WSS 3, which is implemented in LIBSVM.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Main Topic
SVM Training Methods: B.Boser[1992], E. Osuna[1997], T.Joachims[1998] SMO Method:J.C.Platt[1999], S.S.Keerthi[2001], C.-J. Lin[2001,2006] The key point of SMO is how to choose 2 α, LIBSVM uses WSS 3 to pick up them. Our work will compare with WSS 3, which is implemented in LIBSVM.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Outline 1
Motivation Traditional Method for Traning SVMs
2
Our New Method Model and Definition Some Properties of WSS-WR Model
3
Computational Comparison Datasets and Experiment Setting Experimental Results
4
Conclusion
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Traditional Method for Traning SVMs
Outline 1
Motivation Traditional Method for Traning SVMs
2
Our New Method Model and Definition Some Properties of WSS-WR Model
3
Computational Comparison Datasets and Experiment Setting Experimental Results
4
Conclusion
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Traditional Method for Traning SVMs
Key Work in SVM Training Quadratic Problem The key work in training SVMs is to solve the follow quadratic optimization problem: 1 T α Qα − eT α 2 0 ≤ αi ≤ C, i = 1, . . . , l
minf (α) = αB
subject to
(1)
yT α = 0 where e is the vector of all ones, C is the upper bound of all variables, and Qij = yi yj K (xi , xj ), K (xi , xj ) is the kernel function.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Traditional Method for Traning SVMs
SMO(Sequential Minimal Optimization) Method Explaination
Figure: Steepest Desent Method in QP
Employing the thinking of Steepest Desent Method, the WSS 3(LIBSVM) adjusts 2 α every time, in another words, adjusts the direction very slowly in every iteration.
Figure: The Model of WSS 3 Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Traditional Method for Traning SVMs
Interesting Phenomena
Interesting Phenomena We notice that several α are selected to optimize the QP problem repeatedly, while others remain untouched. But does this kind of reuse necessary? The second picture indicates that repeatedly selection just reduces this optimal value a little. Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Traditional Method for Traning SVMs
Hypothesis
Hypothesis The direct and simple method is to reduce even eliminate this kind of reselection during the training precedure. Figure: Our direct and simple modification
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Model and Definition Some Properties of WSS-WR Model
Outline 1
Motivation Traditional Method for Traning SVMs
2
Our New Method Model and Definition Some Properties of WSS-WR Model
3
Computational Comparison Datasets and Experiment Setting Experimental Results
4
Conclusion
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Model and Definition Some Properties of WSS-WR Model
Our New Model: Working Sets Selection without Reselection (WSS-WR) Definition T k +1 , k ∈ {1, · · · , 2l } is denoted as optimized set, in which ∀α ∈ T has been selected and optimized once in working set selection. Definition Figure: The Model of WSS-WR
C k +1 ⊂ {1, . . . , l} \ T k is called aviable set, in which ∀α ∈ C has never been selected.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Model and Definition Some Properties of WSS-WR Model
Outline 1
Motivation Traditional Method for Traning SVMs
2
Our New Method Model and Definition Some Properties of WSS-WR Model
3
Computational Comparison Datasets and Experiment Setting Experimental Results
4
Conclusion
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Model and Definition Some Properties of WSS-WR Model
Theoretical Proof
Theorem 1 The values of all selected {αi , αj } ⊂ B will always be 0.
Lemma 1 In the Model of WSS-WR, Iup ≡ I1 and Ilow ≡ I4 .
Theorem 2 The algorithm will be terminated after a maximun of d 2l e iterations.
Lemma 2 In the Model of WSS-WR, α1new = α2new .
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Datasets and Experiment Setting Experimental Results
Outline 1
Motivation Traditional Method for Traning SVMs
2
Our New Method Model and Definition Some Properties of WSS-WR Model
3
Computational Comparison Datasets and Experiment Setting Experimental Results
4
Conclusion
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Datasets and Experiment Setting Experimental Results
Datasets and Kernel Methods
Datasets 1
Datasets in Classification: a1a, w1a, Australian, Splice, Breast-cancer, Diabetes, Fourclass, German.number, Heart
2
Datasets in Regression: MPG, MG
3
Large Datasets: a9a, w8a, IJCNN1
Kernel Methods Kernel methods: RBF, Linear, Polynomial, Sigmoid
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Datasets and Experiment Setting Experimental Results
Experiment Setting
Experiment Methods 1
"Parameter selection" step: Conduct five-fold cross validation to find the best parameters.
2
"Final training" step: Train the whole set with the best parameter to obtain the final model.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Datasets and Experiment Setting Experimental Results
Outline 1
Motivation Traditional Method for Traning SVMs
2
Our New Method Model and Definition Some Properties of WSS-WR Model
3
Computational Comparison Datasets and Experiment Setting Experimental Results
4
Conclusion
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Datasets and Experiment Setting Experimental Results
Cross Validation Accuracy of Classification
method WSS-WR WSS 3
a1a 83.4268 83.8006
w1a 97.7796 97.9814
aust. 86.2319 86.8116
spli. 86 86.8
brea. 97.3646 97.2182
method WSS-WR WSS 3
diab. 77.9948 77.474
four. 100 100
germ. 75.8 77.6
heart 85.1852 84.4444
Table: Accuracy comparison between WSS 3 and WSS-WR (RBF)
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Notes The data in red indicate the datasets in which WSS-WR achieves lower accuracy than WSS 3 does, while the green ones indicate the opposite.
Motivation Our New Method Computational Comparison Conclusion
Datasets and Experiment Setting Experimental Results
Iteration and Time Consumption Ratios of WSS-WR and WSS 3 Notes RBF kernel for the "parameter selection" step (top: with shrinking, bottom: without shrinking).
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Datasets and Experiment Setting Experimental Results
Iteration and Time Consumption ratios of WSS-WR and WSS 3 Notes RBF kernel for the "final training" step (top: with shrinking, bottom: without shrinking).
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Datasets and Experiment Setting Experimental Results
The Effects by Caching Size and Shrinking Techniques method 100M, shrinking 100M, nonshrinking 100K, shrinking 100K, nonshrinking
100M, shrinking 100M, nonshrinking 100K, shrinking 100K, nonshrinking
WSS-WR RBF Linear 56.8936 15.6038 56.8936 15.6038 59.9929 17.7689 59.8346 15.3692 WSS 3 RBF Linear 400.7368 49.5343 505.7596 62.5636 1655.7840 232.5998 1602.4966 533.9386
Poly. 9.3744 9.7964 9.4823 8.9024
Sigm. 15.9825 16.1321 16.7169 16.8676
Poly. 34.0337 34.3321 140.9247 93.3340
Sigm. 29.0977 32.2544 103.3014 220.2574
Table: W1a, comparison of caching and shrinking
Explanation The caching size and shrinking technique almost do not affect the training time in WSS-WR Model, but the training time is obviously changed by those techniques in WSS 3. Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Datasets and Experiment Setting Experimental Results
Experiment on Large Datasets
300MB cache Problem #data a9a 32,561 w8a 49,749 IJCNN1 49,990 1MB cache Problem #data a9a 32,561 w8a 49,749 IJCNN1 49,990
#feat. 123 300 22 #feat. 123 300 22
RBF kernel Iter. Time 0.3439 0.8498 0.0149 0.2106 0.3474 0.7422 RBF kernel Iter. Time 0.0522 0.0687 0.0370 0.0365 0.1187 0.1673
Table: Large datasets: Iteration and time ratios of WSS-WR and WSS 3 for 16-point "parameter selection".
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Datasets and Experiment Setting Experimental Results
Convergence of WSS-WR Notes The comparison of convergence on several datasets between WSS-WR and WSS 3 with RBF kernel. (These datasets are: a1a, w1a, australian and splice, respectively) Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Conclusion
We have Proposed a new Working Set Selection – WSS-WR; Definition and theoretical proof of this model were also given; Full-scale experiments were given to demonstrate that WSS-WR outperforms WSS 3 in almost all of the datasets during the "parameters selection" and "final training" step.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Conclusion
We have Proposed a new Working Set Selection – WSS-WR; Definition and theoretical proof of this model were also given; Full-scale experiments were given to demonstrate that WSS-WR outperforms WSS 3 in almost all of the datasets during the "parameters selection" and "final training" step.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Conclusion
We have Proposed a new Working Set Selection – WSS-WR; Definition and theoretical proof of this model were also given; Full-scale experiments were given to demonstrate that WSS-WR outperforms WSS 3 in almost all of the datasets during the "parameters selection" and "final training" step.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Future Work
This is a prelimiary work; To extend this model to Multi-class classification is our coming work; Theoretical study on the convergence of WSS-WR and to continually improve this model are our future work.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Future Work
This is a prelimiary work; To extend this model to Multi-class classification is our coming work; Theoretical study on the convergence of WSS-WR and to continually improve this model are our future work.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Future Work
This is a prelimiary work; To extend this model to Multi-class classification is our coming work; Theoretical study on the convergence of WSS-WR and to continually improve this model are our future work.
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training
Motivation Our New Method Computational Comparison Conclusion
Thank you
Q&A Thank you! Question? E-Mail:
[email protected]
Zhen-Dong Zhao, Lei Yuan, et al.
WSS-WR for SVM Training