Motivation Our New Method Computational Comparison Conclusion

A Novel Model of Working Set Selection for SMO Decomposition Methods Zhen-Dong Zhao1 Lei Yuan2 Yu-Xuan Wang2 Bao2 Shun-Yi Zhang1 Yan-Fei Sun1 1 Institute

of Information & Network Technology Nanjing University of Posts & Telecomm. China 2 School

of Communications & Information Engineering Nanjing University of Posts & Telecomm. China

International Conference on Tools with AI Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Sheng

Motivation Our New Method Computational Comparison Conclusion

Main Topic

SVM Training Methods: B.Boser[1992], E. Osuna[1997], T.Joachims[1998] SMO Method:J.C.Platt[1999], S.S.Keerthi[2001], C.-J. Lin[2001,2006] The key point of SMO is how to choose 2 α, LIBSVM uses WSS 3 to pick up them. Our work will compare with WSS 3, which is implemented in LIBSVM.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Main Topic

SVM Training Methods: B.Boser[1992], E. Osuna[1997], T.Joachims[1998] SMO Method:J.C.Platt[1999], S.S.Keerthi[2001], C.-J. Lin[2001,2006] The key point of SMO is how to choose 2 α, LIBSVM uses WSS 3 to pick up them. Our work will compare with WSS 3, which is implemented in LIBSVM.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Main Topic

SVM Training Methods: B.Boser[1992], E. Osuna[1997], T.Joachims[1998] SMO Method:J.C.Platt[1999], S.S.Keerthi[2001], C.-J. Lin[2001,2006] The key point of SMO is how to choose 2 α, LIBSVM uses WSS 3 to pick up them. Our work will compare with WSS 3, which is implemented in LIBSVM.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Traditional Method for Traning SVMs

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Traditional Method for Traning SVMs

Key Work in SVM Training Quadratic Problem The key work in training SVMs is to solve the follow quadratic optimization problem: 1 T α Qα − eT α 2 0 ≤ αi ≤ C, i = 1, . . . , l

minf (α) = αB

subject to

(1)

yT α = 0 where e is the vector of all ones, C is the upper bound of all variables, and Qij = yi yj K (xi , xj ), K (xi , xj ) is the kernel function.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Traditional Method for Traning SVMs

SMO(Sequential Minimal Optimization) Method Explaination

Figure: Steepest Desent Method in QP

Employing the thinking of Steepest Desent Method, the WSS 3(LIBSVM) adjusts 2 α every time, in another words, adjusts the direction very slowly in every iteration.

Figure: The Model of WSS 3 Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Traditional Method for Traning SVMs

Interesting Phenomena

Interesting Phenomena We notice that several α are selected to optimize the QP problem repeatedly, while others remain untouched. But does this kind of reuse necessary? The second picture indicates that repeatedly selection just reduces this optimal value a little. Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Traditional Method for Traning SVMs

Hypothesis

Hypothesis The direct and simple method is to reduce even eliminate this kind of reselection during the training precedure. Figure: Our direct and simple modification

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Model and Definition Some Properties of WSS-WR Model

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Model and Definition Some Properties of WSS-WR Model

Our New Model: Working Sets Selection without Reselection (WSS-WR) Definition T k +1 , k ∈ {1, · · · , 2l } is denoted as optimized set, in which ∀α ∈ T has been selected and optimized once in working set selection. Definition Figure: The Model of WSS-WR

C k +1 ⊂ {1, . . . , l} \ T k is called aviable set, in which ∀α ∈ C has never been selected.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Model and Definition Some Properties of WSS-WR Model

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Model and Definition Some Properties of WSS-WR Model

Theoretical Proof

Theorem 1 The values of all selected {αi , αj } ⊂ B will always be 0.

Lemma 1 In the Model of WSS-WR, Iup ≡ I1 and Ilow ≡ I4 .

Theorem 2 The algorithm will be terminated after a maximun of d 2l e iterations.

Lemma 2 In the Model of WSS-WR, α1new = α2new .

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Datasets and Kernel Methods

Datasets 1

Datasets in Classification: a1a, w1a, Australian, Splice, Breast-cancer, Diabetes, Fourclass, German.number, Heart

2

Datasets in Regression: MPG, MG

3

Large Datasets: a9a, w8a, IJCNN1

Kernel Methods Kernel methods: RBF, Linear, Polynomial, Sigmoid

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Experiment Setting

Experiment Methods 1

"Parameter selection" step: Conduct five-fold cross validation to find the best parameters.

2

"Final training" step: Train the whole set with the best parameter to obtain the final model.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Outline 1

Motivation Traditional Method for Traning SVMs

2

Our New Method Model and Definition Some Properties of WSS-WR Model

3

Computational Comparison Datasets and Experiment Setting Experimental Results

4

Conclusion

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Cross Validation Accuracy of Classification

method WSS-WR WSS 3

a1a 83.4268 83.8006

w1a 97.7796 97.9814

aust. 86.2319 86.8116

spli. 86 86.8

brea. 97.3646 97.2182

method WSS-WR WSS 3

diab. 77.9948 77.474

four. 100 100

germ. 75.8 77.6

heart 85.1852 84.4444

Table: Accuracy comparison between WSS 3 and WSS-WR (RBF)

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Notes The data in red indicate the datasets in which WSS-WR achieves lower accuracy than WSS 3 does, while the green ones indicate the opposite.

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Iteration and Time Consumption Ratios of WSS-WR and WSS 3 Notes RBF kernel for the "parameter selection" step (top: with shrinking, bottom: without shrinking).

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Iteration and Time Consumption ratios of WSS-WR and WSS 3 Notes RBF kernel for the "final training" step (top: with shrinking, bottom: without shrinking).

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

The Effects by Caching Size and Shrinking Techniques method 100M, shrinking 100M, nonshrinking 100K, shrinking 100K, nonshrinking

100M, shrinking 100M, nonshrinking 100K, shrinking 100K, nonshrinking

WSS-WR RBF Linear 56.8936 15.6038 56.8936 15.6038 59.9929 17.7689 59.8346 15.3692 WSS 3 RBF Linear 400.7368 49.5343 505.7596 62.5636 1655.7840 232.5998 1602.4966 533.9386

Poly. 9.3744 9.7964 9.4823 8.9024

Sigm. 15.9825 16.1321 16.7169 16.8676

Poly. 34.0337 34.3321 140.9247 93.3340

Sigm. 29.0977 32.2544 103.3014 220.2574

Table: W1a, comparison of caching and shrinking

Explanation The caching size and shrinking technique almost do not affect the training time in WSS-WR Model, but the training time is obviously changed by those techniques in WSS 3. Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Experiment on Large Datasets

300MB cache Problem #data a9a 32,561 w8a 49,749 IJCNN1 49,990 1MB cache Problem #data a9a 32,561 w8a 49,749 IJCNN1 49,990

#feat. 123 300 22 #feat. 123 300 22

RBF kernel Iter. Time 0.3439 0.8498 0.0149 0.2106 0.3474 0.7422 RBF kernel Iter. Time 0.0522 0.0687 0.0370 0.0365 0.1187 0.1673

Table: Large datasets: Iteration and time ratios of WSS-WR and WSS 3 for 16-point "parameter selection".

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Datasets and Experiment Setting Experimental Results

Convergence of WSS-WR Notes The comparison of convergence on several datasets between WSS-WR and WSS 3 with RBF kernel. (These datasets are: a1a, w1a, australian and splice, respectively) Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Conclusion

We have Proposed a new Working Set Selection – WSS-WR; Definition and theoretical proof of this model were also given; Full-scale experiments were given to demonstrate that WSS-WR outperforms WSS 3 in almost all of the datasets during the "parameters selection" and "final training" step.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Conclusion

We have Proposed a new Working Set Selection – WSS-WR; Definition and theoretical proof of this model were also given; Full-scale experiments were given to demonstrate that WSS-WR outperforms WSS 3 in almost all of the datasets during the "parameters selection" and "final training" step.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Conclusion

We have Proposed a new Working Set Selection – WSS-WR; Definition and theoretical proof of this model were also given; Full-scale experiments were given to demonstrate that WSS-WR outperforms WSS 3 in almost all of the datasets during the "parameters selection" and "final training" step.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Future Work

This is a prelimiary work; To extend this model to Multi-class classification is our coming work; Theoretical study on the convergence of WSS-WR and to continually improve this model are our future work.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Future Work

This is a prelimiary work; To extend this model to Multi-class classification is our coming work; Theoretical study on the convergence of WSS-WR and to continually improve this model are our future work.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Future Work

This is a prelimiary work; To extend this model to Multi-class classification is our coming work; Theoretical study on the convergence of WSS-WR and to continually improve this model are our future work.

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

Motivation Our New Method Computational Comparison Conclusion

Thank you

Q&A Thank you! Question? E-Mail: [email protected]

Zhen-Dong Zhao, Lei Yuan, et al.

WSS-WR for SVM Training

A Novel Model of Working Set Selection for SMO ...

Yan-Fei Sun1. 1Institute of Information & Network Technology. Nanjing University of Posts & Telecomm. China. 2School of Communications & Information Engineering. Nanjing University of Posts & Telecomm. China. International Conference on Tools with AI. Zhen-Dong Zhao, Lei Yuan, et al. WSS-WR for SVM Training ...

971KB Sizes 2 Downloads 123 Views

Recommend Documents

A Novel Model of Working Set Selection for SMO ...
the inner product of two vectors, so K is even not positive semi-definite. Then Kii + Kjj − 2Kij < 0 may occur . For this reason, Chen et al. [8], propose a new working set selection named WSS 3. WSS 3. Step 1: Select ats ≡. { ats if ats > 0 τ o

A novel QSAR model for predicting induction of ...
Jun 16, 2006 - Data set. In this QSAR study, 43 biological data from the work of Kemnitzer et al. 16 work were used. The biological activities of these 43 compounds were ..... SAS. 12. Shape attribute. ShpA. 13. Connolly molecular area. MS. 14. Shape

ACTIVE MODEL SELECTION FOR GRAPH ... - Semantic Scholar
Experimental results on four real-world datasets are provided to demonstrate the ... data mining, one often faces a lack of sufficient labeled data, since labeling often requires ..... This work is supported by the project (60675009) of the National.

Inference complexity as a model-selection criterion for ...
I n Pacific Rim International. Conference on ArtificialIntelligence, pages 399 -4 1 0, 1 998 . [1 4] I rina R ish, M ark Brodie, Haiqin Wang, and ( heng M a. I ntelligent prob- ing: a cost-efficient approach to fault diagnosis in computer networks. S

Model Selection for Support Vector Machines
New functionals for parameter (model) selection of Support Vector Ma- chines are introduced ... tionals, one can both predict the best choice of parameters of the model and the relative quality of ..... Computer Science, Vol. 1327. [6] V. Vapnik.

Kin Selection, Multi-Level Selection, and Model Selection
In particular, it can appear to vindicate the kinds of fallacious inferences ..... comparison between GKST and WKST can be seen as a statistical inference problem ...

A Theory of Model Selection in Reinforcement Learning - Deep Blue
seminar course is my favorite ever, for introducing me into statistical learning the- ory and ..... 6.7.2 Connections to online learning and bandit literature . . . . 127 ...... be to obtain computational savings (at the expense of acting suboptimall

A Theory of Model Selection in Reinforcement Learning
4.1 Comparison of off-policy evaluation methods on Mountain Car . . . . . 72 ..... The base of log is e in this thesis unless specified otherwise. To verify,. γH Rmax.

Working Draft For Discussion Purposes Only Draft of model ...
Announcement is submitted to the DPL Website. 1.7 “DPL User” means an Entity that: (a) has committed to offer a License to its Patents and any Patents it may ...

A Partial Set Covering Model for Protein Mixture ...
2009; published online XX 2009. To date, many popular .... experimental study, we found that they almost exhibit identical ...... software [42] and VIPER software [43] to pre-process the ..... Can Yang received the bachelor's degree and master's ...

Quantum Model Selection
Feb 14, 2011 - Quantum Model Selection. Examples. 1. Non-equiliblium states in QFT ωθ = ∫. B ρβ,µ dνθ(β, µ), where β > 0 is the inverse temparature, µ ∈ R another parameter such as the chemical potential. 2. Reducible representations i

A Novel Model for Academic, Transcultural, and Global ICT Education ...
A Novel Model for Academic, Transcultural, and Global I ... mploying the full potential of ICT, ICT4Africa 2013.pdf. A Novel Model for Academic, Transcultural, and Global IC ... employing the full potential of ICT, ICT4Africa 2013.pdf. Open. Extract.

A Novel Model for Academic, Transcultural, and Global ICT ...
A Novel Model for Academic, Transcultural, and Global I ... mploying the full potential of ICT, ICT4Africa 2013.pdf. A Novel Model for Academic, Transcultural, ...

Appendices - A Novel Dynamic Pricing Model for the ...
Appendices - A Novel Dynamic Pricing Model for the Telecommunications Industry.pdf. Appendices - A Novel Dynamic Pricing Model for the ...

A Multi-level Selection Model for the Emergence of ... - Springer Link
and the reputation of the donor and the receiver (see Figure 1). ..... Dellarocas, C.: Sanctioning reputation mechanisms in online trading environments with moral ...

A Multi-level Selection Model for the Emergence of ... - Springer Link
ishment leads us to call this norm stern-judging. Long before the seminal work ..... Each simulation runs for 9000 generations, starting from ran- domly assigned ... IBM-Watson Research. Center, CIRANO working paper, 2002s–75k (2002). 21.

On the Prospects for Building a Working Model of ... - Semantic Scholar
in our pursuit of human-level perception? The full story delves into the role of time, hierarchy, abstraction, complex- ity, symbolic reasoning, and unsupervised ...

A Computational Model of Adaptation to Novel Stable ...
effect and before effect trials were recorded to check that subjects had adapted to ... signal decays with a large time constant as manifested by the deactivation ...

A Novel Lumped Spatial Model of Tire Contact
refer to [9], [12], [13] for the details), but we describe the basic concepts. In a bond .... and the ground) have collided, and they are now in contact over some area ..... as one dynamical system, instead of looking at separate direc- tions (three 

A Computational Model of Adaptation to Novel Stable ...
effect and before effect trials were recorded to check that subjects ... Japan; the Natural Sciences and Engineering Research Council of Canada; and the.

MUX: Algorithm Selection for Software Model Checkers - Microsoft
model checking and bounded or symbolic model checking of soft- ware. ... bines static analysis and testing, whereas Corral performs bounded goal-directed ...

On the Prospects for Building a Working Model of the ... - Brown CS
Google Inc., Mountain View, CA. {gcarroll,rwashington}@google.com. Abstract ..... In Rosenblith, W. A., ed., Sensory Communication. Cambridge, MA: MIT Press.