AAAI-08 Chicago

Trace Ratio Criterion for Feature Selection Feiping Nie1, Shiming Xiang1, Yangqing Jia1, Changshui Zhang1, Shuicheng Yan2 1Department

of Automation, Tsinghua University, China

2National

University of Singapore, Singapore

Outline

¾ Feature

Selection ¾ Our Method ¾ Experiments ¾ Conclusion

2

Feature Selection vs. Subspace Learning ¾ Feature

selection is often faster than the corresponding subspace learning algorithm ¾ The result of the selection is physically explainable ¾ We only need to process a small subset of features for further data processing. 3

Feature Selection ¾ Select

a subset of m features from the d original feature set. We denote a selection option by . ¾ It can be viewed as a special subspace learning:

¾ where

the corresponding matrix W is constrained to be a 0-1 “selection” matrix. 4

Selection Matrix ¾ We

define where each column-vector comes from the set

,

¾ Also, 5

An Example ¾ If

we are going to choose two features from the original 3dimensional data, a possible option is:

6

Trace Ratio Criterion for Feature Selection ¾A

general graph-based framework is to maximize a trace-ratio criterion:

¾B

is to reflect the between-class or global affinity relationship of the data, E is to reflect the within-class relationship or the local affinity relationship.

7

Examples ¾ Supervised:

Fisher Score [Bishop 1995], using the within-class and between-class scatter matrices ¾ Unsupervised: Laplacian Score [He, Cai, & Niyogi, 2005], using graph Laplacian and its degree matrix ¾ Semi-supervised: Can readily extended based on this framework. 8

Scores ¾ Subset

Score:

¾ Feature

Score:

¾ The

goal of feature selection is to find the largest subset score 9

Previous Methods ¾ Without

loss of generality, suppose

¾ Then

the first m vectors are selected to form the matrix W. ¾ However, this can actually be viewed as a greedy algorithm that essentially maximizes but not the subset-score

10

Main Goal ¾ We

aim to maximize the trace ratio criterion for feature selection, and finds the global optimum solution

¾ It

appears that we need to search the solution space containing options. 11

The Trace Difference function ¾ Suppose

we have the global optimum solution, then

12

The Trace Difference function ¾ Define

the trace difference function

¾ It

can be verified that f is monotonic decreasing ¾ The trace ratio problem will be equivalent to solving the equation 13

How to calculate f and the corresponding W?

¾ For

a given , we can calculate the trace difference score

¾ and

select the m vectors with the largest score to form W, and calculate the corresponding function value. 14

An iterative Algorithm

15

An Example

16

Datasets

17

UCI Results

18

Face Datasets

19

Conclusion ¾A

general feature selection framework ¾ An algorithm to find the global optimum solution for the subsetscore ¾ Experiments show the superiority of the subset-score.

20

Trace Ratio Criterion for Feature Selection Feiping Nie, Shiming Xiang, Yangqing Jia, Changshui Zhang, Shuicheng Yan

THANK YOU!

21

This page is intentionally left blank

22

PowerPoint Presentation - Instance-level Multiple ...

denote a selection option by . ➢It can be viewed ... features from the original 3- dimensional data, a possible option is: 6 ... graph Laplacian and its degree matrix.

2MB Sizes 2 Downloads 205 Views

Recommend Documents

No documents