AAAI-08 Chicago
Trace Ratio Criterion for Feature Selection Feiping Nie1, Shiming Xiang1, Yangqing Jia1, Changshui Zhang1, Shuicheng Yan2 1Department
of Automation, Tsinghua University, China
2National
University of Singapore, Singapore
Outline
¾ Feature
Selection ¾ Our Method ¾ Experiments ¾ Conclusion
2
Feature Selection vs. Subspace Learning ¾ Feature
selection is often faster than the corresponding subspace learning algorithm ¾ The result of the selection is physically explainable ¾ We only need to process a small subset of features for further data processing. 3
Feature Selection ¾ Select
a subset of m features from the d original feature set. We denote a selection option by . ¾ It can be viewed as a special subspace learning:
¾ where
the corresponding matrix W is constrained to be a 0-1 “selection” matrix. 4
Selection Matrix ¾ We
define where each column-vector comes from the set
,
¾ Also, 5
An Example ¾ If
we are going to choose two features from the original 3dimensional data, a possible option is:
6
Trace Ratio Criterion for Feature Selection ¾A
general graph-based framework is to maximize a trace-ratio criterion:
¾B
is to reflect the between-class or global affinity relationship of the data, E is to reflect the within-class relationship or the local affinity relationship.
7
Examples ¾ Supervised:
Fisher Score [Bishop 1995], using the within-class and between-class scatter matrices ¾ Unsupervised: Laplacian Score [He, Cai, & Niyogi, 2005], using graph Laplacian and its degree matrix ¾ Semi-supervised: Can readily extended based on this framework. 8
Scores ¾ Subset
Score:
¾ Feature
Score:
¾ The
goal of feature selection is to find the largest subset score 9
Previous Methods ¾ Without
loss of generality, suppose
¾ Then
the first m vectors are selected to form the matrix W. ¾ However, this can actually be viewed as a greedy algorithm that essentially maximizes but not the subset-score
10
Main Goal ¾ We
aim to maximize the trace ratio criterion for feature selection, and finds the global optimum solution
¾ It
appears that we need to search the solution space containing options. 11
The Trace Difference function ¾ Suppose
we have the global optimum solution, then
12
The Trace Difference function ¾ Define
the trace difference function
¾ It
can be verified that f is monotonic decreasing ¾ The trace ratio problem will be equivalent to solving the equation 13
How to calculate f and the corresponding W?
¾ For
a given , we can calculate the trace difference score
¾ and
select the m vectors with the largest score to form W, and calculate the corresponding function value. 14
An iterative Algorithm
15
An Example
16
Datasets
17
UCI Results
18
Face Datasets
19
Conclusion ¾A
general feature selection framework ¾ An algorithm to find the global optimum solution for the subsetscore ¾ Experiments show the superiority of the subset-score.
20
Trace Ratio Criterion for Feature Selection Feiping Nie, Shiming Xiang, Yangqing Jia, Changshui Zhang, Shuicheng Yan
THANK YOU!
21
This page is intentionally left blank
22