A Comparison of Unsupervised Dimension Reduction ...

Viewer
Transcript

2007 IEEE International Conference on Bioinformatics and Biomedicine

A Comparison of Unsupervised Dimension Reduction Algorithms for Classification Jaegul Choo, Hyunsoo Kim, Haesun Park, Hongyuan Zha College of Computing Georgia Institute of Technology 266 Ferst Drive, Atlanta, GA 30332, USA {joyfull,hskim,hpark,zha}@cc.gatech.edu Abstract

two different approaches. The first strategy is feature selection method to simply select subset of genes or proteins to produce best classification results, and the other is feature extraction that maps data points into a lower dimensional space by data transformation procedure. Some of advanced feature selection approaches have been devised and applied in microarray data [33, 36]. However, provided that feature selection methods completely disregard gene or protein data which are not selected, feature extraction method may have more potential. In this paper, we deal only with feature extraction methods, and generally refer to them as DR methods. Among various DR methods, there are linear DR algorithms such as principal component analysis (PCA) [20], multidimensional scaling (MDS) [32], and distance preserving dimension reduction (DPDR) using singular value decomposition [18]. There are also nonlinear DR (NLDR) algorithms such as isometric mapping (Isomap) [31] and locally linear embedding (LLE) [26, 27]. As NLDR methods are recently getting more attention for their advantages over traditional linear methods, there have been several studies to successfully apply them in bioinformatics areas [10, 28, 22]. However, very few of detailed quantitative comparisons between DR methods for classification problems have been done so far [4, 23]. Recently, Lee et al. [19] presented interesting results showing that among various linear and nonlinear methods including PCA, MDS, Isomap, and LLE, nonlinear methods worked better than linear ones in general, and LLE showed the best overall performance when combined with linear SVM classifiers. However, linear SVMs are not capable of discriminating nonlinearly-separable data classes, and thus it is still in cloud which DR methods including DPDR would be suitable for more powerful classifiers that is available to us. Generally, a DR method is applied as a pre-processing step prior to modeling a classifier for the sake of efficiency and noise reduction. In most cases, however, when using one of them, we have to take the risk that they may influence

Distance preserving dimension reduction (DPDR) using the singular value decomposition has recently been introduced. In this paper, for disease diagnosis using gene or protein expression data, we present empirical comparison results between DPDR and other various dimension reduction (DR) methods (i.e. PCA, MDS, Isomap, and LLE) when using support vector machines with radial basis function kernel. Our results show that DPDR outperforms, as a whole, other DR methods in terms of classification accuracy, but at the same time, it gives significant efficiency compared with other methods since it has no parameter to be optimized. Based on these empirical results, we reach a promising conclusion that DPDR is one of the best DR methods at hand for modeling an efficient and distortionfree classifier for gene or protein expression data.

1 Introduction In these days, various fields of study in bioinformatics are accelerating their advancement with the help of various machine learning methods [21, 25]. As a result, one of its major areas, which is disease diagnosis using microarray data that contain the pattern of gene or protein expression of patients, shows accurate and reliable results by applying some of sophisticated statistical techniques. Most of such data usually have a small number of instances (∼100), but a large number of genes or proteins (∼10,000), which comprise a high dimensional feature space. It often causes dramatic inefficiency and noise when applying state-of-the-art classification techniques such as support vector machines (SVMs) [34, 35], and boosting [12] that are computationally intensive. In order to resolve this problem, incorporating dimension reduction (DR) is getting a crucial step in the classification task of microarray data, and there are mainly

0-7695-3031-1/07 $25.00 © 2007 IEEE DOI 10.1109/BIBM.2007.51

71

the data set, while we randomly split into 2/3 training and 1/3 test set for the next two data set(DLBCL, and DLBCLHarvard), and 1/3 training and 2/3 test set for ovarian cancer data set. In order to prevent possible lack of generalization of our results due to random split from fairly small data sets, we made 5 different set of split from both of DLBCL and DLBCL-Harvard data sets, and used their averaged results from 5 trials. Besides those original data sets, we generated two tainted versions of them except for ovarian cancer data set in order to assess robustness in DR methods against outliers and Gaussian noise. On the one hand, we assigned wrong labels to randomly chosen 10% of training data. On the other hand, we added independent and identical Gaussian noise to every element in each instance, of which mean is zero, and variance is 10% of the maximum absolute value among all the elements in each data set. For the data set with outliers, we applied 5 different set of wrong labels, and averaged their results. Among 5 different previouslysplit settings for DLBCL and DLBCL-Harvard data sets, we selected one that produced fairly good accuracy on test data, and then treated them in the same manner as the other data sets.

the classification performance in an undesirable way. In this paper, we present empirical comparisons of DR methods for disease diagnosis application using microarray data combined with radial basis function (RBF) kernel SVMs, which is known as one of the most powerful classifiers and thus widely applied in statistical approaches in microarray data [29, 7, 17, 11]. The rest of this paper is organized as follows. In Section 2, we review dimension reduction algorithms and describe our experiments. In Section 3, we present experimental results and discuss several issues on DR methods for classification. Finally, conclusion is given in Section 4.

2 Experimental Setup As our classifier, we have chosen SVMs with RBF kernel. The detailed description of SVMs can be found in [6, 9]. For our experiments, we used LIBSVM software package [8] that is implemented in C++. Five algorithms of DR methods (DPDR, PCA, MDS, Isomap, and LLE) were implemented by the original authors of the methods except for PCA, and executed in M ATLAB 7.0. All our experiments were performed on a P4 1.6GHz machine with 1GB memory.

2.1

2.2

Dimension Reduction Algorithms

Data Sets DPDR keeps Euclidean distances and cosine similarities unchanged, and it, as long as concerned metrics are Euclidean distances and cosine similarities, enables us to handle data without any distortion as if in the original space, but with highly improved efficiency due to the reduction of data amount to be dealt with. Previously, on the other hand, there have been a lot of DR methods with linear or even nonlinear capabilities. For example, PCA and MDS are widespread linear DR methods. Basically, these methods seek for the best linear transformation on data matrix using their own criterion. PCA uses the singular value decomposition (SVD) [13] of centered data matrix to find out its principal components and the solution minimizes the distortion of Frobenius norm of centered data matrix. MDS deals with the Gram matrix of the data, which contains all the possible inner products between two points, and performs symmetric eigendecomposition on that in order to find out the solution minimizing the distortion of Gram matrix. Beyond linear limitation, recently proposed were several NLDR methods including Isomap and LLE. These NLDR methods have the common characteristic with linear methods in that they are based on a specific kind of proximity information, but they assume that in data structure itself, there is a nonlinearly folded manifold that can be unfolded in the lower dimensional space. More specifically, they derive manifold structure by using local information of each

The data sets we used are summarized in Table 1, and publicly available from [1]. Each data set contains persons’ gene or protein expression levels, and each instance has its corresponding class label information. For instance, ALL-AML Leukemia data set [14] contains 6817 genes information from 72 humans, and each human’s gene expression data has its label information that indicates either Acute Lymphoblastic Leukemia (ALL) or Acute Myeloid Leukemia (AML). MLL Leukemia data set [3] has 3 different classes of ALL, AML, and Mixed-Lineage Leukemia (MLL). Lung cancer data set [15] is composed of 2 different kinds of lung cancers, which are malignant pleural mesothelioma (MPM) and adenocarcinoma (ADCA), and each sample is described by 12533 genes information. Types of Diffuse Large B-cell Lymphoma (DLBCL) data set [2] includes 2 different categories of lymphoma depending on whether it is germinal center B-like or Active Blike. DLBCL-Harvard data set [30] contains gene expression data from 2 kinds of people with DLBCL versus Follicular Lymphoma. Ovarian cancer data set [24] is the collection of proteomic patterns in serum to distinguish ovarian cancer from non-cancer. Note that every data set we used has relatively small number of instances compared to the dimensionality, which is common in gene or protein expression data set. For the first three data sets in Table 1, we used the original splitting between training and test set shown in

72

Data set

Table 1. Description of data sets # data (# training data + # test data)

# classes

ALL-AML Leukemia

72 (38+34)

7129

2

MLL Leukemia

72 (57+15)

12582

3

181 (32+149)

12533

2

Types of Diffuse Large B-cell Lymphoma (DLBCL)

47 (32+15)

4029

2

DLBCL-Harvard

77(52+25)

6817

2

253(84+169)

15154

2

Lung cancer

Ovarian cancer

following.

data point, and after that, construct similarity measures by tracing the path in manifold. In particular, Isomap builds a connectivity graph from k-nearest neighbors, and calculates similarities along the shortest path in the graph which approximate the geodesic distances in the manifold. LLE builds a similar graph, but assigns edge weights by finding the optimal local convex/linear combinations of k-nearest neighbors to represent each original data point, and obtains the lower dimensional representations that minimize the distortion of such nonlinearly constructed proximity matrix.

2.3

# features

C, penalty parameter in SVM : 2i , i = −5, −3, . . . , 15 γ in RBF kernel: 2j , j = −50, −46, . . . , 10 k in k-nearest neighbors (used in LLE and Isomap) : 6, 8, 10, 12 In our setting, the target dimension d is also regarded as one of parameters to be optimized, and its range for each DR methods is in the following. DPDR : n (the total number of instances) PCA : 1, 2, . . ., n MDS : 50, 100, . . ., 1000 LLE : 1, 2, . . ., n − 1 Isomap : 1, 2, . . ., n

Parameter Search and Performance Measure

There exist several parameters to be optimized in order to obtain the best classifier for a specific data set and each DR method. In common, there are a penalty parameter for classification errors, C, and a RBF kernel parameter, γ, in SVM settings [16], and some of DR methods have additional parameters to be optimized, e.g. k value in k-nearest neighbors and/or even the target dimension d. Following the general framework of classification application, we used training set not only for the purpose of building classifiers but also for optimizing parameters. Parameter optimization was done by 10-fold cross-validation(CV) of training set. As our performance measure, we present the best CV rate from training set and the accuracy on test set when the optimized classifier is applied. Before doing that, however, in order to determine a reasonable range of each parameter to be searched for, first we built up linear SVM classifiers using several parameter values, and measured the best accuracy on test set. Then, using this measure as a bottom line, we adjusted the range for each parameter and its exponentially-incremental step size which give better results than the bottom line accuracy in any cases so that we can make sure that an optimized classifier within these parameter ranges is the same as or at least, close to the one that has globally best performances. Through these steps, the range of each parameter used in search is in the

The range of the target dimension d is come from the inherent limitation of each methods except for MDS. Once the parameter ranges to be searched for are determined, we performed CV for all the possible combinations of parameters within those ranges, and selected the parameter sets with the best CV rate for each DR methods. Due to fairly wide range used in parameter search, it is often the case that multiple parameter sets produce the best CV rate. In that case, we built up the classifiers using all these parameter sets, and calculated their averaged accuracy on test set and the standard deviation of those accuracies used in calculating average.

3 Results and Discussion The results on CV rate and averaged test accuracy on the original data sets are shown in Table 2 and Table 3, respectively. Table 4 and Table 5 show the results using tainted data sets either with mislabeling or Gaussian noise. Note that in most cases, zero variations shown in parentheses in Table 3, Table 4, and Table 5 come from multiple number of identical results to be averaged, not from a single result. Finally, Table 6 gives an example of total computing

73

Table 2. Best cross-validation rate on training data DPDR PCA MDS Isomap

LLE

ALL-AML Leukemia

97.3684

97.3684

97.3684

100

97.3684

MLL Leukemia

92.9825

84.2105

92.9825

77.1930

96.4912

Lung cancer

96.875

96.875

96.875

84.375

100

DLBCL

94.375

90.625

94.375

89.375

92.500

DLBCL-Havard

96.1538

95.3846

95.3846

90.7692

96.1538

Ovarian cancer

98.8235

98.8235

98.8235

98.8235

98.8235

Table 3. Average accuracy on test data. Each entry is shown as average accuracy in percentage and its standard deviation in parenthesis.

DPDR

PCA

MDS

Isomap

LLE

97.3529 (.0675)

93.6174 (.2020)

96.1250 (0.0468)

85.6816 (.0513)

86.0662 (.2440)

100 (0)

77.1428 (0.7924)

92.9412 (1.7536)

64.4445 (1.0779)

91.0000 (.9941)

Lung cancer

98.1544 (.1263)

89.9329 (0)

98.5403 (.0076)

71.0799 (2.3737)

97.6102 (.0181)

DLBCL

ALL-AML Leukemia MLL Leukemia

93.3333 (.7787)

83.4961 (.3118)

89.2949 (.0027)

73.0546 (.3545)

76.6802 (.4289)

DLBCL-harvard

96.8000 (0)

96.1358 (.2007)

96.4876 (.0728)

87.9021 (.5668)

91.9634 (.3213)

Ovarian cancer

97.0238 (0)

97.3364 (.0340)

96.9795 (.0663)

88.0952 (0)

89.9733 (.2243)

a matter of fact, DPDR provides distortion-free reduced dimensional representation just like in the original space as long as the classifier uses only Euclidean distances and cosine similarities [18]. MDS also has a similar kind of characteristic in that it tries to preserve inner products between data points as much as possible, and the fairly wide range of the reduced dimensionality parameter up to 1,000 in our experimental setup might enable it to make the distortion sufficiently small, playing a role of a rival to DPDR. More specifically, reduced dimensionality of MDS that produces the best CV rate was usually much higher, e.g. several hundreds, than the number of instances, which is the dimensionality of DPDR. Other DR methods like Isomap and LLE impose their own assumptions about the data structure and their own criterion into optimization in order to obtain the lower dimensional representations of the original data. These approaches bring some kinds of inevitable distortion to the data, and in practical perspective, there is little justification on the fact that our data is well fitted to such assumptions of a specific method, allowing us to ignore the effect of distortion. For instance, LLE worked well in MLL Leukemia and Lung cancer data sets, which seems to successfully recover the embedded data structure, but in case of

time required to construct optimal classifiers using each DR method. Based on these results, we will discuss from the following points of view: (1) classification accuracy, (2) robustness, (3) nonlinearity in data, and (4) computing speed.

3.1

Classification Accuracy

DPDR shows quite good, but not always the best performance in CV rate. The reason why CV results of DPDR do not match test accuracy is that DPDR has no parameter to be tuned. Other methods than DPDR have additional parameters, and by fine tuning them, they may give flexibility to enhance CV rate. However, accuracy results on test set in Table 3 indicate that DPDR, accompanying with MDS, works with the best overall performance on test set. However, Isomap and LLE produce good overall CV rate, but do not maintain their reliable performances in test set. This means fine tuned parameters in other DR methods do not necessarily bring the performance improvement whereas DPDR maintains its generalization ability even with no additional parameters. Regarding such discrepancy, Braga-Neto et al. [5] pointed out the unreliability of cross-validation method to estimate generalization error due to its high variance. As

74

Table 4. Average accuracy on test data when using training data among which 10% are mislabeled. Each entry is shown as average accuracy in percentage and standard deviation in parenthesis.

ALL-AML Leukemia MLL Leukemia

DPDR

PCA

MDS

Isomap

LLE

94.2647 (.3481)

88.6643 (1.0854)

91.8199 (.2750)

77.7760 (.7345)

77.7075 (1.3701)

100 (0)

66.6667 (0)

90.8333 (1.0878)

73.3333 (0)

91.1111 (1.7106)

Lung cancer

87.4720 (.0544)

80.4067 (.0864)

85.6036 (.1369)

58.3806 (.5173)

72.4319 (.4472)

DLBCL

80.8889 (.2595)

73.8182 (.9513)

72.1004 (.0928)

71.4561 (.9825)

69.1285 (1.9710)

DLBCL-harvard

85.2571 (.1522)

85.6467 (1.0982)

87.7015 (.1030)

79.9240 (1.4870)

85.7397 (.4297)

Table 5. Average accuracy on test data when using data set with artificially added Gaussian noise of which mean is zero, and variance is set to be 10% of maximum absolute value of each data set. Each entry is shown as average accuracy in percentage and standard deviation in parenthesis.

DPDR

PCA

MDS

Isomap

LLE

97.3529 (.0675)

93.6913 (.4348)

96.0121 (.0525)

88.3604 (.0564)

85.7369 (.2605)

100 (0)

80.4762 (1.7169)

92.9412 (1.7536)

63.3334 (1.0050)

92.6667 (.2223)

98.1124 (.1366)

89.9329 (0)

98.3892 (.0174)

63.7364 (.3357)

97.3752 (.0012)

DLBCL

73.3333 (0)

70.1333 (.8924)

68.4324 (.2551)

63.9332 (.2154)

72.6984 (.1420)

DLBCL-harvard

92.0000 (0)

98.4762 (1.4481)

92.7188 (.0638)

79.2258 (.5889)

88.6038 (.0837)

ALL-AML Leukemia MLL Leukemia Lung cancer

degradation. On the contrary, regarding robustness against noisy measurement data shown in Table 5, performances are not much degraded compared with Table 3 except for DLBCL data set which contains the smallest number of data. Comparison between DR methods shows similar tendencies as in Table 3, indicating that DPDR, and MDS methods produce consistently good results compared with the other methods. In view of noise reduction effect of DR methods, what we can see is that although DPDR inherits noisy data as it is into reduced space, it does not lose its advantages because SVM can properly handle them without any need of prior noise reduction.

ALL-AML Leukemia and DLBCL data sets, LLE showed poor results, failing in modeling the data structure correctly. In summary, it is noticeable that DPDR consistently works well combined with nonlinear SVMs using RBF kernel.

3.2

Robustness

We performed two different types of experiments shown in Table 4 and Table 5 to assess robustness of DR methods. The former is against mislabeled data, and the latter is against Gaussian noise that may be introduced as measurement errors. Outliers, which are mislabeled data, are expected to degrade classifier’s performance, and we can see that the results are generally lower than that in Table 3. Out of 5 trials per each data set using different selection of outliers, the performance varied quite much during our experiments, and in some cases, the performance appeared to be far less than 50% regardless of DR methods. It is due to sensitivity of SVM that involves only support vectors to construct its own classifier model, and whether an outlier effects as a support vector is responsible for such drastic

3.3

Nonlinearity in Data

By virtue of nonlinear capabilities of classifiers with an appropriate kernel like RBF kernel, SVM can handle any type of nonlinearly separable data effectively. In this sense, nonlinear transformation of data that is hopefully expected to be helpful for classifiers to do their job better can play a role of a superfluous or even harmful factor. Support-

75

Table 6. Computing times (in seconds) for dimension reductions and cross-validations to optimize parameters for ALL-AML data set. all the entries include the total computation time used in the whole range of parameter search described in section 2.

DPDR

PCA

MDS

Isomap

LLE

0.172

47.203

592.906

73.266

186.453

28

1613

1294

6067

6264

29( .5 min)

1650( 28 min)

1887( 31 min)

6140( 102 min)

6450( 108 min)

Dimension Reductions Cross-validations Approx. Total

ing this claim, comparison between linear methods (PCA, MDS) and nonlinear methods (Isomap, LLE) from Table 2, one cannot find any superiority of either side. Then, we do not necessarily have to take advantage of extra capability of NLDR other than the purpose of improving efficiency literally by reducing the amount of data. This philosophy agree with the idea of DPDR method in that it gives just reduced dimensional representations with their Euclidian distances and cosine similarities unchanged.

identical results as in the original space, but also significantly high efficiency both in obtaining reduced dimensional representations and building classifiers. Therefore, DPDR can be a good complement to state-of-the-art classifiers like SVMs with RBF kernel that have high discriminant power and noise handling ability, but requires a lot of computations. In the long run, it can be also widely applied to various data analysis where the number of instances is much smaller than the dimensionality.

3.4

Acknowledgments

Computing Speed

Another issue is the computation amount required to obtain optimal parameter set. One additional parameter to be tuned can make the parameter optimization even more tedious. The best we can do for that is to extend the parameter range to be searched for as much as possible just like what we did in our experiments, but it would be very exhaustive task, and even with doing so, it is very hard to convince ourselves that what we obtained is the best solution because of high dimensional space of parameter set itself. From Table 6, Isomap and LLE that need optimization of two additional parameters (i.e. k in k-nearest neighbors and the target dimension d) took much more computing time than other methods. In contrast, the computing speed of DPDR with no parameter optimization is remarkably fast, and at the same time, the fact that it performs consistently well on test set is enough to put itself with the high priority to be considered in most of classification applications.

This work is supported in part by the National Science Foundation Grants ACI-0305543 and CCF-0621889. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References [1] Kent Ridge bio-medical data set repository. Data sets available at http://sdmc.lit.org.sg/GEDatasets/Datasets.html. [2] A. A. Alizadeh, M. B. Eisen, E. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. Hudson, L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. O. Brown, and L. M. Staudt. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403(6769):503–511, February 2000. [3] S. A. Armstrong, J. E. Staunton, L. B. Silverman, R. Pieters, M. L. den Boer, M. D. Minden, S. E. Sallan, E. S. Lander, T. R. Golub, and S. J. Korsmeyer. Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30:41–47, 2001. [4] T. Balachander, R. Kothari, and H. Cualing. An empirical comparison of dimensionality reduction techniques for pattern classification. In Proc. of the 7th International Con-

4 Conclusion In this paper, we performed the quantitative experimental comparisons among several DR methods (DPDR, PCA, MDS, Isomap, LLE) for classifying gene or protein expression data. Other methods than DPDR are usually hard to optimize, and can even cause unnecessary distortion of data. However, DPDR, as long as the classifier uses only Euclidean or cosine similarity measures, gives not only the

76

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13] [14]

[15]

[16] [17]

[18]

[19]

[20]

ference on Artificial Neural Networks (ICANN 97), volume 1327, pages 589–594. Springer, Berlin, 1997. U. M. Braga-Neto and E. R. Dougherty. Is cross-validation valid for small-sample microarray classification? Bioinformatics, 20(3):374–380, February 2004. C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998. Y. D. Cai, X. J. Liu, X. B. Xu, and G. P. Zhou. Support vector machines for predicting protein structural class. BMC Bioinformatics, 2:3, 2001. C. C. Chang and C. J. Lin. LIBSVM: A library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm. N. Cristianini and J. Shawe-Taylor. Support Vector Machines and other kernel-based learning methods. University Press, Cambridge, 2000. K. Dawson, R. L. Rodriguez, and W. Malyj. Sample phenotype clusters in high-density oligonucleotide microarray data sets are revealed using Isomap, a nonlinear algorithm. BMC Bioinformatics, 6:195, 2005. C. H. Q. Ding and I. Dubchak. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics, 17:349–358, 2001. J. Friedman. Recent advances in predictive (machine) learning. In Statistical Problems in Particle Physics, Astrophysics, and Cosmology, pages 196–207, 2003. G. H. Golub and C. F. van Loan. Matrix Computations, third edition. Johns Hopkins University Press, Baltimore, 1996. T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439):531–537, October 1999. G. J. Gordon, R. V. Jensen, L.-L. Hsiao, S. R. Gullans, J. E. Blumenstock, S. Ramaswamy, W. G. Richards, D. J. Sugarbaker, and R. Bueno. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 62:4963–4967, September 2002. C.-W. Hsu, C.-C. Chang, and C.-J. Lin. A practical guide to support vector classification. S. J. Hua and Z. R. Sun. Support vector machine approach for protein subcellular localization prediction. Bioinformatics, 17:721–728, 2001. H. Kim, H. Park, and H. Zha. Distance preserving dimension reduction for manifold learning. In Proceedings of the 2007 SIAM International Conference on Data Mining (SDM07), 2007. to appear. G. Lee, C. Rodriguez, and A. Madabhushi. An empirical comparison of dimensionality reduction methods for classifying gene and protein expression datasets. In Proceedings of International Symposium on Bioinformatics Research and Applications, pages 170–181, 2007. W. F. Massay. Principal components regression in exploratory statistical research. J. Amer. Statist. Assoc., 60:234–246, 1965.

[21] S. Mitra and Y. Hayashi. Bioinformatics with soft computing. Systems, Man and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 36(5):616–635, 2006. [22] J. Nilsson, T. Fioretos, M. H¨oglund, and M. Fontes. Approximate geodesic distances reveal biologically relevant structures in microarray data. Bioinformatics, 20(6):874–880, 2004. [23] M. Niskanen and O. Silven. Comparison of dimensionality reduction methods for wood surface inspection. In K. W. Tobin, Jr. and F. Meriaudeau, editors, Proceedings of the Society of Photo-Optical Instrumentation Engineers (SPIE), volume 5132, pages 178–188, Apr. 2003. [24] E. Petricoin, A. Ardekani, B. Hitt, P. Levine, V. Fusaro, S. Steinberg, G. Mills, C. Simone, D. Fishman, E. Kohn, et al. Use of proteomic patterns in serum to identify ovarian cancer. The Lancet, 359(9306):572–577, 2002. [25] S. Ray, S. Bandyopadhyay, P. Mitra, and S. Pal. Bioinformatics in neurocomputing framework. Circuits, Devices and Systems, IEE Proceedings [see also IEE Proceedings G-Circuits, Devices and Systems], 152(5):556–564, 2005. [26] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323– 2326, 2000. [27] L. K. Saul and S. T. Roweis. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4:119–155, 2003. [28] C. Shi and L. Chen. Feature dimension reduction for microarray data analysis using locally linear embedding. Proceedings of 3rd Asia-Pacific Bioinformatics Conference, 2005. [29] G. Shieh, Y. C. Jiang, and Y. Shih. Comparison of support vector machines to other classifiers using gene expression data. Communications in Statistics: Simulation and Computation, 35(1):241–256, January 2006. [30] M. A. Shipp, K. N. Ross, P. Tamayo, A. P. Weng, J. L. Kutok, R. C. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, G. S. Pinkus, T. S. Ray, M. A. Koval, K. W. Last, A. Norton, T. A. Lister, J. Mesirov, D. S. Neuberg, E. S. Lander, J. C. Aster, and T. R. Golub. Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 8(1):68–74, January 2002. [31] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000. [32] W. S. Torgerson. Theory & Methods of Scaling. Wiley, New York, 1958. [33] V. Tseng and C. Kao. Efficiently mining gene expression data via a novel parameterless clustering method. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(4):355–365, 2005. [34] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995. [35] V. Vapnik. Statistical learning theory. Wiley New York, 1998. [36] L. Wang, F. Chu, and W. Xie. Accurate cancer classification using expressions of very few genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(1):40–53, January 2007.

77

A Comparison of Unsupervised Dimension Reduction ...

classes, and thus it is still in cloud which DR methods in- cluding DPDR ... has relatively small number of instances compared to the di- mensionality, which is ...

Download PDF

157KB Sizes 0 Downloads 193 Views

Report

A Comparison of Unsupervised Dimension Reduction ...

Recommend Documents