j 0 i jK(xi;xj)

Viewer
Transcript

578

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

Equilibrium-Based Support Vector Machine for Semisupervised Classification Daewon Lee and Jaewook Lee

Abstract—A novel learning algorithm for semisupervised classification is proposed. The proposed method first constructs a support function that estimates a support of a data distribution using both labeled and unlabeled data. Then, it partitions a whole data space into a small number of disjoint regions with the aid of a dynamical system. Finally, it labels the decomposed regions utilizing the labeled data and the cluster structure described by the constructed support function. Simulation results show the effectiveness of the proposed method to label out-of-sample unlabeled test data as well as in-sample unlabeled data. Index Terms—Dynamical systems, inductive learning, kernel methods, semisupervised learning, support vector machines (SVMs).

I. INTRODUCTION In statistical machine learning, there are three different scenarios: supervised learning, unsupervised learning, and semisupervised learning. In supervised learning, a set of labeled data is given, and its task is to construct a classifier that predicts the labels of the future unknown data. In unsupervised learning such as clustering [14], [18], only a set of unlabeled data is given, and the task is to segment unlabeled data into clusters that reflect meaningful structure of the data domain. In semi supervised learning, a set of both labeled and unlabeled data is given, and its task is to construct a better classifier using the set of both labeled and unlabeled data than using only labeled data as in the supervised learning. Recently, semisupervised learning among the aforementioned scenarios has come to occupy an important position in many real-world applications such as bioinformatics, web and text mining, database marketing, face recognition, video-indexing, etc. This is because a large amount of unlabeled data can be easily collected by automated means in many practical learning domains, while labeled data are often difficult, expensive, or time consuming to obtain as they often require the efforts of human experts [12]. Many learning algorithms have been developed to solve semisupervised learning problems and include graph-based model, generative mixture models using expectation–maximization (EM) [5], selftraining, cotraining [12], transductive support vector machine (TSVM) and its variants [1], [4], kernel methods [17], semisupervised clustering methods [13], [19], etc. Most of these existing methods are, however, designed for transductive semisupervised learning. Since transduction is only concerned with predicting the given specific test points (e.g., in-sample unlabeled points in semisupervised learning), it does not provide a straightforward way to make a prediction on out-of-sample points for inductive learning. Although some transductive methods can be extended into inductive ones, their performance on out-of-sample points tends to be poor or inefficient. In this letter, to overcome such difficulties, we propose a novel robust, efficient, and inductive learning algorithm for semisupervised Manuscript received August 11, 2005; revised March 13, 2006 and July 13, 2006; accepted September 28, 2006. This work was supported by the Korean Science and Engineering Foundation (KOSEF) under the Grant R01-2005-00010746-0. The authors are with the Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang, Kyungbuk 790784, Korea (e-mail: [email protected]). Color versions of one or more of the figures in this letter are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNN.2006.889495

learning. The proposed method consists of three phases. In the first phase, we build a support function that characterizes the support of a multidimensional distribution of a given data set consisting of both labeled and unlabeled data. In the second phase, we decompose a whole data space into a small number of separate clustered regions via a dynamical system associated with the constructed support function. Finally, in the third phase, we assign a class label to each decomposed region utilizing the information of their constituent labeled data and the topological and dynamical property of the constructed dynamical system, thereby classifying in-sample unlabeled data as well as unknown out-of-sample data. The detailed procedure of each phase is described in Section II (see Fig. 1).

II. PROPOSED METHOD A. Phase I: Constructing a Trained Gaussian Kernel Support Function via SVDD

Suppose that a set of labeled or unlabeled data f(xi ; yi )gN i=1 X 2 Y is given where xi 2
min R2 + C

N

i=1

i

subject to k8(xj ) 0 ak2

R 2 + ; 0; j

for j

j

= 1; . . . ; N (1)

where a is the center and j are slack variables allowing for soft boundaries. The solution of the primal problem (1) can then be obtained by solving its dual problem

max

W

K (xj ; xj ) j 0

= j

subject to

i j K (x i ; x j ) i;j

0 C;

j

j

= 1;

j = 1; . . . ; N

(2)

j

where the inner product of 8(xi ) 1 8(xj ) is replaced by a kernel K (xi ; xj ). Only those points with 0 < j < C lie on the boundary of the sphere and are called support vectors (SVs). Note that both labeled and unlabeled data are used as a training set in Phase I and any information of the labels yi is not involved and does not affect the solution of (2), which can be easily seen from the form of (2). Now, let its solution be j , j = 1; . . . ; N and J f1; . . . ; N g be the set of the index of nonzeros j . Then, the trained Gaussian kernel support function, defined by the squared radial distance of the image of x from the sphere center, is given by

f (x) := R2 (x) = k8(x) 0 ak2 =K (x; x) 0 2 j K (xj ; x) + j

=1 0 2

1045-9227/$25.00 © 2007 IEEE

j

2J

j e0qkx0x k

i j K (x i ; x j ) i;j

+ i;j

2J

i j e0qkx 0x k

(3)

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

579

Fig. 1. (a) Original data set. and denote unlabeled points and labeled points, respectively. The number for each labeled point represents its class label. (b) Contour map of a trained Gaussian support function constructed in Phase I. (c) Basin cells generated by Phase II. The solid lines represent a set of contours given by fx : f (x) = r g and the dash–dot lines represent the boundaries of basin cells. denotes a stable equilibrium vector, a representative point in each basin cell. (d) Ultimate labeled regions determined by Phase III. The solid lines represent decision boundaries separating the labeled region.

where a widely used Gaussian kernel of the form K (xi ; xj ) = exp(0qkxi 0 xj k2 ) with width parameter q is employed. For an illustration, Fig. 1(b) shows a contour map of a trained Gaussian support function for an original data set including labeled and unlabeled data described in Fig. 1(a). One distinguished feature of the trained kernel support function via SVDD is that cluster boundaries can be constructed by a set of contours that enclose the points in data space given by a set fx : f (x) = rs g where rs = R2 (xi ) for any SV xi . Another distinguished feature is that, in practice, only small portion of the j take nonzero values, which not only simplifies the cluster structure, but also highly reduces the computational burden involved in the computation of f or its derivative. B. Phase II: Decomposing the Data Space Into Separate Clustered Regions via a Dynamical System The objective of Phase II is to decompose the whole data space, say

dynamical system, which will be shown to preserve a topological structure of the clusters described by f in (3):

dx dt

= F (x) := 0x +

j 2J

j (x )x j

where

0qkx0x k j (x) = j e 0qkx0x k : (4) j e j 2J Note that 0 < j (x) < 1 and j 2J j (x) = 1. The existence of a

1
n for each initial condition unique solution (or trajectory) x( ) : x(0) is guaranteed by the smoothness of the function F . A state vector n satisfying the equation F (s) = 0 is called an equilibrium s

2<

vector of (4) and called a (asymptotically) stable equilibrium vector (SEV) if all the eigenvalues of its derivative are positive. Geometrically, for each x, the vector field F (x) in (4) is orthogonal to the hypersurface fy : f (y) = rg where r = f (x) and points inward the surface. This

580

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

Fig. 2. Contour map of the trained kernel support function varying level values. Two connected components in (a) with a level value . connected component in (b) with a level value

R

>R

property makes each trajectory flows inward and remains in one of the clusters described by f , which will be rigorously proved below. The basin of attraction of a stable equilibrium vector s is defined as a set of all the points converging to s when process (4) is applied, i.e.,

A(s) := x(0) 2
A basin cell of a stable equilibrium vector s, one important concept used in this letter, is defined by the closure of the basin A(s) and is denoted by A(s). From the form of F (1) in (4), the basin cell A(s) can be interpreted in the context of clustering a single approximated Gaussian cluster whose center is a stable equilibrium vector s satisfying

s=

j 2J

j e0qks0x k xj

j 2J

: j e0qks0x k

The next result, which serves a theoretical basis of Phase II, shows that a data space can be decomposed into several basin cells under process (4) while preserving a topological structure of the clusters described by the support function f . Theorem 1: Each connected component of the level set Lf (r) := n : f (x) r g for any level value r is positively invariant, i.e., fx 2 < if a point is on a connected component of Lf (r), then its entire positive trajectory lies on the same component when process (4) is applied. Furthermore, the whole data space is composed of the basin cells, i.e.,

<

n

=

M i

A (s i )

(5)

where fsi : i = 1; . . . ; M g is the set of the stable equilibrium vectors of (4). Proof: See Appendix. One nice property of the constructed system (4) is that the topological structure of each level set Lf (r) is preserved under process (4). Another nice property of (4) is that the data space can be decomposed into a small number of disjoint regions (i.e., basin cells) where each

R

are merged into one

region is represented by a stable equilibrium vector. From a computational point of view, we can identify a basin cell which a data point belongs to by locating a stable equilibrium vector that it converges to under (4) without directly determining the exact basin cells. In an illustrative example of Fig. 1(c), we can see that the whole data space is decomposed into 14 disjoint regions (A1–A14) and all the data points within a basin cell converge to a common stable equilibrium vector under process (4). C. Phase III: Classifying the Decomposed Regions Up to now, we have not used any information of labels provided in a labeled data set. In Phase III, we classify each decomposed region, a basin cell constructed in Phase II, with the aid of a labeled data set as follows. First, for each basin cell A(s) that contains at least one labeled data point, we make a majority vote on the labeled data points in it to assign a class label to A(s) (hence, stable equilibrium vector s). All the unlabeled data points in A(s) are then assigned to the same class label. In an illustrative example of Fig. 1(c) and (d), the decomposed region A1 is predicted to class 1, A4 and A7 are predicted to class 2 by the majority votes made on the labeled data points in each region, and so on. Second, for a basin cell A(s) with no labeled data point in it [A2 in Fig. 1(c)], we utilize a cluster structure of the trained Gaussian kernel support function f constructed in Phase I to classify it. To put it concretely, we notice that the level set Lf (rs ) is composed of several disjoint clusters

Lf (rs ) = fx : f (x) rs g = C1 [ 1 1 1 [ Cp

(6)

where rs = R2 (xi ) for some SV xi and each cluster Ci , i = 1; . . . ; p is a connected component of Lf (rs ) [see Fig. 1(c)]. Therefore, if two decomposed regions (i.e., basin cells) share the same cluster, it is natural to assign the same class label to these regions. For an illustration, in Fig. 1(c), the region A2 with no labeled data point in it shares the same cluster with region A6. Therefore, A2 and A6 are assigned to the same class label as shown in Fig. 1(d). At the worst case, when all the decomposed regions sharing the same cluster contain no labeled data, we increase the current level value starting from Rold = rs until the unlabeled component unites with

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

581

TABLE I BENCHMARK DATA DESCRIPTION AND PARAMETER SETTINGS

a labeled component. Then, we assign to the unlabeled component the same class label of the labeled component. To illustrate this, see Fig. 2. In Fig. 2(a), with a level value r = Rold , a left-sided connected component contains no labeled data while a right-sided connected component contains a labeled data point, say “6”. By increasing a level value r from Rold to Rnew , the two connected components are merged into one connected component with a labeled data in it, and assigned the class “6”, as is shown in Fig. 2(b). To identify the connected component of a level set Lf (r), we employ the following reduced-complete graph (R-CG) labeling strategy [2], [6] restricted to the set of the stable equilibrium vectors fsk gM k=1 generating an adjacency matrix Aij between pairs of sk and sl : Aij = 1, if max01 f (si + (1 0 )sj ) r and Aij = 0, otherwise. A pair of decomposed regions A(si ) and A(sj ) is then assigned to the same connected component of Lf (r) if si and sj belong to the same connected components of the graph induced by A. A simple labeling strategy using the nearest neighboring labeled data as in [7] can also be adopted for some data set with well partitioned convex shapes. For a data set with highly curved data distributions, a more robust strategy suggested in [7] may be preferred, but not reported here. After labeling all the decomposed regions, we can use the trained classifier not only to classify the in-sample unlabeled data, but also to predict class labels of future unknown out-of-sample data by applying process (4), which is one distinguished feature of the proposed method. Specifically, for a given test data point, we apply process (4) to this point as an initial guess and locate a stable equilibrium vector to which the test data point converges. Then, we assign the same class label of corresponding stable equilibrium vector to the test data point. III. NUMERICAL RESULTS AND REMARKS In order to evaluate the effectiveness of the proposed method, denoted by “Proposed,” we conducted experiments on 12 data sets and compared their generalization performance with the existing methods. Description of the data sets is given in Table I. “tae,” “ring,” and “sunflower” data sets are artificially generated from the multimodal and nonlinearly separable distributions. “sonar,” “iris,” “wine,” “satimage,” “segment,” and “shuttle” are widely used classification data sets from University of California at Irvine (UCI) repository [20]. “Coil20,” “g50c,” and “Uspst” data sets are taken from [4]. To check the performance of inductive learning, we randomly partition each unlabeled data set into a (in-sample) unlabeled set and an (out-of-sample)

test set as in [17]. (As a result, we obtain somewhat different results from those of [4] in which data sets do not contain test examples.) The main parts of “Proposed” are implemented as follows. Quadratic programming solver for optimizing a kernel support function is basically based on LibSVM library [21]. To find SEVs, an unconstrained minimization solver or a nonlinear equation solver in Matlab optimization toolbox are used. We performed experiments based on the setup of [4] and [17]. The model parameters C , q , C 3 , and are chosen among combinations of values on a finite grid by fivefold cross validation for unlabeled examples. The best probable combinations are reported in Table I. Each data set has 100 random splits in order to partition training set into labeled and unlabeled set. Performance is evaluated by averaged error rate and standard deviation over the 100 split sets to provide an analysis of the statistical relevance of the reported results. Our interest of the experiments is to show the generalization performance of “Proposed” as an inductive method. To do this, we calculate misclassification error rates on unlabeled data and test data, and then compare the performance of “Proposed” with four widely used and very competitive semisupervised learning algorithms: SVM [3], [16], mixed ensemble approach (MEA-EM) [5], rTSVM, and low-density separation (LDS) [4]. SVM constructs an optimal separating hyperplane by using only labeled data. MEA-EM algorithm is a clusteringbased method and closely related to the proposed method. rTSVM is a modified model of TSVM to reduce the time complexity and directly optimizes the objective in the primal by using nonlinear optimization techniques. LDS aims to find a decision boundary to pass low-density regions by the combination of a so-called manifold learning (e.g., Isomap) and rTSVM. The experimental results are presented in Table II. SVM is highly dependent on initial labeled points, hence not stable. MEA-EM shows similar performance in small-scale data sets but in large-scale data sets its performance is degraded. rTSVM has poor performance on nonlinearly separable or merged data (e.g., “iris,” “ring,” and “sunflower”). It is because rTSVM is optimized in the primal without utilizing inner-product kernel. In order to deal with nonlinearly separable data, some preprocessing based on kernel method, for example kernel PCA, is needed [4]. LDS achieves lower (sometimes best) unlabeled error rates on most of data sets but it has several drawbacks. First, a manifold learning part of LDS cannot be extended to out-of-sample test data, so we cannot calculate the new representation of future unseen data, which results in an obstacle to predict its label. As shown in

582

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

TABLE II AVERAGED MISCLASSIFICATION ERROR RATES (%) AND THEIR STANDARD DEVIATIONS ON UNLABELED AND TEST EXAMPLES

Table II, LDS reports only unlabeled errors. Second, it is not available on large-scale data sets (e.g., “satimage” and “shuttle”) due to its heavy complexity and requirement of large memory. On the other hand, “Proposed” yields unlabeled errors similar to, sometimes slightly worse than, LDS, but achieves a good generalization performance for future unseen patterns showing a potentiality as a good alternative for inductive semisupervised learning while LDS is not a straightforward way to classify them because it is transductive. As a result, “Proposed” shows a statistically different performance (better or slightly worse) from other methods on unlabeled in-sample data and a fairly good generalization performance for out-of-sample data.

O(N 3 )[10]. In Phase II, the time complexity of getting the decomposition (5) is O(Nm) where m is the average number of iteration steps converging to SEV and is independent of N and usually takes the value between 5 and 20 [6]. In Phase III, the complexity of labeling the M decomposed regions is O(M 2 ). To put it together, the time complexity of the proposed method is O(N 3 + Nm + M 2 ) ' O(N 3 ) for large-scale data sets. This fact implies that the computing speed of the proposed method is comparable to those of other existing methods such as rTSVM and LDS [4]. IV. CONCLUSION

Remarks 1) Our proposed method shares some similarities with other clustering-based semisupervised algorithms in [5], [13], and [19]. This kind of methods first clusters the given sample data points, and then assigns to each cluster its label based on some prespecified rule with labeled data set. However, most of them focus on clustering in-sample data, not out-of-sample data, thereby concentrating on enhancing the performance of transductive learning rather than inductive learning [17]. Although they can be extended to labeling the entire space that is inductive, by utilizing some simple strategy (e.g., after K -means clustering, for each out-of-sample point, assigns a class label of the nearest cluster center), the performance for out-of-sample data is often not satisfactory if some restrictive cluster assumptions are violated [4], [17]. Moreover, since they should determine the number of components (or clusters) and parameter values, they easily fail if appropriate values are not found. On the other hand, the proposed method employs an SVDD which has a good descriptive ability to describe the data distribution rather than to cluster data points itself. Also it automatically detects the optimized structure of a high-dimensional data distribution with highly nonlinear shape. These properties make it possible for the proposed method to be a competitive method for inductive semisupervised learning by means of a dynamical system process. 2) To analyze the time complexity of the proposed method, let N be the number of training pattern and M ( N ) is the number of SEVs. The proposed method has a quadratic programming (QP) procedure in Phase I and most of QP solvers have time complexity

In this letter, we have proposed an inductive semisupervised learning method. The proposed method first builds a trained Gaussian kernel support function that estimates a support of a data distribution via an SVDD procedure using both labeled and unlabeled data. Then, it decomposes the whole data space into separate clustered regions, i.e., basin cells, with the aid of a dynamical system. Finally, it classifies the decomposed regions utilizing the information of the labeled data and the topological structure of clusters described by the constructed support function. A theoretical basis of the proposed method is also given. Benchmark results demonstrate that the proposed method has competitive performance for inductive learning (i.e., the ability of labeling out-of-sample unlabeled test points as well as in-sample unlabeled points) and is applicable to large-scale data sets. An application of the proposed method to more large-scale practical semisupervised learning problems remains to be investigated. APPENDIX PROOF OF THEOREM 1 Proof: 1) Let x(0) = x0 be a point in a connected component, say C of Lf (r) and x(t) be the trajectory starting at x(0) = x0 . Since

d f (x ) = dt =

0 rf (x)T ddtx j 2J

j e0qkx0x k

x0

j 2J

j (x )x j

2

0

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 2, MARCH 2007

f (x(t)) is a decreasing function of t 2 < and so we have f (x(t)) f (x0 ) for all t 0, or equivalently, fx(t) : t 0g 2 Lf (r). Since fx(t) : t 0g is connected, we should have x(t) 2 C for all t 0. 2) First, we will show that every trajectory is bounded. Let V (x) = 1=2kxk2 and choose R > maxj 2J kxj k. Then, on kxk = R, we have

@ V (x ) = @t =

0 xT ddtx = 0xT 0 k xk2 +

j 2J

x0

j 2J

j (x )x j

j (x )x T x j

0 kxk2 + 1 1 kxk max kxj k < 0: j This implies that the trajectory starting from any point on kxk = R always enters into the bounded set kxk R, which implies that fx(t) : t 0g is bounded. Next, we will show that every bounded trajectory converges to one of the equilibrium vectors. Since fx(t) : t 0g is nonempty and compact, and g (t) = f (x(t)) is a nonincreasing function of t, g is bounded from below because f is continuous. Hence, g (t) has a limit a as t ! 1. Let !(x0 ) be the ! -limit set of x0 . Then, for any p 2 !(x0 ), there exists a sequence ftn g with tn ! 1 and x(tn ) ! p as n ! 1. By the continuity of f , f (p) = limn!1 f (x(tn )) = a. Hence, f (p) = a, for all p 2 ! (x0 ). Since ! (x0 ) is an invariant set, for all x 2 ! (x0 )

@ f (x ) = @t

j 2J

j e0qkx0x k

x0

j 2J

j (x )x j

2

=0

or, equivalently, F (! (x0 )) = 0. Since every bounded trajectory converges to its ! -limit set and x(t) is bounded, x(t) approaches ! (x0 ) as t ! 1. Hence, it follows that every bounded trajectory of system (4) converges to one of the equilibrium vectors. Therefore, the trajectory of x(0) = x0 under process (4) approaches one of its equilibrium vectors, say s. If s is not a stable equilibrium vector, then the region of attraction of s has a dimension less than or equal to n 0 1. Therefore, we have M

< n = A (s i ) i

where fsi : i = 1; . . . ; M g is the set of the stable equilibrium vectors of system (4).

REFERENCES [1] K. P. Bennett and A. Demiriz, “Semi-supervised support vector machines,” in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1999, vol. 11, pp. 368–374. [2] A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, “Support vector clustering,” J. Mach. Learn. Res., vol. 2, pp. 125–137, 2001. [3] C. J. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining Knowl. Disc., vol. 2, no. 2, pp. 121–167, 1998. [4] O. Chapelle and A. Zien, “Semi-supervised classification by low density separation,” in Proc. 10th Int. Workshop Artif. Intell. Statist., 2005, pp. 57–64.

583

[5] E. Dimitriadou, A. Weingessel, and K. Hornik, “A mixed ensemble approach for the semi-supervised problem” [Online]. Available: http:// citeseer.ist.psu.edu/590958.html [6] J. Lee and D. Lee, “An improved cluster labeling method for support vector clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 3, pp. 461–464, Mar. 2005. [7] ——, “Dynamic characterization of cluster structures for robust and inductive support vector clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 11, pp. 1869–1874, Nov. 2006. [8] D. Lee and J. Lee, “A novel semi-supervised learning methods using support vector domain description,” presented at the World Congr. Comput. Intell. (WCCI), Vancouver, BC, Canada, Jul. 16–21, 2006. [9] ——, “Domain described support vector classifier for multi-classification problems,” Pattern Recognit., vol. 40, pp. 41–51, 2007. [10] J. C. Platt, “Fast training of support vector machines using sequential minimal optimization,” in Advances in Kernel Methods: Support Vector Machines. Cambridge, MA: MIT Press, 1999, pp. 185–208. [11] B. Scholkopf, J. Platt, J. Shawe-taylor, A. j. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural Comput., vol. 13, no. 7, pp. 1443–1472, 2001. [12] M. Seeger, “Learning with labeled and unlabeled data,” Univ. Edinburgh, Tech. Rep., 2001. [13] S. Basu, M. Bilenko, and R. J. Mooney, “A probabilistic framework for semi-supervised clustering,” in Proc. 10th ACM SIGKDD Int. Conf. Knowl. Disc. Data Mining, Seattle, WA, Aug. 2004, pp. 59–68. [14] A. Szymkowiak-Have, M. A. Girolami, and J. Larsen, “Clustering via kernel decomposition,” IEEE Trans. Neural Netw., vol. 17, no. 1, pp. 256–264, Jan. 2006. [15] D. M. J. Tax and R. P. W. Duin, “Support vector domain description,” Pattern Recognit. Lett., vol. 20, pp. 1191–1199, 1999. [16] V. N. Vapnik, “An overview of statistical learning theory,” IEEE Trans. Neural Netw., vol. 10, no. 5, pp. 988–999, Sep. 1999. [17] V. Sindhwani, P. Niyogi, and M. Belkin, “Beyond the point cloud: From transductive to semi-supervised learning,” in Proc. 22nd Int. Conf. Mach. Learn., Bonn, Germany, 2005, pp. 824–831. [18] R. Xu and D. Wunsch, II, “Survey of clustering algorithms,” IEEE Trans. Neural Netw., vol. 16, no. 3, pp. 645–678, May 2005. [19] S. Zhong, “Semi-supervised model-based document clustering: A comparative study,” Mach. Learn., vol. 65, no. 1, pp. 3–29, Oct. 2006. [20] Univ. California, Irvine, “UCI repository of machine learning databases” [Online]. Available: http://www.ics.uci.edu/~mlearn/MLRepository.html [21] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines” 2001 [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/ libsvm

AbstractâA novel learning algorithm for semisupervised classification is proposed. The proposed method first constructs a support function that estimates a ...

Download PDF

627KB Sizes 9 Downloads 265 Views

Report

j 0 i jK(xi;xj)

Recommend Documents