Graph-Based Multiprototype Competitive Learning and ... - IEEE Xplore

Viewer
Transcript

934

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

Graph-Based Multiprototype Competitive Learning and Its Applications Chang-Dong Wang, Student Member, IEEE, Jian-Huang Lai, Member, IEEE, and Jun-Yong Zhu, Student Member, IEEE

Abstract—Partitioning nonlinearly separable datasets is a basic problem that is associated with data clustering. In this paper, a novel approach that is termed graph-based multiprototype competitive learning (GMPCL) is proposed to handle this problem. A graph-based method is employed to produce an initial, coarse clustering. After that, a multiprototype competitive learning is introduced to refine the coarse clustering and discover clusters of an arbitrary shape. The GMPCL algorithm is further extended to deal with high-dimensional data clustering, i.e., the fast graphbased multiprototype competitive learning (FGMPCL) algorithm. An experimental comparison has been performed by the exploitation of both synthetic and real-world datasets to validate the effectiveness of the proposed methods. Additionally, we apply our GMPCL/FGMPCL to two computer-vision tasks, namely, automatic color image segmentation and video clustering. Experimental results show that GMPCL/FGMPCL provide an effective and efficient tool with application to computer vision. Index Terms—Competitive learning, graph-based method, multiprototype, nonlinear clustering.

I. INTRODUCTION ATA clustering plays an indispensable role in various fields, such as computer science, medical science, social science, and economics [1]–[8]. Many types of algorithms have been proposed, for instance, prototype-based methods [9]–[12], evolutionary clustering [13]–[17], graph-based methods [18], [19], density-based clustering [20], [21], kernel methods [22]–[24], conceptual clustering [25], and hierarchical clustering [26]. Discovering clusters of an arbitrary shape has been a hot research topic over the past few years, since the development of density-based clustering [20] and kernel methods [22]. It aims at breaking the linearly separable assumption in traditional clustering methods, such as k-means [9] and competitive learning [10], and tries to generate more accurate clustering by allowing cluster shapes to be arbitrary. In this paper, we propose a novel approach that is termed graph-based multiprototype

D

Manuscript received September 18, 2010; revised February 22, 2011, May 11, 2011, and August 7, 2011; accepted October 16, 2011. Date of publication December 20, 2011; date of current version October 12, 2012. This work was supported by the National Science Foundation of China under Grant 61173084 and Grant 61128009. This paper was recommended by Associate Editor G. I. Papadimitriou. C.-D. Wang and J.-H. Lai are with the School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China (e-mail: [email protected]; [email protected]). J.-Y. Zhu is with the School of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou 510006, China (e-mail: jonesjunyong@ gmail.com). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCC.2011.2174633

competitive learning (GMPCL) for partitioning nonlinearly separable datasets. The proposed method is based on the ideas of graph clustering and multiprototype competitive learning. Nonlinear clustering methods include multiprototype clustering (MPC) [27], graph-based methods [18], [19], density-based clustering [20], [21], kernel methods [22]–[24], etc. MPC uses multiple prototypes to generate subregions of the Voronoi diagram, whose boundaries are combined together to separate clusters of an arbitrary shape [27]. Graph-based methods, such as shared nearest neighbor (SNN) clustering [18] and spectral clustering [19], first, construct a graph from the dataset and, then, utilize the notion of useful links between data points or the eigenstructure of affinity matrix to generate clusters of an arbitrary shape. Density-based clustering [20], [21] relies on a density-based notion to find clusters of different sizes and shapes (e.g., DBSCAN [20]). Kernel methods [22]–[24] first map data points into a feature space, where the nonlinear pattern becomes linear, and then perform clustering in this kernel space. These approaches provide various methodologies and ideas for partitioning nonlinearly separable datasets. In this paper, we propose a novel nonlinear clustering algorithm that is named GMPCL. This algorithm employs a graphbased method to generate an initial, coarse clustering. Multiprototype competitive learning is, then, performed to refine the clustering and identify clusters of an arbitrary shape. Therefore, the proposed approach consists of two phases, namely, graphbased initial clustering and multiprototype competitive learning. High-dimensional clustering applications, such as video clustering, are characterized by a high computational load, which is mainly because of the redundant calculation of the distances between high-dimensional points. To tackle this problem, we further extend GMPCL to a fast form, which is termed fast graph-based multiprototype competitive learning (FGMPCL). The basic idea is to eliminate the redundant calculation of the distances between high-dimensional points by using a novel cluster representation that is named multiprototype descriptor. This paper is organized as follows. In Section II, we introduce the related work and the notation. In Section III, we describe the proposed GMPCL method, which is further extended to deal with high-dimensional data clustering in Section IV. Experimental results are reported in Section V. We conclude this paper in Section VI. II. RELATED WORK AND NOTATIONS In the graph-based initial clustering of GMPCL, a concept of vertex energy is introduced to measure: How “important” a vertex is? In the DBSCAN algorithm [20], the density is

1094-6977/$26.00 © 2011 IEEE

WANG et al.: GRAPH-BASED MULTIPROTOTYPE COMPETITIVE LEARNING AND ITS APPLICATIONS

935

measured simply by the number of points within the neighborhood of one point. However, the proposed vertex energy takes into account the correlations between all data points, which results in a global estimate of the vertex energy. Although both of them can be used to discover arbitrarily shaped clusters, the proposed vertex energy is more suitable for datasets containing clusters of differing densities. After a coarse clustering is generated in the first phase, affinity propagation (AP) [11] is utilized to select multiple prototypes for the initial representation of each cluster. AP [11] initially considers all data points as potential prototypes and recursively transmits real-valued messages between data points to generate the most appropriate prototypes, whose number is controlled by the preference value. These multiple prototypes are further updated by classical competitive learning. In [10], the authors proposed a rival penalized competitive learning (RPCL), where for each input not only the winning prototype is learned to adapt to the input but also its rival (the second winner) is delearned in order to eliminate redundant prototypes. Both GMPCL and RPCL [10] use competitive learning to update prototypes. However, RPCL uses rival penalization to eliminate redundant prototypes and is limited to linearly separable datasets; meanwhile, GMPCL relies on graph clustering to initialize a coarse clustering and has the capability of discovering nonlinear clusters. Another work that is related to the proposed GMPCL algorithm is MPC [27]. Both GMPCL and MPC [27] use multiple prototypes to represent a cluster. However, MPC requires to precompute a number of prototypes that are located in the regions of high density and utilizes an agglomerative method to group these prototypes; meanwhile, GMPCL relies on graph clustering to initialize coarse clusters and uses multiprototype competitive learning to refine the coarse clustering. In Table I, we summarize the notations that are used throughout the paper.

TABLE I NOTATIONS THAT ARE USED THROUGHOUT THE PAPER

III. GRAPH-BASED MULTIPROTOTYPE COMPETITIVE LEARNING

The component ei ∈ [0, 1] is the vertex energy of xi , which measures: How “important” xi is? In Fig. 1(a) and (b), we show a smile face dataset and plot its vertex energy, respectively. In [20], the density is measured simply by the number of points within the -neighborhood of one point. However, the proposed vertex energy that is defined in (2) takes into account the correlations between all data points, which results in a global estimate of the vertex energy. Although both of them can be used to discover arbitrarily shaped clusters, the proposed vertex energy is more suitable for datasets that contain clusters of differing densities. A possible limitation is that in an extremely unbalanced dataset, the presence of a cluster that is dense and contains a large number of points will hinder the detection of smaller ones because of the global estimate of the vertex energy. This is a problem to be addressed in our future research. A subset S that comprises of the vertices of high energy is obtained [see Fig. 1(c)], which is termed core-point set. Definition 1: Given a graph, i.e., Ge = (V, A, e), and a per-

The proposed approach consists of two phases, namely, graph-based initial clustering (see Section III-A) and multiprototype competitive learning (see Section III-B). A. Graph-Based Initial Clustering Given a dataset, i.e., D = {x1 , . . . , xn }, of n points in Rd , the first step of the graph-based algorithm is to construct a graph, i.e., Ge = (V, A, e). The vertex set V contains one node for each sample in D. The affinity matrix, i.e., A = [Aij ]n ×n , is defined as 2 exp(− xi − xj ), if xi ∈ Nk (xj ) ∧ xj ∈ Nk (xi ) Aij = 0, otherwise (1) where Nk (xi ) denotes the set that consists of k nearest neighbors of xi . The vertex energy vector, i.e., e = [e1 , . . . , en ] , is defined as j Aij ei = log2 1 + , i = 1, . . . , n. (2) maxl=1,...,n j Alj

centage ρ, the core-point set S is defined as S = {xi |ei ≥ ζ}, with ζ ∈ [0, 1], such that |S|/|V| = ρ.

936

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

Fig. 1. Smile face example. (a) Smile face dataset, different classes are plotted in different markers. (b) Vertex energy of smile face, the color and size of each point is marked according to its vertex energy. (c) Subset S that comprises of vertices of high energy (core points). (d) k-NN graph of S with k = 21.

The core-point connectivity of any two core points p and q in S is defined as follows. Definition 2 (Core-Point Connectivity): Two core points p and q in S are core-point-connected w.r.t. k (denoted as p Sk q) if p, pm = q, there exists a chain of core points p1 , . . . , pm , p1 = such that pi+1 ∈ Nk (pi ) S and pi ∈ Nk (pi+1 ) S. From the viewpoint of density-based clustering [18], [20], the core-point connectivity separates S into some natural subgroups, as shown in Fig. 1(d), which are defined as connected components as follows. Definition 3. A set of c connected components {I1 , . . . , Ic } is obtained by the separation of the core-point set S w.r.t. k, such that ∀i = j, Ii Ij = ∅, S = ci=1 Ii , and ∀p, q ∈ Ii , p Sk q, while ∀p ∈ Ii , ∀q ∈ Ij , i = j, p Sk q does not hold. The connected components {I1 , . . . , Ic } are taken as initial clusters, which will be further refined via multiprototype competitive learning. B. Multiprototype Competitive Learning The initial clusters {I1 , . . . , Ic } that are obtained in the first phase take into account only data points of high energy, and the remaining data points are not assigned with cluster labels. Therefore, the output of the first phase is only a coarse clustering that requires further refinement. Rather than to directly assign the unlabeled data points to the core points as in [18], this section employs classical competitive learning to refine the initial clustering and assign cluster labels to all data points. Experimental results show that the proposed approach can obtain at least 9.8% improvement over the direct assignment. Since the dataset is nonlinearly separable, a nonlinear cluster with concave boundaries would always exist, which cannot be characterized by a single prototype that produces convex boundaries [27]. However, multiple prototypes produce subregions of the Voronoi diagram, which can approximately characterize one cluster of an arbitrary shape. Therefore, we represent each cluster by multiple prototypes. Every point in Ij can be taken as one of the initial prototypes representing the jth cluster Cj . However, there is no need of using so many prototypes to represent one cluster and some of them are more appropriate and more effective than others. These points should be as few as possible to lower the computational complexity of multiprototype competitive learning, meanwhile be scattered in the whole space of the initial clus-

ter in order to suitably characterize the corresponding cluster. AP [11] can generate suitable prototypes to represent an input dataset without preselecting the number of prototypes. In our experiments, the representative points are obtained by applying AP to each Ij . The similarity s(xi , xi ) between xi , xi ∈ Ij is set to − xi − xi 2 and the preferences are set to the median of the similarities, which outputs pj suitable multiprototypes p wj1 , . . . , wj j . This way, we obtain an initial multiprototype set W = {w11 , . . . , w1p 1 , w21 , . . . , w2p 2 , . . . , wc1 , . . . , wcp c }.

represent C1

represent C2

(3)

represent Cc

Throughout the paper, we use the index notation ωjq to denote the multiprototype wjq . That is, referring to the ωjq th multiprototype is equivalent to mentioning wjq , and ω = {ω11 , . . . , ω1p 1 , ω21 , . . . , ω2p 2 , . . . , ωc1 , . . . , ωcp c }. After the initial multiprototype set W is obtained, classical competitive learning is performed to iteratively update the multiprototypes such that the multiprototype objective function is minimized: J(W) =

n

xi − wνυii 2

(4)

i=1 2

where wνυii satisfies ωνυii = arg minω jq ∈ω xi − wjq , i.e., the winning multiprototype of xi that is nearest to xi . For each randomly taken xi , the winning multiprototype ωνυii is selected via the winner selection rule: 2

(5)

wνυii ← wνυii + ηt (xi − wνυii )

(6)

ωνυii = arg min xi − wjq ω jq ∈ω

and is updated by the winner update rule:

with t } satisfying [28]: limt→∞ ηt = 0, ∞ learning rates{η ∞ 2 η = ∞, and t=1 t t=1 ηt < ∞. In practice, ηt = const/t, where “const” is a small constant, e.g., 0.5. In Fig. 2(a), we illustrate the procedure of updating the winning multiprototype. The converged multiprototype set W and the corresponding Voronoi diagram are shown in Fig. 2(b). The multiprototypes that represent different clusters are plotted in different markers. The piecewise linear separator consists of hyperplanes that are shared by two subregions, which are induced by the multiprototypes that represent different clusters.

WANG et al.: GRAPH-BASED MULTIPROTOTYPE COMPETITIVE LEARNING AND ITS APPLICATIONS

937

Fig. 2. Winner update and the clustering result of smile face. (a) Procedure of updating the winning multiprototype, during which both the winning multiprototype and the corresponding lines in the Voronoi diagram move slightly: red “×” mark and dashed lines represent before update, while green “+” mark and lines represent after update (updating the winner). (b) Converged multiprototypes and the corresponding Voronoi diagram: the multiprototypes that represent different clusters are plotted in different markers, and the piecewise linear separator is plotted in red (converged multiprototypes). (c) Final clusters separated by the piecewise linear separator (final partitioning).

the updated wνυii satisfies wνυii ∈ D. No precomputed distance xi − wνυii 2 is available for calculating (5). In our previous work [23], a prototype descriptor W ψ is designed to represent c prototypes {μ1 , . . . , μc } in the kernel space that is induced by a mapping ψ. The prototype descriptor W ψ is a c × (n + 1) matrix, whose rows represent prototypes as the inner products between a prototype and the data points, as well as the squared length of the prototype:

Wψ

This piecewise linear separator is used to identify nonlinearly separable clusters, as shown in Fig. 2(c). In Algorithm 1, we summarize the proposed GMPCL method.

⎛ μ , ψ(x ) 1 1 ⎜ μ2 , ψ(x1 ) =⎜ .. ⎝ .

... ... .. .

μ1 , ψ(xn ) μ2 , ψ(xn ) .. .

μ1 , μ1 ⎞ μ2 , μ2 ⎟ ⎟. .. ⎠ .

μc , ψ(x1 )

...

μc , ψ(xn )

μc , μc

(7) Competitive learning in the kernel space becomes a process of updating W ψ . In this section, inspired by our previous work [23] and [24], we develop a multiprototype descriptor, which is a row-block matrix that is independent of the dimensionality, and extend GMPCL to deal with high-dimensional clustering.

A. Inner-Product-Based Computation IV. FAST GRAPH-BASED MULTIPROTOTYPE COMPETITIVE LEARNING High-dimensional clustering applications, such as video clustering, are characterized by a high computational load, which is mainly because of the redundant calculation of the distances between high-dimensional points in the update procedure of competitive learning. To overcome this problem, an approach that is similar to the kernel trick [22] is considered. First, an inner product matrix, i.e., M = [Mi,j ]n ×n , of the dataset D is computed, such that Mi,j = xi , xj . Then, the computation of xi − xj 2 is efficiently accomplished by xi − xj 2 = Mi,i + Mj,j − 2Mi,j . Thus, the redundant high-dimensional computation is avoided. Unfortunately, it cannot be directly applied in competitive learning because of the incremental update rule. Since the winning multiprototype wνυii is updated by wνυii ← wνυii + ηt (xi − wνυii ), it is unlikely that

According to the initialization of multiprototypes, the initial W satisfies W ⊂ D. The multiprototype descriptor is defined as follows. Definition 4 (Multiprototype descriptor). A multiprototype descriptor is a row-block matrix W of size |W| × (n + 1): ⎞ W1 . W = ⎝ .. ⎠ Wc ⎛

(8)

such that the jth block Wj represents Cj , and the qth row of Wj , q , represents wjq by i.e., Wj,: q Wj,i = wjq , xi , i = 1, . . . , n,

q q q Wj,n +1 = wj , wj (9)

938

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

q q where Wj,i denotes the ith column of Wj,: , i.e., ⎛ ⎞ w11 , x1 w11 , x2 . . . w11 , xn w11 , w11 ⎜ w12 , x1 w12 , x2 . . . w12 , xn w12 , w12 ⎟ ⎜ ⎟ .. .. .. .. .. ⎜ ⎟ ⎜ ⎟ . . . . . ⎜ p1 ⎟ ⎜ w1 , x1 w1p 1 , x2 . . . w1p 1 , xn w1p 1 , w1p 1 ⎟ ⎜ ⎟ ⎜ ⎟ .. .. .. .. .. ⎜ ⎟. . . . . . ⎜ ⎟ ⎜ ⎟ ⎜ w1 , x w1 , x . . . w1 , x ⎟ 1 1 w , w 1 2 n ⎜ ⎟ c c c c c ⎜ w2 , x w2 , x . . . w2 , x ⎟ 2 2 w , w 1 2 n ⎜ ⎟ c c c c c ⎜ ⎟ . . . . . ⎝ ⎠ .. .. .. .. ..

wcp c , x1

wcp c , x2

...

wcp c , xn

wcp c , wcp c

q Using the ω-notation, the ωjq th row, i.e., Wj,: , represents the q multiprototype, i.e., wj . The initial multiprototype descriptor W is obtained as a submatrix of M . In Algorithm 1, three key procedures of multiprototype competitive learning are involved with the redundant computation of distances, which are the winning multiprototype selection, the winner update, and the computation of the sum of prototype update, i.e., L. Based on the multiprototype descriptor, we implement these procedures whose computational complexity is independent of the dimensionality. For detailed proofs of these theorems and lemmas, readers can see the appendix. Theorem 1 (Winner selection rule): The selection of the winning multiprototype ωνυii of xi can be realized by q q (10) ωνυii = arg min Wj,n +1 − 2Wj,i .

ωjq th

ω jq ∈ω

Theorem 2 (Winner update rule): The update of the winning multiprototype ωνυii of xi can be realized by ⎧ (1 − ηt )Wνυii,j + ηt Mi,j , if j = 1, . . . , n ⎪ ⎨ Wνυii,j ← (1 − ηt )2 Wνυii,j + ηt2 Mi,i (11) ⎪ ⎩ υi +2(1 − ηt )ηt Wν i ,i , if j = n + 1. Similar to [23], in one iteration of competitive learning, each data point xi is assigned to exactly one multiprototype. Let the index array, i.e., πjq = [πjq (1), πjq (2), . . . , πjq (mqj )], store the indices of mqj -ordered data points that are assigned to the ωjq th multiprototype in one iteration. For instance, if x1 , x32 , x8 , x20 , and x15 are five ordered data points that are assigned to the ω32 th multiprototype in the tth iteration, then the index array, i.e., π32 = [π32 (1), π32 (2), . . . , π32 (m23 )] = [1, 32, 8, 20, 15] with π32 (1) = 1, π32 (2) = 32, π32 (3) = 8, π32 (4) = 20, π32 (5) = 15, and m23 = 5. The following lemma formulates the cumulative update of the ωjq th multiprototype based on the index array πjq . Lemma 1: In the tth iteration, the relation between the updated ˆ jq is multiprototype wjq and the old w mq

wjq

m qj

= (1 − ηt )

w ˆ jq

+ ηt

j

q

(1 − ηt )m j −l xπ jq (l) .

(12)

l=1

Theorem 3 (Iteration stopping criteria): The iteration stopping criteria of multiprototype competitive learning can be

realized by L ≤ or t ≥ tm ax , where L is computed as ⎛ ⎞ 2 1 q ⎝ 1− Wj,n +1 ⎠ L= m qj (1 − η ) q t ω ∈ω j

mq mq

j j Mπ jq (h),π jq (l)

+ ηt2

ω jq ∈ω h=1 l=1

+ 2ηt

ω jq

∈ω

(1 − ηt )h+l

⎛

⎝ 1−

1

q

q mj Wj,π q (l) j

q

(1 − ηt )m j

l=1

(1 − ηt )l

⎞ ⎠. (13)

B. Fast Graph-Based Multiprototype Competitive Learning in High Dimension Based on the multiprototype descriptor W and the aforementioned theorems, we propose FGMPCL for high-dimensional clustering, which is summarized in Algorithm 2. According to the three theorems, it is easy to prove that FGMPCL generates the same clustering as GMPCL. The analysis of the asymptotic computational complexity reveals that, in highdimensional clustering, i.e., when d n, FGMPCL can save O (tm ax n(d|W| − n)) computations. First, we should note that one computation of the distances or the inner products between all data points is unavoidable, which takes O(n2 d) operations. Our goal here is to eliminate the redundant computation occurring in the procedure of the multiprototype update. The initialization of the multiprototype descriptor W only takes O(|W|(n + 1)) operations (taking |W|(n + 1) entries from the matrix M ). For the tth iteration, the initialization takes O(|W| + 1 + n) operations (O(|W|) for initializing |W| empty index arrays, O(1) for increasing t = t + 1, and O(n) for randomly permuting the dataset). There are n data points, and each

WANG et al.: GRAPH-BASED MULTIPROTOTYPE COMPETITIVE LEARNING AND ITS APPLICATIONS

takes O(|W|) operations to select a winner by (10) and O(n + 1) to update the winner by (11). Thus, O(n(n + 1 + |W|)) operations are needed for the update procedure in one iteration. The computation of (13) takes O(|W| + n2 + n) operations (the first term is O(|W|), the second term is O(n2 ), and the third term is O(n)). Therefore, in one iteration, the total computational complexity is O(|W| + 1 + n) + O(n(n + 1 + |W|)) + O(|W| + n2 + n) = O(n2 ), since |W| < n. Assume that the iteration number reaches the maximum iteration number tm ax , the computational complexity of iteration procedure is O(tm ax n2 ). Therefore, in FGMPCL, the computational complexity of multiprototype competitive learning is O(|W|(n + 1)) + O(tm ax n2 ) = O(tm ax n2 ). However, in GMPCL, the computational complexity of multiprototype competitive learning is O(tm ax nd|W|). In high-dimensional clustering, where d n, it is easy to see that O(tm ax nd|W|) O(tm ax n2 |W|) > O(tm ax n2 ). The fast version can save O(tm ax nd|W|) − O(tm ax n2 ) = O (tm ax n(d|W| − n)) computations when d n. In practice, we suggest making a choice between GMPCL and FGMPCL before performing clustering based on the relation between n and d. If d n, FGMPCL is preferable; Otherwise, use GMPCL to perform clustering.

Fig. 3.

939

Face dataset, different classes are plotted in different markers. TABLE II DATASETS THAT ARE USED IN THE COMPARATIVE EXPERIMENTS

V. EXPERIMENTAL RESULTS AND APPLICATIONS IN COMPUTER VISION In this section, we first present experimental results comparing the proposed GMPCL approach with seven algorithms in the literature, over six datasets (two synthetic and four real). The comparison shows that GMPCL outperforms or is comparable to the state-of-the-art clustering algorithms in terms of accurately identifying nonlinearly separable clusters. We then apply GMPCL/FGMPCL to two computer-vision tasks: automatic color image segmentation and video clustering. The experimental comparison shows that GMPCL/FGMPCL provides an effective tool with application to computer vision. All the experiments are implemented in MATLAB7.8.0.347 (R2009a) 64-bit edition on a workstation (Windows 64 bit, 8 Intel 2 GHz processors, 16 GB of RAM).

A. Experimental Results 1) Datasets: The two synthetic datasets that we used are the smile face dataset [644 points, 4 classes of different sizes, nonlinearly separable, 2 attributes, as shown in Fig. 1(a)] and the face dataset used in [29] (266 points, 3 classes of different sizes, nonlinearly separable, 2 attributes, as shown in Fig. 3). The four real-world datasets that are used are the synthetic control chart time series (sccts) dataset (four statistical features of each series are used as the feature vector, i.e., the mean, the standard deviation, the skewness, and the kurtosis), the glass identification dataset (glass), the pen-based recognition of handwritten digit dataset (pendigits), and the multiple features dataset (mfeat), all from the UCI repository [30]. In Table II, we summarize the properties of the six datasets.

2) Methods and Settings: The algorithms used in the experimental comparison are as follows. 1) Rival penalized competitive learning [10]: The initial cluster number was preselected as larger than actual. The learning rate and the delearning rate were set to 0.05 and 0.0002, respectively, as suggested in [10]. 2) Kernel k-means (kkmeans) [22]: The code was obtained from the Mathworks repository1 and the Gaussian kernel, i.e., κ(ai , aj ) = exp(− ai − aj 2 /2α2 ), was used, where α was determined by trying multiple values and the value that obtained the lowest kernel k-means objective was selected. The initial seeds were randomly initialized, which is commonly used in the k-means-like algorithms. 3) Spectral clustering based on normalized cut (Ncut) [19]: We used the code that is obtained from J. Shi2 and followed the authors’ suggestion to construct the graph by the computation of the weight matrix as the squared D/σ with the scale sigma, i.e., σ = 0.05 max(D), where D is the pairwise Euclidean distance matrix. 4) Graclus [31] is a fast graph clustering software that is provided by I. Dhillon.3 The default settings that are suggested by the authors were used. The graph was constructed the same way as in Ncut. The ratio association

1 www.mathworks.com/matlabcentral/fileexchange/26182-kernel-k-means 2 www.cis.upenn.edu/∼jshi 3 www.cs.utexas.edu/∼inderjit/software.shtml

940

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

TABLE III ESTIMATED CLUSTER NUMBER c AS A FUNCTION OF k IN k-NN GRAPH ON THE SIX DATASETS

was used, and spectral clustering was used at the base clustering phase. 5) Affinity propagation [11]: We used the code that is obtained from B. J. Frey.4 The similarity s(xi , xj ) between two points xi , xj was set to − xi − xj 2 and the preferences were set to the median of the similarities as suggested by the authors. 6) Multiprototype clustering: The multiprototype clustering proposed in [27] was performed and compared. 7) Spectral curvature clustering [29]: We used the code that is obtained from G. Chen.5 For kkmeans, Ncut, Graclus, MPC, and SCC, the best clustering results are reported, where the actual cluster number was provided as the input parameter. The parameters for GMPCL were set as follows: ρ = 0.5, ηt = 0.5/t, = 0.0001, and tm ax = 20 (which is suitable for = 0.0001). 3) Clustering Evaluation: Since the underlying class labels of all datasets are known, we evaluated clustering results via external criteria. Although there exist many external clustering evaluation measurements [32]–[34], as pointed out in [34], mutual information provides a sound indication of the shared information between a pair of clusterings. Normalized mutual information (NMI) [34] is one of the most widely used measure of clustering quality. Given a dataset D of size n, the clustering labels β of c clusters and the actual class labels θ of cˆ classes, we build a confusion matrix, where entry (i, j) defines the num(j ) ber ni of points in cluster i and class j. Then, NMI can be computed from the confusion matrix [34]: 2

cˆ

c l=1

NMI = where H(β) = −

(h )

h=1

nl n

log c

(h )

nl

(h ) ni i= 1

n

cˆ i= 1

(i)

nl

(14)

H(β) + H(θ) c

ni i=1 n

log

ni n

(j )

and H(θ) = −

cˆ j =1

n(j ) n

log n n are the Shannon entropy of cluster labels β and class labels θ, respectively, with ni and n(j ) denoting the number of points in cluster i and class j. A high NMI value indicates that the clustering and underlying class labels match well. See [34] for further details. 4) Determining the Parameters k and ρ: The determination of k is based on the following observation. Let ξ be the distance of a point p to its kth nearest neighbor. It is unlikely that there exist several points with the same distance ξ from p. Additionally, changing k for a point in a cluster does not 4 www.psi.toronto.edu/ 5 www.math.duke.edu/∼glchen/scc.html

result in large changes of ξ, unless the kth nearest neighbors of p for k = 1, 2, 3, . . . are located approximately on a straight line, which is not true in general [20]. When k increases from a small integer to a larger one, the initial cluster number decreases from a large integer to a smaller one, along with which some of the initial cluster numbers may repeat itself consecutively. The number c that repeats itself for the most times is a rational choice for the underlying cluster number. Thus, a simple yet effective approach for the determination of appropriate k is to run graph construction and clustering initialization with different k ranging from 2 to some large integer (e.g., 0.2 × n, since k will not be too large) and set k to the median of the integer set that generates the cluster number successively appearing for the most times. This most consecutively repeated number is taken as the number of clusters. In AP [11], the shared value (i.e., the preference of each point to be a prototype) controls the number of clusters. Similarly, the percentage ρ of the points that constitute the subset S could be small (i.e., only a small number of almost isolated points of quite high energy are taken as core points, resulting in a large number of tiny clusters) or the median (i.e., marginal points are appropriately removed such that the core points that belong to the same underlying cluster are connected, while those that belong to different underlying clusters are disconnected, resulting in a moderate and natural number of clusters) or large (i.e., a large number of almost connected points are taken as core points, resulting in a small number of huge clusters). In Table III, we illustrate the estimated number of clusters when different k is selected. For the smile face dataset, when k is selected from 15 to 26, the correct number of clusters, i.e., 4, can be obtained. For the other five datasets, the values of k that can generate the actual number of clusters are as follows: 9–25 on face, 10–24 on sccts, 5–9 on glass, 7–17 on pendigits, and 6–11 on mfeat. The results show that, on the six datasets, the metaparameter k can be effectively chosen by the procedure that is discussed earlier yielding to a correct estimate of the actual cluster number. In Fig. 4, we plot the means and variances of NMI values (over ten experiments) as a function of k on the six datasets. Different values of k can generate different clustering results. Because with different values of k, different vertex energies are generated, which result in different core-point sets and different core-point connectivity. Although the same number of connected components is obtained, the partitioning of the core-point set into connected components may be different. Empirically, it can be seen that the best clustering results are obtained when k is selected around the median of the integer set that generates

WANG et al.: GRAPH-BASED MULTIPROTOTYPE COMPETITIVE LEARNING AND ITS APPLICATIONS

941

Fig. 4. Means and variances of NMI values (over ten experiments) as a function of k on the six datasets. As comparison, the means and variances of NMI values (over ten experiments) by the Graclus algorithm are taken and plotted as the baseline. (a) Smile face. (b) Face. (c) Sccts. (d) Glass. (e) Pendigits. (f) Mfeat. TABLE IV MEANS AND VARIANCES OF NMI VALUES (OVER 100 EXPERIMENTS) THAT ARE OBTAINED BY THE EIGHT ALGORITHMS ON THE SIX DATASETS

Fig. 5. NMI value as a function of ρ on the six datasets. Almost the best clustering results are obtained when ρ is set to 0.5 on all datasets except sccts.

the cluster number successively appearing for the most times, which confirms the choice of k as discussed earlier. We made an investigation on the performance of GMPCL when selecting different percentages ρ of core points. The NMI value as a function of ρ on the six datasets is shown in Fig. 5 with k automatically estimated as discussed earlier. The corepoint set is empty when ρ = 0, and no cluster is generated. We denote the corresponding NMI as 0. From the figure, when ρ is selected from 0.4 to 0.6, the results by GMPCL are less

sensitive to ρ, and almost the best clustering results can be obtained when ρ is set to 0.5 on all datasets except sccts. According to this empirical study, we suggest ρ = 0.5, which implies ζ = median (e1 , . . . , en ). 5) Comparative Results and Discussion: In Table IV, we list the means and variances of NMI values that are obtained by the eight algorithms on the six datasets. On all datasets except mfeat, GMPCL obtained the highest NMI among the compared algorithms. On the challenging sccts dataset, only Graclus and GMPCL obtained an NMI higher than 0.79 with GMPCL slightly better than Graclus. On the difficult glass dataset, GMPCL was comparable to the second winner Graclus by achieving a 0.5% improvement, and obtained a 28.5% improvement when compared with SCC. On the large dataset pendigits of size 10 992, GMPCL obtained an NMI of 0.809, which is 0.086 higher than the second winner. As the guideline, when there is no prior knowledge of the cluster number but it is possible to estimate the cluster number by the proposed technique6 , GMPCL should be used in place of Graclus. It should be noted that RPCL and AP obtained much lower NMI than the nonlinear clustering approaches. The main reason is that the linear separability assumption does not always hold 6 That is, when performing the graph-based initial clustering of GMPCL with different k, the initial cluster numbers follow the similar pattern as shown in Table III.

942

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

Fig. 6. Some of the image-segmentation results by GMPCL. The first row displays the original images and the second row displays the segmentation results. Each segment (cluster) is painted with its mean color. TABLE V AVERAGE NMI VALUES BY GMPCL AND GCDA ON THE SIX DATASETS

but the linear clustering methods, such as RPCL and AP, strongly rely on this assumption. In addition, the proposed GMPCL generates a higher NMI than its nonlinear clustering counterparts, namely, kkmeans, Ncut, Graclus, MPC, and SCC. We also compared the performances of GMPCL with and without the second phase. That is, after the initial clusters {I1 , . . . , Ic } are obtained, we use two different approaches to assign cluster labels to the remaining data points: one is multiprototype competitive learning, and the other is to directly assign the remaining data points with labels being the same as of the nearest core points. We denote the algorithm with the second assignment approach by graph clustering with direct assignment (GCDA). In Table V, we compare the average NMI values by GMPCL and GCDA on the six datasets. Since each run of GCDA achieves the same result, the variances are omitted in this table. By comparison, we can see that except on the face dataset, GMPCL achieves at least 9.8% improvement over GCDA. The reason for GCDA to obtain an NMI of 1 on face is that the face dataset contains three easily separable classes, as shown in Fig. 3. B. Automatic Color Image Segmentation In this section, we applied the proposed GMPCL to automatic color image segmentation. Our intention here is to demonstrate that GMPCL has the capability of finding visually appealing structures in real color images. The experiment was performed on images from the Berkeley segmentation dataset (BSDS)7 [35]. BSDS contains 300 images of a wide variety of natural scenes, as well as “ground truth” segmentations that are produced by humans [36], aiming at providing an empirical basis for research on image segmentation and boundary detection. The size of each image is either 480 × 320 or 320 × 480, which is too large for directly computing 153 600 pixel feature vectors. Therefore, we resized the images by 0.4 into either 192 × 129 or 129 × 192. We used the 3-D vectors of color features for each 7 www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/segbench/

pixel as the feature vectors to segment a color image. Since the L*a*b color is designed to approximate human vision and suitable for interpreting the real world [37], the coordinates in the L*a*b* color space were used as the features. Thus, for each image, we obtained a dataset D = {xi ∈ R3 : i = 1, . . . , 24 768}. Before applying clustering, smoothing was performed using the averaging filter of size 3 × 3 to avoid the oversegmentation caused by local color variants. Besides NMI, two additional indices for objective evaluation of image segmentation were used. They are the probabilistic Rand index (PRI) and the normalized probabilistic Rand index (NPRI) [38], which are two widely accepted measurements for quantitative comparison between image-segmentation algorithms using “ground truth” segmentations. Let {Sl } denote a set of F manual segmentations of an image. Stest is the segmentation to be compared with the manually labeled set. The PRI of Stest is defined as [38] 1 ci j PRI(Stest , {Sl }) = n (pij (1 − pij )1−c i j ) (15) 2

i< j

where cij denotes the event of a pair of pixels i and j having the same label in the test image Stest , and pij is the probability that a pair of pixels i and j has the same label. They are computed, respectively, as cij = I(giS t e s t = gjS t e s t ) and pij = F1 Fl=1 I(giS l = gjS l ), where giS t e s t and giS l are the region label of pixel i in Stest and Sl , respectively, and the binary operator I(x) returns 1 if x is true and 0 otherwise. To compute NPRI, we need to normalize PRI w.r.t. its baseline as [38] PRI − Expected PRI . (16) NPRI = Maximum PRI − Expected PRI The Maximum PRI is set to 1 and the Expected PRI is computed based on the whole image dataset: 1 (pij pij + (1 − pij )(1 − pij )) (17) Expected PRI = n 2

i< j

F φ S lφ S lφ 1 with pij = Φ1 Φ I(g = g i j ), where Φ is the φ=1 F φ l=1 number of images in a dataset (i.e., 300 in BSDS), and Fφ is the number of “ground truth” segmentations of image φ. Higher PRI and NPRI values indicate better segmentations. See [38] for further details. In Fig. 6, we display some of the segmentation results that are obtained by GMPCL, without any further postprocessing.

WANG et al.: GRAPH-BASED MULTIPROTOTYPE COMPETITIVE LEARNING AND ITS APPLICATIONS

943

TABLE VI MEANS AND VARIANCES OF PRI, NPRI AND NMI, AND THE AVERAGE COMPUTATIONAL TIME IN SECONDS ON IMAGES FROM BSDS†

Fig. 7.

Clustering frames of Sequence 1 into groups of scenes using FGMPCL. “Fade in” and “fade out” occur at the boundaries of different scenes.

In Table VI, we list the means and variances of PRI, NPRI, and NMI, and the average computational time in seconds on 300 images from BSDS. The compared methods include RPCL, kkmeans, Ncut, Graclus, and gPb-owt-ucm. The gPb-owt-ucm algorithm [39] is a two-step image-segmentation method that constructs a hierarchy of regions from the contours that are detected by [40]. The code is obtained from M. Maire.8 The results show that although GMPCL is not the best of all compared methods, it is ranked at the second place and is comparable to the best algorithm. Note that gPb-owt-ucm is coupled to a highperformance contour detector proposed in [40]; meanwhile, the other algorithms only rely on the color information to perform image segmentation.

TABLE VII SUMMARY OF THE VIDEO SEQUENCES THAT ARE USED IN VIDEO CLUSTERING

C. Video Clustering In this section, we report the experimental results for the video clustering task. Video clustering aims at clustering video frames according to different scenes. It plays an important role in automatic video summarization/abstraction as a preprocessing step [41]. Since our intention here is to show that FGMPCL provides an effective and efficient tool for video clustering, it is beyond the scope of this paper to use the domain-specific cues [41]. The gray-scale values of raw pixels were used as the feature vector for each frame. Three video sequences were used: “NASA 25th Anniversary Show, Segment 01” video sequence [42] (Sequence 1), “Interview with Anne Klinefelter and Deborah Gerhardt” video clip [42] (Sequence 2) and “Win a Date With Tad Hamilton” video clip (Sequence 3). Different kinds of integration techniques occur at the boundaries of different scenes. In Table VII, we summarize the properties of the three video sequences. In Fig. 7, we show the segmentation result for Sequence 1 by FGMPCL. There are eight scenes with “fade in” and “fade 8 www.vision.caltech.edu/∼mmaire/grouping.zip

Fig. 8. Clustering frames of Sequence 2 into groups of scenes using FGMPCL. The camera is switched smoothly among the three scenes.

Fig. 9. Clustering frames of Sequence 3 into groups of scenes using FGMPCL. “Cut” occurs at the boundaries of different scenes.

out” occurring at the boundaries of different scenes. In Fig. 8, we show the segmentation result for Sequence 2 by FGMPCL. Since there are only two interviewees, Klinefelter and Gerhardt, the sequence can be divided into three scenes: Gerhardt talking, Klinefelter and Gerhardt discussing, and Klinefelter talking, with the camera switched smoothly among the three scenes. In Fig. 9, we show the segmentation result for Sequence 3 by FGMPCL. It contains three scenes with “Cut” occurring at the boundaries of different scenes.

944

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

TABLE VIII MEANS AND VARIANCES OF NMI VALUES AND THE AVERAGE COMPUTATIONAL TIME IN SECONDS ON THE THREE VIDEO SEQUENCES (OVER TEN EXPERIMENTS) BY THE SIX ALGORITHMS†

For comparison, the “ground truth” segmentation of each video sequence has been manually obtained, according to which the NMI values were computed to compare FGMPCL with five algorithms: RPCL, kkmeans, Ncut, Graclus, and AP. In Table VIII, we list the means and variances of NMI values and the average computational time in seconds of the three video sequences (over ten experiments). The average NMI values reveal that FGMPCL generates the best segmentation among the compared methods. In particular, in Sequence 3, an NMI of 1 is achieved by FGMPCL. From the viewpoint of computational time, FGMPCL is comparable with the fastest methods, i.e., Graclus and AP, while much faster than Ncut and kkmeans.

multiprototype update: ´ νυii , xj Wνυii,j = w

∀j = 1, . . . , n

= wνυii + ηt (xi − wνυii ), xj = (1 − ηt )wνυii , xj + ηt xi , xj ´ νυii , w ´ νυii j = n + 1 Wνυii,j = w = wνυii + ηt (xi − wνυii ), wνυii + ηt (xi − wνυii ) = (1 − ηt )2 wνυii , wνυii + ηt2 xi , xi + 2(1 − ηt )ηt wνυii , xi .

In this paper, a graph-based multiprototype competitive learning (GMPCL) algorithm for partitioning nonlinearly separable datasets has been presented. The proposed algorithm exploits a graph-based approach to produce an initial, coarse clustering; meanwhile, a multiprototype competitive learning has been introduced to refine the clustering and identify clusters of an arbitrary shape. In order to cluster high-dimensional datasets, the fast graph-based multiprototype competitive learning (FGMPCL) has further been proposed, which performs several magnitudes faster than GMPCL when d n. An experimental comparison has been performed exploiting both synthetic and real-world datasets to validate the effectiveness of the proposed method. Additionally, our GMPCL/FGMPCL to two computervision tasks have been applied, including automatic color image segmentation and video clustering, which shows that the proposed GMPCL/FGMPCL provide an efficient and effective tool for computer-vision applications.

C. Proof of Lemma 1 Proof: We use the principle of mathematical induction. One can verify that (12) is true for mqj = 1 from (6): ˆ jq + ηt (xπ jq (1) − w ˆ jq ) wjq = w 1

= (1 − ηt )1 w ˆ jq + ηt

(1 − ηt )1−l xπ jq (l) .

(21)

l=1

Assume that it is true for mqj = m, i.e., we have wjq as wjq = m −l (1 − ηt )m w ˆ jq + ηt m xπ jq (l) . Then, for mqj = l=1 (1 − ηt ) m + 1, i.e., the (m + 1)th point assigned to wjq , from (6), we have ´ jq = wjq + ηt (xπ jq (m qj ) − wjq ) w = (1 − ηt ) (1 −

ηt )m w ˆ jq

+ ηt

APPENDIX

m

(1 − ηt )m −l xπ jq (l)

l=1

+ ηt xπ jq (m qj )

A. Proof of Theorem 1 Proof: Consider the winner selection rule (5), one can obtain 2

= arg min wjq , wjq − 2wjq , xi

= (1 − ηt )m +1 w ˆ jq + ηt

m +1

(1 − ηt )m +1−l xπ jq (l) .

(22)

l=1

ω jq ∈ω

(18)

ω jq ∈ω

which is the formula that is required.

(20)

The proof is complete.

VI. CONCLUSION

ωνυii = arg min xi − wjq

(19)

The aforementioned equation shows that (12) is true for mqj = m + 1. Therefore, by mathematical induction, it is true for all mqj .

D. Proof of Theorem 3 B. Proof of Theorem 2 ´ νυii w

wνυii .

denote the updated multiprototype of Proof: Let Substitute the winner update rule (6) with the following winner

Proof: By Lemma 1, the old w ˆ jq can be obtained from the q m qj x π jq ( l ) wj updated wjq as w ˆ jq = q − ηt m l=1 (1−η t ) l . Substitute it (1−η t )

j

WANG et al.: GRAPH-BASED MULTIPROTOTYPE COMPETITIVE LEARNING AND ITS APPLICATIONS

into L =

ω jq ∈ω

wjq − w ˆ jq 2 ; we have

⎞2 ⎛ m qj q xπ jq (l) wjq w − ⎝ ⎠ L= − η q t j l mj (1 − ηt ) (1 − η ) q t l=1 ω ∈ω j

=

ω jq

⎛

⎝ 1−

∈ω

⎞

2

1

wjq , wjq ⎠

q

(1 − ηt )m j

mq mq

j j xπ jq (h) , xπ jq (l)

+ ηt2

(1 − ηt )h+l

ω jq ∈ω h=1 l=1

+ 2ηt

ω jq

∈ω

⎛

⎝ 1−

1

mq

j wjq , xπ jq (l)

q

(1 − ηt )m j

l=1

(1 − ηt )l

Thus, L can be computed by (13). This ends the proof.

⎞ ⎠.

ACKNOWLEDGMENT The authors would like to thank the Associate Editor and reviewers for their comments, which helped in improving the manuscript. REFERENCES [1] R. Xu and D. Wunsch, II, “Survey of clustering algorithms,” IEEE Trans. Neural Netw., vol. 16, no. 3, pp. 645–678, May 2005. [2] A. M. Mart´ınez and J. Vitri`a, “Clustering in image space for place recognition and visual annotations for human-robot interaction,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 31, no. 5, pp. 669–682, Oct. 2001. [3] Q. Lu and X. Yao, “Clustering and learning Gaussian distribution for continuous optimization,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 35, no. 2, pp. 195–204, May 2005. [4] A. Zakarian, “A new nonbinary matrix clustering algorithm for development of system architectures,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 38, no. 1, pp. 135–141, Jan. 2008. [5] Z. Wang, L. Liu, M.-C. Zhou, and N. Ansari, “A position-based clustering technique for ad hoc intervehicle communication,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 38, no. 2, pp. 201–208, Mar. 2008. ¨ [6] F. Ashraf, T. Ozyer, and R. Alhajj, “Employing clustering techniques for automatic information extraction from HTML documents,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 38, no. 5, pp. 660–673, Sep. 2008. ˇ [7] J. Kubal´ık, P. Tich´y, R. Sindel´ aˇr, and R. J. Staron, “Clustering methods for agent distribution optimization,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, no. 1, pp. 78–86, Jan. 2010. [8] Z. Liang, W. A. Chaovalitwongse, A. D. Rodriguez, D. E. Jeffcoat, D. A. Grundel, and J. K. O. Neal, “Optimization of spatiotemporal clustering for target tracking from multisensor data,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, no. 2, pp. 176–187, Mar. 2010. [9] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, Berkeley, CA: Univ. California Press, 1967, pp. 281–297. [10] L. Xu, A. Krzy˙zak, and E. Oja, “Rival penalized competitive learning for clustering analysis, RBF net, and curve detection,” IEEE Trans. Neural Netw., vol. 4, no. 4, pp. 636–649, Jul. 1993. [11] B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, vol. 315, pp. 972–976, 2007. [12] D. Bacciu and A. Starita, “Competitive repetition suppression (CoRe) clustering: A biologically inspired learning model with application to robust clustering,” IEEE Trans. Neural Netw., vol. 19, no. 11, pp. 1922– 1941, Nov. 2008. [13] S. Bandyopadhyay and U. Maulik, “Nonparametric genetic clustering: Comparison of validity indices,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 31, no. 1, pp. 120–125, Feb. 2001.

945

[14] S.-M. Pan and K.-S. Cheng, “Evolution-based tabu search approach to automatic clustering,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 37, no. 5, pp. 827–838, Sep. 2007. [15] K. Krishna and M. N. Murty, “Genetic k-means algorithm,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 29, no. 3, pp. 433–439, Jun. 1999. [16] C.-H. Cheng, W.-K. Lee, and K.-F. Wong, “A genetic algorithm-based clustering approach for database partitioning,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 32, no. 3, pp. 215–230, Aug. 2002. [17] E. R. Hruschka, R. J. G. B. Campello, A. A. Freitas, and A. C. P. L. F. de Carvalho, “A survey of evolutionary algorithms for clustering,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 39, no. 2, pp. 133–155, Mar. 2009. [18] L. Ert¨oz, M. Steinbach, and V. Kumar, “Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data,” in Proc. 3rd SIAM Int. Conf. Data Mining, 2003, pp. 47–58. [19] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, Aug. 2000. [20] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proc. 2nd Int. Conf. Knowl. Discovery Data Mining, 1996, pp. 226–231. [21] M. Ester, “Density-based clustering,” in Encyclopedia of Database Systems. New York: Springer-Verlag, 2009. [22] B. Sch¨olkopf, A. Smola, and K.-R. M¨uller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Comput., vol. 10, pp. 1299–1319, 1998. [23] C.-D. Wang, J.-H. Lai, and J.-Y. Zhu, “A conscience on-line learning approach for kernel-based clustering,” in Proc. 10th Int. Conf. Data Mining, 2010, pp. 531–540. [24] C.-D. Wang, J.-H. Lai, and J.-Y. Zhu, “Conscience online learning: An efficient approach for robust kernel-based clustering,” Knowl. Inf. Syst., to be published.. [25] G. Biswas, J. B. Weinberg, and D. H. Fisher, “ITERATE: A conceptual clustering algorithm for data mining,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 28, no. 2, pp. 219–230, May 1998. [26] K. A. Heller and Z. Ghahramani, “Bayesian hierarchical clustering,” in Proc. 22nd Int. Conf. Mach. Learn., 2005, pp. 297–304. [27] M. Liu, X. Jiang, and A. C. Kot, “A multi-prototype clustering algorithm,” Pattern Recognit., vol. 42, pp. 689–698, 2009. [28] C. M. Bishop, in Pattern Recognition and Machine Learning, M. Jordan, J. Kleinberg, and B. Sch¨olkopf, Eds. New York: Springer-Verlag, 2006. [29] G. Chen and G. Lerman, “Spectral curvature clustering,” Int. J. Comput. Vis., vol. 81, pp. 317–330, 2009. [30] A. Asuncion and D. Newman. (2007). UCI machine learning repository. [Online]. Available: http://www.ics.uci.edu/mlearn/MLRepository.html [31] I. S. Dhillon, Y. Guan, and B. Kulis, “Weighted graph cuts without eigenvectors: A multilevel approach,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 11, pp. 1944–1957, Nov. 2007. [32] L. Hubert and P. Arabie, “Comparing partitions,” J. Classif., vol. 2, pp. 193–218, 1985. [33] A. Strehl, J. Ghosh, and R. J. Mooney, “Impact of similarity measures on web-page clustering,” in Proc. AAAI Workshop Artificial Intelligence Web Search, 2000, pp. 58–64. [34] A. Strehl and J. Ghosh, “Cluster ensembles: A knowledge reuse framework for combining multiple partitions,” J. Mach. Learn. Res., vol. 3, pp. 583– 617, 2002. [35] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proc. 8th Int. Conf. Comput. Vis., Jul. 2001, vol. 2, pp. 416–423. [36] C. Fowlkes, D. Martin, and J. Malik, “Local figure-ground cues are valid for natural images,” J. Vis., vol. 7, no. 8, pp. 2.1–2.9, 2007.. [37] R. S. Hunter, “Photoelectric color difference meter,” J. Opt. Soc. Amer., vol. 48, no. 12, pp. 985–993, Dec. 1958. [38] R. Unnikrishnan, C. Pantofaru, and M. Hebert, “Toward objective evaluation of image segmentation algorithms,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, pp. 929–944, Jun. 2007. [39] P. Arbel´aez, M. Maire, C. Fowlkes, and J. Malik, “From contours to regions: An empirical evaluation,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2009, pp. 2294–2301. [40] M. Maire, P. Arbel´aez, C. Fowlkes, and J. Malik, “Using contours to detect and localize junctions in natural images,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2008, pp. 1–8. [41] B. T. Truong and S. Venkatesh, “Video abstraction: A systematic review and classification,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 3, no. 1, pp. 1–37, Feb. 2007.

946

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 6, NOVEMBER 2012

[42] Open Video Project, the Interaction Design Laboratory, School of Information and Library Science, University of North Carolina, Chapel Hill. (1998). [Online]. Available: http://www.open-video.org.

Chang-Dong Wang (S’10) received the B.S. degree in applied mathematics and M.Sc. degree in computer science both from the Sun Yat-sen University, Guangzhou, China, in 2008 and 2010, respectively. Since September 2010, he has been working toward the Ph.D. degree from the School of Information Science and Technology, Sun Yat-sen University. From January 2012 to January 2013, he will be a Visiting Student at the University of Illinois at Chicago, Chicago, will be involved in the research on data mining under the guidance of Professor P. S. Yu. He has authored and co-authored several published scientific papers in some international journals and conferences, such as Knowledge and Information Systems, Neurocomputing, and International Conference on Data Mining (ICDM). His current research interests include machine learning and data mining, in particular, focusing on data clustering and its applications. Mr. Wang is the recipient of the IEEE TCII Student Travel Award and his paper won the Honorable Mention for Best Research Paper Awards, in ICDM 2010.

Jian-Huang Lai (M’10) received the M.Sc. degree in applied mathematics and the Ph.D. degree in mathematics both from Sun Yat-sen University, Guangzhou, China, in 1989 and 1999. He joined the Sun Yat-sen University in 1989 as an Assistant Professor, where he is currently a Professor with the Department of Automation of School of Information Science and Technology and the ViceDean of School of Information Science and Technology. He has authored and co-authored more than 80 published scientific papers in the international journals and conferences on image processing and pattern recognition, e.g., IEEE TRANSACTIONS ON NEURAL NETWORKS, IEEE TRANSACTIONS ON IMAGE PROCESSING, IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS—PART B: CYBERNETICS, Pattern Recognition, International Conference on Computer Vision, IEEE Conference on Computer Vision and Pattern Recognition, and International Conference on Data Mining. His current research interests include the areas of digital image processing, pattern recognition, multimedia communication, wavelet, and its applications. Dr. Lai is a Standing Member of the Image and Graphics Association of China, as well as the Standing Director of the Image and Graphics Association of Guangdong.

Jun-Yong Zhu (S’11) received the B.S. and M.S. degrees from the School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, China, in 2008 and 2010, respectively. Currently, he is working toward the PhD degree at the Department of Mathematics, Sun Yat-Sen University. His current research interests include machine learning, transfer learning using auxiliary data, and pattern recognition, such as heterogeneous face recognition.

Graph-Based Multiprototype Competitive Learning and ... - IEEE Xplore

Oct 12, 2012 - to deal with high-dimensional data clustering, i.e., the fast graph- based multiprototype competitive learning (FGMPCL) algorithm.

Download PDF

1001KB Sizes 0 Downloads 261 Views

Report

Graph-Based Multiprototype Competitive Learning and ... - IEEE Xplore

Recommend Documents