Outline

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Regularized k-means clustering of high-dimensional data and its asymptotic consistency Wei

Sun

Department of Statistics Purdue University Jan 25, 2012

Joint with Junhui Wang (UIC) and Yixin Fang (NYU) Wei

Sun

Regularized K-means

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Outline

1

Cluster analysis and K-means clustering

2

Regularized K-means and its implementation

3

Tuning via clustering stability

4

Estimation and selection consistency

5

Simulation study

6

Applications to gene microarray data

Wei

Sun

Regularized K-means

Simulation

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Cluster Analysis Goal: Assign observations into a number of clusters such that observations in the same cluster are similar.

Wei

Sun

Regularized K-means

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Cluster Analysis Goal: Assign observations into a number of clusters such that observations in the same cluster are similar.

Wei

Sun

Regularized K-means

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

K-means Cluster Analysis–STAT 598G n observations: X1 , · · · , Xn with Xi ∈ Rp Number of clusters: K.

The K clusters: A1 , · · · , AK .

Centers: C1 , · · · , CK with Ck ∈ Rp .

Wei

Sun

Regularized K-means

Simulation

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

K-means Cluster Analysis–STAT 598G n observations: X1 , · · · , Xn with Xi ∈ Rp Number of clusters: K.

The K clusters: A1 , · · · , AK .

Centers: C1 , · · · , CK with Ck ∈ Rp . K-means clustering solves min

Ak ,Ck

K X X

k=1 Xi ∈Ak

(t)

kXi − Ck k2 ,

(t)

Given CK , assign AK . (t) (t+1) Fix AK , update CK . Wei

Sun

Regularized K-means

(1)

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

An illustrative example–n = 12, p = 2, K = 3 (Wiki)

Step 1. Randomly select 3 initial centers C1 , C2 , C3 .

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

An illustrative example–n = 12, p = 2, K = 3 (Wiki)

Step 1. Randomly select 3 initial centers C1 , C2 , C3 . Step 2. Create 3 clusters A1 , A2 , and A3 by assigning observations to the nearest center.

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

An illustrative example–n = 12, p = 2, K = 3 (Wiki)

Step 1. Randomly select 3 initial centers C1 , C2 , C3 . Step 2. Create 3 clusters A1 , A2 , and A3 by assigning observations to the nearest center. Step 3. Mean of each cluster becomes new center.

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

An illustrative example–n = 12, p = 2, K = 3 (Wiki)

Step 1. Randomly select 3 initial centers C1 , C2 , C3 . Step 2. Create 3 clusters A1 , A2 , and A3 by assigning observations to the nearest center. Step 3. Mean of each cluster becomes new center. Step 4. Repeat steps 2 and 3 until convergent. Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

When p ≫ n

Euclidean distance becomes less sensitive. Hall et.al.(2005)

Wei

Sun

Regularized K-means

Simulation

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

When p ≫ n

Euclidean distance becomes less sensitive. Hall et.al.(2005) Many variables are redundant and contain no information of the clustering structure, but k-means clustering tends to include all the variables.

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Lymphoma Dataset (Alizadeh et al. 2000)

n = 62 samples, p = 4026 genes, 3 types of lymphomas. Diffuse large B-cell lymphoma (DLBCL): 42 samples Follicular lymphoma (FL): 9 samples B-cell chronic lymphocytic leukemia (CLL): 11 samples.

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Lymphoma Dataset (Alizadeh et al. 2000)

n = 62 samples, p = 4026 genes, 3 types of lymphomas. Diffuse large B-cell lymphoma (DLBCL): 42 samples Follicular lymphoma (FL): 9 samples B-cell chronic lymphocytic leukemia (CLL): 11 samples.

Goals: Cluster the samples and identify the significant genes for detecting DLBCL, FL and CLL respectively.

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Heatmap of Lymphoma Dataset

Wei

Sun

Regularized K-means

Simulation

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Select informative variables

Multiple testing. Donoho and Jin (2008)

Wei

Sun

Regularized K-means

Simulation

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Select informative variables

Multiple testing. Donoho and Jin (2008)

Regularization. LASSO(Tibshirani 1996) Adaptive LASSO(Zou 2006) Group LASSO(Yuan and Lin 2006) Adaptive group LASSO (Wang and Leng 2008)

Wei

Sun

Regularized K-means

Simulation

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Regularized K-means

Our regularized k-means clustering: min

Ak ,Ck

K X X

k=1 Xi ∈Ak

Wei

kXi − Ck k2 +

Sun

p X

J(C(j) ).

j=1

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Regularized K-means

Our regularized k-means clustering: min

Ak ,Ck

K X X

k=1 Xi ∈Ak

kXi − Ck k2 +

p X

J(C(j) ).

j=1

C(j) is the jth variable across all the centers. b(j) ||−1 , and C b(j) is the J(C(j) ) = λj ||C(j) ||, where λj = λ||C unpenalized estimator from standard K-means. b(j) → large λ||C b(j) ||−1 → smaller C(j) . Small C

Wei

Sun

Regularized K-means

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Implementation

An iterative scheme: Ck is fixed, Ak is updated by assigning Xi to the closest cluster. Ak is fixed, the following Lemma suggests that Ck can be solved in a componentwise fashion.

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Lemma Lemma 1 K X X

k=1 Xi ∈Ak

2

kXi − Ck k +

p X

J(C(j) ) =

j=1

p   X (X(j) − LC(j) )T (X(j) − LC(j) ) + J(C(j) ) , j=1

X(j) is jth variable and L is cluster assignment matrix, Lik = I(Xi ∈ Ak ).

When L is fixed, solving the regularized k-means can be simplified to: minC(j) (X(j) − LC(j) )T (X(j) − LC(j) ) + J(C(j) ), ∀j. Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Algorithm of Regularized K-means

Algorithm (0)

(0)

Step 1. Initialize centers C1 , · · · , CK by standard K-means. Step 2. Until the termination condition is met, repeat (t−1)

Given C1

(t−1)

, · · · , CK

, find L(t) .

Given L(t) , update C (t) by minimization on each j.

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Algorithm of Regularized K-means

Algorithm (0)

(0)

Step 1. Initialize centers C1 , · · · , CK by standard K-means. Step 2. Until the termination condition is met, repeat (t−1)

Given C1

(t−1)

, · · · , CK

, find L(t) .

Given L(t) , update C (t) by minimization on each j. Initialization by multiple starts. The iteration stops when L(t) does not change any more. The number of iterations is often ≤ 5.

Wei

Sun

Regularized K-means

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Model tuning–K and λ

Difficulty of tuning clustering algorithm. Absence of an objective judgement (Wang, 2010).

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Model tuning–K and λ

Difficulty of tuning clustering algorithm. Absence of an objective judgement (Wang, 2010).

Clustering stability. Key idea: If we repeatedly draw samples from the same population and apply the clustering algorithm on the same sample with the given K and λ, a good choice of K and λ should produce stable clustering.

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Clustering Stability (Wang, 2010) Assume Z = (X1 , · · · , Xn ) ∼ F (x) with x ∈ Rp . Clustering assignment ψ(x): Rp → {1, · · · , K}. Clustering algorithm Ψ(Z; K, λ) yields a clustering assignment ψ(Z) when applied to a sample Z.

Wei

Sun

Regularized K-means

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Clustering Stability (Wang, 2010) Assume Z = (X1 , · · · , Xn ) ∼ F (x) with x ∈ Rp . Clustering assignment ψ(x): Rp → {1, · · · , K}. Clustering algorithm Ψ(Z; K, λ) yields a clustering assignment ψ(Z) when applied to a sample Z. Clustering distance between ψ1 (x) and ψ2 (x) h i d(ψ1 , ψ2 ) = P r I{ψ1 (X) = ψ1 (Y )} + I{ψ2 (X) = ψ2 (Y )} = 1 where I(·) is an indicator function, and X and Y are random samples from F (x).

Wei

Sun

Regularized K-means

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Clustering Stability (Wang, 2010) Assume Z = (X1 , · · · , Xn ) ∼ F (x) with x ∈ Rp . Clustering assignment ψ(x): Rp → {1, · · · , K}. Clustering algorithm Ψ(Z; K, λ) yields a clustering assignment ψ(Z) when applied to a sample Z. Clustering distance between ψ1 (x) and ψ2 (x) h i d(ψ1 , ψ2 ) = P r I{ψ1 (X) = ψ1 (Y )} + I{ψ2 (X) = ψ2 (Y )} = 1 where I(·) is an indicator function, and X and Y are random samples from F (x). d(ψ1 , ψ2 ) measures the probability of their disagreement. Example: ψ1 = (1, 1, 1), ψ2 = (1, 2, 2), d(ψ1 , ψ2 ) = 23 . Wei

Sun

Regularized K-means

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Clustering Stability (Wang, 2010)

Clustering instability of Ψ(·; K, λ) S(Ψ, K, λ, n) = E(d{Ψ(Z1 ; K, λ), Ψ(Z2 ; K, λ)}), where Ψ(Z1 ; K, λ) and Ψ(Z2 ; K, λ) are clusterings obtained by applying Ψ(·; K, λ) to two samples Z1 and Z2 .

Wei

Sun

Regularized K-means

(2)

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Clustering Stability (Wang, 2010)

Clustering instability of Ψ(·; K, λ) S(Ψ, K, λ, n) = E(d{Ψ(Z1 ; K, λ), Ψ(Z2 ; K, λ)}), where Ψ(Z1 ; K, λ) and Ψ(Z2 ; K, λ) are clusterings obtained by applying Ψ(·; K, λ) to two samples Z1 and Z2 . Distribution F(x) is unknown.

Wei

Sun

Regularized K-means

(2)

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Clustering Stability (Wang, 2010)

Clustering instability of Ψ(·; K, λ) S(Ψ, K, λ, n) = E(d{Ψ(Z1 ; K, λ), Ψ(Z2 ; K, λ)}), where Ψ(Z1 ; K, λ) and Ψ(Z2 ; K, λ) are clusterings obtained by applying Ψ(·; K, λ) to two samples Z1 and Z2 . Distribution F(x) is unknown. The sample size n is relatively small compared with p.

Wei

Sun

Regularized K-means

(2)

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Estimation based on Bootstrap

Generate bootstrap samples of same size n, Z1∗b , Z2∗b , Z3∗b .

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Estimation based on Bootstrap

Generate bootstrap samples of same size n, Z1∗b , Z2∗b , Z3∗b . Construct clusterings Ψ(Z1∗b ; K, λ) and Ψ(Z2∗b ; K, λ).

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Estimation based on Bootstrap

Generate bootstrap samples of same size n, Z1∗b , Z2∗b , Z3∗b . Construct clusterings Ψ(Z1∗b ; K, λ) and Ψ(Z2∗b ; K, λ). Estimate S(Ψ, K, λ, n) on Z3∗b as the distance between Ψ(Z1∗b ; K, λ) and Ψ(Z2∗b ; K, λ)

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Estimation based on Bootstrap

Generate bootstrap samples of same size n, Z1∗b , Z2∗b , Z3∗b . Construct clusterings Ψ(Z1∗b ; K, λ) and Ψ(Z2∗b ; K, λ). Estimate S(Ψ, K, λ, n) on Z3∗b as the distance between Ψ(Z1∗b ; K, λ) and Ψ(Z2∗b ; K, λ) b = mode{K b λ , λ > 0} where K b λ = mode{K b ∗1 , · · · , K b ∗B }, K λ λ ∗b ∗b b = argmin2≤K≤K.maxSb (Ψ, K, λ, n). and K λ

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Estimation based on Bootstrap

Generate bootstrap samples of same size n, Z1∗b , Z2∗b , Z3∗b . Construct clusterings Ψ(Z1∗b ; K, λ) and Ψ(Z2∗b ; K, λ). Estimate S(Ψ, K, λ, n) on Z3∗b as the distance between Ψ(Z1∗b ; K, λ) and Ψ(Z2∗b ; K, λ) b = mode{K b λ , λ > 0} where K b λ = mode{K b ∗1 , · · · , K b ∗B }, K λ λ ∗b ∗b b = argmin2≤K≤K.maxSb (Ψ, K, λ, n). and K λ

b = mode{λ b∗1 , · · · , λ b∗B }, where b λ Given K, ∗b ∗b b = argminλ Sb (Ψ, K, b λ, n). λ

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Asymptotic Consistency–Fixed p

Theorem 1. Estimation Consistency √ Under regularity assumptions, if nλ → 0 and nλ → ∞, then b → C¯ a.s. and kC b − Ck ¯ = Op (n1/2 λ). C Theorem 2. Selection Consistency √ Under regularity assumptions, if nλ → 0 and nλ → ∞, then b(j) k = 0) → 1 for any j ∈ Ac , where Ac is the P (kC non-informative variable set.

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Asymptotic Consistency–Diverging p: p < O(n1/3)

Theorem 3. Estimation Consistency √ Under regularity assumptions, if nλp → 0 and n−2 λ−2 p → 0 as b → C¯ almost surely and kC b − Ck ¯ = Op (n1/2 λp−1 ). n → ∞, then C Theorem 4. Selection Consistency Under regularity assumptions, if n1/2 λp → 0 and n−2 λ−2 p → 0 as b(j) k = 0) → 1 for any j ∈ Ac . n → ∞, then P (kC

Wei

Sun

Regularized K-means

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Simulation 1: K is known n = 80, K = 4, p = 50, 200, 500, 1000, µ = 0.4, 0.6, 0.8. First 50 informative variables ∼ N (µkj , 1) Variable 1-25 26-50

Cluster 1 µ −µ

Cluster 2 −µ −µ

Cluster 3 −µ µ

Cluster 4 µ µ

The remaining p − 50 uninformative variables ∼ N (0, 1).

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Simulation 1: K is known n = 80, K = 4, p = 50, 200, 500, 1000, µ = 0.4, 0.6, 0.8. First 50 informative variables ∼ N (µkj , 1) Variable 1-25 26-50

Cluster 1 µ −µ

Cluster 2 −µ −µ

Cluster 3 −µ µ

Cluster 4 µ µ

The remaining p − 50 uninformative variables ∼ N (0, 1).

Compare with K-means and sparse k-means (Witten and Tibshirani, 2010) All algorithms are randomly started 100 times. Grid search for λ.

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Case I: µ = 0.4

Wei

Sun

Regularized K-means

Simulation

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Case II: µ = 0.6

Wei

Sun

Regularized K-means

Simulation

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Case III: µ = 0.8

Wei

Sun

Regularized K-means

Simulation

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Performance of variable selection

µ 0.4

0.6

0.8

Methods K-means Sparse Regularized K-means Sparse Regularized K-means Sparse Regularized

p=50 50(0) 33.3(3.05) 36.6(1.68) 50(0) 45.2(0.99) 49.8(0.12) 50(0) 46.4(1.09) 50(0)

Wei

p=200 200(0) 84.6(15.80) 35.9(4.20) 200(0) 128.3(9.57) 52.1(1.77) 200(0) 157.1(7.53) 65.5(1.08)

Sun

p=500 500(0) 127.0(39.30) 45.1(9.20) 500(0) 182.8(41.46) 47.3(2.30) 500(0) 126.8(30.40) 53.2(1.85)

Regularized K-means

p=1000 1000(0) 362.6(87.80) 60.3(11.90) 1000(0) 43.6(6.04) 64.8(9.80) 1000(0) 44.9(4.41) 65.3(7.03)

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Simulation 2: K is unknown

The same setup as in simulation 1 with p = 200, µ = 0.8. Grid search for K and λ.

Wei

Sun

Regularized K-means

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Simulation 2: K is unknown

The same setup as in simulation 1 with p = 200, µ = 0.8. Grid search for K and λ.

Methods Standard k-means Sparse k-means Regularized k-means

K=2 0 18 0

Wei

Sun

K=4 20 2 20

Number 200(0) 138.0(4.11) 50.0(0.05)

Regularized K-means

Error 0.001(0.001) 0.228(0.017) 0(0)

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Performance of tuning

Wei

Sun

Regularized K-means

Simulation

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Two gene microarray examples

Lymphoma: n = 62 and number of genes p = 4026. 42 samples of DLBCL, 9 samples of FL, and 11 samples of CLL. Leukemia: n = 72 and number of genes p = 6817. Two types of human acute leukemias: 25 patients with AML and 47 patients with ALL.

Wei

Sun

Regularized K-means

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Real Data

Two gene microarray examples

Lymphoma: n = 62 and number of genes p = 4026. 42 samples of DLBCL, 9 samples of FL, and 11 samples of CLL. Leukemia: n = 72 and number of genes p = 6817. Two types of human acute leukemias: 25 patients with AML and 47 patients with ALL. Clustering errors are estimated by comparing the clustering assignments to the available cancer types.

Wei

Sun

Regularized K-means

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Simulation

Performance

Data Leukemia

Lymphoma

Methods k-means Sparse k-means Regularized k-means k-means Sparse k-means Regularized k-means

Wei

Sun

K 2 4 2 2 3 3

Genes 3571 2577 211 4206 3025 66

Regularized K-means

Error 2/72 2/72 2/72 4/62 2/62 1/62

Real Data

Outline bg=white

Introduction

Regularized K-means

Tuning

Asymptotic

Heatmap of Lymphoma on selected genes

Wei

Sun

Regularized K-means

Simulation

Real Data

Regularized k-means clustering of high-dimensional ...

Jan 25, 2012 - Step 1. Randomly select 3 initial centers C1,C2,C3. ... Mean of each cluster becomes new center. .... C(j)is the jth variable across all the centers.

1MB Sizes 2 Downloads 154 Views

Recommend Documents

25-clustering-and-kmeans-handout.pdf
Connect more apps... Try one of the apps below to open or edit this item. 25-clustering-and-kmeans-handout.pdf. 25-clustering-and-kmeans-handout.pdf. Open.

Efficient Computation of Regularized Boolean ...
enabled the development of simple and robust algorithms for performing the most usual and ... Some practical applications of the nD-EVM are also commented. .... Definition 2.3: We will call Extreme Vertices of an nD-OPP p to the ending ...

Programming Exercise 5: Regularized Linear Regression ... - GitHub
where λ is a regularization parameter which controls the degree of regu- larization (thus ... Finally, the ex5.m script should also plot the best fit line, resulting in an image similar to ... When you are computing the training set error, make sure

GRAPH REGULARIZED LOW-RANK MATRIX ...
also advance learning techniques to cope with the visual ap- ... Illustration of robust PRID. .... ric information by enforcing the similarity between ai and aj.

Alternative Regularized Neural Network Architectures ...
collaboration, and also several colleagues and friends for their support during ...... 365–370. [47] D. Imseng, M. Doss, and H. Bourlard, “Hierarchical multilayer ... identity,” in IEEE 11th International Conference on Computer Vision, 2007., 2

Feature Selection via Regularized Trees
Email: [email protected]. Abstract—We ... ACE selects a set of relevant features using a random forest [2], then eliminates redundant features using the surrogate concept [15]. Also multiple iterations are used to uncover features of secondary

Regularized Locality Preserving Learning of Pre-Image ...
Abstract. In this paper, we address the pre-image problem in ... component space produced by KPCA. ... Reproducing Kernel Hilbert Space (RKHS) associated.

Consistent Variable Selection of the l1−Regularized ...
Proof. The proof for Lemma S.1 adopts the proof for Lemma 1 from Chapter 6.4.2 of Wain- ..... An application of bound (3) from Lemma S.4 with ε = φ. 6(¯c−1).

Hierarchic Clustering of 3D Galaxy Distributions - multiresolutions.com
Sloan Digital Sky Survey data. • RA, Dec, redshift ... Hierarchic Clustering of 3D Galaxy Distributions. 4. ' &. $. %. Hierarchic Clustering. Labeled, ranked dendrogram on 8 terminal nodes. Branches labeled 0 and 1. x1 x2 x3 x4 x5 x6 x7 x8 ... Ostr

CLUSTERING of TEXTURE FEATURES for CONTENT ... - CiteSeerX
storage devices, scanning, networking, image compression, and desktop ... The typical application areas of such systems are medical image databases, photo ...

COMPARISON OF CLUSTERING ... - Research at Google
with 1000 web images, and comparing the exemplars chosen by clustering to the ... surprisingly good, the computational cost of the best cluster- ing approaches ...

Topical Clustering of Search Results
Feb 12, 2012 - H.3.3 [Information Storage and Retrieval]: Information. Search and ... Figure 1: The web interface of Lingo3G, the com- mercial SRC system by ...

Regularized Latent Semantic Indexing
optimization problems which can be optimized in parallel, for ex- ample via .... edge discovery, relevance ranking in search, and document classifi- cation [23, 35] ..... web search engine, containing about 1.6 million documents and 10 thousand.

LOCALITY REGULARIZED SPARSE SUBSPACE ...
Kim, Jung Hee Lee, Sung Tae Kim, Sang Won Seo,. Robert W. Cox, Duk L. Na, Sun I. Kim, and Ziad S. Saad, “Defining functional {SMA} and pre-sma subre-.

Feature Selection via Regularized Trees
selecting a new feature for splitting the data in a tree node when that feature ... one time. Since tree models are popularly used for data mining, the tree ... The conditional mutual information, that is, the mutual information between two features

Spatiotemporal clustering of synchronized bursting ...
Mar 13, 2006 - In vitro neuronal networks display synchronized bursting events (SBEs), with characteristic temporal width of 100–500ms and frequency of once every few ... neuronal network for about a week, which is when the. SBE activity is observe

Spatiotemporal clustering of synchronized bursting ...
School of Physics and Astronomy. Raymond and Beverly Sackler Faculty of Exact Sciences. Tel Aviv University, Tel Aviv 69978, Israel barkan1,[email protected]. January 30, 2005. SUMMARY. In vitro neuronal networks display Synchronized Bursting Events (SB

Spatiotemporal clustering of synchronized bursting ...
We study recordings of spiking neuronal networks from the ... Some network features: ... COMPACT - Comparative Package for Clustering Assessment -.

Performance Comparison of Optimization Algorithms for Clustering ...
Performance Comparison of Optimization Algorithms for Clustering in Wireless Sensor Networks 2.pdf. Performance Comparison of Optimization Algorithms for ...Missing:

Curvelet-regularized seismic deconvolution
where y is the observed data, A is the convolution operator (Toeplitz matrix), .... TR-2007-3: Non-parametric seismic data recovery with curvelet frames, Geophys.

Web page clustering using Query Directed Clustering ...
IJRIT International Journal of Research in Information Technology, Volume 2, ... Ms. Priya S.Yadav1, Ms. Pranali G. Wadighare2,Ms.Sneha L. Pise3 , Ms. ... cluster quality guide, and a new method of improving clusters by ranking the pages by.

Rough clustering of sequential data
a Business Intelligence Lab, Institute for Development and Research in Banking Technology (IDRBT),. 1, Castle Hills .... using rough approximation to cluster web transactions from web access logs has been attempted [11,13]. Moreover, fuzzy ...

data clustering
Clustering is one of the most important techniques in data mining. ..... of data and more complex data, such as multimedia data, semi-structured/unstructured.

Fuzzy Clustering
2.1 Fuzzy C-Means . ... It means we can discriminate clearly whether an object belongs to .... Sonali A., P.R.Deshmukh, Categorization of Unstructured Web Data.