LOCALITY REGULARIZED SPARSE SUBSPACE CLUSTERING WITH APPLICATION TO CORTEX PARCELLATION ON RESTING FMRI Xiuchao Sui1 , Shaohua Li2 , Jagath C Rajapakse1 1

2

School of Computer Engineering, Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY), Nanyang Technological University, Singapore ABSTRACT

Parcellating the brain into functional homogeneous regions could be achieved by performing clustering on fMRI data. Mathematically, parcellation is about how to construct the affinity matrix between voxels and identify subspaces within. Most previous methods derive the affinity matrix by using Pearson correlation, which is sensitive to noises inherent in fMRI. One recent work uses Sparse Representation to reconstruct the signals of each voxel by a sparse linear combination of other voxels, and the reconstruction coefficients are used as the affinity matrix for parcellation. We extend a popular Sparse Representation method — Sparse Subspace Clustering (SSC), by incorporating a locality regularization that encourages a voxel to be more represented by nearby voxels, and establish the Local-SSC method. Local-SSC performs comparably to SSC on simulated rs-fMRI data, and outperforms SSC on real fMRI data. Our approach demonstrates the benefit of the locality regularization to parcellation, and suggests improvement for other applications involving functional connectome analysis. Index Terms— Resting state fMRI, Parcellation, Locality regularization, Sparse Subspace clustering 1. INTRODUCTION Parcellating brain structures into functionally homogeneous subregions based on resting-state fMRI (rs-fMRI) data could be achieved by clustering image voxels on the affinity matrix (“similarity matrix” in some literature) of voxels [1]. This problem is challenging due to two inherent properties of fMRI data: 1) various sources of noises exist within fMRI data, which can easily distort the correlations between voxels; and 2) fMRI data suffer from high-dimensionality and low-sample size, which undermine the reliability of statistical methods [2]. The affinity matrix is the basis of parcellation, and its soundness is crucial for the parcellation quality. Traditional parcellation methods typically compute the similarity between each pair of voxels, based on the Pearson correlation

of their BOLD signals [1, 3, 4]. However, Pearson correlation is sensitive to noises and outliers [5] that are common in fMRI data. For example, head motion could increase spurious correlations between weakly correlated brain regions and decrease the correlations between strongly correlated regions [6]. Naturally, such noises would impair the parcellation based on Pearson correlation. Sparse Representation is a recently developed approach to handle noisy and high-dimension low-sample size problems [7, 8]. It has been widely adopted in signal and image processing, and recently was successfully applied to brain parcellation, with advantages over traditional methods [9]. Given a set of observations, Sparse Representation seeks to reconstruct each observation by a linear combination of other observations. The sparsity regularization on the reconstruction coefficients removes insignificant correlations that are often caused by noises. Hence the reconstruction coefficient matrix is a good candidate for the affinity matrix in parcellation. Neuroimaging data often exhibit locality of correlations, i.e., signals of neighboring voxels are more correlated and signals of voxels far apart are less correlated. Such locality of correlations is unexploited in Sparse Representation. By incorporating such prior knowledge, spurious correlations arising from noises could be further reduced. And the reconstruction coefficient matrix may better agree with the functional connectivity patterns. Consequently, better parcellation could be achieved. The locality of correlations can be incorporated into Sparse Representation with a locality regularization, encouraging the signals of a voxel to be more reconstructed by its neighboring voxels, and less reconstructed by distant voxels. We implement the locality regularization on a popular Sparse Representation method: Sparse Subspace Clustering (SSC). Our extension of SSC is named Locality Regularized Sparse Subspace Clustering (Local-SSC). Our locality regularization is essentially different from previous methods using spatial constraints [3, 4, 10]. On one hand, spatial constraints in previous works are incorporated into Pearson correlation, which still suffer from the disadvantages of Pearson correlation. On the other hand, in [4, 10], the

spatial constraint eliminates all the correlations between two voxels whose distance between is beyond a specified threshold, enforcing a hard sparsity pattern. In contrast, the locality regularization imposes a soft constraint to the sparsity pattern, leaving flexibility for the algorithm to pursue better reconstructions. 2. METHOD The parcellation scheme using subspace clustering could be summarized with the following steps: 1) for each voxel, BOLD signals are represented as a sparse linear combination of other voxels within the target brain region, and the coefficient matrix is normalized into the affnity matrix; 2) perform clustering on the affinity matrix to get functional homogeneous regions. In this section, we propose the Local-SSC method, and compare the consistency of parcellation using SSC and Local-SSC on simulated fMRI data as well as real fMRI data.

addition, the noise level we employed in simulated data ensured comparable mean temporal signal-to-noise ratio (SNR) with real fMRI data after 6mm Gaussian smooth. 2.2. Locality regularized sparse subspace clustering The locality of voxel correlations was incorporated into Sparse Subspace Clustering (SSC), by adding different penalties to the reconstruction coefficients in the objective function: nearby voxel pairs will receive low penalties and distant voxel pairs will receive higher penalties. We propose a weight function to add such weights and then solve the optimization problem using the Alternating Direction Method of Multipliers method similar to [8]. 2.2.1. Adding locality regularization to SSC The original optimization objective of SSC [8] is: min

2.1. fMRI data and preprocessing 2.1.1. Target brain region for parcellation Medial frontal cortex (MFC), which includes the supplementary (SMA) and pre-supplementary motor area (pre-SMA), was commonly used as target region to evaluate the performance of different parcellation methods. We manually drew MFC from the MNI152 template, which extends from y = −22 to y = 30, with a short distance above the cingulate sulcus [11]. 2.1.2. Real rs-fMRI data The Oxford Resting state fMRI dataset (20 subjects, age 2035, 175 time points, TR=2s) were obtained from http://fcon 1000.projects.nitrc.org. The pre-processing was performed using FSL (http://fsl.fmrib.ox.ac.uk/fsl) as described in [10]. 2.1.3. Simulated rs-fMRI data For simulated data, we use y = 0 as the boundary that separate SMA from pre-SMA [12]. We generated the simulated data set by filling MFC with synthetic BOLD signals, which were created based on real rs-fMRI data. The preprocessing pipeline is as follows: 1) define two regions of interest (ROIs) of 27 voxel cubicles in SMA and pre-SMA, which were centered at (8,22,50) and (9,-6,64) respectively; 2) extract the mean time courses from each ROI which were used as the source signals for the synthetic data in the corresponding subunit; and 3) add Gaussian noise (SD=5) throughout MFC region. Thus, the target brain region was filled with synthetic signals composed of source signals in separate fuctional parcels and different noise signals in each voxel. In

s.t.

kCk1 + λ kEkF Y = Y C + E, diag(C) = 0,

(1)

where Y is the input data, C is the reconstruction coefficient matrix, E is the random error matrix, k·kF is the Frobenius P 2 norm: kAkF = a i,j ij , k·k1 is the `-1 norm, diag(C) is the diagonal elements of C. The `-1 regularizer kCk1 drives C to be a sparse matrix, hence the name Sparse Subspace Clustering. When using SSC in fMRI data to get the affinity matrix, the i-th column of the coefficient matrix C is the coefficients of other voxels when reconstructing the BOLD signals of the i-th voxel. The constraint diag(C) = 0 avoids the trivial solution that a voxel is reconstructed by itself. These reconstruction coefficients natually measure the similarity between the i-th voxel and the other voxels. Considering the property of fMRI signals — a voxel could be highly correlated with spatially neightboring voxel and less correlated with distant voxels, we propose to add the locality regularization when reconstruting the BOLD signals of each voxel. The locality regularization assigns the reconstruction coefficients with varying penalties, which increases with the distance between a pair of voxels. We replace P kCk1 in (1) with a weighted `-1 regularizer kC ◦ W k1 = i,j wij |cij |, where “◦” is the element-wise product. W is initialized according to the spatial proximity between voxels in input data Y . Specifically, Let wij denote 2 the i, j-th elements in W , then wij = f (kv i − v j k ), where v i is the coordinate vector of the i-th voxel in a Euclidean space, and f (·) is a monotonically increasing weight function. Then the objective function of Local-SSC becomes: min s.t.

kC ◦ W k1 + λ kEkF Y = Y C + E, diag(C) = 0.

(2)

solution of (2). The optimization algorithm uses the same auxiliary matrix A, and the Lagrange multiplier matrix ∆ ∈ RN ×N for the equality constraint diag(C) = 0. The augmented Lagrangian for (2) is L(C, A, δ, ∆) = kC ◦ W k1 + +

Fig. 1. Locality weight function used in Local-SSC. 2.2.2. Locality Weight Function The weight function f (·) is essential for imposing the locality of correlations. We choose the following weight function: d

f (d) = α(1 + δ − e− β ),

lim f (d) = α(1 + δ),

d→+∞

lim f (d) = αδ.

d→0

ρ 2 kA − C + diag(C)k + tr(∆> (A − C + diag(C))), (5) 2 where ρ is the penalty parameter of the Lagrangian, diag(C) here denotes a diagonal matrix with the same diagonal of C. As only the `-1 regularizer kC ◦ W k1 is changed in this objective function, the update equations for other variables are identical to those in [8] and so are omitted here. The only difference is on the update equation of the reconstruction coefficients C, while keeping (A(k) , δ (k) , ∆(k) ) fixed. In this step, first we observe that the optimal C always has all the diagonal elements to be 0, otherwise let C 0 = C − diag(C). C 0 achieves a smaller value of the objective function, and thus C is not optimal. Given this property, C − diag(C) = C. The terms only involving fixed variables are constant in this step. Using b to denote their sum, (5) is rearranged as

(3)

where α, β, δ > 0 are parameters of this weight function. δ  1. It is easy to see that (4)

Hence α determines the maximum value of f , δ determines its minimum value, and β is a discount factor, determining the speed in which f changes with d. A typical choice of α, β, δ could be α = 2, δ = 0.1, β = d 10. Under this setting, f (d) = 2(1.1 − e− 10 ). A schematic illustration of this weight function is given in Fig. 2. We can see from the value surface in Fig. 2 that when d is small, f (d) increases quickly with d, whereas when d is big (≥ 10), the increase of f (d) slows down, and eventually f (d) becomes satuated, approaching the limit value 2.2. Thereby, f (d) gives decreasing preferences to a neighborhood of a voxel, as the voxels range from the center to the edge. For voxels out of the neighborhood, f (d) no longer differentiates them, and just gives them almost the same low preference. d The reason we used e− β instead of the more commonly

λ 2 kY − Y AkF 2

˜ L(C) = b+

X i,j

 ρ wij |cij | + (aij − cij )2 + ∆ij (aij − cij ) . 2

˜ It is easy to observe that in L(C), different cij are independent of each other, and can be optimized individually. For each cij , the minimum is obtained at the following value: ( 0 i=j cij = , (6) ∆ij ∆ij wij sgn(aij + ρ ) · (|aij + ρ |− ρ )+ i 6= j where (x)+ is a truncation operator returning x if x ≥ 0, and 0 otherwise. 2.2.4. Explanation of how locality regularization works w

e as d increases, thus the regularization is milder for remote voxels, effectively reserving a larger neighborhood for the sparse reconstruction.

In (6), cij will be truncated at most by ρij , which is proportional to wij . This has a simple interpretation: when the i-th and j-th voxels are close in space, wij will be small, and less truncation will be done on the reconstruction coefficient. Conversely, distant voxels receive more truncation on their reconstruction coefficient. In this fashion, Local-SSC tends to prefer nearby voxels to remote voxels for reconstructing the signals of a voxel. SSC is equivalent to Local-SSC if ∀wij = 1, i.e., it treats all voxels equally for reconstructing the signals of a voxel.

2.2.3. Optimization Algorithm

2.3. Implementation and parameter settings

We adopted the Alternating Direction Method of Multipliers (ADMM) method as described in [8] to obtain a locally sparse

In our study, SSC was obtained from http://www.eecs.berkeley. edu/∼ehsan.elhamifar/code.htm, and Local-SSC was imple-

d2

d

used Gaussian filter e− β is, e− β decays much slower than 2 − dβ

mented based on it. We set alpha=20 Local-SSC.

1

for both SSC and

2.4. Performance evaluation and group consistency We used the normalized mutual information (NMI) [13] to evaluate the similarity between different parcellations. For the simulation data, the parcellation of each subject was compared with the ground truth — the vertical boundary y = 0 in MNI space. For real fMRI data without accessible ground truth, we evaluated the consistency of parcellations within the dataset. We randomly separated the subjects into two sub-groups, obtained parcellation separately, and evaluated the consistency between the two maximum probability maps (MPM) using NMI. By repeating the procedure for 100 times, we derived the mean NMI value; a higher value indicates better reproducibility.

predefined border, but fails to incorporate locality. This may explain why Local-SSC performs unstably on the simulated data. In contrast, real fMRI data is better conformed to the locality hypothesis and thus highlights the main advantage of Local-SSC. For future work, it would be desirable to generate simulated fMRI data that model locality effect and re-evaluate Local-SSC.

3. RESULTS 3.1. MFC Parcellation based on Local-SSC For simulated data with a boundary of vertical line (y = 0), Local-SSC achieves quite accurate parcellation on MFC as shown in Fig. 3A. For a representative real fMRI data, the parcellation result was shown in Fig. 3B, which complies with earlier studies.

Fig. 2. Maximum probablity maps of MFC based on LocalSSC parcellation on simulated and real fMRI dataset.

3.2. Comparison with SSC based parcellation We compared the performance of Local-SSC with SSC on simulated data and real fMRI data. As shown in Fig. 4, the accuracy of parcellation on simulated data was comparable (no statisitically signifant difference) between the two subspace clustering methods. On real-fMRI data, Local-SSC achieves higher consistency of parcellations than SSC (p < 0.001, statistically significant). We should notice that our proposed Local-SSC relies on the locality hypothesis that the signal in each voxel could be better represented by neighboring voxels instead of voxels far way. The simulated fMRI data we used was generated according to the scheme in Section 2.1.3, which only reflects 1 This alpha is not the α used in our weight function; in the code it is transformed to λ in Eq.(1).

Fig. 3. Performance comparison of Local-SSC and SSC on simulated and real rs-fMRI data.

4. DISCUSSION In this work, we propose Locality Regularized Sparse Subspace Clustering (Local-SSC) method to detect functional homogeneous parcels. The novelty of our method lies in a locality regularization, which encourages the signals of a voxel to be more reconstructed by its neighboring voxels and less reconstructed by distant voxels. This enhancement lends LocalSSC to better performance than SSC in MFC parcellation. The proposed locality regularization is not limited to parcellation. It could also be used to construct robust functional brain network, with the potential to improve the accuracy of connectome analysis. For future work, one could investigate the effect of spatial smoothing in fMRI preprocessing on parcellation results, and compare Local-SSC with other parcellation methods, such as those based on eta2 [14] and the KNN graph [4]. We would extend our evaluation to more complicated brain regions, as well as multi-datasets to further verify the impact of the locality regularization. In addition, we would attempt to incorporate the locality regularization into other applications involving connectome analysis. 5. ACKNOWLEDGEMENT This research is partly supported by the AcRF Tier-1 grant RG19/15 by the Ministry of Education, Singapore, and the National Research Foundation Singapore under its Interactive Digital Media (IDM) Strategic Research Program.

6. REFERENCES [1] Jae-Hun Kim, Jong-Min Lee, Hang Joon Jo, Sook Hui Kim, Jung Hee Lee, Sung Tae Kim, Sang Won Seo, Robert W. Cox, Duk L. Na, Sun I. Kim, and Ziad S. Saad, “Defining functional {SMA} and pre-sma subregions in human {MFC} using resting state fmri: Functional connectivity-based parcellation method,” NeuroImage, vol. 49, no. 3, pp. 2375 – 2386, 2010. [2] Katherine S Button, John PA Ioannidis, Claire Mokrysz, Brian A Nosek, Jonathan Flint, Emma SJ Robinson, and Marcus R Munaf`o, “Power failure: why small sample size undermines the reliability of neuroscience,” Nature Reviews Neuroscience, vol. 14, no. 5, pp. 365–376, 2013. [3] R Cameron Craddock, G Andrew James, Paul E Holtzheimer, Xiaoping P Hu, and Helen S Mayberg, “A whole brain fmri atlas generated via spatially constrained spectral clustering,” Human brain mapping, vol. 33, no. 8, pp. 1914–1928, 2012. [4] Xilin Shen, Xenophon Papademetris, and R Todd Constable, “Graph-theory based parcellation of functional subunits in the brain from resting-state fmri data,” Neuroimage, vol. 50, no. 3, pp. 1027–1035, 2010. [5] Peter Langfelder and Steve Horvath, “Fast r functions for robust correlations and hierarchical clustering,” Journal of statistical software, vol. 46, no. 11, 2012. [6] Koene R.A. Van Dijk, Mert R. Sabuncu, and Randy L. Buckner, “The influence of head motion on intrinsic functional connectivity {MRI},” NeuroImage, vol. 59, no. 1, pp. 431 – 438, 2012, Neuroergonomics: The human brain in action and at work. [7] Michael Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing, Springer Publishing Company, Incorporated, 1st edition, 2010. [8] Ehsan Elhamifar and Rene Vidal, “Sparse subspace clustering: Algorithm, theory, and applications,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, no. 11, pp. 2765–2781, 2013. [9] Yu Zhang, Svenja Caspers, Lingzhong Fan, Yong Fan, Ming Song, Cirong Liu, Yin Mo, Christian Roski, Simon Eickhoff, Katrin Amunts, and Tianzi Jiang, “Robust brain parcellation using sparse representation on resting-state fmri,” Brain Structure and Function, vol. 220, no. 6, pp. 3565–3579, 2015. [10] Hewei Cheng, Hong Wu, and Yong Fan, “Optimizing affinity measures for parcellating brain structures based

on resting state fmri data: A validation on medial superior frontal cortex,” Journal of Neuroscience Methods, vol. 237, pp. 90 – 102, 2014. [11] Simon B. Eickhoff, Danilo Bzdok, Angela R. Laird, Christian Roski, Svenja Caspers, Karl Zilles, and Peter T. Fox, “Co-activation patterns distinguish cortical modules, their connectivity and functional differentiation,” NeuroImage, vol. 57, no. 3, pp. 938 – 949, 2011, Special Issue: Educational Neuroscience. [12] Karl Zilles, Gottfried Schlaug, Stefan Geyer, Giuseppe Luppino, Massimo Matelli, M Q¨u, Axel Schleicher, and Thorsten Schormann, “Anatomy and transmitter receptors of the supplementary motor areas in the human and nonhuman primate brain,” in Supplementary sensorimotor area, pp. 29–43. Lippincott-Raven, 1996. [13] Andrea Lancichinetti and Santo Fortunato, “Community detection algorithms: a comparative analysis,” Physical review E, vol. 80, no. 5, pp. 056117, 2009. [14] Alexander L Cohen, Damien A Fair, Nico UF Dosenbach, Francis M Miezin, Donna Dierker, David C Van Essen, Bradley L Schlaggar, and Steven E Petersen, “Defining functional areas in individual human brains using resting functional connectivity mri,” NeuroImage, vol. 41, no. 1, pp. 45–57, Mar. 2008.

LOCALITY REGULARIZED SPARSE SUBSPACE ...

Kim, Jung Hee Lee, Sung Tae Kim, Sang Won Seo,. Robert W. Cox, Duk L. Na, Sun I. Kim, and Ziad S. Saad, “Defining functional {SMA} and pre-sma subre-.

324KB Sizes 2 Downloads 197 Views

Recommend Documents

Regularized Locality Preserving Learning of Pre-Image ...
Abstract. In this paper, we address the pre-image problem in ... component space produced by KPCA. ... Reproducing Kernel Hilbert Space (RKHS) associated.

Sparse Linear Models and l1−Regularized 2SLS with High ...
High-Dimensional Endogenous Regressors and Instruments. Ying Zhu ... (2) for all j. Our primary interest concerns the regime where p ≥ (n ∨ 2), β∗ and π∗ ..... quantity erra accounts for the remaining error from π∗ j,Sc τj ..... (2013).

EXPLOITING LOCALITY
Jan 18, 2001 - memory. As our second solution, we exploit a simple, yet powerful principle ... vide the Web servers, network bandwidth, and content.

Efficient Subspace Segmentation via Quadratic ...
tition data drawn from multiple subspaces into multiple clus- ters. ... clustering (SCC) and low-rank representation (LRR), SSQP ...... Visual Motion, 179 –186.

Programming Exercise 5: Regularized Linear Regression ... - GitHub
where λ is a regularization parameter which controls the degree of regu- larization (thus ... Finally, the ex5.m script should also plot the best fit line, resulting in an image similar to ... When you are computing the training set error, make sure

GRAPH REGULARIZED LOW-RANK MATRIX ...
also advance learning techniques to cope with the visual ap- ... Illustration of robust PRID. .... ric information by enforcing the similarity between ai and aj.

Alternative Regularized Neural Network Architectures ...
collaboration, and also several colleagues and friends for their support during ...... 365–370. [47] D. Imseng, M. Doss, and H. Bourlard, “Hierarchical multilayer ... identity,” in IEEE 11th International Conference on Computer Vision, 2007., 2

Feature Selection via Regularized Trees
Email: [email protected]. Abstract—We ... ACE selects a set of relevant features using a random forest [2], then eliminates redundant features using the surrogate concept [15]. Also multiple iterations are used to uncover features of secondary

Locality-Based Aggregate Computation in ... - Semantic Scholar
The height of each tree is small, so that the aggregates of the tree nodes can ...... “Smart gossip: An adaptive gossip-based broadcasting service for sensor.

Regularized Latent Semantic Indexing
optimization problems which can be optimized in parallel, for ex- ample via .... edge discovery, relevance ranking in search, and document classifi- cation [23, 35] ..... web search engine, containing about 1.6 million documents and 10 thousand.

Feature Selection via Regularized Trees
selecting a new feature for splitting the data in a tree node when that feature ... one time. Since tree models are popularly used for data mining, the tree ... The conditional mutual information, that is, the mutual information between two features

Language Constructs for Data Locality - Chapel
Apr 28, 2014 - lower levels for greater degrees of control ..... codenames in advertising, promotion or marketing and any use of Cray Inc. internal codenames is ...

Language Constructs for Data Locality - Semantic Scholar
Apr 28, 2014 - Licensed as BSD software. ○ Portable design and .... specify parallel traversal of a domain's indices/array's elements. ○ typically written to ...

Efficient Computation of Regularized Boolean ...
enabled the development of simple and robust algorithms for performing the most usual and ... Some practical applications of the nD-EVM are also commented. .... Definition 2.3: We will call Extreme Vertices of an nD-OPP p to the ending ...

Curvelet-regularized seismic deconvolution
where y is the observed data, A is the convolution operator (Toeplitz matrix), .... TR-2007-3: Non-parametric seismic data recovery with curvelet frames, Geophys.

Sparse Sieve MLE
... Business School, NSW, Australia; email: [email protected] .... space of distribution functions on [0,1]2 and its order k controls the smoothness of Bk,Pc , with a smaller ks associated with a smoother function along dimension s.

Multi-Subspace Representation and Discovery
robust data presentation which preserves the affine subspace structures ...... In: 2010 IEEE International Conference on Data Mining, IEEE (2010). 344–353. 5.

Robust Subspace Based Fault Detection
4. EFFICIENT COMPUTATION OF Σ2. The covariance Σ2 of the robust residual ζ2 defined in (11) depends on the covariance of vec U1 and hence on the first n singular vectors of H, which can be linked to the covariance of the subspace matrix H by a sen

Groupwise Constrained Reconstruction for Subspace Clustering
50. 100. 150. 200. 250. Number of Subspaces (Persons). l.h.s.. r.h.s. difference .... an illustration). ..... taining 2 subspaces, each of which contains 50 samples.

Groupwise Constrained Reconstruction for Subspace Clustering - ICML
k=1 dim(Sk). (1). Unfortunately, this assumption will be violated if there exist bases shared among the subspaces. For example, given three orthogonal bases, b1 ...

Groupwise Constrained Reconstruction for Subspace Clustering
The objective of the reconstruction based subspace clustering is to .... Kanade (1998); Kanatani (2001) approximate the data matrix with the ... Analysis (GPCA) (Vidal et al., 2005) fits the samples .... wji and wij could be either small or big.

Discovering Correlated Subspace Clusters in 3D ... - Semantic Scholar
clusters in continuous-valued 3D data such as stock-financial ratio-year data, or ... also perform a case study on real-world stock datasets, which shows that our clusters .... provide a case study on real stock market datasets, which shows that ...

The subspace Gaussian mixture model – a structured ...
Oct 4, 2010 - advantage where the amount of in-domain data available to train .... Our distribution in each state is now a mixture of mixtures, with Mj times I.