ICA-based Identification of Overlapping Spatial Clusters ...

Viewer
Transcript

ICA-based Identification of Overlapping Spatial Clusters in ECoG data Tim R Mullen VS298: Neural Computation Final Project Fall 2006 Prof. Bruno Olshausen

Abstract An interesting issue in cognitive neuroscience is discovering the spatial distribution of cortical areas engaged in specific task-dependent information processing. We may also like to know whether any these (perhaps distant) cortical areas are functionally linked, constituting a spatially-fixed, distributed processing network. A method that groups individual electroencephalographic / electrocorticographic (EEG/ECoG) electrodes into overlapping spatial clusters based on the underlying structure of their time-series may allow us to address both these issues. Independent Component Analysis (ICA) [1, 3] allows us to decompose a set of observed signals collected from the scalp or cortex into a separate set of statistically independent components (ICs), each of which may be thought of as reflecting the activity of a spatially-fixed, though not necessarily spatially restricted, potential-generating system [2]. It is possible to determine the influence of each IC on the observed signal at each electrode, thus allowing electrodes (and the cortical areas they overly) to be clustered together on the basis of shared underlying brain processes. In this paper, we apply an ICA-based overlapping clustering algorithm to subdural ECoG data collected from a patient performing a language comprehension and production task. We show that the algorithm is able to group electrodes into functionallyplausible clusters corresponding to classical areas such as auditory and motor/premotor cortex, “Broca’s” and “Wernicke’s” areas, and prefrontal areas that may be involved in language processing. Furthermore, the clusters exhibit a moderate degree of overlap, revealing a complex network linking areas of primary and associative auditory cortex, “Broca’s” area, motor, premotor, and mouth somatosensory cortex.

1.

Introduction

1.1

Independent Component Analysis

Independent Component Analysis (ICA) [1] is blind source separation/deconvolution technique for decomposing a set of observed signals (i.e., EEG time-series), x = [x1,…,xn]T, into a set of T latent variables, s = [s1, …, sn] under the criterion that the probability density function (pdf) of s factorizes: f s (s) = Π iN=1 f si ( si ) ; in other words the elements of s are as statistically independent as possible (hence the term independent component analysis). The ICA model is generally expressed as:

x = As

(1.1)

where A is an unknown mixing matrix. The goal is to find a set of weights W, such that

s = Wx = A -1 x

(1.2)

ICA assumes that the independent components (ICs) are nongaussian, and it can be shown that maximizing the nongaussianity of w i T x, W = [w1 ,..., w N ] T corresponds to finding one of the ICs [3]. There several methods for finding W [1, 3], mostly differing in the particular measure of nongaussianity used (e.g, kurtosis, negentropy). Other methods such as minimizing mutual information between components, and entropy maximization can be shown to be equivalent (under certain conditions) to maximizing nongaussianity [3]. The Infomax method of Bell & Sejnowski [1,2], uses an approach that maximizes the entropy of a non-linearly transformed vector y = Fs(s), where Fs(s) is the logistic function Fs(s) = (1+e-S)-1 (technically-speaking infomax ICA assumes that the unknown ICs, s, have the same form of cumulative density function (c.d.f.) and Fs is taken to be this c.d.f.; however, the logistic function has been found to be a good substitute for the true c.d.f.). Infomax finds W using a neural network approach, performing stochastic gradient ascent with the weight update rules:

ˆ T , ∆w ∝ yˆ ∆W ∝ [ WT ]−1 + yx

(1.3)

where w is a bias term added to equation (1.2), yˆ = [ yˆ1 ,..., yˆ N ]T , and

yˆi =

∂ ∂yi ∂yi ∂si

[which if y = Fs (s)] =

∂f s ( si ) ∂Fs ( si )

(1.4)

This method is equivalent to making the mutual information between the outputs si go to zero, at which point the desired factorization of the p.d.f. of s is achieved and the independent components have been found. ICA has been shown in numerous studies [2, 4, 5] to be applicable to EEG (and other) time-series analysis, both in terms of source localization and separation of event-related potentials (ERPs) and other neurophysiologic phenomena as well as for removing artifacts such as eye blinks. One assumption is that observed EEG (or ECoG) is the output of a number of statistically independent, spatially-fixed potentialgenerating systems, and ICA can be used to separate the mixed signal into these underlying generating systems.

1.2

ICA Clustering

Suppose a subject hears a series of nouns and is instructed to respond with an associated verb after each noun. Now assume that there is some set of independent brain processes that contribute to the overall ability to do this task. We might be interested in identifying the spatial distribution of cortical areas engaged in carrying out these brain processes. Furthermore, we may like to know whether any these (perhaps distant) cortical areas are functionally linked, constituting a spatially-fixed, distributed processing network. In previous and ongoing work [6, 7], we have addressed the first issue – identification of functional clusters – using a hierarchical clustering algorithm with cluster similarity computed as a function of both mutual information (or cross correlation) between time-series and the optimal delay offset that maximizes mutual information (CC). We regard channels* with high mutual information (CC) and low optimal (propagation) delays as being engaged in co-processing of similar information (i.e. due to a common underlying processing mechanism). However, because this method yields a disjoint partitioning of the channels, it does not address the second issue, functional linkage between clusters. It seems plausible that any one cortical region may be functionally engaged in several different processes, and thus any method that forces each channel into a single cluster will overlook this shared processing; an overlapping clustering algorithm is needed to address this functional linkage. The ability for ICA to perform overlapping clustering of EEG time-series based on shared latent structure makes it a potentially promising tool for identifying cortical areas engaged in shared processing. The ICACLUS algorithm of Wu and Yu [8] has been applied to artificially (and naturally) generated time series and was shown to far outperform K-Means in several overlapping clustering tasks, grouping together time series that were generated (with several kinds of additive noise) from the same source with 100% accuracy. For this paper I have modified the basic ICACLUS algorithm (modified step 4, added step 5). The modified algorithm is outlined below: ICACLUS Algorithm (modified) 1. Import n time series X1,…,Xn with m observations. 2. Use ICA to compute k ICs and corresponding mixing matrix A. 3. For each Xi, sort its ICs by the loadings (coefficients in A) and select C% of ICs with highest positive loadings and C% of ICs with most negative loadings (0 < C% ≤ 50%) and then determine the C% * k positive dominant ICs and C% * k negative dominant ICs. 4. Build a pairwise similarity matrix (for all Xi, Xj) using a similarity threshold, t ≤ 2C% * k; if Xi and Xj (i ≠ j) have at least t dominant ICs in common, then Xi and Xj are ‘matched’ in the similarity matrix. 5. Agglomeratively combine pairs of smaller clusters (beginning with single-channel ‘clusters’) into progressively larger clusters using the following rule: • For all pairs of clusters (sets of time-series) Ci, Cj in the current hierarchy level, if Ci ─ Cj has all elements ‘matched’ with Cj ─ Ci then combine Ci, Cj : Ci ∪ Cj Ck • Store Ck in the next level of the hierarchy (it is now a candidate for merging in the next iteration). • Any cluster that does not merge with another cluster on this level is stored in the final list of clusters (it will not merge with any clusters higher up the hierarchy) 6. When no further merging is possible, output the clustering results

*

I may interchangeably use channel to refer to the electrode itself, the time-series recorded at an electrode site as well as the cortical region underlying an electrode.

Note that varying the threshold, t, controls the number (and size) of clusters generated; a low t will result in a more liberal clustering, while a high t will force fewer (and likely smaller) clusters to be generated. Furthermore, it is evident that any Xi can have membership in multiple clusters, so the algorithm is capable of producing overlapping clusters. To estimate the degree of cluster overlap we use the following simple measure:

overlap =

number of time-series in more than one cluster number of time-series

(2.1)

Wu and Yu advocate the use of fastICA [9], a fast, parallel, fixed-point ICA algorithm, to perform the IC decomposition. However, in my experience I have found that fastICA, between two identical runs, tends to yield somewhat different mixing matrices, A, with somewhat different rankings of independent components relative to each time-series. Depending on choices of t, C%, and k, ICACLUS may be more or less sensitive to these fluctuations; we found that the clustering results were frequently unstable with somewhat different clustering results between two identical runs of ICACLUS. However, the logistic infomax algorithm of Bell and Sejnowski, implemented by Makeig, Bell, et al. [2,10] as the runica EEGLAB toolbox procedure, only exhibits very slight fluctuations and produced stable clusters. Furthermore, runica utilizes a natural gradient feature which significantly speeds up computation, making it comparable to fastICA in speed. Thus, runica was selected as the preferred ICA algorithm in clustering. To better illustrate the clustering process and its implications, let me provide a concrete example: Suppose that a subject is performing the verb generation task described above. Assume some set of cortical processes occurs in primary auditory cortex in response to the noun stimulus, and, due to fiber projections between the regions (or perhaps common thalamic input to both regions), a related set of cortical processes occurs in associative auditory cortex. Suppose we have two subdural electrodes, A and B, over primary auditory and associative auditory, respectively. Since the processing mechanisms underlying both these regions are related we might expect that there would be some shared similarity in terms of the structure of the observed time-series A′, B′. Now, if we applied ICA and found that the majority of ICs that dominate A′ (have high weights in the row of A corresponding to channel A) also dominate B′, then we can conclude that there is indeed similarity in the underlying structure of A′ and B′. In fact, if we view each independent component (excluding components corresponding to eye blinks and muscle, line noise artifacts) as reflecting a spatially-fixed potential-generating system (i.e. the electrical activity of an specific processing network), as others have suggested [2], then it seems reasonable to attribute this IC similarity to a functional coupling between the cortical areas underlying A and B. Now, suppose we find that there is another similarly clustered group {C D B}, where C and D comprise a region of mouth motor cortex. What can we say about the fact that B has shared membership in {A B} (auditory processing) and {C D} (mouth motor)? The fact that these clusters overlap tells us something significant about the functional relationship between these separate cortical areas: namely that they seem to have some underlying processing mechanisms in common. We might even view B as a sort of ‘link’ between these two cortical processing regions; B seems to be playing a role in both the auditory processing system and the mouth motor system. While some might dismiss this assumption as mere speculation, it is at least clear that B warrants further investigation. If we are interested in studying the interaction between specific cortical areas, this method at least allows us to narrow our focus to cortical regions that may be playing a role in this interaction.

2.

Methods

Multi-channel subdural electrocorticogram (ECoG) data was recorded from patients undergoing neurosurgery for epilepsy. The 64-channel grid (shown in Figure 1) is implanted for 4-7 days, during which experimental recording is performed. This paper focuses on data from a single patient performing a

verb generation task. Subjects are presented auditorially with a noun to which they must verbally generate an action verb (e.g., ‘apple’ ‘eat’). ECoG is collected at a sampling rate of 2003 Hz, and each electrode is common-averaged with respect to electrode # 64. A 3-160 Hz bandpass filter was applied to remove any residual ocular movement artifacts (they are vastly reduced in ECoG), as well as a 180 Hz line noise harmonic. The 60 Hz fundamental line noise artifact was present but had very low power and thus was not removed (extended ICA was unable to cleanly separate these noise artifacts). Channels that showed evidence of epileptiform activity or were corrupted with noise were removed. The reference electrode (#64) was also removed. The remaining time-series were segmented into 98 epochs each with an interval of [0 1300] ms from the stimulus onset which were concatenated to form the input data. A necessary ICA preprocessing step is centering (mean-removing) the signal data, and each epoch was centered separately. Pre-whitening (another standard ICA preprocessing step that removes 1st and 2nd order statistics by linearly transforming the observed variables such that their covariance matrix equals the identity matrix) is a standard feature of both the runica and fastICA procedures. Furthermore, running ICA on a principal component (PC) subspace (dimensionality reduction) of the full data set can reduce noise and prevent overlearning [3]. Dimensionality reduction was always performed; PCs with low eigenvalues (subjectively determined) were discarded. The modified ICACLUS algorithm (using runica) was applied to the preprocessed data with the following parameters: k = 20, (C% * k) = 3, t = 5. Thus 20 ICs were computed (on a 20-dimensional PCA subspace), with the similarity constraint that 5 of the top 3 positive and top 3 negative dominant ICs must match. An adaptive ICA learning rate was employed with an initial learning rate of 0.000156, multiplied by 0.9 whenever the weight change angle was less than 60o. ICA training was set to halt when weight change < 10-6. The starting weight matrix was fixed to the identity matrix.

3.

Results

The resulting clusters showed a complex network of overlapping interaction between several classical functional structures, including the superior temporal plane (including auditory cortex and “Wernicke’s Area”), mouth motor cortex and premotor cortex, “Broca’s Area,” and portions of the middle frontal gyrus and PFC. 20 clusters were generated (a cluster is comprised of 2 or more channels) with an overlap of 18.2%. Figure 2 shows the clustering results overlaid on a scaled rendering of the prominent sulci on this patient’s cortex. Figure 3 shows previous spectral analysis of a phoneme mismatch task for this patient. Note the early high-gamma onset in channels 50, 58, 49, 57, indicating their role in initial auditory processing. A separate analysis (not shown) of a passive tone task identified channels 50, 58 as primary auditory. Furthermore, it was found that, while 50, 58 had high-gamma response to words and nonwords, while 49, 47 showed a response only to words, indicating the role of 49,47as an associative auditory structure. It is interesting to note that these two electrode groups are clustered separately by the ICA analysis. Note also the pronounced alpha desynchronization occurring in channels 39, 47; these two channels likewise form a distinct ICA cluster. The ICA clustering also shows long-range links between associative aud. (49) and lower premotor cortex (54). A group of channels (5, 13, 21) seem to comprise “Broca’s” Area (Brodmann’s Area (BA) 44) and are linked to lower mouth motor (36). The mouth motor and somatosensory regions show a complex overlapping network, perhaps indicating a significant degree of complex, distributed processing occurring within these regions. Figure 4 shows spectrograms for this patient during this verb generation task (background sulci map omitted, but the layout is identical to Figure 3). Note that many of the clustered regions also shown similar spectral activity. The strong late-onset low-frequency activity in PFC areas (16, 24, 32, 7, 15, 23, 31) seems to be reflected in the formation of several PFC clusters, perhaps indicating a shared processing mechanism (this is near PFC language area 46). Finally, Figure 5 shows previous clustering results using a hierarchical clustering algorithm (disjoint partitioning) based on high mutual information and low propagation delays between channels.

Details on this method are described in [6] and [7]. The interesting thing to note is that there is a distinct similarity in overall clustering (clusters of interest have been circled). There is a high degree of similarity in the clustering of Broca’s area, PFC, auditory cortex, premotor regions and some somatosensory areas. The high degree of overlap in ICA clustering of mouth motor and lower somatosensory areas may possibly explain why these are not similarly clustered in the disjoint partitioning method. In a separate analysis, the seizure channels were not removed and this resulted in tight clusters forming around channels that were known from clinical analysis (as well as post-operative results) to be seizure focal points. This may suggest a possible method for identifying epileptic cortical areas based using this ICA decomposition.

4.

Conclusions

ICA-based clustering appears to be a useful tool for discovering the spatial distribution of cortical areas engaged in specific task-dependent information processing. Furthermore, the ability for ICA clustering to group electrodes into overlapping spatial clusters based on the underlying structure of their time-series makes it a potentially promising tool for exploring the interaction of spatially separated cortical areas that are linked within a larger distributed processing network. Channels that are members of several clusters may be providing a sort of mediating role between spatially separated, focal processes and these can be identified for further exploration. An interesting area of further exploration is to perform spectral and ERP analysis on the independent components that dominate each cluster, as well as on those ICs that dominate channels with shared membership. This may yield valuable insight into the nature of individual computational processes as well as the manner by which different processes interact to allow a cognitive task to be performed.

Figure 1. 64-Channel subdural grid overlaid on patient anatomical scan. 2-dimensional grid overlay does not account for 3-D curvature of the cortex, and thus may not accurately represent the actual locations of the electrodes on the cortex. Yellow electrodes denote motor interrupt regions; blue electrode is language interrupt region.

premotor

mouth motor

“Broca’s” Area

Lateral Sulcus Aud. cortex Assoc. aud.

Figure 2. Clustering results using modified ICACLUS algorithm. k = 20, (C% * k) = 3, t = 5. Grayed-out electrodes were removed prior to ICA-clustering due to epileptiform activity or poor signal to noise ratio. All channels within any circled area are part of the same cluster. Similarly, channels mutually connected with a solid line form a cluster (i.e., {44, 43}). Dashed lines denote links (overlapping connections) between clusters. For example, {5 13 21}, {36 45}, and {36 5 13} are all separate clusters. The dashed line connecting the pair {5 13} to channel 36 indicates the link channel 36 provides between the {5 13 21} and {36 45} clusters.

Figure 3. Spectrograms for a passive phoneme mismatch task. Note the high-gamma activity in the 4 bottom-rightmost electrodes indicating their role in auditory processing. In a simple tone task, the upper two of these electrodes (50, 58) were found to be primary auditory (the other two are likely associative auditory, as determined by their discriminating pattern of activity with respect to words and nonwords). Ref: Fogelson et. al, Society for Neuroscience Poster, 2006.

Figure 4. Spectrograms for verb generation task. Ref: Canolty, et. al. (In preparation). Electrocortcographic evidence that high gamma activity differentiates processing of words and pseudowords.

Figure 5. Disjoint clustering results using hierarchical, weighted-linkage function with distance computed as a function of channel mutual information and optimal delay. The distance between two channels, X, Y β·dt(X;Y) is computed as D[X;Y] = е · e −α·I(X;Y) = e (β·dt(X;Y) − α·I(X;Y)) with α=5, β=1 and where dt is the temporal offset (delay) that maximizes mutual information (MI) between X, Y and I is this maximum MI value. Line thickness denotes strength of MI connection, while line color denotes propagation delay offset. Arrows denote direction of propagation. The important thing to note is the clustering coloring. Channels that belong to the same cluster are identically-colored. i.e., (5, 13, 21) form a cluster. Clusters that are well-matched with those in the ICA analysis are circled.

REFERENCES

1) A.J. Bell & T.J. Sejnowski (1995). An information-maximization approach to blind separation and blind deconvolution, Neural Computation 7:1129-1159 2) S. Makeig, A.J. Bell, T. Jung & T.J. Sejnowski (1996). Independent Component Analysis of Electroencephalographic Data, Advances in Neural Information Processing Systems 8:145-151. MIT Press, Cambridge MA. 3) A. Hyvarinen and E. Oja (2000). Independent component analysis: algorithms and applications. Neural Networks; 13: 411-30. 4) A. Delorme and S. Makeig (2004). EEGLAB: an open source toolbox for analysis of singletrial EEG dynamics including independent component analysis. Journal of Neuroscience Methods. 134 (1), pp. 921. 5) R. Vigario (1997). Extraction of ocular artifacts from EEG using independent component analysis. Electroenceph. Clin. Neurophysiol., 103(3):395-404. 6) T. Mullen (2005). Dynamic Identification of Focal Cluster ROIs Using Time-lagged Mutual Information. CS182 Final Project (Prof. Srini Narayanan). 7) G. Fuhrmann Alpert, T. Mullen, S. Suppiah, M. Soltani, E. Edwards, R. Canolty, S. Dalal, H. Kirsch, N. Barbaro and R. T. Knight (2006). Functional Connectivity and Information Flow in the Human Brain during Language Processing: evidence from ECoG data. Poster, Cognitive Neuroscience Society Conference, 2007 (abstract submitted). 8) E.H.C Wu & P.L.H. Yu (2005). Independent Component Analysis for Clustering Multivariate Time Series Data. In ADMA 2005, LNAI 3584, pp. 474-482, 2005. X. Li, S. Wang, Z.Y. Dong (Eds.). Springer-Verlag Berlin Heidelberg. 9) A. Hyvarinen and E. Oja (1997). A fast fixed-point algorithm for independent component analysis, Neural Computation 9; 1483-1492 10) http://www.sccn.ucsd.edu/eeglab/

Functional Magnetic Resonance Imaging Investigation of Overlapping ...