Decoding Cognitive Processes from Functional MRI Oluwasanmi Koyejo1 and Russell A. Poldrack2 1

2

Imaging Research Center, University of Texas at Austin [email protected] Depts. of Psychology and Neuroscience, University of Texas at Austin [email protected]

Abstract. The goal of cognitive neuroscience is to understand the the brain processes that underlie cognitive function. These brain processes are studied by examining neural responses to experimental tasks and stimuli. While most experiments are designed to isolate a single cognitive process, the resulting brain images often encode multiple processes simultaneously. Thus standard classification methods are inappropriate for decoding cognitive processes. We propose a multilabel classification approach for decoding, and present empirical evidence that multilabel classification can accurately predict the set of cognitive processes associated with an experimental contrast image.

1

Introduction

An important hypothesis in modern cognitive neuroscience is that brain function is decomposable into a set of elementary cognitive processes - representing the basis set of brain functions recruited for cognitive tasks [13]. For example, recognizing a face may require the cognitive process of vision, working memory and retrieval, while the music comprehension may require, in addition to the shared cognitive processes of working memory and retrieval, additional cognitive processes of rhythm and intonation. Cognitive neuroscientists and other researchers measure these processes in the laboratory setting by developing experiments that allow (e.g., via cognitive subtraction) the isolation of a specific cognitive process from other recruited processes. Unfortunately, despite careful selection of the stimuli and control tasks, the measured brain function often captures multiple cognitive processes simultaneously [8]. Functional magnetic resonance imaging (fMRI) has enabled the non-invasive measurement of brain function in response to experimental stimuli at fine spatial scales. From initial studies that used classifiers to discriminate between different classes of visual objects [4] to more recent studies showing large scale classification across experiments [11], decoding from brain images has become an important research tool [7]. Decoding performance can be used to test hypothesis about the cognitive content of the brain images. Further, the classifier parameters can be used to localize predictive voxels [3], or select regions of interest for additional processing. In addition to the general scientific utility of decoding, the specific application to cognitive processes may help address additional scientific questions, such as which cognitive processes outlined in the literature represent true differences in brain function, and which merely reflect theoretical distinctions [10]. Despite these potential insights, direct decoding of cognitive processes from brain function has not been attempted before.

0.40

0.35

0.30

0.25

0.20

0.15

0.10

Vision Action Execution Decision Making Orthography Shape Vision Audition Phonology Conflict Semantics Reinforcement Learning Working Memory Feedback Response Inhibition Reward Stimulus-driven Attention Speech Emotion Regulation Mentalizing Punishment Error Processing Memory Encoding Spatial Attention

0.05

A B C D E F G H I J K L M N O P Q R S T U V

A B C D E F G H I J K L M N O P Q R S T U V Cognitive Process Label

Code Process Label

Fraction of Examples with Process Label 0.00

Table 1. Cognitive Process Labels Sorted by Prevalence in Data.

Fig. 1. Fraction of Data with each Cognitive Process Label.

We study the decoding of cognitive processes from brain function measured via functional magnetic resonance imaging (fMRI) contrasts using a multilabel classification approach. Multilabel classifiers are designed to solve classification problems where each example may be associated with multiple processes, and are popular in several domains such as image processing and text processing [14]. We focus on the subclass of multilabel classification methods known as label decomposition methods [14], where the multilabel classification problem is decomposed into multiple binary classification problems. Our work is enabled by the recent availability of a large public fMRI database (OpenFMRI3 ) [9] and a large cognitive ontology labeled by domain experts (Cognitive Atlas4 ) [12]. Our results provide empirical evidence that the set of cognitive processes associated with an experimental contrast can be accurately decoded. Notation: We denote vectors by bold-face lower case letters x and matrices by boldface capital letters X. The set of real valued D dimensional vectors are denoted by RD . Label sets are denoted by script capital letters S with cardinality |S|.

2

Methods

Let xn ∈ RD denote the nth brain volume with voxels collected into a real valued D dimensional vector. The total number of brain volumes is represented by N . Each brain volume is associated with a set of process labels Sn = {s1 , . . . sK } chosen from the full S set of possible process labels L = n=1,...,N Sn with |L| = L. Multilabel classification 3 4

openfmri.org www.cognitiveatlas.org

involves estimating a predictive mapping f : xn 7→ Sn . There are several approaches in the literature for multilabel classification including label decomposition, label ranking, and label projection methods [14]. We focus on label decomposition methods due to their simplicity, scalability and ease of interpretation. Label decomposition methods separate the multilabel classification task into a set of binary classification tasks. A popular approach in this family is the One-Vs-All decomposition, where the multilabel classification is decomposed into binary classification tasks. Each binary classification model is trained to predict the presence or absence of each each label independently. We experimented with the multilabel decomposition approach using the following base classifiers: (i) l2 regularized support vector machine (SVM) [2] , (ii) l2 regularized logistic regression (Logistic) [1] , and l2 regularized squared loss classifier (Ridge) [1]. Each sub-classifier was implemented using a linear model of the form fl (xn ) = wl> xn where wl ∈ RD ∀ l = 1 . . . L is a real valued weight vector. In addition, we experimented with a baseline multilabel classifier (Popularity) designed to approximate the dataset label statistics. To this end, the set of predicted process labels were determined based on prevalence in the training set. Specifically, the indicator denoting the presence of each label was drawn independently from a Bernoulli distribution with probability given by the fraction of examples in the training data containing that label.

3

Empirical Results

We compiled brain image data from the publicly available openfMRI database [9]. OpenfMRI contains pre-extracted z-statistic contrasts for each subject computed using a generalized linear model. This data extraction was implemented using the FMRIB Software Library (FSL). Combining the whole brain data with the standard brain mask resulted in D = 174, 264 extracted voxels. We extracted N = 479 contrast images associated with 26 contrasts in the database. Further details on data preprocessing may be found in [9]. In addition to the brain volumes, we extracted a list of cognitive process labels associated with each experimental contrast. The list was curated starting from processes in the Cognitive Atlas [12] and refined by domain experts. The final set of L = 22 cognitive process labels are provided in Table 1. It is clear from Fig. 1 that some process labels are significantly more prevalent in the data than other process labels. For example vision is more than 20 times more prevalent than spatial attention. The data samples included an average of 3.5 process labels per example with a maximum of 9 process labels per example and a minimum of 1 process label per example. We evaluated the models using (label) Accuracy, Precision, Recall, Hamming loss and F1Score, metrics commonly applied for evaluating multilabel classification [14]. Let Sn represent the true process labels and Zn represent the predicted process labels associated with the nth example. The metrics are computed as: Precision =

N N N 1 X |Sn ∩ Zn | 1 X |Sn ∩ Zn | 1 X |Sn ∩ Zn | , Recall = , Accuracy = , N n=1 |Zn | N n=1 |Sn | N n=1 |Sn ∪ Zn |

Hamming Loss =

N N 1 X 2.0 ∗ Precisionn × Recalln 1 X 1 |Sn Zn |, F1Score = , N n=1 L N n=1 Precisionn + Recalln

Table 2. Mean (var) of Aggregated Performance Metrics. *- represents models where all metrics are statistically significant (p < 10−3 ) wrt. the permutation based null distribution for the model. Accuracy Precision SVM* Logistic* Ridge* Popularity

0.43 (0.03) 0.44 (0.03) 0.34 (0.02) 0.12 (0.01)

0.53 (0.03) 0.53 (0.02) 0.47 (0.02) 0.21 (0.02)

Recall 0.68 (0.03) 0.68 (0.03) 0.37 (0.02) 0.18 (0.03)

F1Score 1 - Hamming Loss 0.51 (0.03) 0.52 (0.03) 0.39 (0.02) 0.18 (0.02)

0.79 (0.01) 0.79 (0.01) 0.91 (0.00) 0.76 (0.01)

where A B represents the symmetric difference of set A and B. Label Accuracy measures the average fraction of process labels that are predicted correctly with respect to the cardinality of the union of true and predicted process labels. Precision measures the fraction of predicted process labels that are relevant, and Recall measures the fraction of relevant process labels that are predicted. The F1Score combines Precision and Recall into a single score. Higher scores indicate superior performance for Accuracy, Precision, Recall and F1Score, and the best possible score is 1. The Hamming Loss directly penalizes both false positives and false negatives equally. Lower cores indicate superior performance for Hamming Loss, and the best possible score is 0. To simplify comparison with other scores, we present results as 1 - Hamming Loss. Further details on the metrics are available in [14]. All models were trained using 5-fold double loop cross validation. The inner loop was used for parameter selection, and the outer loop was used to estimate the generalization performance. The l2 regularization parameter for all models was selected from the set {102 , 101.5 , 101 , . . . , 10−2.5 , 10−3 }. We used the Hamming Loss metric for parameter selection. We evaluated the use of the other metrics for parameter selection and found the results to be qualitatively equivalent. In addition to performance comparisons, we were interested in evaluating the statistical significance of the results. Hence, we computed an empirical null distribution by randomly permuting the process labels 1000 times and retraining the model. Note that the empirical null distribution was estimated separately for each trained model, so the presented statistical significance is model dependent. We computed statistical significance using a threshold of p = 10−3 , suggesting high confidence in rejecting the hypothesis that the performance scores were the result of chance. We found that the performance of SVM and Logistic were almost identical in aggregate (Table 2). Ridge was comparable to SVM and Logistic in terms of Precision, but performed worse in terms of Accuracy and Recall. On the other hand, Ridge significantly outperformed all other models in terms of Hamming Loss. To investigate these observations further, we computed per-label performance metrics as shown in Fig. 2. As expected, the overall trend of most of the metrics was correlated with the label imbalance i.e. more common process labels were easier to predict. Our results show that Ridge was the most accurate model for prevalent process labels. However, Ridge was not accurate for rare process labels. Surprisingly, some cognitive process labels such as Speech were well predicted by Ridge despite their rarity. To investigate any systematic bias in classifier mistakes, we computed the classifier confusion matrices as shown in Fig. 3 (only confusion matrices for Ridge and Logistic

1.0 Accuracy

0.8 0.6 0.4

SVM Logistic Ridge Popularity

0.2 0.0

A B C D E F G H I J K L M N O P Q R S T U V (a) Accuracy per Cognitive Process Label

1.0 F1Score

0.8 0.6 0.4

SVM Logistic Ridge Popularity

0.2 0.0

A B C D E F G H I J K L M N O P Q R S T U V (b) F1Score per Cognitive Process Label

1 - Hamming Loss

1.0 0.8 0.6 0.4

SVM Logistic Ridge Popularity

0.2 0.0

A B C D E F G H I J K L M N O P Q R S T U V (c) 1 - Hamming Loss per Cognitive Process Label

Fig. 2. Model Performance per Cognitive Process Label. For metrics other than Hamming Loss, label prevalence is highly correlated with classification performance. Ridge was especially accurate for the most prevalent process labels, but was relatively less accurate for rare process labels. The cognitive process labels are coded as capital letters A, . . . , V (see Table 1). Figure is best viewed in color.

are shown due to limited space). Each row represents the average fraction of examples where the cognitive process label associated with the row was predicted as the cognitive process label associated with the column. Across process labels, it was clear that labeling mistakes were systematically in the direction of more prevalent process labels i.e. the confusion matrices are brighter towards the left side. The cooler color in Ridge was mostly due to the high proportion of mistakes made for Spatial Attention - the rarest process label. Examining the right side of the confusion matrices, we note that Logis-

A B C D E F G H I J K L M N O P Q R S T U V

15 13 10 10 9 6 3 3 2 4 18 18 13 9 10 8 2 2 0 6 15 15 15 5 11 5 2 4 1 7 13 10 5 14 6 8 5 3 3 1 13 11 11 6 12 3 2 3 2 6 15 15 10 11 4 15 2 5 0 4 9 8 4 9 3 7 15 3 12 2 6 6 10 5 3 9 1 11 2 5 3 2 3 4 3 4 22 4 21 3 12 13 14 1 11 3 2 3 2 12 8 10 11 4 7 6 2 5 3 9 9 6 5 8 10 4 4 4 4 4 7 6 11 6 3 12 0 12 1 4 7 4 6 7 6 4 3 5 3 5 11 11 15 3 7 10 1 8 1 8 22 20 2 21 0 18 4 1 0 0 5 6 6 4 4 5 4 5 4 5 4 5 4 3 4 6 3 5 4 4 6 5 5 6 7 3 4 6 4 5 2 4 5 5 3 5 6 7 5 6 6 7 5 8 4 5 4 3 4 4 13 13 8 24 2 10 0 2 0 0

3 3 4 3 3 4 4 5 5 6 7 3 5 5 5 0 6 3 6 4 5 5

4 2 2 5 5 1 3 2 4 2 2 8 2 6 1 0 3 3 6 3 4 0

2 1 2 3 2 4 1 8 2 2 3 4 11 3 7 1 4 5 3 6 4 3

2 1 2 3 2 0 3 3 3 3 4 4 1 5 0 0 4 4 8 5 4 0

2 2 4 2 3 4 2 7 3 4 4 3 8 5 7 0 4 6 4 4 4 0

3 3 1 5 2 6 4 2 2 1 1 3 2 3 1 12 4 6 2 3 7 0

1 0 1 2 1 0 1 2 2 2 2 2 0 3 0 0 6 4 3 6 2 0

1 0 1 2 2 0 1 2 1 1 2 3 2 3 0 0 5 10 2 6 2 0

2 1 2 3 3 0 2 3 3 2 3 4 1 6 1 0 5 4 5 5 3 0

1 0 1 2 2 0 1 3 2 2 2 3 2 4 1 0 4 5 4 6 3 0

2 0 1 2 2 0 2 3 2 2 2 3 2 3 0 0 3 4 3 3 5 0

2 1 1 2 2 0 2 2 2 2 2 3 2 3 1 0 5 4 3 3 4 0

A B CD E F GH I J K L MNOPQR S T U V

(a) Logistic

0.225 0.200 0.175 0.150 0.125 0.100 0.075 0.050 0.025 0.000

A B C D E F G H I J K L M N O P Q R S T U V

20 16 12 11 12 7 2 1 0 5 20 18 14 9 11 8 2 2 0 6 18 16 19 5 13 6 2 3 0 7 21 15 4 21 6 10 4 2 0 1 19 14 16 7 17 3 1 1 0 8 16 15 9 12 4 17 2 6 0 4 12 11 5 13 3 7 20 2 15 2 13 6 20 8 3 17 0 18 0 0 2 0 1 7 0 0 42 2 42 0 15 15 18 1 15 5 1 2 0 14 11 12 14 7 9 14 2 6 0 10 17 7 8 13 17 2 1 2 1 7 12 5 18 8 2 20 0 17 0 0 19 1 3 19 19 0 0 3 0 0 12 9 19 3 8 13 1 10 0 8 23 22 1 22 0 17 4 0 0 0 0 0 10 10 0 0 0 40 0 0 0 0 0 0 0 0 10 10 0 0 20 3 1 15 21 0 1 2 1 0 20 0 0 0 0 0 0 30 0 0 15 5 0 40 0 10 0 10 0 0 11 4 0 70 0 3 3 4 0 0

3 3 3 1 3 3 2 0 1 6 8 2 0 0 3 0 0 0 0 0 0 0

3 0 1 1 0 1 1 1 1 4 1 3 5 0 2 1 4 0 0 0 1 0 9 0 1 1 0 3 1 0 2 4 0 11 1 4 0 10 0 18 0 0 1 7 0 0 0 0 0 0 0 0 0 0 18 0 16 0 10 0 0 0 0 0 3 0

1 1 2 0 2 2 0 6 0 4 2 1 7 0 6 0 0 0 0 0 0 3

3 3 1 5 0 5 5 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 17 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A B CD E F GH I J K L MNOPQR S T U V

0.64 0.56 0.48 0.40 0.32 0.24 0.16 0.08 0.00

(b) Ridge

Fig. 3. Avg. of Normalized Confusion Matrices for Logistic and Ridge. The true process labels are along the row, and the predicted process labels are along the columns. Each row represents the average fraction of examples where the cognitive process label associated with the row was predicted as the cognitive process label associated with the column. The matrix entries are scaled ×102 to improve readability. Cognitive process labels are coded as capital letters A, . . . , V (see Table 1). Figure is best viewed in color.

tic sometimes classified prevalent process label examples as rare process labels, while Ridge rarely made such mistakes at the expense of low accuracy for rare process labels. Recall that the cost of ignoring rare process labels is relatively low for the Hamming Loss as compared to the other losses. This explains the relatively high performance of Ridge for the Hamming Loss. The empirical results suggest that a multi-classifier approach combining the advantages of the different classifiers may be effective. For example, Ridge could be used for predicting the most prevalent process labels, and combined with Logistic for predicting rare cognitive process labels.

4

Conclusion

The decoding of cognitive processes is an important first step towards evaluating and verifying the latent processes the brain employs to complete various tasks. We have provided experimental evidence that cognitive processes can be accurately decoded from brain function using a multilabel classification approach. We also studied some of the trade-offs that arise due to the imbalance of the process labels. We intend to continue further verification of the decoding performance by evaluating various multilabel classification methods in the literature [14]. This will also aid in understanding the trade-offs between different methods in the specific application to neuroimaging data. In addition, we plan to incorporate structured regularizers such as the total variation regularization [5], or Bayesian models for structured sparsity [6] that may help to localize the sources of classification performance, improving the interpretability of the results.

Bibliography

[1] Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA (2006) [2] Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998) [3] De Martino, F., Valente, G., Staeren, N., Ashburner, J., Goebel, R., Formisano, E.: Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns. NeuroImage 43, 44–58 (2008) [4] Haxby, J.V., Gobbini, M.I., Furey, M.L., Ishai, A., Schouten, J.L., Pietrini, P.: Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293(5539), 2425–2430 (2001) [5] Michel, V., Gramfort, A., Varoquaux, G., Eger, E., Thirion, B.: Total variation regularization for fmri-based prediction of behavior. Medical Imaging, IEEE Transactions on 30, 1328–1340 (2011) [6] Park, M., Koyejo, O., Ghosh, J., Poldrack, R.R., Pillow, J.W.: Bayesian structure learning for functional neuroimaging. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2013) [7] Pereira, F., Mitchell, T., Botvinick, M.: Machine learning classifiers and fMRI: A tutorial overview. NeuroImage 45, S199–S209 (2009) [8] Poldrack, R.A.: Subtraction and beyond: The logic of experimental designs for neuroimaging. In: Hanson, S.J., Bunzl, M. (eds.) Foundational Issues in Human Brain Mapping, pp. 147–160. MIT Press, Cambridge, MA (2010) [9] Poldrack, R.A., Barch, D.M., Mitchell, J.P., Wager, T.D., Wagner, A.D., Devlin, J.T., Cumba, C., Koyejo, O., Milham, M.P.: Towards open sharing of task-based fMRI data: The OpenfMRI project. Frontiers in Neuroinformatics (2013) [10] Poldrack, R.A.: Inferring mental states from neuroimaging data: from reverse inference to large-scale decoding. Neuron 72(5), 692–697 (2011) [11] Poldrack, R.A., Halchenko, Y.O., Hanson, S.J.: Decoding the large-scale structure of brain function by classifying mental states across individuals. Psychological Science 20, 1364–1372 (2009) [12] Poldrack, R.A., Kittur, A., Kalar, D., Miller, E., Seppa, C., Gil, Y., Parker, D.S., Sabb, F.W., Bilder, R.M.: The cognitive atlas: Towards a knowledge foundation for cognitive neuroscience. Frontiers in Neuroinformatics 5 (2011) [13] Posner, M.I., Petersen, S.E., Fox, P.T., Raichle, M.E.: Localization of cognitive operations in the human brain. Science 240(4859), 1627–31 (Jun 1988) [14] Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering 99(PrePrints), 1 (2013)

Decoding Cognitive Processes from Functional MRI - Sanmi Koyejo

(OpenFMRI3) [9] and a large cognitive ontology labeled by domain experts (Cognitive ... processes in the Cognitive Atlas [12] and refined by domain experts.

243KB Sizes 0 Downloads 195 Views

Recommend Documents

Decoding Cognitive Processes from Functional MRI - Sanmi Koyejo
(OpenFMRI3) [9] and a large cognitive ontology labeled by domain experts (Cognitive .... Precision, Recall and F1Score, and the best possible score is 1.

Computational Modeling/Cognitive Robotics Enhances Functional ...
... Rosenbloom, 1987), ACT-R (Lebiere & Anderson, 1993), CAPS (Just & .... 1 Please see http://psych.wisc.edu/glenberg/garachico/Garachico.html for a recent ...

Computational Modeling/Cognitive Robotics Enhances Functional ...
Department of Computer Science ... Almost by definition, the architecture of a cognitive robot, a robot that employs ...... Cambridge MA: Harvard University Press.

336 Acceleration of functional MRI data acquisition by ...
[1] Bernstein MA, Grgic M, Brosnan TJ, Pelc NJ. Reconstructions of phase contrast, phased array multicoil data. Magn. Reson. Med. 1994;32:330–334. ... the support of S. The reconstruction algorithm of SELFIE is given in. Fig. 1, and the reconstruct

High temporal resolution functional MRI using parallel ...
UNTIL NOW, functional MRI (fMRI) data were mostly acquired using echo planar ... To clarify the presentation of the different steps re- quired to achieve parallel EVI, .... reconstruction to improve the visual quality of static images. Nevertheless .

Functional MRI reveals declined prefrontal cortex ...
b Department of Radiology, Maastricht University Hospital, Maastricht, The Netherlands c Department of Neurology, Maastricht University Hospital, Maastricht, ...