4, 34–54 (1996) 0027

NEUROIMAGE ARTICLE NO.

Reproducibility of PET Activation Studies: Lessons from a Multi-Center European Experiment EU Concerted Action on Functional Imaging J-B. POLINE, R. VANDENBERGHE, A. P. HOLMES, K. J. FRISTON,

AND

R. S. J. FRACKOWIAK1

The Wellcome Department of Cognitive Neurology, 12 Queen Square, WC1N 3BG, United Kingdom Received April 9, 1996

The main effect determining reproducibility was the overall sensitivity of the experiment, to which the scanner and number of scans contribute in a major way, with a marked advantage for 3D scanners and a large field of view. An important conclusion is that data from different centers can be pooled to improve the reliability of results, which is of particular importance for studies in patients with rare conditions. r 1996

PET activation studies are performed widely to study human brain function. The question of reproducibility, reliability, and comparability of the results of such experiments has never been addressed on a large scale. Recently, 12 European PET centers performed the same cognitive activation experiment in a European Union funded concerted action. The experiment involved a standardized and validated cross-lingual experimental and control task involving verbal fluency. Each center contributed at least 6 subjects. In total there were 77 subjects and 247 scans in each of the two conditions, giving 494 scans in total. We have analyzed each center’s dataset and pooled datasets using statistical parametric mapping. We present results that address the consistency of these analyses, discuss the factors that influence their sensitivity, and comment on a number of related methodological issues. We used a MANOVA to test for center, condition, and centre by condition effects and found a strong condition and center effect and weaker interactions.

Academic Press, Inc.

INTRODUCTION Over the past decade positron emission tomography (PET) has provided one of the most powerful tools available for studying the relationship between human brain functions and neuroanatomy. Applications in this field are many and PET experiments have brought new insights into the functional anatomy of a number of behaviors. Occasionally, controversies have emerged when different laboratories were unable to replicate results (cf. Petersen et al., 1990; Howard et al., 1992; Price et al., 1994). Generally speaking, differences between PET activation experiments can be divided into those that depend on neurospychological aspects (design of the various tasks, subject selection etc.), those that depend on data acquisition parameters (PET scanner characteristics, reconstruction and correction algorithms, number of subjects scanned, the number of scans per subject etc.), and those that depend on data analysis (including statistical analysis and image post processing). These factors can affect the end result of an experiment to a greater or smaller extent. For these reasons the European concerted action on functional imaging decided to study the reproducibility of PET activation technique across laboratories. While it is relatively easy to analyze the same dataset with different methodologies (method comparison), it is much more difficult to gather a large number of comparable datasets (same protocol). Because of the great practical difficulties involved in such a project,

1 On behalf of the PET centers in Cologne: K. Herholz, U. Pietrzyk, A. Thiel, Max-Planck-Institut fur Neurologische Forschung. Germany. Copenhagen: I. Law, C. Svarer, K. Rune, C. Bonde, O. B. Paulson, Department of Neurology, Rigshopitalet, Denmark. Dusseldorf: P. Indefrey, Max Planck Institut fu¨r Psycholinguistik; R. J. Seitz, Department of Neurology, H. Heine University and Julich: H. Herzog Institute of Medecine Research Centre, Germany. Essen: C. Weiller, S. Kiebel, M. Rijntjes, S. Mu¨ller, Neurology Clinic, University of Essen, Germany. London: L. Warburton, FIL, Wellcome Department of Cognitive Neurology, Queen Square, UK. Groningen: L. A. Stowe, A. M. J. Paans, A. A. Wijers, A. T. M. Willemsen, PET Center, University Hospital Groningen, The Netherlands. Leuven: R. Vandenberghe, P. Dupont, G. Orban. Department of Brain & Behaviour, Katholieke University Leuven, Belgium. Liege: P. Maquet, E. Salmon, C. Degueldre, CRC Universite de Liege, Belgium. Lyon: I. Faillenot, J. Decety, D. Comar, Hopital Neuro-cardio, Cermep, France. Milano: V. Bettinardi, S. F. Cappa, F. Fazio, M. Gorno-Tempini, F. Grassi, D. Perani, T. Schnur, G. Striano, INB-CNR Universities of Milan and Brescia, Scientific Institute HS Raffaele, Italy. Orsay: B. Mazoyer, N. Tzourio, F. Crivello. SHFJ-DRIPP-DSV, CEA, France. Stockholm: M. Ingvar, K. M. Petersson, G. von Heijne, Cognitive Neurophysiology Department Clinical Neuroscience, Karolinska Hospital, Sweden.

1053-8119/96 $18.00 Copyright r 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.

34

LESSONS FROM A MULTI-CENTER PET ACTIVATION STUDY

little has been done so far to address this question. Among the few exceptions are the study by Senda et al. (1993) involving 3 Japanese PET centers that addressed localization issues and a more recent study by Watson et al. (1995) that compared results of a word recognition study between two centers. The results of our experiment have implications for two important aspects of functional brain mapping. First they define the degree of confidence that can be invested in results from different centers and allow an objective assessment of findings that are not reproduced. Second they address the question ‘‘can data from different centers be pooled during analyses’’—a facility that will be important when rare cases are studied independently at different sites. Much work has been put into defining a common anatomical reference space to efficiently and reliably compare the results of brain functional imaging experiments. This is the main task of human brain mapping projects and related databases (see Fox et al., 1994). We address here some fundamental issues that should be of particular interest to the database community by providing information about interlaboratory variability. This information may in turn influence the way neuroscientists interrogate or construct functional mapping databases. During 1994, a European Union concerted action was undertaken by 12 PET centers in 8 European countries (EU Concerted Action on Functional Imaging, 1995). While motor or sensory tasks have been shown to be highly reproducible (see Watson et al., 1993), replication of cognitive tasks is less certain, possibly because of the relatively greater organizational variability of higher brain functions. The 12 centers performed the same silent verbal fluency brain activation experiment. The language activation paradigm was chosen on scientific and practical grounds; importantly, however, this paper is concerned only with methodological issues such as reproducibility, variance partitioning, and differential sensitivity. Data produced by the different laboratories had very different characteristics, due to the physical characteristics of the PET scanners used, reconstruction procedures, the number of scans collected per subject, the number of subjects scanned, the blood flow tracer used, and the subject characteristics. While a number of issues could be addressed with this dataset, in this paper we concentrate on three important themes: c Reproducibility: How reproducible are the results from one center to another, despite differences between centers? c Sensitivity: What increase in sensitivity is obtained when pooling data and how do the results compare with the results of analyses of data from individual centers?

35

c Variance analysis: How does the variance of this dataset partition and what are the sources of this variance? Future work should address questions relating to the variability in location and in extent of activation foci across centers, which is a large topic on its own. While the main issues of this work were methodological, important neurolinguistic questions (for instance the involvement of area 47 in semantic processing and questions that might relate to processing differences due to language) could also be addressed. They are, however, beyond the scope of this article. The large number of centers involved in this experiment (77 subjects, almost 500 scans) is unique, representing the first large-scale experiment on the reproducibility of cognitive PET activation studies. MATERIALS AND METHODS Data Acquisition and Data Collection Data were collected in 12 PET centers (Cologne, Copenhagen, Dusseldorf/Julich, Essen, Groningen, Leuven, Liege, London, Lyon, Milano, Orsay, Stockholm) in 8 countries and the complete dataset was gathered in London for analysis. There were great differences in the physical characteristics of the various scanners (sensitivities, resolution, fields of view (FoV)) and in the activity measurements (oxygen 15 was used to label either water or butanol). Tracer was delivered by infusion or bolus techniques and acquisition duration varied between 40 and 120 s). The reconstruction procedures also differed in terms of reconstruction filter and scatter correction. Table 1 summarizes some of the physical characteristics and general acquisition parameters of each dataset. Cognitive and Control Tasks The control condition was silent rest, with eyes closed, and earphones in place. Subjects were asked to relax and avoid movements. The cognitive task was the silent generation of verbs related to nouns presented binaurally via earphones. Subjects were asked not to articulate. Nouns were presented every 6 s for 2 min, giving a total of 20 nouns per scan. Nouns for the different languages were selected using a common set of pictures (Snodgrass and Vanderwart, 1980). The task order was randomized between subjects, although randomization was different between centers (most centers chose to scan half of the subjects with a ABABAB design and the other half with BABABA, while others used more elaborate randomization latin-square like techniques). Subjects were males, between 20 and 60 years old and strongly right handed as assessed by the Edin-

36

POLINE ET AL.

TABLE 1 General Characteristics of the Dataset (Physical Characteristics of Scanners and Simplified Protocol Acquisition Parameters) Center

Subjects 3 scans

Scan duration(s)

Language

Camera

Slice thickness (mm)

Slices

Axi. FOV (cm)

2D/3D

Injection

Col Cop Due/Jul Ess Gro Leu Lie Lon Lyo Mil Ors Sto

736 10 3 6 (12) 6 (4) 3 6 6 (5) 3 12 634 636 636 6 3 8 (12) 636 636 636 636

90 90 40 90 90 40 120 90 90 90 80 100

Ger. Dan. Ger. Ger. Dut. Dut. Fre. Eng. Fre. Ita. Fre. Swe.

EcatHR GE-Advanced Scandi.PC-15 CTI-953 CTI-951 CTI-931 CTI-951 CTI-953B TTV03 CTI-931 (1R) CTI-953B EXACT HR

3.125 4.25 6.5 3.375 3.375 6.75 3.375 3.375 9.0 (12) 6.75 3.375 3.125

47 35 15 15 31 15 31 31 7 7 31 47

15.0 15.2 10.4 5.4 10.8 10.8 10.8 10.8 8.1 5.4 10.8 15.0

3D 3D 2D 2D 2D 2D 2D 3D 2D 2D 2D 2D

Bolus Bolus Bolus/But Infusion Bolus Infusion Infusion Slow bolus Bolus Bolus Bolus Bolus/But

Note. NB all scans were corrected for attenuation by measured transmission scans). Subjects 3 scans column, the number of subjects with a full number of scans is in parentheses. For the number of scans, the number of scans acquired, as opposed to the number of scans used in the analysis (some scans were acquired under some other experimental conditions) is in parentheses. Slick thickness column: the slice separation when it differs from the slice thickness is in parentheses. Last column: But: 15O-Butanol. Other abbreviations: Ger, German; Dan, Danish; Dut, Dutch; Fre, French; Eng, English; Ita, Italian; Swe, Swedish.

burgh inventory. They were screened for auditory deficits, were taking no drugs, and had no present or past neurological disorders. Each subject was able to produce at least 2 verbs per noun during a training session. This training session consisted of an articulated generation of verbs to 30 spoken nouns and a silent generation to 15 spoken nouns. Each center agreed to scan at least 6 subjects with 6 acquisitions per subject (in accord with local ethics committees). However, experiments were purposely conducted with very little standardization in terms of data acquisition procedure. In particular, centers were free to proceed with their usual scanning paradigm, including the tracer injected, method of subject positioning, acquisition duration, scan start time, and so on. Not all centers succeeded in providing 6 subjects or 6 scans, while others scanned more subjects and/or acquired more scans per subject. Nethertheless, all datasets have been analyzed as received (see next paragraph). Overall, we collected a database of 77 subjects with a total of 494 scans. Individual Center Analysis Individual center datasets were first analyzed with the latest version of statistical paramatric mapping (SPM95, Wellcome Department of Cognitive Neurology; Friston et al., 1995a). Scans were stereotactically normalized into Talairach space in preparation for pixel by pixel analyses and filtered with a Gaussian kernel of 12 mm FWHM in the x, y, and z axes. Stereotactic normalization used affine and quadratic transformations (Friston et al., 1995). We did not try to match the resolutions of the final statistical maps, as these depend on the initial image resolution and on the

spatial structure of the residual variances which, per force, varied across centers. Instead, the same filter was applied to all datasets. Each of the 12 datasets was analyzed using a completely randomized block design. ANCOVA was used to normalize to a mean global flow of 50 ml/100 ml/min and an SPM5t6 statistic was constructed for the contrast of condition effects. Equivalent z statistic volumes (SPM5Z6) were assessed for significant regions by voxel intensity (Friston et al., 1991; Worsley et al., 1992). Supra-threshold cluster sizes and significance values (Friston et al., 1994a) were also computed for completeness but not used to select significant regions unless explicitly stated. A direct comparison of each center’s results used the set of regions found at the 5 and the 20% significance levels. The second threshold was chosen to limit the risk of missing true positives. A third threshold was chosen to define a set of regions with a very low probability of including a false positive. These regions survived a correction for the number N of centers analyzed (excluding the 3 centers with small fields of view: N 5 9, i.e., a 5 1 2 (1 2 .05)1/N 5 0.0057). For each center dataset, we tried to analyze data optimally, leaving out subjects that were badly positioned in the scanner (a total of 4) but including as many scans as possible (for instance keeping subjects for which a scan was missing). We wanted to focus on a comparison of results as if they were obtained in individual laboratories so that, for example, we did not match the datasets for number of scans and/or subjects. Pooled Analysis A group of 9 centers with fields of view greater than 10 cm were analyzed together (57 subjects, 347 scans),

LESSONS FROM A MULTI-CENTER PET ACTIVATION STUDY

using a split plot design implemented in SPM95 with a regression coefficient for global effects specific to each center. Within this model, inferences regarding overall activation and activation 3 center were performed. We were mainly interested in the condition effect but we also tested for some center by condition interactions and for some individual center effects to show the impact of increased degrees of freedom (these analyses were performed on a subset of 7 centers described below). All computations were performed on a SUN sparc 20 workstation (SUN Inc.) with 64 Mb of RAM (the pooled analysis was computer intensive and took 5 h to run on this machine). Pooled Analysis of Datasets Containing the Lower Part of the Brain A subset of centers was chosen to analyze the lower part of the brain. Six centers were chosen because they included this region during scanning. This subset included 41 subjects and 252 scans and was used to assess the significance of regions below z 5 212 in the standardized coordinates (Talairach and Tournoux, 1988). Reporting Anatomical Regions Significant regions were located in the Talairach and Tournoux atlas using the lists of locally identified activation maxima provided by SPM. On some occasions, significant Z scores were found that did not correspond to any local maximum. This was the case especially in the pooled analysis maps that were smoother than individual maps due to noise reduction. These regions were also reported and we have indicated that a local maximum was not found (see Table 3). Singular Value Decomposition (SVD) Singular value decomposition reveals the spatial and/or temporal patterns that account for the largest portion of the variability of the data in a noninferential manner. In this analysis we excluded centers with less than 6 subjects or less than 6 scans, leaving 7 centers with large field of views (FoV) and equal numbers of subjects, and scans per subject, to ensure that components due to different centers would contribute equally. (We arbitrarily chose the first 6 subjects and the first 6 scans in datasets with more scans or subjects.) A singular value decomposition (SVD; Friston et al., 1993, 1994b; Strother et al., 1995) was performed on the whole series of adjusted within center condition effects. Therefore, the data subjected to SVD were corrected for global activity and subject (or block) effects, ensuring that subject effects did not confound other effects of interest. Although it is possible to perform SVD on the individual (adjusted) data, this yields a data matrix 6 times greater than that actually

37

used (42 3 29000), leading to impractical data processing and inclusion of confounding subject effects. SVD was also performed on the average condition effect of scan effects across centers to remove differences due to centers. Multivariate Analysis of Variance (MANOVA) We were unable to examine all the different variance components due to the vast variety of aquisition/ reconstruction parameters specific to each center. In other words, there were many more effects to account for than centers. However, we performed a multivariate analysis of variance to test for condition, center, and center by condition interaction effects. These effects are (2 by 2) orthogonal to each other. Note that this would not be the case in a dataset where the number of scans differed between centers. Although this analysis does not have localizing power, it provides an overall significance for these three effects, and more importantly gives a quantitative measure of the relative importance of each of them. Note that this analysis does not need to be performed on the data in the original image space. For computational purposes, we first reduced the dimensionality of the dataset through a singular value decomposition. Results in both spaces are identical if all components of the SVD decomposition are used in the MANOVA (Chatfield and Collins, 1980). In this work we only used the first 8 components, which explained more than 83% of the variance (increasing the number of components had little effects on the results). C, A, and I denote the center, condition (activation), and interaction effects, respectively, and P values were obtained using Wilks’ L statistic, comparing the determinants of the residual matrices of the full model (model including all three effects) and of the model excluding the effect to be tested. For instance, to test for the center effect, we have L( p, h, r) 5 det(CBI )/det(BI ), where CBI is the residual matrix of the full model, BI is the residual matrix of the model excluding the center effect, and p, h, r are, respectively, the dimentionality of the data, the degrees of freedom of the full model, and that of the restricted model. (see Friston et al., in press, for a more complete description of this technique.) ANOVA on SVD First Four Axes A similar univariate (ANOVA) analysis was applied to the first four components of the singular value decomposition. These explained 25.2, 23.3, 12.5, and 8.1% of the variance, respectively. The variances due to the center, condition, and interaction effects were partitioned and significance was tested with the usual F statistic.

38

POLINE ET AL.

FIG. 1. Typical output of an SPM analysis of the dataset of one laboratory (in this case the Leuven PET centre). The saggittal, coronal, and axial maximum intensity projection maps are thresholded at Z 5 3.09. The top right panel shows the design matrix of the general linear model used to partition the data (6 blocks corresponding to the 6 subjects, 6 scan effects, and global activity as a covariate of no interest). The SPM5Z6 shows voxels that survived an intensity or spatial extent statistical threshold of 0.2.

LESSONS FROM A MULTI-CENTER PET ACTIVATION STUDY

39

FIG. 3. Pooled SPM analysis of the 9 centers with a large field of view PET camera. The design matrix implemented a study (center)-specific ANCOVA for the correction of global activity in the brain. Notice the very high Z score reached by this analysis (Z . 17). The Z map has been thresholded such that every voxel in this figure survived a multiple comparison correction for an overall level of 5% (Z 5 3.72).

40

POLINE ET AL.

FIG. 2. Saggittal, coronal, and axial maximum intensity projections of the statistical parametric maps (maximum intensity projection) for each of the 12 centers. The Z maps were thresholded at Z 5 3.09. Row by row, from left to right and top to bottom: Copenhagen, London, Cologne, Liege, Leuven, Stokholm, Groningen, Orsay, Dusseldorf/Julich, Essen, Lyon, Milano.

Examples of Center by Condition Interactions To obtain specific examples of interactions between centers and condition, we tested the interaction between pairs of centers using the above dataset (7 centers included: the number of centers included does not matter in this instance and indeed we could have used a larger group of datasets at the expense of more computationally expensive analyses).

Effect of Pooled Variance Estimates Finally, the group of 7 centers was used to investigate the effects of working with a ‘‘variance map’’ averaged across laboratories, in terms of the sensitivity and the specificity of an individual center analysis. Again, the entire dataset could have been used but a restricted group was chosen due to practical considerations.

RESULTS Individual Centre Analyses Figure 1 gives an example of a typical result for a single center analysis with 6 subjects and 6 scans (in this instance results are from the Leuven center). To compare visually the results of all these single center analyses, we present SPM maximum intensity projections in the saggittal, coronal, and axial orientations (Figs. 2a, 2b, 2c), all thresholded at Z 5 3.09. Only voxels or clusters that survived a corrected statistical threshold of 0.20 are shown to reveal main patterns of activity. The high consistency between these patterns is seen clearly, especially among centers with high sensitivity (due either to 3D acquisition mode or to a large numbers of subjects, e.g., Cologne, Copenhagen, Hammersmith, Stockholm). On the other hand, studies with smaller numbers of scans or subjects give noisier maps

LESSONS FROM A MULTI-CENTER PET ACTIVATION STUDY

41

FIG. 2—Continued

and the patterns are generally less similar. Note that the bottom row shows centers with small fields of view, which explains the absence of signal in the higher or lower parts of the brain. Some of the variation seen is due to differences in detection sensitivity (due to scanner sensitivity, and more importantly, number of scans and number of subjects). The highest Z scores ranged from 4.0 to 9.4 (average: 5.8), and the number of voxels above a Z 5 3.09 threshold ranged from 237 to 7519. A better idea of the consistency between maps would be obtained if different statistical thresholds were specified for each center’s dataset to counterbalance (as far as possible) the differential sensitivities. However, this was not our aim. Characteristics of the single center analyses are summarized in Table 2. Resolutions of SPM 5Z6 maps ranged from 15 to 25 mm (FWHM) for an initial gaussian filter of 12 mm FWHM and, as expected, the resolution was influenced by the degrees of freedom of the residuals (Worsley et al., 1992). Given the intrinsic variability of the smoothness assessment (Poline et al., 1995), values found in different SPM5Z6

maps were generally similar. The number of voxels and resels in an analyzed volume was determined by the dimensions of the field of view. The degrees of freedom associated with a single center analysis were usually 24 (36 (scans) 2 6 (conditions) 2 (6 2 1) (subjects) and 1 confound), reflecting the number of scans (centers acquiring data in 3D often scanned more frequently leading to greater stability of the SPM5Z6. We located the significant (risk of error 5% per center) local maxima in the Talairach and Tournoux atlas. Table 3 presents the set of regions found. Not all the regions found in individual center analyses are reported in this table but the list of other significant findings is presented in the Appendix. Regions that survived a correction for the number of centers tested (a 5 1 2 (1 2 .05)1/9 5 .0057) are marked with a ‘‘1’’ in this table and represent regions that are very unlikely to contain false positives. On the other hand, we also report the regions found at a lower significance level in the temporal lobe. If these regions are true positives then there is a reduction in the risk of missing such

42

POLINE ET AL.

FIG. 2—Continued

activations with an increase in the risk of false positives. See, for instance, how the signal in the left middle temporal lobe is detected by almost all studies at the a 5 0.25 level. Note in this particular example that the only dataset in which a signal is not detected in this region had only 4 scans per subject. In other words, an important component of the apparent variability in Table 3 can be explained by the statistical threshold at which we chose to report our results. Shaded cells represent regions that were not consistently scanned by a particular center. The number of false positives expected at the a 5 0.2 level is one activation every 5 independent analyses. To assess the number of inconsistent activations caused by lowering the threshold, we determined which activations appeared at P , 0.2 and were not found in any of the centers at P , 0.05 nor in the pooled analysis. Only one such region was found: the right posterior inferior temporal sulcus, at the border between BA37 and 39. It is left to the researcher to decide whether this is an acceptable risk or not. Results reported at that level (0.2) should be regarded

with caution and ideally reproduced to sustain any neurobiological conclusion drawn from them. Pooled Analysis (Including the Lower Part of the Brain) Figure 3 presents the results of the pooled analysis. Because of the massively significant response due to the very high sensitivity of the analysis the display cutoff threshold was set such that each voxel above the threshold survived a correction for multiple comparisons (at a 5 0.05 level). A large number of structures in the right hemisphere were found to be very significantly activated (e.g., MTG, MFG, Insula). The highest Z score was about 3 times the highest Z score obtained in the single center analyses. Because of the smoothness of the pooled SPM5Z6 map, a number of highly significant areas did not show up as local maxima. A corollary that we develop in the discussion is that one has to be careful interpreting local maxima in highly smoothed data. We interrogated the pooled SPM directly and significant regions without corresponding

LESSONS FROM A MULTI-CENTER PET ACTIVATION STUDY

TABLE 2 Summary of the Results of Individual Center Analysis Center Col Cop Due Ess Gro Leu Lie Lon Lyo Mil Ors Sto Pool

k Vox.

W

Resels

df

Lower z

Higher z

Max Z score

67 74 55 27 62 64 67 67 44 25 64 67 44

18 19 18 21 22 25 18 19 20 22 22 20 14 20 20 18 20 22 23 24 22 21 21 21 16 19 20 17 18 18 18 19 19 20 22 22 29 28 29

167 104 123 46 177 130 86 119 115 73 155 107 30

29 44 18 43 14 24 24 34 24 24 24 24 231

228 228 216 22 228 228 228 228 220 8 216 228 216

60 70 54 36 52 54 54 58 32 36 56 64 52

6.0 9.4 5.3 5.9 4.8 5.7 6.0 7.4 5.1 4.3 5.2 5.3 17.3

Note. Column 2: number of voxels analyzed (3103). Column 3: Smoothness (FWHM) in xyz in mm. Column 5: Degrees of freedom in SPM analysis. Column 6: Highest and lowest Z coordinate in the Talairach and Tournoux atlas. Column 7: Maximum Z value found in the statistical map (SPM5Z6) (note that this last figure does not always give a fair idea of the sensitivity of the analysis).

local maxima are noted with an asterisk in Table 3. Comparing results from the meta-analysis (regions that survive the 0.0057 level in at least one dataset) and the pooled analysis, it is clear that the averaging strategy is much more sensitive (giving additional significant results, e.g., left SFG, left IFS, right IFG etc). Obviously an important constraint is that the volume analyzed is restricted to the FoV common to all scans. Activations that were found using the ‘‘lower pooled analysis’’ are indexed with a small ‘‘l.’’ It is remarkable that all regions reported in the pooled analysis are found in at least one or two centers. This means that none of the biological signals we were trying to detect is so small that it could not be detected within the dataset of one laboratory. It is no less remarkable that there is no significant region (at a 5 0.05, corrected for multiple comparisons) in the individual center results that is not detected by pooled analysis. This finding suggests that there are very few false positives in results of single center studies at the threshold chosen (a 5 0.05, corrected). Differences in the results are also important but, unfortunately, more difficult to comment. Specifically, the source of these differences is not easy to trace, given confounding factors such as different subjects, scanners, etc. Singular Value Decomposition (SVD) The first two eigenimages of the singular value decomposition of the 7 centers data (with similar fields of view and number of scans per subject) are presented in Fig. 4. Figure 5 shows the scores for the first four axes explaining 69.1% of the total variance. In Figure 5

43

each color corresponds to one center: for each the first three observations relate to the experimental task while the last three correspond to the resting control. We observe that the condition (activation) effect is mainly expressed in the first axis (which also include some centre effects). This is confirmed by the eigenimage. The second, third, and fourth eigenimages (or axes) reflect mainly center effects, although these axes are not free from condition effects. Note that variations in the analysis due, for instance, to other global activity normalization procedures or other versions of stereotactic normalization produce slightly different SVD results, but do not change their interpretation. On removing the center effects from analysis (by averaging across centers, see Fig. 6), we found that the first axis accounted for more than 87% of the variance, clearly reflecting the condition effect (again, the first three scans represent the experimental task and the last three the resting control, see Fig. 6). Not surprisingly, the first component in image space is very similar to the pooled SPM-Z testing for a condition effect with maximum variation in the inferior and medial frontal areas as well as in the SMA. The second component shows a pattern which clearly reflects a time effect. In most cases, subjects were scanned according to an ‘ABABAB’ design (‘A’ being the experimental task for half the subjects and the rest condition for the other). In image space, areas of maximum positive variation (activity following the scan pattern) were located in the left parietal cortex. This axis, however, represents less than 7% of the variance. Other axes are less easily interpretable and account for very little of the variance (third axis: less than 3%). Multivariate Analysis of Variance (MANOVA) The results of the MANOVA performed on the dataset were significant for all three effects (center: P , 10213, condition: P , 10212, and center by condition interaction: P 5 0.01, see Fig. 7), with the center effect explaining most of the variance. Although the interaction effect was significant in the MANOVA, it was much weaker than the center or the condition effect. The same observation applies to the ANCOVA results (performed on the 4 first SVD axes), where the percentage of variance explained by the C, A, and I effects for the two first axes were, respectively, 21.8% (C), 74.3% (A), 3.8% (I) and 93.3% (C), 6.5% (A), 0.2% (I) (Fig. 8). This means that although the contribution of the interaction term (differences in subject responses to condition between centers) was significant, it contributes very little variance to the data along the first two eigenimages (axes that accounted for almost 50% of the variance). Note that the interaction may be due to two different factors: responses to the protocol may be greater in some centers than in others due to greater physiological activations and better signal to noise ratio (not to be

44

POLINE ET AL.

confused with the overall sensitivity of the analysis) or alternatively groups of subjects may respond in different ways due to different strategies of processing (e.g., induced by language differences), scanning conditions, or subject characteristics (we see an example of such an interaction below). An Example of a Specific Center by Condition Interaction We present in Fig. 9 the results of a contrast testing for the interaction between two centers and the condition effect. In both interactions (center A (Orsay) versus center B (Liege), and vice versa) very significant interactions were found in regions that were not generally involved in the main condition effect, supporting the conclusion that subjects in these two groups showed similar activations in these regions but responded in different ways in other brain areas. The unexpected response seen in occipital cortex is easily explained by an incident during the acquisition of data from center A: a subject opened his eyes during the activation condition (see Crivello et al., 1995). We chose this example to make the following methodological comment: the response of a single subject (here, an important visual response) can be powerfully detected even when averaged over a group of 6 subjects. However, although such a signal is visible in the original dataset and close to significance (P 5 0.15 for the peak intensity test (PI), P 5 0.28 for the spatial extent test (SE), all P values are corrected), the interaction detected the

activation much better (P 5 0.015 (PI) and P 5 0.002 (SE)) than the original center analysis. When testing for other center by condition interactions, significant results were often found, even after Bonferroni correction for the number of comparisons considered. Results, however, were usually less clearly interpretable because they could not be attributed to a specific element of the design or implementation of the protocol. Increase of Sensitivity with the Number of Scans Analyzed A simple way of increasing the sensitivity or power of a single center analysis is to include more data (more subjects or other groups of subjects), even if the effect tested is restricted to one particular group of subjects. We demonstrate this by showing (see Table 4) the Z scores obtained with an individual center analysis and from the multistudy design while testing for exactly the same effect (i.e., for one center). The maximum Z score was increased by 1.27 (5.72 = 6.99), while most Z scores often increased by one unit. The total number of voxels above a given threshold also increased (on the 4 clusters reported in Table 4: 2850 to 4126). Location of peaks was only slightly effected. This substantial increase in sensitivity has the usual important implications, e.g., a potential for reduction of dose per individual. There is a simple theoretical reason why the sensitivity is increased. Adding further groups of subjects does not increase signal but reduces uncertainties in the estimate of the residual variance. In other words,

TABLE 3 Regions Found at the 0.0057(1), 0.05, and 0.2(2) for the Individual Anonymous Center Analyses

Center Col Cop Du/Ju Ess Gro Leu Lie Lon Lyo Mil Ors Sto P (9)

MTG STS

ITG L2.25 L L2

L1 L1 L1 L2

12345 12345 12345 12345 L L1 L1

L2 R2 L2 R2 L1 R2.56 L2

STG

R1 L L L R L-.26

SFG

IFG

IFS

L L1 R2 L1 L1 L1 L L L L1 L L1 R L L2 L2 L L L

L 1234 1234 L

R L1 R1 L1 R2 L1

1234 12345 12345 12345 1234 12345 1234 1234 12345 12345 12345 1234 12345 1234 L2.25 R L2.46

L1

L*1

L

L

L2.53 R

L*

L

L*1

R* 1

L*

MFG

L L L L L

R

Cin Pre SMA Cen GFd Sul M1 M1 M1 M1 M M1 M1 M1 M2

L1 L1 L L1

R*

L L1 L1 R1 L1 R R2

Orb. Fr. IPar R R2

M1 M

L

M

L

Th

L

R2 R

L

R

L

L

123 1234 123 123 1234 123 123 1234 123 M 1234 123 1234 123 123 1234 1234 123 123 1234 1234 1 123 23 123 123 M11234 R

R2 R*

R1 R1

R1 L1 M1 L1 R1 L1 M L2 R1 L1 M

L1 L

L1

Cer.

M1 L M1

L2

R2 123 1234 1234 123 1234 1234 L2 L*

Put. BG Ve.

L

L L1 L1 R1 L R1

L2

L

Ins

R*

L*

M

L1

R1

Note. In the temporal regions we also report regions with higher risk of error (in this case the P value is reported as well). The differential sensitivity between these analyses is due not only to physical factors (e.g., 3D aquisition), but also to the different numbers of subjects or scans. Shaded cells correspond to regions (of the temporal lobe) that were not scanned. In the pooled dataset, (P(9)), the * corresponds to significant regions not represented by any local maximum (due to the smoothness of the pooled dataset). ITG, inferior temporal gyrus; MTG, middle temporal gyrus; STS, superior temporal sulcus; STG, superior temporal gyrus; SFG, superior frontal gyrus; IFG, inferior frontal gyrus; IFS, inferior frontal sulcus; MFG, middle frontal gyrus; Cin, cingulate; SMA, supplementary motor area; GFd; medial frontal gyru; Pre Cen Sul, precentral sulcus; Ins, insula; OrbFr., orbitofrontal; IPar, inferior parietal cortex; Th, thalamus; Put., putamen; BG, basal ganglia; Ve., vermia, Cer., cerebellum; subscript l, lower paled analysis.

FIG. 4. First two axes of the singular value decomposition in image space (see Fig. 5 for the scan effect space) showing pixels that varied the most in the entire dataset (block and global activity effects have been previously removed). The first axis is clearly a condition effect while the second axis presents a center effect that still clearly contains a component due to a condition effect (cf. Fig. 5). FIG. 5. Singular value decomposition in ‘‘scan effect’’ space. The first 4 axes account for 68% of the variance. Each color corresponds to one center, and the first 3 observations of each correspond to the activation state, while the last 3 observations correspond to the resting state. The condition (activation) effect is clearly seen on the first axis, while other axes showed a clear center effect, although they are not entirely free from a condition effect. The first center that shows a behavior different from the other centers in the second axis is Cologne. Unfortunately, we have no definite explanation why the subjects from this center should show such a difference (see Discussion for some hypotheses).

LESSONS FROM A MULTI-CENTER PET ACTIVATION STUDY

45

46

POLINE ET AL.

we average noise but not signal. Not only should this increase the signal to noise ratio in an area where changes are genuine, but it should also prevent false positives, because the residual variance is better estimated. Therefore the Z scores may decrease in some areas, meaning that the residual variance is in fact greater than was originally computed with a smaller number of scans in this specific location. (Pros and cons of the use of a pooled variance map are further developed under Discussion.) Note that this point is general and is not restricted to the framework of this analysis. (This point was also experimentally confirmed when analyzing other center datasets with and without pooling the residual variance.) DISCUSSION Consistency between the Results and Related Issues In this work, we have been concerned with the consistency of the results of cognitive activation experiments when performed in different laboratories (i.e., not matching for the number of scans or the volume of the brain analyzed). An important point is that differences in sensitivities should not be confused with a lack of consistency. Apparent failures to reproduce results are often due to statistical thresholding. Matching datasets for the number of scans and volume analyzed should improve the consistency of output, but wouldstill leave other sources of differential sensitivity such as subject responses, physical characteristics of the scanner, and associated differences in acquisition/ reconstruction parameters. The use of pooled variance estimates could also eliminate variability due to poor estimates of the variance of the residuals. A pooled variance estimate may also stabilize the location of the peaks and eliminate false positives. These questions should be addressed in the near future. Despite the variety of method-based issues, a high consistency was found among the SPMs between and across centers, which was reflected by the consistent set of activated regions reaching a significance level of 20% in most of the center by center analyses. In particular, the left middle temporal gyrus and left superior temporal sulcus, the left inferior frontal gyrus, the medial frontal gyrus, the insula on the left, and the cerebellum on the right were found, respectively, in 9/11, 11/12, 11/12, 8/11, and 6/7 centers. Less consistently activated across centers were the left thalamus, the left superior temporal gyrus, the left posterior inferior temporal sulcus (4/12, 5/12, 5/12), and more generally the right-hemispheric activations. One activation in the left inferior parietal lobule was found in one dataset only, even when the significance threshold was lowered to a , 0.2. One way of decreasing the number of true negatives is to raise the risk of error. In fact, the statistical

threshold of 0.05 (an arbitrary value) may be misleading, especially if only regions that survive this threshold are reported and discussed. Regions close to significance should certainly be discussed and might indicate a need to increase sensitivity by augmenting the number of scans for each condition. The results of this multi-center study emphasizes the well known fact that experiments designed with poor sensitivity (or power) will miss important activations at high thresholds. It is likely that activations of small magnitude will not reach significance (or will reach significance in one experiment but not in another) because of random factors affecting the sensitivity of analysis. In other words, the fact that statistical tests are not symmetrical and failure to reject the null hypothesis does not imply the absence of an effect (especially with risk of errors at 0.05). Lowering the significance level to P , 0.2 (marked with a ‘—’ sign in Table 3) yielded only one activation (in two centers) that was not seen in any other center at a , 0.05 nor in the pooled analysis. At the 0.2 corrected P value threshold, only 2.4 false positive activations are expected over 12 analyses. Overall, the gain in sensitivity by lowering the threshold to a , 0.2 seems to outweigh the loss of reliability. Note that we have not tried to generate a list of true positives (or false negative) since the true signal is unknown (and may vary across center). Positioning of the subject in the scanner remains a crucial aspect for intersubject averaging, either within or between centers. This problem should gradually disappear with the new generation of scanners that have fields of view large enough to acquire data from the whole brain. For our series of scans, a number of datasets did not include the cerebellum and/or the higher parts of the brain. A further source of variation found when comparing results from the literature is due to differences in data processing and statistical analysis. This question is often addressed by comparing methods on one or two datasets. More knowledge about the sensitivity and specificity of the various methods available could be gained by using the data collected by this European Union concerted action. 2D versus 3D Scanners and Sensitivity Issues A greater effective sensitivity (for the same number of subjects scanned) was observed in the comparisons of results from the 3D relative to 2D scanners (typically maximum Z scores increase by 1.5 or 2 units). There is also a better reproducibility of results which is a consequence of the increased number of scans (greater degrees of freedom) and improved signal-to-noise ratio of these scanners (Bailey et al., 1991; Spinks et al., 1992). The field of view also influences the sensitivity of the analysis indirectly because the stereotactic normaliza-

LESSONS FROM A MULTI-CENTER PET ACTIVATION STUDY

tion is effected on larger volumes. This often results in a more complete match between the scans and the template. The issue of determining the number of subjects needed to achieve good sensitivity in a cognitive activation experiment is, unfortunately, theoretically not solvable, because the magnitude (and indeed shape and extent) of the activation is unknown. Practical experience with our dataset indicates that virtually all regions are detected when pooling optimal (most sensitive) datasets from 3 centers. Using Pooled Variance Estimates: Constraint, Advantages and Disadvantages We have demonstrated that use of several datasets to estimate the residual variance greatly improves results. This is an expected finding and is of practical importance. Pooled estimation of variance may be a necessity, especially when dealing with low degrees of freedom. One important restriction is that the residual variance of added ‘‘dummy’’ subjects must be similar to the primary group of subjects analyzed. ‘‘Dummy’’ subjects are those used in the estimation of the residual variance but that are not included in the analysis of interest. They constitute a means of improving estimation of error variance to render the biologically important comparisons more sensitive. This will be the case in most normal volunteers datasets but may not be the case in patient studies. Note that datasets acquired using parametric designs can be included as well, as long as the variance introduced by the experimental design can be modeled. A practical drawback is the additional computer resources needed when solving the general linear model, depending on the number of ‘‘dummy’’ subjects added.

47

variance can be attributed to instrumental factors related to the scanner used, to the acquisition protocol, or to some characteristic of the subject groups. A preliminary multilinear regression failed to show a simple relation between a set of instrumental factors and subject characteristics and the center effect (after removing the condition and condition by center interaction effects). Similarly, no significant results were found when testing for the effect of age and of the rates of verb generation. Note that the subjects were not selected randomly in a population. Some laboratories recruited significantly younger groups of subjects than others, introducing a dependency between a ‘‘group of subjects effect’’ and other center effects. Due to dosimetry limitations, each subject could be scanned in one center only, preventing within-subject across centers comparisons. Although it seems that we are unable to disentangle different factors that contribute to the center effect, we can speculate about factors that are likely to cause such an effect. The SVD decomposition gives important information in this respect by detecting voxels that most follow similar patterns. There is a central versus cortical effect visible in Fig. 4. This may be due to several causes, e.g., differences in stereotactic normalization due to brain edge detection or some variation imposed by differences in scatter and attenuation characteristics and corrections between centers. Other SVD components related to the center effect are more difficult to interpret. Even if the center effect does increase residual variance, the results of the pooled variance analysis show that this increase does not dominate when testing for condition effects and that averaging over centers still results in dramatic improvements in sensitivity that outweigh the disavantages discussed above.

Sources of Center Effect

Local Maxima and Peak Location: How Do We Report Results?

Because the center effect is a source of variability in the complete dataset we investigated whether the

Local maxima provide a simple, concise, and more importantly, a standard way of reporting results of

FIG. 6. SVD decomposition of the averaged (across center) scan effect. When the center effect is removed through averaging, 88.4% of the variance is explained by the first component, which reflects a condition effect (a). The first three scans correspond to the active condition. The second component (7% of the variance) shows a profile attributable to a time effect (b). FIG. 7. Multivariate analysis of variance. The MANOVA was performed on the first 8 scores obtained with the SVD analysis (explaining more than 75% of the variance). The bars represent a measure of the variance due to the center (Cent.), condition (Cond.), and interaction (Int.) effects. First and fifth bars represent measures of the total variance (Total) and the residual variance (Res.), respectively. All modeled effects were significant at P , .01, but center and condition effects were much more significant than the interaction term. The height of the bars is to be compared with the height of the first bar, a measure of the total variance, which itself is arbitrary scaled. FIG. 8. ANOVA on first SVD axes. Four ANOVAs on the first 4 scores of the singular value decomposition (explaining, respectively, 24.6, 23.1, 12, and 8.6% of the variance). Notice that the condition effect is restricted (and reaches significance) in the 4 first axes but has higher values in the first 2 axes. The center effect is highly significant in all axes, and the interaction effect shows an effect of small significance in axes 1 and 3. Tot, Total variance; Ce., Co., and In. are the variances explained by the center, condition, and interaction effects, respectively. Res., residual variance. FIG. 9. Interaction between center and condition effect in two laboratories (from a group of 7 datasets with matched numbers of scans and subjects). The interaction tested between two centers with comparable 2D scanners and subjects matched for their mother language (French). The strong visual cortex signal is due to a single subject who opened his eyes during the active condition in the first center.

48

POLINE ET AL.

49

50

5.02 44,248,28

260,244,28 3.89

44,240,0 6.16 5.17 5.04 4.99 4.35

5.03 48,244,20

54,238,4 44,238,24 258,238,4 4.42 260,236,4 258,244,8 252,234,4 40,242,8 3.82

3.83 42,222,24 4.01 38,230,24 4.16 46,220,24

56,244,216 4.13 54,250,28 3.94

52,222,24 5.02 44,240,4 58,240,0 L GTM (BA21) R GTM

4.59 4.05 258,226,4 4.78 R STS (BA22)

L 1 octmp s (BA20/37) R 1 octmp (BA20/37) L pst ins WM

3.87

4.27 46,242,12

4.08 50,234,8 6.04 58,244,8

4.38 42,18,22 4.30

L GTS 44,12,28 (BA38/47?) L GTS (BA22) L STS 44,232,0

Note. GTS, superior temporal gyrus; STS, superior temporal sulcus; GTM, middle temporal gyrus; l octmp s, lateral occipitotemporal sulcus; pst ins, posterior insula; WM, white matter.

6.6 4.17 242,238,4 5.44 38,242,0 34,24,220 3.77

5.34

42,8,28

Kar

TABLE A1

Lyo

4.65

Ors

Note. First column displays the location of the local maxima found in the standard analysis, and their Z scores are in the second column. The second value in the second column is the Z score at the same location when the pooled variance was used. Third column shows the size of the cluster (defined by a threshold of Z 5 3.09) to which these local maxima (LM) belong. The location of the closest LM in the pooled variance analysis with corresponding Z scores and size of cluster (defined at the same threshold Z 5 3.09) are shown in the last three columns, respectively. * Significance at the 0.05 level.

262,242,0 3.72 258,226,24 4.35

347*

46,246,212 3.73 52,252,24 5.28

4.33 38,242,0 5.44 4.06 262,242,0 3.59 44,246,4 54,252,0

Leu

698*

functional imaging studies (when they refer to a standard atlas). The EU dataset also gives some important information on the variability of the location of local maxima (LM, see the Appendix). We observed that variation in the location of peaks is large, although it may decrease when highly sensitive experiments are compared. However, it should be noted that local maxima clearly represent variable ‘‘volumes’’ of significant change depending on the smoothness of the SPM and have an intrinsic statistical variability that is at present unknown (but depends on the initial data smoothness, the low pass filter width, and the degrees of freedom of the test). Local maxima are at times too limited a description of activations. With large smoothnesses, one problem that is encountered is the fact that one local maximum may be spread by the smoothness function to overlie a number of adjacent distinct anatomical regions. Thus it may be necessary to interogate these data directly. On the other hand, a description in terms of gyri or Brodmann areas is in some instances too vague. Localization of peaks may not be an easy task and requires experience whenever the LM falls between different anatomical regions. The resolution of the SPM5Z6 and the uncertainty due to anatomical variation between subjects should be borne in mind before biological conclusions are drawn. Probability maps of anatomy (see Evans et al., 1995) are likely to be an important future tool for reporting results of functional imaging experiments in a meaningful anatomical framework.

48,238,0 12.2 50,_16,24 7.35 260,222,0 6.77 252,228,0 6.67 260,234,4 5.79 9.1

5.5

4.96

2517*

48,244,12

6.99* 6.89* 6.38* 6.13* 6.08* 6.13* 3.98* 4.02*

Lie

564*

Hamm

73

6.75*

Ess

339*

4 6 48 4 6 48 38 30 0 42 20 16 32 24 20 42 242 0 52 252 0 42 242 0 2 284 8 24 294 0

Gron

1937*

Vox.

Duss

501*

Z score

Cop

5.72*/5.58* 4.46*/4.45* 5.72*/6.63* 5.39*/5.78* 5.33*/5.41* 4.33*/5.78* 4.062/5.82* 3.732/5.48* 3.422/3.95* 3.362/3.88

Closest LM

Col

0 8 48 12 16 52 36 28 24 44 20 24 28 30 20 44 246 4 54 252 0 46 246 212 4 284 8 22 290 0

Vox.

Temporal Lobe Activations

Z Score/

Mil

Improvement of Z Scores Using Pooled Variance Estimates for the Leuven Dataset Location

6.1

POOL

8.6

TABLE 4

10.8

POLINE ET AL.

6,4,60 4,12,52

5.88 5.80

4,0,56

Cop 9.39

4.95 4.46

24,10,44

4.71

2,18,20

0,4,48

Duss

8,20,24

12,38,24

4,56,32 6,50,24

Gron

4.71

4.41

4.17 3.99

10,18,32 216,20,24 210,30,20

26,16,36

Ess

4.13 3.96 3.73

5.75

2,14,40

16,2,56 4,4,48 210,0,52

Hamm

6.37

5.81 6.76 5.30 10,48,32 10,52,16 14,20,52

4,4,64

Kar

4.60 4.14 3.87

4.31

26,22,28

Lyo

3.88

Cop

5.37

48,26,0

4.42

44,36,16 4.40

Gron

40,24,8

44,40,8 4.51 36,42,16 4.97 220,40,28 4.58 8,48,40

4.77

Hamm

4.64 254,24,12

20,28,48

4.64

250,18,24 4.04

4.65

44,28,24 4.27

6.75 44,16,12 6.47 6.60

6.12

Kar

42,20,16 40,28,4 36,22,8

36,28,20

46,10,16 7.14 46,14,24 7.43

5.90

Ess

36,26,32 4.22 40,36,28 4.52 32,36,24 5.81 44,28,28 4.13 228,36,38 4.56

5.48 46,28,24

7.50 6.70 6.04

Duss

42,36,4 3.80

Lyo

Ors

2,8,48

Leu

5.28

44,20,24

Leu

Lie

4.46 5.72

Lie

16,24,28

14,32,36 8,32,48 0,16,32 24,36,28

0,4,48

Mil

4.53

4.80 4.72 5.67 4.53

6.01

Mil

POOL

10,54,28

2,6,48

10,40,44

abs 5.09

8.7

3.31

7.91

4.5

14

12.8

3.75

8.4

14.9

10.9

11.5

6.7 5.6

10.3

7.8

4.18

17.3

POOL

5.39 40,18,28 4.49 44,24,28 3.74 24,40,24 36,30,16 3.92 238,36,32

40,18,4 5.42 46,14,12 4.30 44,10,12 4.98

5.07 38,14,36 4.47

12,16,52 0,8,48

36,28,24 5.62

42,14,32

Ors

Note. SFI, inferior frontal sulcus; GFI, inferior frontal gyrus; GFM, middle frontal gyrus; GFS, superior frontal gyrus; abs, absent..

L SFI (BA45/46) L SFI (BA9/44) L GFI 44,20,8 5.80 44,16,8 (BA44/45) 48,24,4 L GFI 42,40,0 (BA44/45) R GFI (BA44/45) L GFI (BA44) L GFI 24,38,212 (BA47) R GFI (BA47) L GFM (BA9) R GFM (BA9) L GFM (BA46) L GFS (BA9) L GFS (BA8/9) L GFS (BA6)

Col

Lateral Frontal Activations

TABLE A3

Note. GFD, medial frontal gyrus; cing s, cingulate sulcus; sccall, s. callosomarginalis; ant cing, anterior cingulate.

L GFd (BA9) L GFd (BA8) cing s (BA32) sccall ant cing (BA24)

L GFd (BA6)

Col

Medial Frontal Activations

TABLE A2

LESSONS FROM A MULTI-CENTER PET ACTIVATION STUDY

51

5.31

4.96

4.47

38,22,48

34,2,28

218,38,216

5.27 4.83 4.37 4.34 4.22

14,272,212 4.41 210,270,216 4.43 26,264,212 4.22

232,260,224 238,268,228 232,278,220 222,272,224 36,260,224

8,238,212

26,272,220 3.79

4.93

232,262,228 6.29 214,266,228 4.81

Cop

26,20,4 30,2,24

Ess

5.14 4.03

Duss

8,238,212 8,234,224

Gron

4.66 3.92

4.09

Ess

Gron

4.34

7.05

TABLE A6

38,22,36

Hamm

Kar

210,256,220 4.23 4,272,220 4.45

238,266,224 4.69 216,276,220 4.70

Hamm

26,234,212 26,226,24

22,260,224 28,280,228

4.26 3.69

4.67 4.15

234,266,228 4.30

Kar

Lyo

226,44,212 218,26,220

Lyo

24,258,212

Infratentorial Activations

54,8,12

Ess

Lyo 38,18,4 4.49

Ors

Ors

3.83 3.73

Leu

4.72

Leu

232,274,224 4.81 238,284,228 3.75

42,8,16

Ors

248,16,8 5.14

Lie

Leu

5.11

32,262,228 44,262,228 230,258,228 4,266,220 2,240,215

Lie

3.81

4.68 3.82 5.24 3.69

Mil

Mil

POOL 13.4

3.78 3.71

17.1

2,230,24

24,266,220

44,262,224

6.93

7.49

7.73

236,266,228 11.2 214,278,224 7.89

POOL

226,30,216 224,38,212

44,15,8

3.90 3.85

11.6

Mil POOL

38,22,44

34,12,24 5.63 34,6,12 4.40

238,262,228 5.16

Lie

32,22,44

32,10,12 4.45 24,28,0 3.87

4.46

30,24,12 5.25 32,16,16 5.05

Kar

Additional Lateral Frontal Activations

TABLE A5

244,4,0 5.28

Hamm

Note. cerbll, cerebellum; cerbll nc, cerebellar nuclei; mes, mesencephalon; verm, vermis.

mes/verm pons/mes

R cerbll nc vermis

L cerbll

R cerbll.

Col

Gron

230,24,4 4.05 232,22,16 3.74

42,24,44

Duss

3.81 4.52

224,34,216

6.96

24,22,20

36,24,52

Cop

Note. precntr s, precentral sulcus.

L precntr s (BA4/6) L precentr s (BA6/44) R GFM (BA11) WM

Col

4.60 4.97 4.11 5.28

Duss

28,20,0 4.47 34,20,4 6.58 40,8,24 34,30,0 5.95 42,8,8 42,26,0 244,4,0

Cop

Note. ant ins, anterior insula; ins, insula.

R ins R ant ins

L ins

L ant ins

Col

Insular Activations

TABLE A4

52 POLINE ET AL.

6.7 7 7.1 8.2 4.5 4.80 6.5 9.5 214,26,4

This multi-center study, the largest of its kind, shows that highly consistent results can be found with the same cognitive protocol. The consistency is a function of the sensitivity of the analysis procedure which is itself dependent on scanner sensitivity and field of view, optimization of scanning protocols, subject selection, etc. . . . Differences in sensitivity indicate the need for careful choice of statistical thresholds when reporting results and when comparing results from different studies. The EU concerted action on functional imaging data has produced a database that is a resource that could be used in the future to test new analytical tools. It offers unique material for the investigation of methodological issues that require a large number of scans.

12,22,12

4.89

5.24 3.59 28,18,12 230,8,0

Lie Leu

3.88 210,230,0

4.19 3.64 12,2,8 8,216,20

3.92 3.88

3.79

18,6,8 26,4,16

22,20,12

L int. caps. L lat. V L putamen L claustrum R claustrum int pall L caudate WM

Note. thal, thalamus; int caps, internal capsule; lat. V, lateral ventricle; int pall, internal pallidum.

22,216,22

18,216,4 10,22,8 4.07 4.30 220,218,0 16,210,16 4.14 218,218,12 R thal. L thal.

APPENDIX: LIST OF ALL SIGNIFICANT REGIONS AT THE 0.05 LEVEL (PER CENTER)

3.86

4.97 18,210,24 5.09 4.90

4.97 26,22,78 16,26,4 12,8,12

4.58 12,4,8

4.40 5.27 4.12 3.69

8,220,8

216,210,8

3.93

12,218,12

Ors Lyo Kar Hamm Ess Gron Duss Cop Col

Subcortical Activations

TABLE A7

53

CONCLUSION

Mil

8,212,12

POOL

4.0 7.81

LESSONS FROM A MULTI-CENTER PET ACTIVATION STUDY

Each table lists the local maxima found in the statistical parametric maps and their localization in the Talairach space in terms of anatomical regions and Brodmann areas. The columns correspond to the centers (Col, Cologne; Cop, Copenhagen; Duss, Dusseldorf/ Julich; Gron, Groningen; Ess, Essen; Hamm, London (Hammersmith); Kar, Stockholm (Karolinska); Lyo, Lyon; Ors, Orsay; Leu, Leuven; Lie, Liege; Mil, Milano. ACKNOWLEDGMENTS This work is the results of a large collaboration involving many people. We thank all the participants of this EU concerted action who have acquired this unique dataset and the coordinators B. Mazoyer and R. Frackowiak. We are in debt to D. Comar for his help in the organisation of this action. (EU Biomed 1 Concerted Action on Positron Emission Tomography of Cellular Degeneration and Regeneration.) The PET center of Copenhagen received additional support from the Danish Medical Association Research Fund and from the Fondation for the Progress of Medical Research in Denmark. We thank our colleagues from the Wellcome Department for their help: Liz Warburton, Jon Heather, John Ashburner, Cathy Price, and Richard Wise. J.B.P. was funded by an EU Grant ERB4001GT932036, Human Capital and Mobility. A.P.H., K.S.F., and R.S.J.F. were funded by the Wellcome Trust.

REFERENCES Bailey, D. L., Jones, T., Spinks, T. J., Gilardi, M-C., and Towsend, D. W. 1991. Noise Equivalent Count measurements in a NeuroPET scanner with retractable septas. IEEE Trans. Med. Imaging 10:256–260. Chatfield, C., and Collins, A. J. 1980. Introduction to Multivariate Analysis, pp. 189–210. Chapman & Hall, London. Crivello, F., Tzourio, N., Poline, J-B., Woods, R. P., Maziotta, J. C., Mazoyer, B. M. 1995. Intersubject variability in functional neuroanatomy of silent verb generation: Assessment by a new activation detection algorithm based on amplitude and size information. Neuroimage 2:253–263.

54

POLINE ET AL.

EU Concerted Action on Functional Imaging (rapp. Frackowiak, R. S. J.) 1995. An European activation study on verbal fluency: Results from a multicenter PET experiment. 17th International Symposium on Cerebral Blood Flow and Metabolism, Cologne, Germany, 2–6 July. S52. EU Concerted Action on Functional Imaging (rapp. Poline, J-B.). 1995. Analysing an European PET Activation Study: Lessons from a Multi Centre Experiment. Second International Meeting on Quantification of Brain Functions, Oxford, England, 7–9 July. Neuroimage 2(2):S74. Evans, A. C., Collins, D. L., and Holmes, C. J. 1995. Automatic 3D Regional MRI Segmentation and Statistical Probability Maps.’’ Second international Meeting on Quantification of Brain Functions, Oxford, England, 7–9 July. Neuroimage 2(2):S29. Fox, P. T., and Lancaster, J. L. 1994. Neuroscience on the net. Science 226:994–996. Friston, K. J., Frith, C. D., Liddle, P. F., and Frackowiak, R. S. J. 1991. Comparing functional (PET) images: The assessment of significant change. J. Cereb. Blood Flow Metab. 11:690–699. Friston, K. J., Frith, C. D., Liddle, P. F., and Frackowiak, R. S. J. 1993. Functional connectivity: The principal component analysis of large PET data sets. J. Cereb. Blood Flow Metab. 13:5–14. Friston, K. J., Worsley, K. J., Frackowiak, R. S. J., Mazziotta, J. C., and Evans, A. C. 1994a. Assessing the significance of focal activations using their spatial extent. Human Brain Mapping 1:214–220. Friston, K. J. 1994b. Functional and effective connectivity in neuroimaging: A synthesis. Human Brain Mapping 2:56–78. Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J-B., Frith, C. D., and Frackowiak, R. S. J. 1995. Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapping 2:189–210. Friston, K. J., Ashburner, J., Frith, C. D., Poline, J-B., Heather, J. D., and Frackowiak, R. S. J. 1996. Spatial registration and normalization of images. Human Brain Mapping, in press. Friston, K. J., Poline, J-B., Strother, S., Holmes, A. P., Frith, C. D., and Frackowiak, R. S. J. 1996. A Multivariate analysis of PET activation studies. Human Brain Mapping, in press. Howard, D., Patterson, K., Wise, R., Brown, W. D., Friston, K. J., and Weiller, C. 1992. The Cortical localisation of the lexicon: Positron emission tomography evidences. Brain 115:1769–1782. Petersen, S. E., Fox, P. T., Snyder, A. Z., and Raichle, M. E. 1990.

Activation of the extrastriate and frontal cortical areas by visual words and word like stimuli. Science 249:1941–1944. Poline, J-B., Worsley, K. J., Holmes, A. P., Frackowiak, R. S. J., and Friston, K. J. 1995. Estimating smoothness in statistical parametric maps: A variability of p values. J. Comp. Assist. Tomogr. 19(5):788–796. Price, C. J., Wise, R. J. S., Watson, J. D. G., Patterson, K., Howard, D., and Frackowiak, R. S. J. 1994. Brain activity during reading: The effect of exposure duration and task. Brain 117:1255–1269. Senda, M., Kanno, I., Yonekura, Y., Fujita, H., Ishii, K., Lyshkow, H., Miura, S., Oda, K., Sadato, N., and Toyama, H. 1993. Comparison of three anatomical standardisation methods regarding foci localisation and its between subject variation in the sensorimotor activation. In Quantification of Brain Functions: Tracer Kinetics and Image Analysis in Brain PET (K. Uemura, N. A. Lassen, T. Jones, and I. Kanno, Eds.), pp. 439–445. Excerpta Medica, Amsterdam. Snodgrass, J. G., and Vanderwart, M. 1980. A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity and visual complexity. J. Exp. Psychol. Human Learning Memory 2:174–215. Spinks, T. J., Jones, T., Bailey, D. L., Towsend, D. W., Grootoonk, S., Bloomfield, P. M., Gilardi, M-C., Casey, M. E., Sipe, B., and Reed, J. 1992. Physical performances of a positron tomograph for brain imaging with retractable septa. Phys. Med. Biol. 8:1637–1655. Strother, S. C., Kanno, I., and Rottenberg, D. A. 1995. Principal component analysis, variance partioning and ‘Functional Connectivity.’ J. Cereb. Blood Flow Metab. 15:353–360. Talairach, J., and Tournoux, P. 1988. Co-planar Stereotxic Atlas of the Human Brain. Stuttgart, Georg Thieme Verlag. Watson, J. D. G., Coltheart, M., O’Keefe, G. J., O’Sullivan, B. T., Egan, G. F., Tochon-Danguy, H. J., Barrett, N. A., Large, M., Bierlangieri, S. U., and Miekle, S. R. 1995. 17th International Symposium on Cerebral Blood Flow and Metabolism, Cologne, Germany, 2–6 July. S51. Watson, J. D. G., Myers, R., Frackowiak, R. S. J., Hajnal, J. V., Woods, R. P., Mazziotta, J. C., Shipp, S., and Zeki, S. 1993. Area V5 of the human brain: Evidence from a combined study using positron emision tomography and magnetic resonance imaging. Cereb. Cortex 3:79–94. Worsley, K. J., Evans, A. C., Marrett, S., and Neelin, P. 1992. A three-dimensional statistical analysis for rCBF activation studies in human brain. J. Cereb. Blood Flow Metab. 12:900–918.

Reproducibility of PET Activation Studies: Lessons from ...

relax and avoid movements. The cognitive task was the silent generation of verbs related to nouns presented binaurally via earphones. Subjects were asked not ...

673KB Sizes 1 Downloads 133 Views

Recommend Documents

Reproducibility of Quantitative Cerebral T2 ...
(CSF) map was calculated. The ADC (in 10-6 mm2/s) and FA (in %) maps were calculated using the diffusion software available on the MRI scanner. The T2- ...

PET - Directorate-of-Education-Kavaratti-PET-Posts.pdf
PET - Directorate-of-Education-Kavaratti-PET-Posts.pdf. PET - Directorate-of-Education-Kavaratti-PET-Posts.pdf. Open. Extract. Open with. Sign In. Main menu.

510 SA-PM Reproducibility of Small World Metrics from ...
Centre Kempenhaeghe, Heeze, Netherlands/3Centre for functional MRI of the ... metrics derived from tractography data and (ii) to asses the reproducibility of ...

510 SA-PM Reproducibility of Small World Metrics from ...
http://www.meetingassistant2.com/OHBM2009/planner/abstract_popup.php? ... Parker, G.J. (2003), 'A framework for a streamline-based probabilistic index of ...

The effect and reproducibility of different clinical DTI ...
Available online 11 March 2010 ... scheme. These findings should be considered when comparing results across studies using different gradient schemes ...... Scientific Meeting of the International Society for Magnetic Resonance in Medicine.

Imaging Brain Activation Streams from Optical Flow ...
is illustrated by simulations and analysis of brain image sequences from a ball-catching paradigm. ..... and its implementation in the. BrainStorm software [2].

Reproducibility of tractography and small world metrics
Systems, Best, the Netherlands). Each subject was ... Node degree K and cluster coefficient C increased significantly (p

Reproducibility of tractography and small world metrics
1Department of Radiology, Maastricht University Medical Centre, Maastricht, ... from tractography data and (ii) to asses the test-retest reproducibility of small ...

Tract Specific Reproducibility of Tractography Based ...
Apr 2, 2012 - MRI system (Philips Achieva, maximum gradient strength. 40 mT/m .... Native space tract segmentations were mapped to the common space by ...

Impact of RSBY on enrolled householdS lessons from Gujarat.pdf ...
Retrying... Impact of RSBY on enrolled householdS lessons from Gujarat.pdf. Impact of RSBY on enrolled householdS lessons from Gujarat.pdf. Open. Extract.

In Search of Usable Security: Five Lessons from ... - Semantic Scholar
Such a model ties digital security to physical security. At PARC, for example, users must pre- sent their badge to a system administrator before that ad- ministrator will unlock the enrollment room. Usability studies demonstrate that this approach is

Spatiotemporal Activation of Lumbosacral Motoneurons ...
1Center for Neuroscience, University of Alberta, Edmonton, Alberta T6G 2S2, Canada; and 2Department of ... from these digitized data and displayed on a computer screen as three- ...... The locus of the center of MN activity (open circles) was.