Combinatoric Codes in Ventral Medial Temporal Lobes for Objects: Haxby 2001 revisited Stephen J. Hanson†, Toshihiko Matsuka†, James V. Haxby‡ †RUMBA Rutgers University – Newark, ‡Princeton University
Neural Network Classifier
ABSTRACT
Haxby et al (2001) recently made the observation that responses in the ventral temporal lobe during object identification were overlapping and distributed in topography. This observation contrasts with various prevailing views that object codes are focal and localized to specific areas such as the fusiform and parahippocampal gyri. We conclusively test Haxby's hypothesis and rule out the other two logical possibilities (localist codes or unique distributed codes) that were consistent with that analysis. Using a neural network classifier which achieves 84% correct generalization (on unseen scans) performance across all categories/subjects and a voxel-wise sensitivity analysis we show that the response in ventral temporal lobe is combinatorial, that is, substantially the same voxels are contributing to the classification of all visually presented objects. Moreover, there appear to be no spatially local representations contributing to category assignment. The neural network representations (hidden units) of the voxel codes are also shown to be sensitive to each category, and for the first time in these types of neuroimaging data, to a superordinate level feature (animate/inanimate) which was only available implicitly in the object categories.
PROBLEM :
How does the brain encode and represent objects?
1. Local Coding (e.g, Kanwisher et al., 1998) - there could be a relatively local code for object types that are segregated by category type (e.g. “faces”, “places” “body parts”, & etc…). 2. Distributed, but Unique Coding (e.g. Haxby et al., 2001) - there could be a distributed code that is relatively unique to each category type, but completely non-local in its coding properties. 3. Combinatoric Coding (Hanson et al., 2004) - the codes could be combinatorial in the sense that the same voxels/features are reused in an efficient way for object type category codes.
NN can detect either a localized or a distributed code with no initial bias towards either NN offers a more general method in detecting topographic patterns than the correlation method Methods 2-layer feedforward NN with a 10-node hidden layer, trained with scaled conjugate gradient search. Hyperbolic tangent transfer function (TF) for the internal layer and softmax TF (error = cross entropy) for the category nodes. N-1 bootstrap: 88 (11 scans x 8 category) in-sample training and eight (1 scans x 8 category) out-of-sample opportunities. Other settings tested are shown in Table 1 Results Overall we are accounting for a mean of 99.5% in training and mean of 82.5% in transfer (Figure 1) Model selection results (Figure 2), tested transfer at seven different hidden unit values finding the best case to be 10 hidden units, similar to the what the preliminary PCA indicated about the structure of the voxel intensity.
Overview Neural Network Classifier: Nonlinear classifiers were trained to make predictions which visual objects/categories being viewed given particular patterns of voxel activities. Hidden Unit Analysis: Activities of Hidden Units of the trained NN were analyzed to identify some aspects of the underlying representation that supports the classifier Sensitivity Analysis Using the NN classifier, the voxels’ sensitivities to particular objects/categories were identified
Error Metric
Output Func.
Gradient Est.
Input Tr.
Background
SSE, MSE
Logistic
BP
-1,1
REST
ABSOLUTE Logistic
BP w Mom
Min-max
REST
SSE, MSE
Linear
SCG
z-scanwise
NO REST
Crossentropy
SOFTMAX
SCG
means
NO REST
Crossentropy
SOFTMAX
SCG
means z-scanwise
NO REST
Table 1
Figure 1. Performance on generalization test for all categories averaged for all subjects. Overall generalization accuracy is 82.5%
Figure 2. Model selection results indicating that between 9 and 15 hidden units are most appropriate.
Figure 3. Cluster dendrogram, showing the response of the Hidden Units to all scans. Note that each category set is represented in hidden unit space and that there appears to be an “animate/inanimate” distinction learned from examples of object exemplar scans by the NN.
Sensitivity Analysis Methods In order to determine the contribution of each voxel to the overall classification and generalization results, Gaussian noise of sufficient width is added to each voxel, one at a time and the generalization error is recalculated for the NN classifier with the modified input (i.e., perturbed voxel activities). Noise is sampled and added hundreds of times to get a stable estimate of the error contribution. If the error increases significantly, this indicates the voxel is showing a contribution to the classification performance. If, on the other hand, increases in noise to that voxel provides little or no significant change in the classification error, then we will index it as having little contribution in the classification performance. In this way, we effectively assay the voxel's classification contribution by “lesioning” it with the perturbing noise source. Results In Figure 4 a typical subject's sensitivity values are plotted in a mid-level slice for all eight object categories. There are two observations to note: (1) inter-subject variability in terms of object coding is high, notwithstanding that the VMT mask is relatively small, (2) the voxels that are most sensitive across all object types within each subject are practically identical. The overlap for all subjects (calculated within subjects) and between category type voxels (500-600) is 88.4% .
Hidden Node Analysis The trained neural network's hidden unit states can be analyzed to indicate some aspects of the underlying representation that supports the classifier (Hanson & Burr, 1990). It is clear that the network has produced a 10 dimensional embedding of the original exemplars as distinct categories. From left to right the hidden space shows “faces”, “cats”, “houses”.. etc.. and apparently makes at the next level of the dendrogram a distinction between the group “faces” and “cats” and all other categories. This type of “animate/inanimate” distinction is evidence — for the first time — that fMRI signals could encode an implicit semantic distinction based only on learning categories from specific exemplars sampled from those categories. As we see below, this type of distinction is apparently part of a larger code that indexes a specific exemplar, while at the same time coding for the entire category.
Sensitivity Analysis (cont) Additional analysis In order to provide the strongest possible test of the localization of object identity in temporal lobe, highly selective masks were identified for the “fusiform face area” and the “parahippocampal place area” that did not overlap in voxel space. These voxels were then probed for their responses to “face” and to “house”, based on the sensitivities previously computed. Distributions of sensitivities for all voxels and subjects were calculated for the 4 possible cases of voxel mask and object sensitivity. Specifically, we show in Figure 5, FACE sensitivity response given HOUSE voxels, HOUSE sensitivity response given FACE voxels, HOUSE sensitivity response given HOUSE voxels and finally FACE sensitivity response given FACE voxels All distributions are skewed to the left, have roughly the same range, and show the same median response to either object by either the FFA or the PPA voxels. In effect the distributions for each voxel type overlap in their responses, indicating no particular local response to object type.
Appendix: NN classifier configuration -Feedforward functions: hyperbolic tangent and softmax exp j
Ok =
w jk tanh
exp m∈C
i
wij vi
w jm tanh j
wij vi i
where vi is an activity of voxel I, wij is an association weight between hidden unit j and vi, tanh(⋅) is a hyperbolic tangent function shown below, Ok is an activation of output (category) node k, wjm is a weight between hidden unit j and category m, and C indicates the 8 categories. tanh (h ) =
exp (h ) − exp (− h ) exp (h ) + exp (− h )
-Error function: Cross Entropy E=−
N
K
n =1 k =1
Figure 4. Sensitivity analysis in 1 slice containing ventral medial temporal lobe. Red voxel patterns show the voxels that had more than 30% classification error due to noise perturbation at each voxel. (FA: face; HO: house; CA: cat; BO: bottle; SS: scissor; SH: shoe; CH; chair; SC: scrambled image). Figure 5. Distribution of sensitivities of FFA and PPA voxels to either Face of House. Note the lack of right skew in any distribution, indicating no special response of voxels to object type.
t kn ln
Okn t kn
where tkn is a target output at time n. Reference Hanson S. J. & Burr, D. J., (1990), What Connectionist Models Learn: Toward a theory of representation in Connectionist Networks, Behavioral and Brain Sciences, 13, 471-518. Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293, 2425 -2430 Kanwisher, K., McDermott, J. & Chun, M. M. (1997). The fusiform face area: A module in human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17, 4302 – 4311
Acknowledgements: James S. McDonnell Foundation NSF (EIA-0205178)