Connectionist Neuroimaging

Viewer
Transcript

Connectionist Neuroimaging Stephen José Hanson, Michiro Negishi, and Catherine Hanson

1

Psychology Department Rutgers University Newark N.J. USA

Abstract. Connectionist modeling and neuroscience have little common ground or mutual influence. Despite impressive algorithms and analysis within connectionism and neural networks, there has been little influence on neuroscience, which remains primarily an empirical science. This chapter advocates two strategies to increase the interaction between neuroscience and neural networks: (1) focus on emergent properties in neural networks that are apparently “cognitive”, (2) take neuroimaging data seriously and develop neural models of dynamics in the both spatial and temporal dimensions.

1 Introduction In 1990 then President of the USA, George H W Bush, declared the “Decade of the Brain”. This year the “Decade of the Brain” ended (although perhaps George W Bush, the son, will declare yet another decade), it is worth asking what happened during the Decade of the Brain and in particular what was the influence of neural computation on neuroscience. How did neuroscience help define or delineate aspects of neural computation during this last decade? Neural Networks have become mainstream engineering tools (IEEE, 1998) helped stir a resurgence of statistical methods (esp. Bayesian methods). They have been incorporated into a large and diverse application base from medical to automotive and control applications. And Hollywood continues to believe “intelligence” is some property of a neural network. On the other hand, paradoxically, Neural Networks have had little or no effect on the larger mainstream neuroscience community or field. It is clear over the last decade, despite an increasing sophistication and development of neural network algorithms, that little has changed in the neuroscience field with respect to computation or the representational issues concerning neural tissue. Neuroscientists continue to focus on cell level mechanisms and generic properties of system level interaction. Most people blame neural networks for this lack of impact on neuroscience and biological considerations of neural networks. Four reasons are often cited:

1

They are not biologically plausible They do not scale well with large problems They are not new---just statistics They don’t process symbols and humans do

Also at Telcordia Technologies, Piscataway, New Jersey.

S. Wermter et al. (Eds.): Emergent Neural Computational Architectures, LNAI 2036, pp. 560-576, 2001. © Springer-Verlag Berlin Heidelberg 2001

Connectionist Neuroimaging

561

I blame neuroscience. I see three reasons for this lack of connection: For nearly 100 years neuroscience has been essentially an empirical enterprise, one that has not easily embraced common abstract principles underlying common behavioral/physiological observations (“splitters” as opposed to “lumpers”). System Neuroscience, which should have the greatest impact on computational approaches, in general is the most difficult level in which to obtain requisite data to constrain network models or provide common principles due to potential complexity of the multiple Neuron-Body problem. Most serious, is the level of analysis that Neuroscientists tend to cling to: the cellular level (or god help us the molecular level). This focus is notwithstanding the lack of fundamental identification of this anatomically distinct structure as also a unique unit of computation. It can be easily shown that computational regularity at the behavioral level does not force unique implementations at the neural level. We suggest there are two strategies to encourage more connections between neuroscience and neural networks. One: Attempt to show emergent behavior that is more is similar to human COGNITIVE performance: Analyze the network representations to understand the nature of the interactions between learning and representations. Two: take neuroimaging data seriously and model it with neural networks (that embody dynamical systems), for example, as opposed to doing inferential statistics-- Neuroimaging data could be seen as spatio-temporal multivariate data, representing some time dynamical system distributed through a 3-d volume. In the end we shall suggest it is also productive to look for ways to combine cognitively suggestive models with the data rich methods of neuroimaging such as EEG and fMRI.

2 Network Emergent Behavior: The Case of Symbol Learning We argue it is useful to demonstrate Emergent behaviors in Networks that were not programmed, engineered or previously constrained by choice of architecture, learning rule or distributional properties of the data. It is known that Recurrent Neural Networks can induce regular grammars from exposure to valid strings drawn from the grammar. However, it has been claimed that neural networks cannot learn symbols independent of rules (see Pinker). A basic puzzle in the cognitive neurosciences (30) is how simple associationist learning which has been proposed to exist at the cellular and synaptic levels of a brain can be used to construct known properties of cognition which appear to require abstract reference, variable binding, and symbols. The ability of humans to parse sentences and to abstract knowledge from specific examples appears to be inconsistent with local associationist algorithms for knowledge representation (3, 8, 16, 20, 21, 22 but see 11). Part of the puzzle is how neuron-like elements could from simple signal processing properties emulate symbol-like behavior. Properties of symbols include the following (14).

562

S.J. Hanson, M. Negishi, and C. Hanson

a set of arbitrary "physical tokens" scratches on paper, holes on a tape, events in a digital computer, manipulated on the basis of "explicit rules" that are likewise physical tokens and strings of tokens. The rulegoverned symbol-token manipulation is based purely on the shape of the symbol tokens (not their "meaning"), i.e., it is purely syntactic, and consists of "rulefully combining" and recombining symbol tokens. There are primitive atomic symbol tokens and composite symbol-token strings. The entire system and all its parts -- the atomic tokens, the composite tokens, the syntactic manipulations both actual and possible and the rules -- are all "semantically interpretable:" The syntax can be systematically assigned a meaning e.g., as standing for objects, as describing states of affairs). As this definition implies a key element in the acquisition of symbolic structure involves a type of independence from the task the symbols are found in and the vocabulary they represent. Fundamental to this type of independence is the ability of the learning system to factor the generic nature of the task or rules it confronts with from the aspect of the symbols or vocabulary set which are arbitrarily bound to the input description or external referents of the task. In this report we describe a series of experiments with an associationist neural network that creates abstract structure that is context sensitive, hierarchical, and extensible. Hidden Layer

Output Layer

second order connections

...

...

Feedback Layer Input Layer

Fig. 1. The Recurrent Network Architecture used in the simulations. This is a simple neural network learning architecture that possesses a simple memory. All weights are subject to adaptation or learning, there are no fixed structures in the RNN prior or during learning.

Consider the simple problem of learning a grammar from valid, or positive set only sentences consisting of strings of symbols drawn randomly from an infinite population of such valid strings. This sort of learning might very well underlie the acquisition of language in children from exposure to grammatically correct sentences during normal

Connectionist Neuroimaging

563

discourse with their language community2. It is well known now that neural networks3 can induce the structure of the FSM (Finite State Machine; for example see Fig. 2) only from presentation of strings drawn from the FSM (9, 25). In fact, recently (2) it has been shown that the underlying attractors of neural networks have no choice but to be in a one to one correspondence with the states of the state machine from which the strings are sampled. This surprising theorem is the precedence for proposing that a neural network embodying the underlying rules of a state machine in its attractor space could also learn to ‘‘factor’’ the input encoding or external symbols. In the present report, we employ a Recurrent Neural Network (RNN, see Figure 2) with a standard learning algorithm developed by Williams and Zipser, extended to second order connections (10). The network was trained with newly generated sentences until performance of the network met a learning criterion4. Input sentences were limited to 20 symbols long and were constructed from local binary encoding of each symbol.5 All weights in the RNN were candidates for adaptation, no structures were fixed prior to or during learning. Humans are known to gain a memorial advantage from exposure to strings drawn from a FSM over ones that would be constructed randomly (17, 18, 23, 24) as though they are extracting abstract knowledge of the grammar itself from exposure to strings drawn randomly from the FSM. A more stringent test of knowledge of a grammar 2 Although controversial, language acquisition must surely involve the exposure of children to valid sentences in their language. Chomsky (3) and other linguists have stressed the importance of the a prior embodiment of the possible grammars in some form more generic than the exact target grammar. Although not the main point of the present report, it must surely be true that of the distribution of possible grammars, some learning bias must exist that helps guide the acquisition and selection of one grammar over another in the presence of data. What the nature of this learning bias is might be a more profitable avenue of research in language acquisition than the recent polarizations inherent in the nativist/empiricist dichotomy. (5, 16, 20, 22). 3 Neural networks consist of simple analogue computing elements that operate in parallel over an input and output field. Recurrent Neural Networks are networks that have recurrent connections to their intermediate or ‘‘hidden’’ layers. Such recurrent connections implement a local memory of recent input/output and processing states of the network. Feedforward networks have only unidirectional connections and hence no mechanism for examining past inputs.

4That is, after each training sentence, the network was tested with randomly generated 1000 sentences and the training session was completed only when the network yielded below the low-threshold output node activity when sentences could not end and above high-threshold activity when they could end. Thresholds were initialized to 0.20 (high value threshold) and 0.17 (low value threshold) and were adapted using output values during the network was processing test sentences. The high threshold was modified to the minimum value yielded at the end of test sentences minus a margin (0.01) and the low threshold was modified to the high threshold minus another margin (0.02) during the test, and these thresholds were used for the next test and training sentences. 5Each word was represented as an activation value of 1.0 of a unique node in the input layer, while all other node activations were set to 0.0. The task for the network was to predict if the next word was END (in which case the output layer node activation was trained to become 1.0) or not (output should be 0.0). Note that when the FSM is at the end state, a sentence can either end or continue. Therefore at this state, the network is sometimes taught to predict the end a sentence and sometimes not. However the network eventually learns to yield higher output node activation when the sentence can end.

564

S.J. Hanson, M. Negishi, and C. Hanson

would be to expose the subjects to a FSM with one external symbol set and to see if the subjects transfer knowledge to a novel external symbol set. In principle in this type of task, it is impossible for the subjects to use the symbol set as a basis for generalization without noting the patterns that are commensurate with the properties of the FSM6. A version of this type of transfer is shown in Fig. 3. In this task new symbols are assigned randomly to the arcs, such that the external symbols are completely new. This task, which we call the Vocabulary Transfer Task, was used in the first simulation to train recurrent neural networks and to examine their ability to transfer over novel, unknown symbol sets.

A

D

B 2

C

3

E

1

B C

1

A

2

E

F

F 3

D

Fig. 2. The SYMBOL transfer task. The figure shows two finite state machine representations, each of which has 3 states (1, 2, 3) with transitions indicated by arrows and legal transition symbols (A, B, ... , F) for each state. Note that this task involves no possible generalization from the transition symbol. Rather, all that is available are the state configuration geometries. The task explicitly forces the network to process the symbol set independently from the transition rule.

In this task, the network was trained on three regular grammars (the source grammar) which have the same syntactic structure (Fig. 2) defined on three unique sets of words and the effect of these prior training’s to the training of the target grammar was measured as the network was trained with yet another new set of words. One of the indicators of such effect is the savings in terms of number of trials needed to meet the completion criterion. Fig. 3 shows the number of trials for both the source grammar trainings (vocabulary switching = 1, 2, ... , 9 in the figure) and the target grammar training (vocabulary switching = 10), averaged on 20 networks with different initial random weights. The result of vocabulary switching is in the first 9 cycles is a complete accommodation of the new symbol sets with near 100% savings. This accommodation represents the networks ability to create a data structure that is consistent with a number of independent vocabularies. More critically however there was 63% reduction in the number of required trainings for the new unseen vocabulary. This result is remarkable given the required independence of syntax and vocabulary. Apparently the RNN is able to partially factor the transition rules from its constituent symbol assignments after exposure to a diversity of vocabularies. One obvious question that arises is whether the source of the novel transfer is due to a network memory. Our initial studies in this area showed that in fact local memory is 6 Reber (24) showed that humans would significantly transfer in such a task, however his symbol sets allowed subjects to use similarity as a basis for transfer as they were composed of contiguous letters from the alphabet. However, recent reviews of the literature indicate that this type of transfer is common even across modalities (16).

Connectionist Neuroimaging

565

important. We showed in a series of similar tasks that there was no significant savings in learning for feedforward networks that were exposed to rule learning contexts (e.g., ‘‘Penzias Problem’’) with subsequent permutation transfer7. At least from these preliminary studies it would suggest that memory in the network is an important component of the ability of a neural network to transfer its syntactic knowledge.

Fig. 3. The learning savings from subsequent relearning of the symbol transfer task. Each data point represents the average of 20 networks trained to criterion on the same grammar. The relearning cycle shows an immediate transfer to the novel symbol set which continues to improve to near perfect transfer through the ninth cycle (3 cycles of the 3 symbols sets) until the 10th cycle where a completely novel set is used with the same grammar. Over 60% of the original learning on the grammar independent of symbol set is saved.

How is the neural network accomplishing these abstractions? Note that in the vocabulary transfer task the Network as with a human subject has no possible way to transfer based on the external symbol set. It follows that the network must abstract away from the input encoding. In effect the network must find a way to buffer or recode the input in order to defer symbol binding until enough string sequences have been observed. If the network extracted the common syntactic structure, the hidden layer activities would be expected to represent the corresponding FSM states, regardless of vocabularies. This is, in fact, shown by linear discriminant analysis (LDA)8. After the network learned the first vocabulary, activity of the node was shown to be sensitive to FSM states (Fig. 4A). In this figure, different FSM states are 7 Feedforward networks were trained on the Penzias task, a boolean counting task studied previously by Denker et al (4). A permutation task was defined which was similar to the vocabulary transfer task, but the feed-forward network showed only interference effects even with significant increases in the capacity of the network. 8 LDA of the hidden unit states allows for a complete search of linear projections that are optimally consistent with organizations based on state from the FSM or by vocabulary. LDA was applied to the hidden unit activations over 20 Networks to find some a stable result for the preferred encoding of the input space. Evidence from the LDA for state representations would indicate that the RNN found a solution to the multiple vocabularies by referencing them hierarchically within each state based on context sensitivity within each vocabulary cluster.

566

S.J. Hanson, M. Negishi, and C. Hanson

represented by different clusters, while the different symbol sets are plotted with different graphic symbols. Note that these clusters represent attractors for the states in the FSM. Moreover if one starts a trajectory nearby one of the clusters it proceeds to a location nearby the cluster representing the appropriate state transition. Hence this space possesses context sensitivity in that coordinate positions encode both state and trajectory information.

Fig. 4A. Linear Discriminant Analysis of hidden activities of networks that learned a single FSM/symbol set. Note that the different clusters represent different states while the "+" sign codes for the single symbol set.

After each of the three vocabularies were learned in three cycles, LDA of the hidden layer node activities with respect to FSM states (Fig. 4B) was contrasted with that with respect to vocabulary sets (Fig. 4C). The correct rate of discrimination clearly shows that the state space is organized by FSM states, since FSM states could be correctly classified by the former linear discriminants with the accuracy of 80% (SD=16, n=20) whereas vocabulary set could be classified correctly for only 45% (SD=9.7, n=20). Notice in both figures 4B and 4C relative to 4A that the symbol sets have spread out and occupy more of the hidden unit space with significant gaps between clusters of the same and different symbol sets. Moreover from Fig. 4B, one can also see that the vocabularies are hierarchically organized into states corresponding to the FSM. This hierarchical structure provides a super-structure for the accommodation of the already learned vocabularies and any new ones the RNN is asked to learn. It also can be seen from Fig. 4C that the hidden layer activations are also sensitive to , but not linearly separable by, vocabularies. LDA after the test vocabulary is learned once also shows that the network state is predominantly organized by FSM states (Fig. 5), although the linear separation by FSM states of a small fraction of activities are compromised. This interference by the new vocabulary is not surprising considering that old vocabularies were not re-learned after the new vocabulary was learned. What is more interesting is the spatial location

Connectionist Neuroimaging

567

Fig. 4B. Linear discriminant analysis of the hidden units of the RNN that have learned three independent FSMs with three different symbol sets using state as the discriminant variable. Notice how the hidden unit space spreads out compared to Fig. 4A. Notice further that the space is organized by clusters corresponding to states which are internally differentiated by symbol sets (represented by different graphic symbols: +=ABC, triangle=DEF, square=GHI).

Fig. 4C. Linear Discriminant Analysis of hidden unit activities from networks trained on three independent FSM and three different symbol sets. The LDA used the symbol set as the discriminant. Note the symbol sets are coded by 1 of 3 graphic codes same as in Fig. 4A and 5B. In this case note that the discriminant function based on symbol set produces no simple spatial classification as in Fig. 4B which shows the same activations classified by the state of the FSM.

of the new vocabulary ("stars"). The hidden unit activity, again, clearly shows that state discriminant structure is dominant and organizes the symbol sets, the fourth vocabulary or unseen symbol set that the networks are exposed to, simply finds empty spots in the hidden units space to code it location relative to the existing state structure, hence indicating the strength of the present abstraction to encourage

568

S.J. Hanson, M. Negishi, and C. Hanson

observance of the hierarchical state representation and its existing context sensitivity. In effect the network bootstraps from existing nearby vocabularies that would allow generalization to the same FSM that all symbol sets are using.

Fig. 5. Linear Discriminant analysis of the Hidden state space after training on 3 independent symbol sets for 3 cycles and then transfer to a new untrained symbol set (coded by stars). Note how the new symbol set slides in the gaps between symbol sets previously learned. Presumably this provides initial context sensitivity for the new symbol set creating the 60% savings.

It has been argued that neural networks are incapable of representing, processing, and generalizing symbolic information (3, 16, 20, 21, 22). Pinker, for one, argues there must be some distinction drawn between what the brain can do with "mere statistical information" and the sorts of symbol processing that must be required to understand that a "whale is not a fish or Tina Turner is a grandmother, overriding our statistical information about what fish or grandmothers look like" (20). The other alternative, as demonstrated by the present experiments is that neural networks that incorporate associative mechanisms can be sensitive to both the statistical substrate of the world and create data structures that have the property of following a deterministic rule and once learned can be used to override even large amounts of statistical evidence (e.g. from another FSM). Quite conveniently as demonstrated, these data structures can arise even when that rule has only been expressed implicitly by examples and learned by mere exposure to regularities in the data. The next section focuses on Neuroimaging techniques and discusses some of the problems and some of the promise. This will help develop the idea of combining RNN for extraction of FSM properties of real-time cognition.

Connectionist Neuroimaging

569

3 Taking Neuroimaging Seriously: New Tools for Neuroimaging Is Neuroimaging Just the 21st Century Phrenology? Neuroimaging is an important new technology for he analysis and representation of cognitive processes and the neural tissue that supports them. Nonetheless, these technologies run the danger of becoming the new Phrenology. Unfortunately, when it comes to studying cognitive processes, data analytic techniques commonly employed in neuroimaging limit and distort the hypotheses researchers can consider. For example, several statistical problems exist within the assumptions of these analysis:

Independence of voxels in space and time; clearly there is dependence structure in both time and space. Gaussian assumption unlikely to hold; hence stat tests will be inefficient and miss structure in low S/N environments. Contrastive testing is subject to linearity and minimal components assumptions Modularity metaphor --looking for local focal areas of computation--face and teapot areas! Statistical thresholds that are chosen are enormously high, implying that either the underlying distribution is non-guassian or that locality of signal is preferred ( 10 −5 ,

10 −20 )! We have looked at a number of these assumptions including recently showing that BOLD susceptibility in brain is non-gaussian (Hanson & Bly, (32)). Recently we have looked specifically at the notion that the fMRI time series has a specific temporal dependence (Murthy, Lange, Bly & Hanson (33)). We used two kinds of simple sensory tasks. The first task was a finger tapping task in a box car paradigm with 4 seconds of tapping and 4 seconds of no finger tapping. A GE 1.5T scanner was used to collect the data from a single subject. We also used an auditory task where a single tone was presented again to a single subject also in a boxcar presentation. Autoregressive time series models were used to model every voxel in the brain over time steps and the goodness of fit was collected. All goodness of fit above a criterion value (>95%) was re-projected back into the brain slice without further thresholding. In the figure 6 below we show two slices, one showing a standard SPM99 analysis of the boxcar for the finger tapping task (leftmost graphic) and the AR analysis showing the projection of an AR (3) model (rightmost graphic) which required no reference to the contrastive box car design, rather it created sensitivity in the brain slice due only to the temporal structure of the fMRI signal itself. We have also used the time delay coefficients derived from the AR analysis which would indicate the sensitivity of the indicated tissue to the time dependence. Recently we have also shown that the time series is generally not stationary especially in areas that seem “activated”. In these cases and perhaps more generally it would be appropriate to consider ARIMA style models which can explicitly model nonstationarity. We note in passing that a more general ARIMA model is in fact a Recurrent Neural Network which as we have shown previously has the useful property of extracting a unknown FSM from a time series.

570

S.J. Hanson, M. Negishi, and C. Hanson

SPM99

AR(3)

Fig. 6. Brain slices showing the “active” areas of the brain during a finger tapping task. Rightmost graphic shows a standard SPM99 analysis, while the leftmost graphic shows the AR model.

We have also used the time delay coefficients derived from the AR analysis which would indicate the sensitivity of the indicated tissue to the time dependence. Recently we have also shown that the time series is generally not stationary especially in areas that seem “activated”. In these cases and perhaps more generally it would be appropriate to consider ARIMA style models which can explicitly model nonstationarity. We note in passing that a more general ARIMA model is in fact a Recurrent Neural Network which as we have shown previously has the useful property of extracting a unknown FSM from a time series.

4 Dynamics of Cognition: The Case of Event Perception and Signal Fusion (EEG, fMRI) 4.1 Event Perception: Perceiving and Encoding Events Day to day experience is characterized, remembered, and communicated as a series of events. We think about driving to work, we remember having an argument with our spouse, and we tell a friend about our plans to attend the theatre next Saturday. Abbreviated phrases such as driving to work act as a type of shorthand notation for describing complex action sequences. Thus, our ability to communicate successfully with others using such labels as driving to work reflects a certain level of familiarity with the referenced activities that we share or presume to share with our intended audience. How common is our knowledge about common events? Empirical work suggests that there is considerable consensus concerning the constituent actions of familiar events. For example, Bower, Black, and Turner (1) asked subjects to describe the typical actions involved in going to a restaurant, attending a lecture, getting up, grocery shopping, and visiting a doctor. They found that subjects showed

Connectionist Neuroimaging

571

considerable agreement about the composition of common events, many responses being offered by more than 70% of their subjects and very few being unique. Familiarity with events may provide the basis for understanding and encoding new information. In a second experiment of their 1979 study, Bower et al. (1) asked subjects to parse prose stories centered on events such as visiting a doctor into smaller "parts." They found that subjects tended to choose similar points in the story as constituent boundaries. Agreement about event boundaries extends to online measures of parsing as well (e.g., Newtson (19); Hanson & Hirst (12)). For example, Hanson & Hirst (12) asked subjects to indicate the boundaries of events while viewing videotapes of common activities (e.g., playing a game) under various orientation instructions and found that subjects had little difficulty agreeing about the boundaries of such events. 4.2 Recurrent Nets and Schemata Neisser’s Perceptual Cycle Neisser has suggested that perception is a cyclical activity in which (1) memory in the form of schemata guides the exploration of the environment, (2) exploration yields samples of available information, and (3) data collected from the exploration process modifies the prevailing schema. According to Neisser: “The schema assures the continuity of perception over time in two different ways. Because schemata are anticipations, they are the medium by which the past affects the future; information already acquired determines what will be picked up next. (This is the underlying mechanism of memory, though that term is best restricted to cases in which time and a change of situation intervene between the formation of the schema and its use.) In addition, however, some schemata are temporal in their very nature. “One can anticipate temporal patterns as well as spatial ones. (p. 22-23).” By focusing on the interaction of perception and memory, Neisser's "perceptual cycle" model offers a particularly fertile context for studying the processing of event information. However, because this is a processing model rather than a model of knowledge representation little emphasis is placed on the structure of schematized knowledge. Thus, it is not clear how turning ignition might be related to driving home or even what role the decomposition of events might play in generating the expectations purportedly used to guide sampling of available information. Germane to this issue is another that arises in relation to the proposed modification process. How does the prevailing schema change in response to the sampling process? In particular, what is the basis for the similarity between the ongoing situation and the schemata that are subsequently activated? These questions lead to computational considerations in how one might implement a system which can represent schemata, their similarity and their dynamic properties in the presence of the ongoing stimulus situation. We discuss two possibilities, the first which has a classical status in the event-perception literature. This hypothesis concerning event structure, was first introduced by Schank & Ableson (28) and by Minsky, and is referred to by either "Scripts" or "Frames". This hypothesis basically asserts that the world can be clustered into simple categories that predict and organize the stimulus situations according to their goals and expectancies.

572

S.J. Hanson, M. Negishi, and C. Hanson

The second approach is a competing account which we will introduce here for the first time in the context of a Connectionist hypothesis concerning temporal processing of information. Temporal control in connectionist networks was introduced by Jordan (15, also see 26,27), as well as Elman (6,7) who first discussed the relationship between introducing recurrence into connectionist networks and temporal processing. Both Jordan and Elman were interested in the relationship between psychological constraints arising from phenomena involving recognition or production of serial order. Jordan’s models focused on Lashley’s general challenge to associationism which requires the addition of a memory to the connectionist network allowing context sensitive behavioral sequences. Similarly, Elman introduced a memory to the standard connectionist network in order to recognize simple grammars. 4.3 Categories and Temporal Control In the present paper we introduce and focus on another important property of Recurrent connectionist networks, that of the evolution and dynamic control over the adjunct memory and the construction of recognition categories ("schemata") from the stimulus situation. The difference in our focus to the Jordan-Elman focus underscores our interest in the interaction between perception and memory. Our experiments and simulations our meant to elucidate the types and nature of interactions between a "memory" in the neural network and its dependence on evolving schemata (hidden layer) and present features of the situation. There is actually a closer relation of the present work to the general framework of Arbib who has used the term Schema as we use it here in this constructive sense as well as Hanson & Burr (11), who have emphasized the importance of representational and learning interactions in connectionist networks. 22 subjects asked to provide judgments of event change while watching a video tape of actors engaging in everyday events (eating in a restaurant, driving a car, working in an office.). One videotape showed two people playing a game of Monopoly and the other showed a woman in a restaurant drinking coffee and reads a newspaper. Subjects watched the videotapes under various orientations and pressed a response button whenever they believed a new event was beginning. In the present study, we used responses made when subjects had been oriented toward "small" events while viewing the tapes. This orientation produced the greatest number of perceived event boundaries. There was high agreement between the 22 subjects on event boundaries and we chose cases where there was at least 75% agreement on the location of the eventsecond for an event boundary which produced 15 event-boundaries for averaging. Both EEG and fMRI were recorded simultaneously from a single subject making event boundaries judgments in the Restaurant tape, which was scripted each second with an actor who was drinking coffee and reading a newspaper in a restaurant and had been used previously in the behavioral study. ERP and fMRI were averaged before and after the 15 event boundaries and are time aligned (fMRI was sampled every 4 seconds and ERPs were taken every 1MS and then down-sampled for every second) and shown below 20 seconds before and 8 seconds after an event change. The ERP shows positive active areas in visual areas and temporal lobe prior to event change evolving to a large negative wave starting from prefrontal areas and moving back towards temporal and visual areas. Simultaneous with EEG, we recorded from

Connectionist Neuroimaging

573

the same subject fMRI, this is also shown in figure 7, again time synchronized with the ERP averages. In the fMRI case, an arbitrary baseline was created from the first 4 scans as an average and then contrasted against the other 41 scans to produce a standard t-map. These t-maps were then treated as the ERPs and averaged within a window before and after an event change. Prior to an event change there is a large amount of distributed activity (these are at t values where p<.1 --low threshold case) early on, which “thins out” prior to an event change and seems also to lateralize towards the right hemisphere after an event change boundary. Areas that were significantly active in what we termed the low threshold case (t values where p < 0.1) produced 10-12 areas which are shown in figure 8. These areas include Dorsal Lateral prefrontal cortex, anterior cingulate, precuneous and temporal lobe areas similar to what was seen in the ERP coherence maps. Also shown in figure 8 is a “meta analysis” of labels taken from the literature where these areas have been implicated in different tasks. Shown are at least 3 labels that have been used to characterize these areas from different studies or tasks in the neuroimaging literature. EVENT BOUNDARY

Fig. 7. ERP and fMRI showing concurrently collected and averaged activity 20 seconds before and 8 seconds after 15 judged event changes.

It is clear from the putative functions of these areas that it would be possible to characterize the event boundary judgments as a dynamical system that includes attentional , spatial orienting and detection functions (anterior cingulate, anterior parietal), which includes areas that have been implicated in so-called “attentional networks” (e.g. Posner). Such an event detection system would also necessarily seem to include planning for schema processing and some sort of short term buffer for comparison to well known schema that would be involved in memory and or language comprehension (temporal lobe). Finally, in terms of comparison and schema processing, functions such as imagery and spatial modeling (parietal lobule, precuneus) would be critical to such a dynamic schema detection circuit. The most highly significant areas as indexed by the t-value was Anterior Cingulate and Dorsal Lateral Prefrontal Cortex. The subject also responded with a thumb

574

S.J. Hanson, M. Negishi, and C. Hanson

movement to indicate a behavioral response to an event change which was most likely associated with SMA and primary motor cortex. This potential circuit would not be apparent from the normative neuroimaging method which stresses single area functions rather than interactivity, distributed processing and systemic function. Key to the proposal in this chapter is the concept that neuroimaging data is subject to dynamical systems analysis and in particular the type of sequential structure inherent in simple FSM as discussed earlier in this chapter. Without this kind of view, neuroimaging data are far too limited to reveal the likely complexity of commerce between brain areas. Neural networks have considerable power and generalization scope that could be useful in analyzing and modeling neuroimaging time series, in particular we end on a proposal to fuse time rich signals (such as EEG) and space rich signals (such as fMRI) in order to create spatio-temporal signals that commensurate and sensitive to cognitive interactivity and system level brain function.

'HFLVLRQV )URQWDODUHDV

PDLQWDLQ DWWHQWLRQDOIRFXV

'RUVDO/DWHUDO

3DULHWDOOREXOH

3UHIURQWDO

SUHFXQHXV

3ODQQLQJ 6KRUWWHUP

$QWHULRU

,PDJHU\

FLQJXODWH

EXIIHU

$WWHQWLRQ

6SDWLDO

³'HWHFWLRQ´

PRGHOLQJ

FKRLFH

7HPSRUDO/REH

WKDODPXV $QWHULRUSDULHWDO

$WWHQWLRQ6SRWOLJKW /RQJWHUP

VKXQWLQJWRZDUGV

6WRUDJH

IURQWDODUHDV

/DQJXDJH

$WWHQWLRQ 2ULHQWLQJLQ 6SDFH VWLPXOXV GLVHQJDJHPHQW

3ULPDU\0RWRU $UHD /HIW2FFLSLWDO 0RWRUSODQQLQJ

/REH

,QLWLDWLRQ PRGXODWLRQRI 6XSSOHPHQWDU\ 9LVXDO3DWWHUQ UHFRJQLWLRQ

PYW

0RWRU $UHD

HQFRGLQJ

Fig. 8. SPM99 analysis of the event perception task. Using low threshold values.

Connectionist Neuroimaging

575

4.4 Fusing EEG and fMRI for “Real Time” Cognitive Measurement Because event perception tasks are productive in a cognitive sense and evokes lots of COGNITION at once: Perception, memory, sequence, grammar etc. Processes, system level and real-time processing are informative and potentially revealing. We claim it is important to understand the SYSTEM of brain areas that support COGNITIVE function as and interactive, distributed, time, dynamic structured neural network. Specifically it is well known that the reconstruction problem using ERPs is intractable due to the lack of constraints on the position and the number of dipoles giving rise to scalp voltage potentials (ill-posed). We would constrain the ERP inverse equations with fMRI number and locations in a low threshold approach as described earlier. Next one would solve the equations and iterate at that sampling rate of the fMRI image acquisition. Use the now solved for dipoles for initiation of the location estimation procedures from before (for example, in the kernel Gaussian method, the mean might be placed at the now identified dipole) and reestimate the locations of neural activity. Now use these new estimates to reseed the ERP inverse and iterate. This kind of approach would allow the location stable measures in FMREI to be interpolated with the ERP estimators between fMRI sample points. In effect the fMRI would be augmented with a millisecond estimation of position between every sampled image. Further imposing of temporal regularizer would ensure that the ERP-fMRI estimator would remain smooth and stable between fMRI samples. The successful application of this new method would constitute the first demonstration of real time brain imaging.

5 Some Conclusions: CONNECTIONIST Neuroimaging We need to look for emergent properties of networks that might guide the measurement for neuroscience. Similar to the Grammar transfer task, the kinds of computations found were *METRIC* and similarity based. Neuroimaging Data can help constrain our modeling and provide us insights in to the complex spatio-temporal dynamical system of the brain.

References 1. 2. 3. 4. 5. 6. 7.

Bower, G.H., Black, J.B., & Turner, T.J. (1979). Scripts in memory for text. Cognitive Psychology, 11, 177-220. Casey, M, The dynamics of discrete-time computation with applications to recurrent neural networks and finite state machine extraction, Neural Computation, 8, 6 pp. 11351178, (1996). Chomsky. N, Syntactic structures. Mouton, The Hague (1957). Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, J. Hopfield, Automatic learning, rule extraction and generalization, Complex Systems, 1 (5), 877-922 (1987). Elman, J.L., E. Bates, M. Johson, A. Karmiloff-Smith, D. Parisi, K. Plunkett, Rethinking Innatenes (MIT Cambridge, 1996). Elman, J.L., Finding Structures in Time, Cognitive Science, 14 (1990). Elman, J.L. (1988). Finding structure in time. CRL Technical Report8801. Center for Research in Language, UCSD.

576 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33.

S.J. Hanson, M. Negishi, and C. Hanson Fodor, Z. Pylyshn, Connectionism and Cognitive Architecture: A critical analysis. In Pinker & Mehler (Eds.), Connections and Symbols, (MIT Cambridge, 1988). Giles, B. G. Horne, T. Lin, Learning a class of large finite state machines with a recurrent neural network, Neural Networks, 8, (9), pp. 1359-1365 (1995). Giles, L., Miller, C. B., Chen D. Chen, H. H. Sun G. Z., Lee, Y.C. (1992). Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks, Neural Computation, (In Press) Hanson S. & D. Burr What connectionist model learn: learning and representation in connectionist models, Behavioral Brain Models, 13, (3), 471 (1990). Hanson, C. & Hirst, W. (1989). On the representation of events: A study of orientation, recall, and recognition, Journal of Experimental Psychology: General, 118, pp. 124-150. Hanson, S.J. & Burr, D. J. (1990). What Connectionist Models Learn: Learning and Representation in Connectionist Networks. Behavioral and Brain Sciences, 13, 3 pp. 477518. Harnad, S. The Symbol Grounding Problem , Physica D 42: 335-346, (1990). Jordan, Serial Order: A parallel distributed processing approach, ICS Technical Report (UCSD, 1986). Marcus G., S. Viyayan, P. Bandi Rao, M. Vishton, Science. 283, (1999). Medin, D.L., & Schaffer, M.M. (1978). Context theory of classification learning. Psychol. Review, 85, 207-238. Miller, G.A. and M. Stein, Grammarama I: Preliminary studies and analysis of protocols. Technical Report No. CS-2, Cambridge: Harvard University, CCS (1963). Newtson, D. (1973). Attribution and the unit of perception of ongoing behavior. Journal of Personality and Social Psychology, 28, 28-38. Pinker, S. Enhanced: Out of the Minds of Babes, Science, 283, (1), 40-41, (1999). Pinker, S. How the Mind works. (W.W. Norton & Co. 1997). Pinker, S. The Language Instinct (Morrow & Co. 1994). Reber, A. Implicit learning of artificial grammars. Journal of Verbal Learning and Verbal Behavior, 6, 855-863 (1967). Reber, A. Transfer of syntactic structure in synthetic languages. Journal of Experimental Psychology, 81, 115-119, (1969). Redington, J and N. Chater, Transfer in Artificial Grammar Learning: A reevaluation, J. Exp. Psych: General, 125: 2, pp. 123-138. (1996). Rumelhart, D., G. Hinton, and R. J. Williams, Learning representations by backpropagating errors, Nature, 323, 9, (1986). Rumelhart, D., Hinton, G. & Williams, R. (1986). Learning internal representations by error propagation. In D.E. Rumelhart D. and J.L. McClelland (Eds.) Parallel Distributed Processing I: Foundations. Cambridge, Mass: MIT Press. Schank, R.C. (1982).Dynamic Memory: A theory of reminding and learning in computers and people. Cambridge: Cambridge University Press Servan-Schreiber D., Cleeremans, A. & McClelland, J. (1988). Encoding sequential structure in simple recurrent networks. CMU Technical Report CS-88-183. Special Issue on Cognitive Neuroscience, Science, 275, 1580-1608 (1997). Watrous, R. & Kuhn G. (1992). Induction of Finite -State Languages Using Second -Order Recurrent Networks, Neural Computation, Hanson & Bly, 2000, The distribution of BOLD susceptibility in the Brain is NonGaussian Murthy, Bly & Hanson, 1999, Identification of the fMRI Signal, Cognitive Neuroscience Society.

Connectionist Neuroimaging

respect to computation or the representational issues concerning neural tissue. ... neural networks cannot learn symbols independent of rules (see Pinker). A basic puzzle in the ..... Agreement about event boundaries extends to online.

Download PDF

377KB Sizes 1 Downloads 254 Views

Report

Connectionist Neuroimaging

Recommend Documents