Coordinate-free sensorimotor processing: computing ...

Viewer
Transcript

Neural Networks PERGAMON

Neural Networks 11 (1998) 1417–1428

1998 Special Issue

Coordinate-free sensorimotor processing: computing with population codes Pietro G. Morasso a,*, Vittorio Sanguineti b, Francesco Frisone a, Luca Perico a a

Department of Informatics, Systems and Telecommunications, University of Genova, Genova, Italy b Department of Physiology, Northwestern University Medical School, Chicago, USA Received 13 November 1997; revised 4 May 1998; accepted 4 May 1998

Abstract The purpose of the study is to outline a computational architecture for the intelligent processing of sensorimotor patterns. The focus is on the nature of the internal representations of the outside world which are necessary for planning and other goal-oriented functions. A model of cortical map dynamics and self-organization is proposed that integrates a number of concepts and methods partly explored in the field. The novelty and the biological plausibility is related to the global architecture which allows one to deal with sensorimotor patterns in a coordinate-free way, using population codes as distributed internal representations of external variables and the coupled dynamics of cortical maps as a general tool of trajectory formation. The basic computational features of the model are demonstrated in the case of articulatory speech synthesis and some of the metric properties are evaluated by means of simple simulation studies. 䉷 1998 Elsevier Science Ltd. All rights reserved. Keywords: Population code; Cortical map; Cortical dynamics; Field computing; Self-organization; Speech; Topology representing network; Hebbian learning

1. Introduction A fundamental feature of sensorimotor processing in biological or robotic organisms is its ecological nature, i.e. the fact that the relevant dynamics applies to the whole ensemble ‘‘organism þ environment’’ and the latter is a full partner, not a mere passive ‘‘slave’’ of the former. There is no doubt that the implications of this concept of circularity have not been explored to their full extent, although the main idea has been around for some time (since the pioneering work of J. Piaget and J.J. Gibson), shifting from one research field to another: cognitive psychology, cognitive neuroscience, robotics, neural networks, artificial life, etc. In AL, for example, the attention is focused on very simple organisms which are able to exhibit some form of intelligent behavior, without any explicit form of internal intelligence, a situation that has been described also as pre-rational intelligence (Cruse, 1996). Such simple organisms have ‘‘simple’’ sensory and motor organs and the related sensorimotor processing can be reduced to a rather straightforward * Requests for reprints should be sent to Dr P.G. Morasso, University of Genova, DIST, via Opera Pia 13, I-16145 Genova, Italy. Tel.: +39-103532749; fax: +39-10-3532154; e-mail: [email protected]

0893–6080/98/$19.00 䉷 1998 Elsevier Science Ltd. All rights reserved. PII: S0 89 3 -6 0 80 ( 98 ) 00 0 65 - 3

(although tunable) analog circuitry, which directly transforms sensory signals into motor commands. The richness of the observed behavior is mainly a consequence of the complexity and, in a sense, creativity of the non-linear, dissipative, non-equilibrium dynamics of the environment; thus, a small amount of adaptability is sufficient for the organism to tailor its behavior to the essential constraints. However, adaptability does not necessarily imply intelligence. With the exception of ‘‘hard-wired’’ tropistic organisms, most existing organisms must be adaptable (in order to survive) but not all of them can be considered ‘‘intelligent’’, whatever the specific definition we use for such an elusive concept (Fig. 1). For the scope of the paper, we limit ourselves to the domain of sensorimotor processing and we argue that in such context a thing called intelligence is a necessity for organisms which have the burden to manage the complex sensory and motor organs required for complex tasks, such as manipulation and phonation. ‘‘Complex organs’’ of this kind would be useless for an insect and, in general, for an organism that could only rely on reflexive processing modules, although adaptable. What is needed, in general, is the ability to build internal representations of the external world, which allow the organism to anticipate, plan, and

1418

P.G. Morasso et al. / Neural Networks 11 (1998) 1417–1428

Fig. 1. The ecological nature of sensorimotor processing.

imagine sensorimotor patterns (whether real or only plausible) in order to free itself from the ‘‘tyranny’’ of control automatisms. We propose such ability as an operational definition of sensorimotor intelligence, for an organism that can take advantage of complex sensory and motor organs when carrying out complex sensorimotor tasks. Many possible computational architectures may be conceived that fit the requirement, but it may be considered that the nature of the phylogenetic process favors solutions which evolve in an incremental, although non-linear way from older ones. In the evolution of the nervous system from a series of ganglia (invertebrates) to a central nervous system differentiated into spinal cord and brain (vertebrates), the emergence of the cerebral cortex is clearly a turning point. In fact, it makes available a new piece of neuronal hardware, which not only is massively integrated with afferent and efferent flows (a strong link with real reality) but, at the same time, has an endogenous dynamics which gives it the power of planning and problem solving on realistic but not necessarily real sensorimotor patterns (i.e. a suitable degree of virtual reality). Substantial advances in the study of sensorimotor cortical areas have been achieved since the pioneering work in the 1950s and 1960s by Mountcastle, Hubel, Wiesel, Evarts, and others, thus gaining an understanding of the cortex as a continuously adapting system, shaped by competitive and cooperative interactions. However, the greatest part of the effort has been devoted to the investigation of the receptive-field properties of the cortical maps, whereas relatively little attention has been devoted to the role of lateral connections and the cortical dynamic processes that are determined by the patterns of recurrent excitation (Amari, 1977; Kohonen, 1982; Grajski et al., 1990; Reggia et al., 1992; Martinetz et al., 1994; Sirosh et al., 1996; Morasso et al., 1996). The paper gives a contribution in this direction by investigating a computational model for the cortical processing of high-dimensional sensorimotor variables. It is based on topologically organized cortical maps, which support the formation of internal representations of the variables in a coordinate-free way by means of population codes. We show that the same mechanism of cortical dynamics, which induces the emergence of the population code from the topology of connections, is also able to carry out taskrelevant computations with population codes by exploiting the coupled dynamics of the different cortical areas. The model is demonstrated in the field of speech motor

Fig. 2. Cortical representation of environmental variables.

control, with the very limited goal of showing the feasibility of the computational mechanism and its ability to deal with complex patterns in a general way. Admittedly, this is a qualitative test and the main purpose is to outline a new way of looking at cortical computation, without attempting for lack of space a quantitative comparison with alternative models on specific aspects of the theory. The model develops previous concepts on field computing (Morasso et al., 1997a) and speech production (Sanguineti et al., 1997; 1998) and investigates in more detail the metric aspects of the map dynamics.

2. Cortical dynamics and population codes In our model, a cortical map is characterized for its ability to store internal, distributed representations C of a generic environmental vector x. The representation is constructed according to the scheme of Fig. 2:1 •

•

x 僆 X 傺 Rn is mapped onto a large set of ‘‘filters’’ F ¼ {fk }1, N each of which is characterized by a particular response function fk (x). We call F a thalamic representation and we map it onto the cortical map, as an external input, via a set of convergent, unidirectional connections W ki. C emerges from the dynamic interaction between the thalamo-cortical inputs (via W ki, giving rise to the receptive field properties of the cortical units) and the corticocortical inputs (via the intra-connections C ij, which express the topological structure of the map): x ⇒ F ⇒ C ⇔ C. C is given by a population code {V i} 1,M (a pattern of activity clustered around a winning neuron) that implicitly expresses the posterior probability distribution of x given the available sensorimotor measurements. In principle, this scheme can be applied to maps that

1 For simplicity, Fig. 2 does not include the cross-connections among cortical maps that are an essential part of the theory.

P.G. Morasso et al. / Neural Networks 11 (1998) 1417–1428

represent the ‘‘distal’’ space of targets and/or obstacles and maps related to the ‘‘proximal space’’ of body configurations and/or motor commands. Our theory suggests that the relations among such maps should be bi-directional, in order to carry out a variety of task-related computational operations. For example, in some cases we might wish to make a prediction of the distal sensory patterns which are likely to arise from an internally generated motor command pattern; in another, we might be interested in planning the time course of the motor commands which is required for reaching a distal target while satisfying proximal constraints and avoiding distal obstacles. In both cases, the different interacting maps are requested to deal with patterns that are motor and sensory at the same time. In this section we simply introduced some general ideas about cortical maps and population codes that can be applied to a whole family of possible models. The next session addresses some topics of biological plausibility for this class of models and then we present our proposal of a multiple map architecture. 2.1. Biological plausibility of cortical map models As regards the biological plausibility of cortical map models, the somatotopic or ecotopic layout of many cortical areas has long suggested a kind of topologic organization, associated with a dimensionality reduction of the representational space, thus motivating the development of a (large) family of self-organizing network models. From its beginning, however, the effort has been affected by a number of misconceptions, partly due to the overemphasis on the receptive field properties of cortical neurons. Only recently, a new understanding of the cortex is emerging as a dynamical system, which focuses the attention on the competitive and cooperative effects of lateral connections (Sirosh et al., 1996). It has been shown that cortico-cortical organization is not static but changes with ontogenetic development together with patterns of thalamocortical connections (Katz et al., 1992). Shortly, it has been suggested that cortical areas can be seen as a massively interconnected set of elementary processing elements, which constitute a computational map (Knudsen et al., 1987). From the modeling point of view, the most common misconceptions about cortical functionality can be reduced to the following three items: • • •

flatness of cortical maps (related to the locality of lateral connections); fixed lateral connections (versus plastic thalamo-cortical connections, which determine receptive-field properties); Mexican-hat function of lateral interactions (it implies a significant amount of recurrent inhibition for the formation of localized responses by lateral feedback).

The flatness assumption that characterizes the classic map models (Amari, 1977; Kohonen, 1982) is contradicted by the fact that the structure of lateral connections is not

1419

genetically determined but depends mostly on electrical activity during development. More precisely, the connections have been observed to grow exuberantly after birth and reach their full extent within a short period; during the subsequent development, a pruning process takes place so that the mature cortex is characterized by a well defined pattern of connectivity, which includes a large amount of non-local connections: this rules out all the models limited to a purely 2-D circuitry. Moreover, the superficial connections to non-neighboring columns are organized into characteristic patterns: a collateral of a pyramidal axon typically travels a characteristic lateral distance without giving off terminal branches and then it produces tightly packed terminal clusters (possibly repeating the process several times over a total distance of several millimeters). Such characteristic distance is not a universal cortical parameter and is not distributed in a purely random fashion but is different in different cortical areas (Gilbert et al., 1979; Schwark et al., 1989; Calvin, 1995). Thus, the development of lateral connections depends on the cortical activity caused by the external inflow, in such a way to capture and represent the (hidden) correlation in the input channels. Each individual lateral connection is ‘‘weak’’ enough to go virtually unnoticed while mapping the receptive fields of cortical neurons but the total effect on the overall dynamics of cortical maps can be substantial, as is revealed by cross-correlation studies (Singer, 1995). Lateral connections from superficial pyramids tend to be recurrent (and excitatory) because 80% of synapses are with other pyramids and only 20% with inhibitory interneurons, most of them acting within columns (Nicoll et al., 1993). Recurrent excitation is likely to be the underlying mechanism which produces the synchronized firing which has been observed in distant columns. The existence (and preponderance) of massive recurrent excitation in the cortex is in contrast with what could be expected, at least in primary sensory areas, considering the ubiquitous presence of peristimulus competition (or ‘‘Mexican-hat pattern’’) which has been observed in many pathways as the primary somatosensory cortex and has been confirmed by direct excitation of cortical areas as well as correlation studies; in other words, in the cortex there is a significantly larger amount of long-range inhibition than expected from the density of inhibitory synapses. In general, ‘‘recurrent competition’’ has been assumed to be the same as ‘‘recurrent inhibition’’, for providing an antagonistic organization that sharpens responsiveness to an area smaller than would be predicted from the anatomical funneling of inputs. Thus, an intriguing question is which manner of long-range competition can arise without long-range inhibition and a possible solution is the mechanism of gating inhibition based on a competitive distribution of activation, proposed by Reggia et al. (1992) and further investigated by Morasso et al. (1996). 2.2. The proposed model: interacting cortical maps The following model of cortical dynamics is proposed

1420

P.G. Morasso et al. / Neural Networks 11 (1998) 1417–1428

where, for simplicity, we lump the generic ith cortical column into a single processing element, characterized by an ext activity level V i and two kinds of inputs (hlat i and hi ): dVi ext ¼ g(t)[ ¹ gi Vi þ hlat (1) i þ hi ] dt The equation simply says that V i evolves under the action of three competing influences: 1. a self-inhibition (weighted by the parameter g i ⬎ 0); 2. a net input hlat i coming from the set of lateral connections inside the same cortical map; coming from thalamo-cortical 3. a net external input hext i connections (or cortico-cortical connections from other cortical maps). The first term is consistent with the already mentioned intra-columnar nature of inhibitory synapses; we can also say that it gives the column the character of a ‘‘leaky integrator’’. The second term is a recurrent input, intended to express the massive lateral excitatory connections: X Vj Cij X hlat i ¼ j Vk k

where the sum is extended to the set of columns laterally connected to the given element and the connection weights C ij are positive and symmetric. This term includes an element of gating inhibition, because the activity level of each neuron is normalized (as in Reggia’s model) according to the average activity of its immediate neighbors. The symmetry of connections implies that the map, as a kind of continuous Hopfield network, is characterized by pointattractor dynamics. This kind of gating inhibition allows the attractor-pattern (i.e. the population code) to be much sharper than the receptive field, as is clearly apparent in the simulations. The external input, which in the scheme of Fig. 2 is detected by a set of ‘‘filters’’, defines the receptive field properties of the unit. It is a function of the environmental variable x, with a preferred value or receptive field center wi . In the simulations, we simply used a broad Gaussian Gi (x),2 whose covariance matrix identifies the receptive field size and shape. For the dynamics of the cortical map, however, it is essential to add a shunting interaction term (an idea borrowed by Grossberg, 1973) hext i ¼ Gi (x)Vi This contributes, together with gating inhibition, to obtain the following type of transient behavior: the sudden shift of the input variable x (say the selection of a new target) induces first a diffusion process (which initially flattens the population code, spreading the activity pattern over a large part of the network) and then a re-sharpening process 2 With respect to the diagram of Fig. 2, Gi (x) must beP considered as an approximation of the convergent thalamo-cortical input k fk (x).

around the target (which builds up faster and faster as the diffused wave-form reaches the target area). The combination of the two processes is the propagation of the population code toward the new target, following a geodesic in the characteristic manifold of the map, and this kind of behavior is the basic component of the envisaged coordinate-free computational mechanism operating on internal representations of the external world.3 The transient behavior is smooth due to the combined action of diffusion and re-sharpening, even with a timeinvariant gain g. In this case, however, there is no control over the timing which can change unpredictably for the unavoidable fluctuations of the equation parameters. However, this undesirable effect can be counteracted, without changing the nature and simplicity of the model, by using a suitable time-varying gain g(t),4 a concept which has been explored in a variety of contexts: ‘‘GO-signal’’ (Bullock et al., 1989), ‘‘terminal attractor’’ pace-maker (Barhen et al., 1989), ‘‘y-model’’ (Morasso et al., 1993; Morasso et al., 1997b). In this way, the timing behavior is more robust with respect to the fluctuations in the equation parameters and allows the synchronization of concurrent cortical dynamic processes. A biologically plausible implementation of this concept is related to the basal-thalamo-cortical loop and the well established role of the basal ganglia in the initiation and speed-control of voluntary movements. Fig. 3 shows a simulation of a cortical map that illustrates the described computational mechanism. The input environmental variable x is two-dimensional, varying in a circular domain. The map neurons (N ¼ 128) are characterized as follows: (i) the receptive field centers are set according to a regular tessellation of the input domain; (ii) the receptive fields are radially symmetric and their sizes are large (comparable to the size of the input domain); and (iii) the lateral connections are consistent with the Voronoi tessellation of the input domain. The map is initially at rest in the point x ¼ (0:2, ¹ 0:2); the transient is initiated by switching on the pacemaker y(t) and during the simulation the external input x remains fixed at its final value (¹0.2, 0.2). A few words of justification are needed on the choice of Reggia’s model as a reference for our attempt to model some dynamic aspects of the sensorimotor cortex. Although no solid experimental evidence of competitive distribution of activation is available, the weaker and less committing feature of gating inhibition that we use in our model, with the specific goal of inducing the smooth propagation of population codes, is quite attractive from many points of 3 The propagation of the population code cannot be obtained with the original Reggia’s model. In that model the map equation (to be compared ext with P Eq. (1)) is as follows: dVi =dt ¼ ¹ gi Vi þ (M ¹ Vi )[hlat i þ hi ]. P ¼ c V (V (V þ q)= (V þ q) and c , q are map-wide constants. hlat p j p j j i k k i Instead of a propagation, the model yields a combination of a waning peak (in the old location) and a growing peak (in the new location). In our model, the gating and shunting terms are both necessary in order to obtain the propagation effect. 4 The following time-varying gain has been used: g ¼ g(t) ¼ dy=dt=(1 ¹ y) where y(t) is a sigmoid (0 → 1 in Ts). See Morasso et al. (1997b) for details.

P.G. Morasso et al. / Neural Networks 11 (1998) 1417–1428

1421

Fig. 3. Transient behavior of a cortical map model.

view: (i) it is consistent with the widespread presence of recurrent excitation;5 (ii) it gives the right emphasis to the role of lateral connections; (iii) it is supported by converging lines of evidence on the continuous variation of the equilibrium trajectories that underlay coordinated movements; and (iv) it is not contradicted by specific experimental data. 2.3. Training The learning procedure of the cortical model is an 5

The basic dynamic behavior of the network is preserved, even if we include some degree of recurrent inhibition.

extension of the technique proposed by Martinetz et al. (1994) for the TRN model. If we apply such a technique to the network illustrated by the previous simulation, with a training set uniformly distributed in the domain, we obtain a distribution of receptive fields and a pattern of lateral connections quite similar to the one adopted in the simulation. In the TRN model, however, the lateral connections do not influence the network dynamics and there are different learning rules for the thalamo-cortical and the corticocortical connections: the former rule is Hebbian but the latter one uses an explicit ordering procedure which is biologically non-plausible and contradicts the basic

1422

P.G. Morasso et al. / Neural Networks 11 (1998) 1417–1428

requirement of parallelism and locality. In the proposed extension (Frisone et al., 1997), the same Hebbian rule is utilized for adapting both the thalamo-cortical W ij and cortico-cortical C kl connection weights: in the former case the weights are increased proportionally to the correlation between the thalamic input and the cortical activation (X iV j) and, in the latter, to the correlation between the corresponding cortical activities (V kV l). The ordering procedure, which is explicit in the TRN rule, is implicitly performed in our model by cortical dynamics itself: it attributes to the cortical units, at steady state, an activation level V i that is higher the closer the receptive field center wi to the input pattern x. Thus, learning should only be applied at steady state, after the application of a new input pattern. In fact, simulations have shown that this kind of learning strategy yields a tessellation quite similar to the TRN model. As learning proceeds, both groups of connection weights are modulated, thus shifting the receptive field centers and, at the same time, pruning the lateral connection weights which fall under a given threshold.

3. Cortical control of speech movements In this section, we show how the cortical model can be applied to the control of sensorimotor problems by means of the dynamic interaction between cortical maps. As an example, we consider the case of speech motor control.6 The minimal computational architecture that has been implemented consists of (i) an articulatory map, representing the space X of articulatory gestures, i.e. the geometric configurations that the vocal tract can possibly assume during speech movements, and (ii) an acoustic map, storing the acoustic consequences in a formant space Y. In particular, the environmental vectors x and y were defined as follows: •

•

x is 10-dimensional and is based on a geometric characterization of the vocal tract due to Badin et al. (1995) x ¼ [LH (Lip Height), LP (Lip Protrusion), JH (Jaw Height), TB (Tongue Body), TD (Tongue Dorsum), TT (Tongue Tip), TA (Tongue Advance), LY (Larynx), VH (Velum Height), LV (Lips Vertical)] T; y is five-dimensional and stores the first five formants of the vocal tract y ¼ [F 1, F 2, F 3, F 4, F 5] T.An important comment is that this choice of parameterization for the acoustic and motor spaces does not imply that the cortical maps are directly encoded in this way. We chose those parameters for convenience and compatibility with the available training set. However, any other kind of parameterization would be acceptable if it is powerful enough to fit the data. One of the strong points

6 This is only an exercise, although a complex one. There is no space for a detailed comparison with alternative models and we do not claim that it is better in any sense. The point we wish to make is that it is possible to handle high-dimensional sensorimotor patterns by means of a totally distributed architecture that only relies on population codes.

of the cortical map model is that it gives a coordinatefree representation of the environmental variables and so is rather insensitive to the parameterization of input and output. The two maps (which contain 1000 and 500 neurons, respectively) were trained by means of a data set in which the acoustic output of a male French speaker,7 pronouncing VVV and VCV sequences, was synchronized with a cineradiographic acquisition, yielding about 5000 digitized Xray images of the sagittal view of the vocal tract (at the sampling frequency of 50 Hz) (Badin et al., 1995). From these data x and y vectors were extracted8 and then used in a training procedure which adopted the TRN strategy (for learning the receptive field centers and the patterns of intraconnections) extended in such a way so as to learn an analogous set of cross-connections between the two maps.9 The cross-connections implicitly code the functional relationship between the two manifolds and also allow one to map the population code of one map as external input for the other: this induces coupled acoustic-articulatory dynamics that is a general-purpose tool for solving a number of sensorimotor problems in a simple and unified framework. Unfortunately there is no space for exploring all the implications of the model in this context. We simply list a few of them as an illustration of generality of the approach. Fig. 4 shows the histograms of lateral connections in the two maps after learning; they are consistent, for the X manifold, with an intrinsic dimensionality of about 3–4 and, for the Y manifold, with a dimensionality of about 4–5.10 This is consistent with estimates of dimensionality performed with standard statistical methods. Moreover, if we project the acoustic map onto the plane of the first two formants, we recover the classical triangle of vowels (Fig. 5 (top)) which in fact is ‘‘bent’’ if we also consider the third formant (Fig. 5 (bottom)). An important issue in motor control is redundancy. The maps can be used as a visualization tool. The speech articulatory system (which includes tongue, jaw, lips and larynx) mechanically speaking has an infinite number of degrees of freedom. However, in functional terms the real number of 7 The data were made available in the framework of the Esprit Project SPEECH-MAPS, coordinated by ICP-INPG in Grenoble. 8 The TRN algorithm requires a training set of suitable size, particularly for yielding a sufficient approximation of the lateral connectivity. For overcoming the limitation of the available data set we exploited the fact that a Gaussian mixture centered on the learned prototype vectors optimally approximates the underlying probability density function of the data set and we randomly sampled the Gaussians in order to extend the training set to a minimum size (10 times the number of neurons). 9 The connection weight C ij between neuron-i in one map and neuron-j in the other was modified proportionally to the product V iV j with a decay term proportional to C ij. 10 In a regular tessellation, according to the theory of dense sphere packing, the kissing number K is a precise index of the dimensionality of the manifold (e.g. K ¼ 2, 6, 12, 24, 40 for a dimensionality equal to 2, 3, 4, 5, 6, respectively). In a quasi-regular tessellation, as achieved by the TRN algorithm, the same information is approximated by the number of lateral connections.

P.G. Morasso et al. / Neural Networks 11 (1998) 1417–1428

1423

Fig. 4. Histogram of intra-connections for the acoustic map (left) and articulatory map (right).

degrees of freedom or functional articulators is only 4–5 but is greater than the estimated dimensionality of the acoustic manifold. This means that each target phoneme can be produced by a variety of articulations, which define the no-motion-manifold of the phoneme.11 The maps provide a good way of exploring the morphology of such manifolds. For example, if we consider the /u/ phoneme, which is identified by a point in the Y space, the corresponding nomotion-manifold in X can be evaluated by projecting that point onto the articulatory map, via the cross-connections. It turns out that such a manifold is basically two-dimensional, simply by looking at the projections on the different articulatory planes, such as the JH–LH plane (Fig. 6). In the same way, it is possible to describe the effect of articulatory constraints, such as the bite-block, which can be defined by a constant value of the articulatory variable JH (or a constant constraining a combination of several articulatory variables). The functional reduction of the acoustic space that is a consequence of such constraint can be evaluated by identifying the neurons, in the articulatory map, which approximately fit the constraint, and isolating, in the acoustic map, the neurons that are cross-connected with them. We get a direct picture (Fig. 7) of the reduced manifold of phonemes that can be uttered in such constrained condition, to be compared with the larger manifold of Fig. 5. Finally, the computational power of the dual-map model is demonstrated by testing its ability to generate coordinated acoustic-articulatory patterns in VV transitions. Fig. 8 illustrates the case of an /ae/ transition, for which we had experimental data. The initial conditions in the two maps were chosen by centering the two population codes according to the available data vectors and allowing the overall system to stabilize. The phoneme /e/ was then given as new external input at t ¼ 0. It was applied to the neuron in the acoustic Fig. 5. Acoustic map (500 neurons) projected onto the F 1 –F 2 and F 2 –F 3 planes (receptive field centers are visualized as points and lateral connections as segments).

11 Articulatory movements inside the no motion manifold of a given phoneme are called non-audible gestures because they do not affect the acoustic characteristics of the phoneme.

1424

P.G. Morasso et al. / Neural Networks 11 (1998) 1417–1428

Fig. 6. Map of non-audible gestures corresponding to /u/ displayed in the plane JH–LH.

map that was the closest to /e/, with an amplitude modulated in time as shown in the lower-right graph of Fig. 8. (At the same time, the amplitude of the initial /a/ input was decreased to 0 in a symmetric way, while keeping constant the equation gain g.) The two maps started co-evolving in time, as dictated by Eq. (1), under the driving influence of the following cross-coupling term nX o ext hi articulatory map ¼ k Cik Vk acoustic map

where the C-coefficients identify the cross-connections among the two maps. The population code in the acoustic map was attracted by the target phoneme /e/, producing a moving wave of activation that was similar to the one illustrated in Fig. 3, with the difference that it was five-dimensional instead of two-dimensional. At the same time, the population code in the articulatory map was attracted by a moving target, identified by the cross-coupling term above. At the end of the transient, the articulatory map settled in a configuration that implicitly selected, in the no-motionmanifold of /e/, the configuration closest to the initial one. In other words, an effect of the cross-coupling is to establish a correspondence between phonemes and no-motion manifolds and the map dynamics is then a navigation tool that carries out the inverse acoustic-articulatory mapping, without any explicit regularization or optimization procedure. Fig. 8 shows the articulatory-acoustic transitions generated by integrating the equations: they are consistent, from the qualitative and quantitative points of view, with the experimental data mentioned above. In principle, the same model could be applied to generate VCV transitions, but then we would need to expand it by including an additional map that is appropriate to represent consonant-targets in addition to vowel-targets. It is well

known indeed that consonants are badly represented in an acoustic way (e.g. in terms of formants) whereas they can be precisely identified by specifying the location of constrictions in the vocal tract. Therefore, a straightforward extension of the dual-map model to the VCV paradigm requires training a constriction map, together with the acoustic and articulatory maps described above. A VCV transition would be generated by activating the sequence of targets in the two maps (acoustic and constriction, respectively) and allowing the combined dynamics to carry out the overall computation. In fact, the available data set allowed us to estimate the constrictions but unfortunately the size of the set (barely enough for the vowels) was quite insufficient for the consonants.

4. Metric properties In the previous section, the computational power of the proposed cortical map model is demonstrated by showing the qualitative properties of the mechanism in dealing with complex sensorimotor patterns in a coordinate-free way, i.e. operating directly with population codes. At no point of the sensorimotor processing architecture is there a need for the organism to perform an explicit evaluation of coordinate values. In this section we intend to investigate the basic metric properties and the robustness with respect to different kinds of disturbances. 4.1. Static and dynamic precision First, we wish to estimate the degree of accuracy to which it is possible to recover a given environmental variable from

P.G. Morasso et al. / Neural Networks 11 (1998) 1417–1428

1425

sided trajectory near the center of the domain. The map was allowed to settle in each vertex before turning on the next one. The top-left panel of the figure displays four input stimuli (small circles), the estimated trajectory (dotted lines), and the corresponding ideal trajectories (full lines); the bottom-left panel shows the evolution of the errors12 along the curvilinear coordinates, respectively for the four movements; the top-right panel shows the neurons of the map (dots) and the succession of neurons with the highest activity during the transients (stars; circles þ stars identify the four corners); in the bottom-right panel the same information is displayed together with the underlying pattern of lateral connections. The figure shows that, as expected, the dynamic accuracy is significantly smaller than the static accuracy but, in fact, is not too bad. Fig. 9(B) and (C) document the graceful degradation of performance when the mechanism operates near the border of the domain or even beyond (carrying out an extrapolation function). Moreover, in spite of the increasing metric error, remarkably the computational mechanisms can preserve the topological structure of the reconstructed pattern. 4.2. Sensitivity to noise in the thalamo-cortical connections Such noise affects the position of the receptive field center in the input domain. We added a noise uniformly distributed in a circle of radius 0.1 (10% of the domain size) and the simulation result is shown in Fig. 10. The dynamic errors are larger than in the previous case but the static ones are comparable (they are exaggerated by the fact that we used, for simplicity, the same read-out matrix computed for the perfect map). The topologic robustness is also confirmed in this case. Fig. 7. Reduced acoustic map, corresponding to the bite-block experiment

4.3. Sensitivity to noise in the lateral connections

its population code in the best possible conditions. For this purpose, we considered a perfect topological map in which the receptive field centers were pre-set according to a regular tessellation of the input domain and the lateral connections were perfectly consistent with the Delaunay triangulation. A simple read-out mechanism x ¼ f (V) was then used which was based on the least-squares approach. The input domain was sampled using the samples as input stimuli and allowing the map to reach the corresponding equilibrium states. These were the data used for finding the best linear read-out operator at steady state (the Moore–Penrose pseudo-inverse operator). During the transients from an initial state to a target, illustrated for example by the graphs in Fig. 3, we used the same operator for estimating the trajectory implicitly determined by the dynamic-map equations and evaluating how well these trajectories approximate the corresponding geodesic curves or ideal trajectories (straight lines in the simulations). A 2-D, circular input domain, with a radius of 1, was used in the simulations. Fig. 9(A) shows the simulation of a four-

In the previous simulations, the lateral connections are all topologically correct, symmetric and equal. In one set of simulations, we estimated the influence of additive noise on these connections, while keeping the topological correctness. We found that a 10–20% noise level (on top of the nominal value ¼ 1) hardly had any effect on the performance of the network. The effect is more significant, as one might expect, if such ‘‘noise’’ affects the topological structure of the network. In Fig. 11 we show two examples: one can be labeled ‘‘random pruning’’ (top panel: 25% of the connections were randomly destroyed) and the other ‘‘random sprouting’’ (bottom panel: 10% long connections were randomly added). From this experiment and a number of similar ones, we could conclude that the cortical map model is more sensitive to over-pruning than oversprouting. The possible biological implication is intriguing: 12 The error is the magnitude of the vector difference between the points on the ideal trajectory (the straight line joining the initial to the final point) and the points estimated from the population code via the read-out procedure.

1426

P.G. Morasso et al. / Neural Networks 11 (1998) 1417–1428

Fig. 8. Evolution of the articulatory commands (left) and formant variables (right) during a simulated /ae/ transition.

we might expect biological networks to follow the strategy of having some redundancy in the patterns of lateral connections (although it slightly degrades the metric precision) in order to exhibit a greater robustness with respect to the unavoidable destruction due to aging or accidental damages. 4.4. Sensitivity to the receptive field size In most simulations the receptive field size (coded by the standard deviation of Gi (x), the external input functions) was 0.75, i.e. a large fraction of the domain size. Reducing it in half did not have any effect on the performance. In a similar way, using a non-circular covariance had a small effect, provided that the eigenvalue ration was not too large. The reason is probably that the intrinsic dynamics tends in any case to sharpen the population code to a region in the immediate neighborhood of the ‘‘winning’’ neuron. In general, it appears that the cortical model is a rather robust computational mechanism of trajectory formation

that can carry out a number of ‘‘computational missions’’ in a homogeneous way. In particular, it allows the population codes to implicitly store distributed representations of high-dimensional environmental variables and operates on them with spatio-temporal competence. We wish to emphasize that the read-out mechanism of the population code employed in the simulations is simply a probe and there is no need for the robotic/biological organism to ever use it, because the multi-map computational architecture is totally distributed and only operates with population codes. This feature of our model should be contrasted with the same notion of population code in motor control, as originally proposed by Georgopoulos et al. (1983) and later developed in a large set of models inspired by those findings; the models generally assume the presence of a specific read-out mechanism, thus introducing the vexing question about the intrinsic coordinates of the population code. Our model, on the contrary, in agreement with Sanger (1994) is based on the idea that population codes give the cortex the opportunity to manipulate coordinate-free representations

P.G. Morasso et al. / Neural Networks 11 (1998) 1417–1428

1427

Fig. 10. Map with noise in the thalamo-cortical connections (uniformly distributed on a circle with a radius of 0.1).

macroscopic-behavioral level (the Piagetian circular reaction) as well as at the microscopic level (where the Hebbian paradigm applies). Both operational levels imply that what is learnt inside is somehow similar to the world outside. In fact, such similarity or somatotopic/ecotopic organization has always been the target of a line of criticism centered on the homunculus paradox: if the brain operates

Fig. 9. Ideal map with border effects.

of external variables, eliminating even the need to formulate the question.

5. Conclusions The circularity of the organism–environment interaction has two complementary implications: one is related to learning and the other to dynamics. Learning is a process of self-organization at the

Fig. 11. Map with topological noise in the lateral connections: 25% random pruning (up); 10% random addition of long connections (down).

1428

P.G. Morasso et al. / Neural Networks 11 (1998) 1417–1428

by building internal representation of the environment, who is going to manage them? The paradox is essentially an infinite regression of computational layers. However, this only holds for hierarchical models in which there is a sharp separation between the world and the computational mechanism: in such models the sensorimotor data are purified and abstracted as the computation goes up from one level to the next one, ending up with pure symbols and/or coordinates of abstract variables. In a circular-distributed model, on the contrary, sensorimotor patterns interact in a distributed, coordinate-free way by means of their population codes, without any need of extracting (or reading-out) coordinates and/or symbols, but taking advantage of the pattern-formation properties of the overall dynamics. Dynamics also operates as a regularization mechanism, inverting in an implicit way the sensorymotor mappings. Inflow is transformed into the outflow in a continuous way and the intermediate distributed patterns of activation have at the same time sensory and motor nature, while keeping some degree of somatotopic/ecotopic organization. Finally, we wish to observe that the distributed and analogic nature of the proposed internal representations might provide a natural and simple interface between sensorimotor and cognitive processes, without any need of explicit symbolic computations. The role of the cognitive/attentional system is indeed to identify, select, mask, etc. generalized targets/obstacles, thus leaving to the sensorimotor dynamics the task of generating the ouflow of motor commands and triggering the causally related inflow of sensory reafferences. At the same time, the complementary task of the cognitive system is to categorize situations/patterns in the perceptual data, extracting ‘‘symbols’’ out the continuous flow, which guide the further selection of targets/obstacles. In this way, the action–perception cycle is closed, externally, through the dynamics of the real world and, internally, through the categorization/selection interplay between the dynamics of the sensorimotor maps and the underlying cognitive processes.

Acknowledgements The research was supported by the EU project SPEECHMAPS, ISS, CNR, and MURST.

References Amari S. (1977). Dynamics of pattern formation in lateral-inhibition type neural fields. Biological Cybernetics, 27, 77–87. Badin, P., Gabioud, B., Beautemps, D., Lallouache, T., Bailly, G., Maeda, S., Zerling, J. P., & Block, J. (1995). Cineradiography of VCV sequences: articulatory-acoustic data for speech production model. In: International Conference on Acoustics, Trondheim, Norway (pp. 349– 352).

Barhen J., Gulati S., & Zak M. (1989). Neural learning of constrained nonlinear transformations. IEEE Computer, 6, 67–76. Bullock, D., & Grossberg, S. (1989). VITE and FLETE: Neural modules for trajectory formation and postural control. In: W. A. Hershberger (Ed.), Volitional action (pp. 253–297). Amsterdam: North-Holland/Elsevier. Georgopoulos A. P., Caminiti R., Kalaska J. F., & Massey J. T. (1983). Spatial coding of movements: A hypothesis concerning the coding of movement direction by cortical populations. Exper. Brain Research Suppl., 7, 327–336. Calvin, W. (1995). Cortical columns, modules and hebbian cell assemblies. In: M. Arbib (Ed.), The handbook of brain theory and neural networks (pp. 269–272). Cambridge, MA: MIT Press. Cruse, H. (1996). Neural networks as cybernetic systems. Stuttgart: G. Thieme Verlag. Frisone, F., Perico, L., & Morasso, P. (1997). Extending the TRN model in a biologically plausible way. In: W. Gerstner, A. Germond, M. Hasler, & J. D. Nicoud (Eds.), Artificial neural networks, LNCS vol. 1327 (pp. 201–206). Gilbert C. D., & Wiesel T. N. (1979). Morphology and intracortical projections of functionally identified neurons in cat visual cortex. Nature, 280, 120–125. Grajski K. A., & Merzenich M. M. (1990). Hebb-type dynamics is sufficient to account for the inverse magnification rule in cortical somatotopy. Neural Computation, 2, 71–84. Grossberg S. (1973). Contour enhancement, short term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 52, 213–257. Katz L. C., & Callaway E. M. (1992). Development of local circuits in mammalian visual cortex. Annual Review of Neuroscience, 15, 31–56. Kohonen T. (1982). Self-organizing formation of topologically correct feature maps. Biological Cybernetics, 43, 59–69. Knudsen E. I., du Lac S., & Esterly S. (1987). Computational maps in the brain. Ann. Rev. Neuroscience, 10, 41–65. Martinetz T., & Schulten K. (1994). Topology representing networks. Neural Networks, 7, 507–522. Morasso, P., Sanguineti, V., & Tsuji, T. (1993). A dynamical model for the generation of curved trajectories. In: S. Gielen, & B. Kappen (Eds.), Proceedings ICANN’93, London: Springer Verlag (pp. 115–118). Morasso P., & Sanguineti V. (1996). How the brain can discover the existence of external egocentric space. Neurocomputing, 12, 289–310. Morasso P., Sanguineti, & Spada V. G. (1997). A computational theory of targeting movements based on force fields and topology representing networks. Neurocomputing, 15, 411–434. Morasso, P., & Sanguineti, V. (Eds.), (1997a). Self-organization, cortical maps and motor control. Amsterdam: North Holland Elsevier. Nicoll A., & Blakemore C. (1993). Patterns of local connectivity in the neocortex. Neural Computation, 5, 665–680. Reggia J. A., D’Autrechy C. L., Sutton III G. G., & Weinrich M. (1992). A competitive distribution theory of neocortical dynamics. Neural Computation, 4, 287–317. Sanger T. (1994). Theoretical considerations for the analysis of population coding in motor cortex. Neural Computation, 6, 29–37. Sanguineti V., Laboissiere R., & Ostry D. J. (1998). A dynamic niomechanical model for neural control of speech production. Journal of the Acoustical Society of America, 103 (3), 1615–1627. Sanguineti V., Laboissie´re R., & Payan Y. (1997). A control model of human tongue movements in speech. Biological Cybernetics, 77, 11– 22. Schwark H. D., & Jones E. G. (1989). The distribution of intrinsic cortical axons in area 3b of cat primary somatosensory cortex. Experimental Brain Research, 78, 501–513. Singer W. (1995). Development and plasticity of cortical processing architectures. Science, 270, 758–764. Sirosh, J., Mikkulainen, R., & Choe, Y. (1996). Lateral interactions in the cortex. Hypertext book, www.cs.utttexas.edu/users/nn/web-pubs/ htmlbook96.

Parallel Processing, Grid computing & Clusters