Comparison of absolute and relative learning strategies ...

Viewer
Transcript

CONFIDENTIAL. Limited circulation. For review only.

1

Comparison of absolute and relative learning strategies to encode sensorimotor transformations in tool-use Raphael Braud, Alexandre Pitti and Philippe Gaussier

Abstract—We explore different strategies to overcome the problem of sensorimotor transformation that babies face during development, especially in the case of tool-use. From a developmental perspective, we investigate the computational costs and benefits to adopt a model based on absolute coordinate frames of reference, another one based on relative coordinate frames of reference, but also an other model-free strategy. We show that in situation of sensorimotor learning in two computer simulations of 1 and 4 degrees of freedom robots, the three strategies can perform the task although there is a dimensional cost, in terms of neural representation, different for each of them. In situation of sensorimotor transformation such as during tool-use, this computational cost is no more the same for the three strategies for which the relative coordinate strategy is the most rapid and robust to readapt the neural code. We discuss this computational issue with respect to current developmental observations and to a proposal for a neural architecture. Index Terms—Relative Encoding - Sensorimotor Transformations - Tool-Use - Neural Networks

I. I NTRODUCTION In order to use a tool, a well known mechanism is the adaptation of the body schema [1]. The role of the body schema is to encode our body in the brain, and the question is still open about how it is done. Many works have been done about how a body schema can be encoded in a robot (see [2] for a review), and how it can be adapted for tool-use. While according to [1] body schema adaptation to a tool appears as a fast and easily triggered mechanism, using a tool on purpose, possibly through a detour to grasp it, has proven as a more complex skill to develop. In Fagard et. al. experiments, children have to use a rake-like tool to reach toys presented out of reach. Their results indicate that it is only between 16 and 20 months that the infants suddenly start to intentionally try to bring the toy closer with the tool, suggesting that a true understanding of the use of the tool has not been fully acquired before that age [3]. The questions we focus on are : (i) why the adaptation to a tool could be very fast [1], and (ii) why it is very long to use it on purpose [3]. Question (i) deals with what is learned, i.e. whether we adapt the model of the arm, or we learn a model of the tool, see [4]. Question (ii) focuses on how this learning could be used. Obviously, this learning is not immediately available, but requires some additional skills or knowledge. In particular, we hypothesize that such skills include: R. Braud, A. Pitti, P. Gaussier are in the Laboratory ETIS, UMR CNRS 8051, the University of Cergy-Pontoise, ENSEA, France. E-mail: [email protected]

Absolute strategy learning

Relative strategy learning

x

x

Visual coordinates Motor coordinates

Fig. 1.

x

x

x x

x x

Absolute versus relative sensorimotor encoding

- A slight adaptation of the body schema when using a tool, instead of a full re-learning of the arm’s internal model. - A learning of the relevant sensory context in which this adaptation occurs [5]. - An ability to trigger, on demand, a previously encoded context, e.g., ”having the tool in the hand”, to achieve a task, e.g., ”hitting an out-of-reach target”. The motor equivalence problem states that a redundant arm has an infinite number of ways to reach a given target. More generally, it arises for any effector system with a higher dimensionality than the target specification : a defined target can then be reached using multiple motor trajectories. We propose that, during a reaching task, this feature also applies to tool use. More precisely, perceiving a tool can be seen as a new possibility, among others, to reach a target. To tackle this ill-posed problem, there are two main strategies for sensorimotor encoding between the motor coordinates θ and the end-effector coordinates, that can achieve a reaching task, see [6] for more details. First, one can learn a mapping between those coordinates. An effector system can then reach a given target XB by activating the motor coordinates θB associated with the desired end-effector coordinates, see fig. 1. θB = f (XB )

(1)

In case of a continuous sequence of end-effector coordinates (forming an ”8” for example) it can generate a continuous sequence of motor configurations. Throughout this paper we will call this strategy the absolute strategy (AS), because an absolute mapping is learned. Second, one can learn a mapping between each spatial direction of the end-effector, and the change in joint angles that causes movement in the spatial direction. Typical approaches

Preprint submitted to 5th International Conference on Development and Learning and on Epigenetic Robotics. Received March 24, 2015.

CONFIDENTIAL. Limited circulation. For review only.

2

involve a direct model, through the Jacobian matrix J: X˙ = J(θ)θ˙

(2)

Throughout this paper we will call this strategy the relative strategy (RS). An effector system can then reach a given target not immediately like in the absolute strategy, but through the direction from the end-effector XA to the target XB , which will then be associated to the learned change in joint angles. θ˙B = f (XB − XA )

(3)

Such model is usually estimated through dedicated inverse learning algorithms. Typical methods compute a generalized inverse of the Jacobian matrix [7] [8][9] to determine a particular solution among the alternative means available to perform a defined task. θ˙ = J + (θ)X˙

Fig. 2. Experimental real setup with the katana arm, a camera, a tool and a ball for the target. In white overlay, the simulation setup with 4-DOF in 3D space

(4)

The main advantage of this relative strategy is that it does not directly learn the target sensor configuration but the whole diversity of means available to the system to reach it. In a second phase, this learning can be reused by any action selection mechanism as a repertoire of valid paths. Then we find in this strategy a clear separation between what is learned (i), and how this learning could be used (ii). In this paper, we investigate different strategies to encode sensorimotor informations. In the first section, we introduce the reaching task the different strategies will have to achieve. In the same section we also introduce a simple reinforcement model able to perform the task but without any sensorimotor learning. In the two following sections, the absolute and the relative strategies are detailed, with the experiments on the reaching task.

II. G ENERAL METHODS AND EXPERIMENTS We present in next sections the absolute and relative models (AS and RS), and the related experiments. In addition, both models are also compared with a very simple reinforcement strategy model which is not learning, but only adapting to any situations. It gives us a lower reference other models will have to surpass, and we could imagine that such a mechanism could be an alternative always present in a robot in case of failure of other models. In this reinforcement model, based on [10], a noise activate a ˙ an at discrete time random motor activation configuration θ(t) step t. (Here and throughout this paper, vectors are represented with bold upper case symbols). And this motor configuration is learned depending on a reward signal R. The reward signal is a function of the vision coordinates, given by XH (t) for the end-effector, i.e. the hand coordinates and by XT (t) for the target coordinates. In our case, the reward is the diminution of the distance between the end-effector and the target. For the simple reinforcement strategy model we then have the following equations :

˙ θ(t)

=

N X H( Wi .e(t) + noise) = {−1, 0, 1} (5)

e(t)

=

1

(6)

dW

=

˙ − 1).e(t − 1).∆R(t) λ.θ(t

(7)

∆R(t)

=

R(t) − R(t − 1)

(8)

i=1

R(t)

=

T

H

|X (t − 1) − X (t − 1)| −|XT (t) − XH (t)|

(9)

The output motor activation θ˙ is given by a classical computation of the learned weights W , the inputs e and an additive uniform noise noise ∈ [0, 1], with the Heaviside function H. The weights are adapted with a learning rate λ. Two series of experiments are done in this paper. First part, experiments are done with a one degree of freedom (DOF) arm with an end-effector moving along a one dimension visual perceptive field. We implement the three models just described, and visualize how each models learns through motor babbling, characterize their specificity and reaction when adding a tool, and their ability to act through a reaching task. Second, we perform the same experiments in 3D cartesian space and with a 4-DOF arm. All the experiments are performed in simulation in order to perform them in parallel with the three models, for a better comparison. For the 4-DOF effector, we experiment each models with the real katana arm we have in our lab, see fig. 2 for the reaching task experimental setup. III. L EARNING AN absolute STRATEGY A. Model In this model, a sensor configuration X (the vision in our case) is categorized, and associated to a motor pattern θ (a proprioception pattern in our case). After learning, such an association allows this model to activate the corresponding motor pattern when a desired sensor configuration is given in input. The algorithm is composed of (1) a categorization layer of the sensor space, (2) a filtering layer that selects the winning

Preprint submitted to 5th International Conference on Development and Learning and on Epigenetic Robotics. Received March 24, 2015.

CONFIDENTIAL. Limited circulation. For review only.

LMS

VisuoMotor Associations SAW WTA One to one link (unconditional link) One to all link (conditional link)

Fig. 3.

category and (3) a learning algorithm that associates a motor output to the elicited sensor values. ART neural networks can learn sensorimotor categories and deal with contextual changes or noisy inputs. We then use an ART-like neural network, inspiration from the Adaptive Resonance Theory [11] called Selective Adaptative Winner (SAW) to categorized our sensors (step (1)). The visual input X of N dimension, is encoded on the weights of the input links of recruited neurons. The computation of the activity (eq. 10) and learning (eq. 11) for those neurons is given by the following equations:

∆wkj SAW

=

1−

1 X |wkj − ej | N j

= SAW (ej − wkj ) 1 if Ak > vigilence = 0 else

(10) (11) (12)

where Ak is the activity of the k th neuron of the output layer, the learning rate and wkj is the weight of the connection between an output neuron k and the j th neuron of the input layer whose activity is ej . The vigilence is a recruitment threshold. The choice of the vigilance threshold determines the granularity of the states in the proprioceptive space. This categorization step is followed by a Winner Takes All (WTA) neural network for the filtering layer (step (2)). The resulting winner then learn the desired output, i.e. the joint configuration θ, through a learning algorithm based on the Least Mean Square algorithm (LMS, step (3)). The output activity is given by the following equation : Oi

=

X

SAW, it is not adapted, and the LMS learned the related joint configuration in one shot (LM S = SAW = 1). This absolute strategy algorithm can be seen as a homoeostatic system. After the learning of its visual categories, even in presence of noise or changes, any conflict in its visual inputs will force the system to stabilize its output to an equilibrium point.Such a mechanism is here used only with visual inputs, but those categories could for example be build as visio-motor attractors with both vision and proprioceptive inputs, see [12], [13]. B. Experiments

Visuomotor associative model

Ak

3

Wik .ek

k LM S

∆Wik

=

LM S

= SAW

.(θi − Oi ).ek

(13) (14) (15)

where Oi is the output of the LMS, ek is the input coming from the WTA. The learning rule adapt the weights Wik in order to match the output with the desired unconditional stimulus θi . For the sake of simplicity, here we only use one shot learning, then once a category is recruited by the

We will use this property to perform reaching experiments, with a 1-DOF arm in a 1D visual space, then with a 4-DOF in 3D cartesian space. In the first experiment, we then simulate a one degree of freedom arm, with a joint position given by θ, and the visual input x of the position of the end-effector of the arm. In this simulation we bounded x so that for any motor activation we have 0 ≤ x ≤ 1. In this range, we modelized the following rule : x = 0.5 θ if θ ≤ 0.5, and x = 1.5 θ if θ > 0.5. To see how this Absolute Strategy (AS) encode information, we observe how neurons encode the visual and joint informations together. In fig. 4 a) we show during a babbling phase the real position of the simulated visual input and also the position encoded by the AS model. In fig. 4 c), we show the total number of recruited neurons through time. In AS (in blue), neurons are recruited each time the visual input are far enough from the position encoded by every other neurons (see eq. 12). Since our babbling was going back and forth from 0 to 1, in AS they are immediately recruited once the visual input reach 1 for the first time. Categorization neurons encode the position the sensor had when they were recruited. The Winner Takes All network, following the categorization layer (SAW) explains why, in this implementation of the AS, after recruitment the encoded positions are varying step by step. However, they can be as little as we wish, if we increase the vigilance threshold in the SAW, that is if we recruit more neurons. IV. L EARNING A relative STRATEGY A. Model The third algorithm we present in this paper –, the one we propose,– is the Dynamic Sensory-Motor (DSM) model based on a relative strategy (RS). DSM model is composed of two parts. The first part is the sensory-motor law encoder (SLE) that attempts to encode a sensor variation as a motor activity (according to a sensor categorization). It then follows a co-variation principle. Doing so, it predicts sensory variations based on motor activity. For the SLE, the rationale is that instead of approximating a function between sensors and motors, we attempt to predict the relative sensor variations with respect to the current motor activity. The SLE receives as inputs a sensor velocity to predict, here positions in 3D space X˙ j , a motor activity, here a

Preprint submitted to 5th International Conference on Development and Learning and on Epigenetic Robotics. Received March 24, 2015.

CONFIDENTIAL. Limited circulation. For review only.

4

a)

Fig. 5. Each figures represents the predicted movement of the sensor in absciss and the sensor in ordinate, in case of an arbitrary input motor at 1 (otherwise the prediction would be 0). We show the evolution after each recruitment. The red bars show where a new neuron is recruited, and the black bars show where the other neurons were recruited in previous steps.

Error

Learn

LMS

b) Recruit

Sensorimotor Law Encoder SAW WTA

(SLE)

One to one link (unconditional link) One to all link (conditional link)

c) Fig. 4. Encoding of the sensor position (the real one is in dash black) for both for the AS model (in blue) and the RS model (in green). a) Represents the encoded positions of the sensor by the AS model. b) Represents the encoded positions of the sensor by the RS model. c) Represent the number of recruited neurons for both models.

˙ and a joint configuration to contextualize joint velocity θ, the learned rules. It predicts as output the vision velocity X˙ jP redicted . The categorization layer and the filtering layer of the absolute strategy model are strictly the same as for the SLE model with a SAW and WTA algorithms. The same equations are used, except for eq. (12) : the recruitment of a new neuron of the SAW (and then the learning of the LMS) depends not on the input space, through a vigilance threshold on the activity Ai of the SAW, but only of an external signal, a threshold on the prediction error.

Fig. 6. To predict, a SLE encode sensory information θ in a categorization layer. Then it takes the winner using a Winner Takes All (WTA) neural network. Finally, it multiply the recognized binarized category by a scalar corresponding to a motor activity θ˙i . Then, using a Least Mean Square (LMS), the SLE compute the prediction X˙ jP redicted through the law associated to this categorization, and modulated by the motor activity.

Yet, the main difference comes from two points. First, the multiplication stage between a joint velocity θ˙i and the winning category of joint configuration θ. Second, it do not learn a pattern but of a velocity, i.e. X˙ j , which is then the output of the LMS. There in one predictive SLE per sensor dimensions, and as many categorization layers (step (1) and (2)) as joints. So with N this dimension, and j ∈ N , each SLE computes the

Preprint submitted to 5th International Conference on Development and Learning and on Epigenetic Robotics. Received March 24, 2015.

CONFIDENTIAL. Limited circulation. For review only.

5

following equation : θ˙i (t) x˙ pred j

=

X

Wjk .ek .θ˙

k LM S

=

xpred (t) j

=

xj (t − 1) + x˙ pred j

LM S

= =

SAW

1 0

|xpred (t) j

if else

arg max(∆Rpred (t + 1))

θ˙i∗ (t) (17) (18)

pred

˙ pred Xpred H (t + 1) = XH (t) + XH (t)

(21)

The learning consists in controlling the categorization of the input space θ by recruiting new receptive fields in the SAW layer, every time a prediction Xpred H (t + 1) fails or is below a given threshold value. Otherwise, the LMS updates its weights to map the sensorimotor law h; see Fig. 6. In situation of failure (e.g., during sensorimotor changes), the algorithm does not modify what it has already learnt as for the ART but instead augments its repertoire with the new sensorimotor law to optimize h, by recruiting a new categorization neuron in the SAW. Our RS model akin to Bullock et. al. (see [6], [14], [15]). The main difference lies in the fact that we propose to learn only the direct model through the SLE, which is able to predict sensors variations, and then instead of learning the inverse model ([16]), we will use the direct model for acting, via our Dynamic Sensori-Motor (DSM) mechanism. So our model will just learn predictive sensorimotor laws of covariation between sensors and motors, and the predictive mechanism will then be used for action ([17]) through a mechanism independent from the learning. Furthermore, in opposition to Bullock et. al., we do not have one categorization layer for all sensor variations, but we have one per sensor. Furthermore, the categorization layer recruitment is here uniquely triggered through prediction errors. The main specificity of our approach, lying in the fact that DSM don’t learn any inverse model, then need an other mechanism in order to act. This mechanism is based on action simulation. It uses an actor-critic paradigm with no learning and a critic based not on real actions but on simulated ones. Like in the reinforcement strategy model, a reward signal R(t) is computed, following the same equation as in the reinforcement model (see eq.5). The equation rely on XH and XT at time t and t − 1. But instead of using ∆R(t) after a performed action, DSM use the predictive SLE to compute a predicted reward ∆Rpred (t + 1). Then, in order to act, for each joint θi our model follow those equations :

= {−1, 0, 1} pred

(23)

(t + 1)

=

R

(t + 1) − R(t)

(24)

Rpred (t + 1)

=

|XT (t) − XH (t)| −

(25)

∆R

|XT (t + 1) −

(19) − xj (t)| > vigilence (20)

where ek is computed through the same equations of the SAW and the LMS. ˙ pred (t) and Taking all together, they predict the vector X H doing so the DSM model can compute

(22)

θ˙i∗ (t)

(16)

.(x˙ − Oj ).ek

∆Wjk

=

=

Xpred H (t

+ 1)| (26)

|XT (t) − XH (t)| −

(27) ˙ |XT (t + 1) − f (θ(t), θi (t))| (28)

It assumes XT (t + 1) = XT (t), and compute XH (t + 1) thanks to eq.(21). Therefore, there is no adaptation or learning of the sensorimotor laws specific to any new given task. If the function R giving the reward changes, the SLE can compute it on the fly without any new learning. Then, for a given reward function, at each iteration a new result is computed and the model launch a motor activity since the task is reached. B. Experiments As for the AS strategy, we redo the experiments with a 1-DOF arm, this time with the RS strategy. In fig. 4 b) we show during a babbling phase the real position of the simulated visual input and also the position encoded by the RS model. In fig. 4 c), we show the total number of recruited neurons through time. In the DSM model, neurons are recruited each time the prediction error is beyond a given threshold. But since each neuron codes for a linear law between a motor activity and a sensor variation, the total number of recruited neurons do not depend on the error threshold but only on the non linearity of the law to learn. Each neuron encodes a sensorimotor law, which will be applied until an other neuron of the SAW wins. In our example the law between X and θ is linear between [0, 0.5], nonlinear at θ = 0.5 and then linear again between [0.5, 1]. It is also nonlinear at 0 and 1 since 0 ≤ X ≤ 1. So theoretically, since there is three different linear laws, at least three neurons must be recruited. In our experiment, we can observe that the first neuron is recruited when X = 0.1. The prediction was X pred = 0 since every neurons in the SAW were at 0 (no recruitment before babbling). The error between X and X pred is beyond the threshold (in our example at 0.1). The SAW then recruits a new neurone which code for a linear ˙ This law is then generalized while the law between θ˙ and X. same neuron of the SAW wins after the WTA layer. This single neuron is then enough to encode the law while the babbling is performed between θ = 0 and θ = 0.5 (where the same linear law is actually the same). Thus, an error is done after θ = 0.5 (in our example when θ = 0.65) because the previous law gives a prediction (in our example when θ = 0.55) beyond the threshold. It then recruits a other neuron to encode the new law. By the same mechanism, it encodes a new law when the sensor reach 1 because it cannot move anymore. But neuron of the SAW wins depending on the topology. The problem is that each neuron encodes locally linear law,

Preprint submitted to 5th International Conference on Development and Learning and on Epigenetic Robotics. Received March 24, 2015.

CONFIDENTIAL. Limited circulation. For review only. but the topology where this law gives prediction above the error threshold, is independent from the topology of the input space θ. Then, the emerging frontiers of learned laws, after a recruitment, does not necessary fit frontiers of the real sensorimotor law. Then, more neurons have to be recruited to define more precisely those frontiers. In our example, when the end-effector performs exactly the same movement the second time, the first neuron (encoded when X = 0.1) will wins until the second one (encoded when X = 0.65) wins, so when = 0.375. After this point, the second linear law X = 0.1+0.65 2 is predicted while it’s actually still the first one that should apply in reality. This is why an other neuron will be recruited to correct it. We can observe more precisely how this mechanism works with fig. 5. We put an arbitrary motor activity input at 1, and observe the prediction made by our SLE all along the proprioception (θ) input range. Since it gives the predicted variation of the visual input in all positions, it gives the derivative law it has learn. Due to our simulation, it should learn +0.5 when 0 ≤ θ ≤ 0.5, +1.5 when 0.5 ≤ θ < 1 and 0 when θ = 1. We can observe the evolution of the learned law each time a new neuron is recruited, with the generalization consequences each neurons have. We can observe how the law is learned and how the frontiers are moving, until it fits enough (accordingly to the threshold defined) to the real law. Side effect of this recruitment is a very large generalization. With just one neuron, the robot is able to act correctly since it predict a movement is the good direction. But at step 2 and 3, it will predict that no movement is possible around 0.8 ≤ θ ≤ 1 which will penalized it for moving until it learns the law more precisely. In the 4-DOF experiment in 3D space, we performed a bench of experiments to test the evolution of the performance of the reaching task through time. To do so, the 3 simulated robots are running in parallel, one per models. They are all performing the same motor babbling, and the AS and DSM models are learning. Regularly, we stop the babbling and the learning, to perform the same reaching task (with no learning). The reaching task is a continuous sequence of target coordinates in 3D space the 3 models have to achieve. The sequence is always the same and is the record of the endeffector coordinates of the arm during a random exploration performed in an other experiment. This reaching task last for 5000 iterations. During this reaching task, the target stops moving 120 times, and we measure the euclidean distance of each end-effector to the target each time the target stops. A the end of the reaching task, we then have a mean value of distances to target for each models. We perform this task (in which there is no learning) regularly during the babbling and learning phase. In fig. 8 a), we can see the evolution of those values in time. We note that the reinforcement strategy, in red, remains quite constant since there is no learning. The AS model decrease rapidly and then slowly when almost all the reaching space have been explored. The DSM model decrease faster than the AS model. To test the robustness of the tool, we extend, during a reaching task, the length of the final segment of the simulated kanata arm in every simulations, and we also uniformly

6

a)

b) Fig. 7. In red for RS, blue for AS and green for DSM a) Mean error for the reaching task during development b) Mean error for the reaching task after learning, while the geometry of the robot change (tool)

extend the positions of the target coordinates. We perform the reaching task with this geometry perturbation after a babbling phase of 21000 iterations, once both models have quite constant performances. Results are shown in fig. 8 b). They clearly indicates a robustness of the RS, while the AS have a clear drop of performances right after the geometry change. The difference is explained by the fact that for the AS, the target position is associated to a motor configuration with the previous geometry of the robot. Since the geometry change, performances decrease. But for the RS, the distance between the end-effector and the target is computed, then even if the sensorimotor laws change a little, the co-variations directions between the end-effector and the joints remains almost the same. The evolution of performances in time of the reaching task with the geometry modification shows that the AS model needs the same amount of time to adapt as during the first babbling phase. It shows that this model have to learn every associations again to adapt to the tool, while the RS model don’t have to learn again to be efficient in the reaching task. Considering the number of recruited neurons, the problem with this DSM mechanism is that it will encode a new motor pattern each time the prediction is not good enough, given a threshold. But the 4-DOF katana arm end-effector have a huge amount of distinct laws in 3D space (the direct model

Preprint submitted to 5th International Conference on Development and Learning and on Epigenetic Robotics. Received March 24, 2015.

CONFIDENTIAL. Limited circulation. For review only.

a)

b) Fig. 8. In blue for AS and green for DSM a) Number of recruited neurons in time. b) Performances in function of the number of recruited neurons

is based on sinusoid which are highly non linear). In order to have good predictions, it then recruits a lot of neurons, while the performances for action remains unchanged. And the time to explore all the movement in any motor configuration during a random motor babbling is very long. In fig. 8 a), we see the number of recruited neurons in time for both models. In fig. 8 b), we see the performances of each models in function of the number of recruited neurons. We can see that DSM model needs less neurons to be accurate during reaching, but still, to satisfy predictions performances, it will continue to recruit more neurons. V. D ISCUSSION AND CONCLUSION In this paper, we investigate how, through an absolute and a relative strategy sensorimotor informations are encoded, and how ”contextual” informations are encoded and what they code for. Experiments with 3 simulated 4-DOF katana arm during a reaching task in 3D space, each with an absolute, relative and reinforcement strategy, indicate a good robustness when adding a tool for the RS, and the AS can adapt to the tool but by re-learning its whole visio-motor mapping. However, the number of categorization required is very important, and furthermore with no impact on the performances. The reinforcement strategy, with constant performances, seems a good back up solution in case of failure of other models. We propose

7

that the 3 strategies are always present in the development, since they have different properties and can be complementary. We see how each strategy can adapt to a tool for reaching task. The RS is robust to the extension of the arm, and its performances are immediately as good as they can be. The AS is slower to adapt to a tool, but can still achieve reaching through a new learning. Iriki et. al. (see [1]) show that this adaptation needs to be quite fast, this result tend to make the RS more relevant for tool use. However, we propose that both AS and RS models can learn in parallel. Indeed, an AS can achieve a task not immediately feasible by the RS, for example if the direction from the current position to the target position is not practicable, a detour is not possible with this simple implementation of the RS since it would imply to increase the distance to the target for a while. But it can be achieve by the AS if it already learns the target position. However, it can be possible but with more elaborated mechanisms, see [15] for an obstacle avoidance reaching task with a RS. And in case of failure, or dead end of both strategies, we can imagine that the reinforcement strategy is always present. Encoding of tool perception. In our experiments we see how the different strategies can learn and adapt to a tool (i), but the question (ii) about why tools are very long to use on purpose (see [3]) remains unanswer. If both models can adapt, then where does the difficulty of exploiting the tool come from? We can suppose for example a problem due to an inhibition mechanism. But within a very rich environment, and with a body with a lot of DOF, the way the tool and the body can be encoded seems a issue. And precisely, the perception of the tool and the unclear mechanism through which the arm appears to be ”extended” is a critical question. One hypothesis is, for example, the perception of the tool can directly bias the encoding of the hand. For example, one can imagine a gain field mechanism which operate a transformation on visual inputs (or motor outputs) and then correct the inputs of an AS model. The model can then have consistent associations, with no more learning, to be adapted to the tool. We note that with such a visual treatment, interesting generalization properties can also emerge if the tool changes. For example we can imagine that having 2 end-effectors in case of a rake-like tool, may lead to a transformation adapted for both directions. On-demand context switching. The question of how the object is perceived and then encoded can have different effects on both models. But in order to request a tool for achieving a task, it seems that a context ”having the tool in the hand” should be learned in any case, in order for the tool not to appears like a distractor infant throws away for the reaching task, but as a context to ”request”. The question is then how such a context can be requested, and if one strategy seems better than an other to achieve this request mechanism. For the time being, ”contexts” in our models were reduced to the SAW and the WTA categorization layers. They encode proprioception θ (for the AS) or vision X (for the RS). But those ”contexts” can be both vision and proprioception, and within the developmental path, we suggest that a context should also be : ”having the tool in the hand”. Independently of what contexts are encoding, we propose to focus on the

Preprint submitted to 5th International Conference on Development and Learning and on Epigenetic Robotics. Received March 24, 2015.

CONFIDENTIAL. Limited circulation. For review only. question : what contexts code for. Since in the AS, absolute positions are coded, the transformation induced by the tool can be coded as a new absolute position (as in our experiment), or maybe as a shift due to visual inputs (as we said before, with a gain field for example as we discuss before). But in the RS, ”contexts” code for variations, or sensorimotor laws. Then the shift of the tool can be encoded as one variations among others. Considering the RS, the mechanism used to act in our DSM model is based on 2 informations : (1) like in Bullock et. al. [Bullock 93), the direction from the hand to the target. Second (2), the motor activity the robot can perform. So (1) means that the way RS have to act are based on reducing the distance to a desired sensory state, i.e. the target coordinates. So if the context ”having a tool in the hand” was defined precisely enough, it can constitute a sensory state the DSM model can try to reach. Then, since our DSM model decides to act based on the motor activity the robot can perform (2), we can imagine that through the same mechanism, the DSM model can decide to act based also on context, i.e. sensory states the robot can achieve. It can be achieve by adding random sensory states in the arg max computation of equation eq. 22, for example. Combinatorial issues. The more the context is defined precisely, the more such a mechanism would be efficient. But for the moment, patterns of all sensory inputs are recruited. It leads to a combinatorial explosion of the number of recruited neurons in the SAW. Indeed, especially with the RS model, co-variations of sensors and motors can lead to a very huge combinations of just the motor patterns with 4-DOF (see fig. 8 a), or in [14] they recruits 77175 contexts to properly encode co-variations of 7-DOF of their headneckeye robot with a RS). Though, even if we propose a different recruitment law, by a recruitment based not on the input categorized space (like in classical ART-like neural networks) but based only on a prediction error (see fig. 5), we still encode too many neurons, considering their role. Since they code for linear law, they should only be recruited around the frontiers of those laws, in order to reduce the number of recruited neurons, and also to increase performances. In further works, we will investigate how such a recruitment can be possible, and also how we can have requests on those contexts with the DSM model. Relation to common-coding. This paper raises the importance of having a clear separation between what is learned (i), and how this learning can be used (ii). Indeed, in our proposed Dynamic Sensori-Motor model we separate the learning mechanism that predict sensors variations (X), and the action selection mechanism. By learning sensorimotor laws, directions to target are encoded as motor activities. This equivalence between sensors variations and motor activities in highly related to common coding theories [18]. Building on the separation of what is learned and the action selection mechanism, the system can somehow ”request” previously encountered sensory contexts exactly as we did in this paper with motor activities. In further work we will see how, through this mechanism and by improving the recruitment mechanism, we can achieve a tool-use task with a DSM model. Finally, we can test whether those 2 strategies are relevant or

8

efficient for imitation behaviours. Are they based on postural imitation, or dynamical imitation? Here again we suppose that both strategies have interesting properties (see [19]for example with a AS) and must be combined, but maybe the RS can be more efficient for converting task imitations to motor commands, see [20]. R EFERENCES [1] A. Maravita and A. Iriki, “Tools for the body (schema),” Trends in cognitive sciences, vol. 8, no. 2, pp. 79–86, 2004. [2] M. Hoffmann, H. G. Marques, A. Hernandez Arieta, H. Sumioka, M. Lungarella, and R. Pfeifer, “Body schema in robotics: a review,” Autonomous Mental Development, IEEE Transactions on, vol. 2, no. 4, pp. 304–324, 2010. [3] L. Rat-Fischer, J. K. ORegan, and J. Fagard, “The emergence of tool use during the second year of life,” Journal of experimental child psychology, vol. 113, no. 3, pp. 440–446, 2012. [4] J. Kluzik, J. Diedrichsen, R. Shadmehr, and A. J. Bastian, “Reach adaptation: what determines whether we learn an internal model of the tool or adapt the model of our arm?” Journal of neurophysiology, vol. 100, no. 3, pp. 1455–1464, 2008. [5] N. Cothros, J. Wong, and P. Gribble, “Are there distinct neural representations of object and limb dynamics?” Experimental Brain Research, vol. 173, no. 4, pp. 689–697, 2006. [6] D. Bullock, S. Grossberg, and F. H. Guenther, “A self-organizing neural model of motor equivalent reaching and tool use by a multijoint arm,” Journal of Cognitive Neuroscience, vol. 5, no. 4, pp. 408–435, 1993. [7] J. Baillieul, J. Hollerbach, and R. Brockett, “Programming and control of kinematically redundant manipulators,” in Decision and Control, 1984. The 23rd IEEE Conference on, Dec 1984, pp. 768–774. [8] F. A. Mussa-Ivaldi and N. Hogan, “Integrable solutions of kinematic redundancy via impedance control,” The International Journal of Robotics Research, vol. 10, no. 5, pp. 481–491, 1991. [9] A. D’Souza, S. Vijayakumar, and S. Schaal, “Learning inverse kinematics,” in Intelligent Robots and Systems, 2001. Proceedings. 2001 IEEE/RSJ International Conference on, vol. 1. IEEE, 2001, pp. 298– 303. [10] R. S. Sutton and A. G. Barto, Introduction to reinforcement learning. MIT Press, 1998. [11] G. A. Carpenter and S. Grossberg, Adaptive resonance theory. Springer, 2010. [12] P. Gaussier and S. Zrehen, “Perac: A neural architecture to control artificial animals,” Robotics and Autonomous Systems, vol. 16, no. 2, pp. 291–320, 1995. [13] A. de Rengerv´e, P. Andry, and P. Gaussier, “Online learning and control of attraction basins for the development of sensorimotor control strategies,” Biological cybernetics, pp. 1–20, 2015. [14] N. Srinivasa and S. Grossberg, “A head–neck–eye system that learns fault-tolerant saccades to 3-d targets using a self-organizing neural model,” Neural Networks, vol. 21, no. 9, pp. 1380–1391, 2008. [15] N. Srinivasa, R. Bhattacharyya, R. Sundareswara, C. Lee, and S. Grossberg, “A bio-inspired kinematic controller for obstacle avoidance during reaching tasks with real robots,” Neural Networks, vol. 35, pp. 54–69, 2012. [16] M. Kawato, “Internal models for motor control and trajectory planning,” Current opinion in neurobiology, vol. 9, no. 6, pp. 718–727, 1999. [17] J. R. Flanagan, P. Vetter, R. S. Johansson, and D. M. Wolpert, “Prediction precedes control in motor learning,” Current Biology, vol. 13, no. 2, pp. 146–150, 2003. [18] B. Hommel, J. Musseler, G. Aschersleben, and W. Prinz, “The theory of event coding (tec): A framework for perception and action planning,” Behavioral and Brain Sciences, vol. 24, pp. 849–937, 2001. [19] A. de Rengerv´e, S. Boucenna, P. Andry, and P. Gaussier, “Emergent imitative behavior on a robotic arm based on visuo-motor associative memories.” in IROS, 2010, pp. 1754–1759. [20] S. Schaal, A. Ijspeert, and A. Billard, “Computational approaches to motor learning by imitation,” Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 358, no. 1431, pp. 537–547, 2003.

Preprint submitted to 5th International Conference on Development and Learning and on Epigenetic Robotics. Received March 24, 2015.

Absolute and relative signals: a comparison between melanin- and ...

Learning of Absolute and Relative Distance and ...

ABSOLUTE RELATIVE

EVALUATION OF ABSOLUTE AND RELATIVE ...

Relative-Absolute Information for Simultaneous Localization and ...

Parallel Absolute-Relative Feature Based ...

Absolute vs relative humidity.pdf

Relative-Absolute Map Filter for Simultaneous ...

Comparison of Square Comparison of Square-Pixel and ... - IJRIT

Comparison of Generation Strategies for Interactive ...

Comparison of Turbulence Modeling Strategies for ...

Comparison of Three Targeted Enrichment Strategies ...

A comparison of machine learning techniques for ...

523 Grammar Relative Pronouns and Relative Adverb (hub 3A).pdf ...

COMPARISON OF EIGENMODE BASED AND RANDOM FIELD ...

COMPARISON OF ACUTE AND DELAYED ANTIEMETIC EFFECT ...

comparison