Approaching Evolutionary Robotics Through ...

Viewer
Transcript

Approaching Evolutionary Robotics Through Population-Based Incremental Learning F. Southey* & F. Karray** *Dept. of Systems Design Engineering, University of Waterloo, Waterloo, ON, N2L 3G1, Canada [email protected] **Dept. of Systems Design Engineering, University of Waterloo, Waterloo, ON, N2L 3G1, Canada [email protected]

ABSTRACT Population-Based Incremental Learning (PBIL) [1] is a recently developed evolutionary computing technique based on concepts found in genetic algorithms and competitive learning-based artificial neural networks. PBIL and a traditional genetic algorithm are compared on the task of evolving a neural networkbased controller for a simulated robotic agent. In particular, this paper examines the performance of FP-PBIL, a variant of PBIL developed for this task that works with floating-point representations rather than bit-strings. Results are presented showing the superior performance of FP-PBIL. This advantage, combined with lower memory and processing requirements indicate that the technique is well-suited to developing online, evolutionary controllers for autonomous robotic agents. 1. INTRODUCTION The domain of evolutionary robotics applies evolution-inspired optimization techniques to the design and control of robotic systems (for overviews see [2] and [3]). Both sides of this union are substantial research areas in their own right and their combination is intuitively appropriate since robotics, in a crude fashion, attempts to mimic living creatures, and evolutionary computation attempts to mimic the processes that brought those creatures into being. Some researchers have constructed online evolutionary controllers [2]. These controllers evolve while the agent is performing its task, evaluating the performance of many different configurations as they work. Systems like these may exhibit both the optimizing and adaptive capabilities of natural evolutionary systems. This paper explores a means for implementing evolutionary control systems on an autonomous robotic agent that can both overcome the slow progress of traditional evolutionary techniques

and reduce the on-board resources required. The research presented here is a preliminary investigation comparing the performance of a relatively new evolutionary technique known as Population-Based Incremental Learning (PBIL) with that of a traditional genetic algorithm (GA) when evolving a simple neural network controller for a simulated robotic agent. This paper provides a brief introduction to PBIL and makes some comparisons to traditional genetic algorithms. This is followed by a description of the simulated agent, the neural network controller, the environment, and the task. Some details of the algorithms are also described. The next sections present the experiments and results demonstrating the advantages of the new approach, and the remainder is devoted to a brief discussion and conclusion. 2. BACKGROUND 2.1 Population-Based Incremental Learning Population-Based Incremental Learning, introduced around 1994 by Baluja [4], is a technique derived by combining aspects of genetic algorithms and competitive learning. This algorithm and its variants have been shown by Baluja to significantly outperform standard GA approaches on a variety of stationary optimization problems ranging from toy problems designed specifically for GA’s to NP-complete problems [5]. It has also been applied to various real-world applications including autonomous highway vehicle navigation. PBIL’s most significant difference from standard genetic algorithms is the removal of the population found in GA’s. A GA’s population can be thought of as implicitly representing a probability distribution of alleles over genes. PBIL takes this implicit distribution and makes it explicit by dispensing with the population and instead maintaining a set of probabilities for alleles over genes. In the common case

of binary genes (two alleles per gene, 0 and 1) this set is simply a vector containing the probability for each gene that the allele is a 1. Learning in PBIL consists of using the current probability distribution to create N individuals. These individuals are evaluated according to the objective function. The “best” individual is used to update the probability vector, increasing the probability of producing solutions similar to the current best individual. The update rule is essentially the same as that used in Learning Vector Quantization (LVQ) [6]. This process of generation, evaluation, and update is repeated until some stopping criterion is met. Details of the algorithm and possible variations can be found in [4]. When considered for use in an online, evolutionary controller for an autonomous agent, PBIL could offer two distinct advantages. The first is the improved performance described above. The second lies in the fact that the algorithm requires less memory and processing power, resources that are frequently precious in autonomous robots. Where a GA requires storage for all solutions in its population, with population sizes typically ranging from tens to hundreds of solutions, PBIL only requires storage for two solutions (the current best solution and the solution being evaluated) and the probability distribution. In the research presented here, the distribution required the same amount of memory as a single solution. This means PBIL is using the space of three solutions as compared to tens or hundreds. Furthermore, PBIL’s memory requirements do not increase with increasing population size. On the processing side, PBIL does not have the crossover and fitness proportional selection operations of a classic GA. Additionally, PBIL’s “mutation” operation is performed only once per generation and only on the probability distribution. While an instruction-by-instruction analysis has not been performed, experience in executing both algorithms indicates that the overhead for GA operations is significantly higher than for PBIL. Finally, because PBIL does not use the fitness of all sampled solutions for any of its calculations, but only uses the best solution for updating its model, evaluation of any solution can be aborted once it becomes clear that it will not exceed the value of the current best solution. While not used in these experiments, this curtailed evaluation technique could yield significant improvements in evaluation time and should be investigated further.

3. PROBLEM DESIGN The problem presented here is intended to provide proof of concept and a simple basis for comparison of the two techniques. It is not intended for use as a real-world controller. As an evolutionary controller design, it is far from unique (see [7] for a similar controller) and Baluja has even applied the standard PBIL algorithm to the task of evolving a neural network controller for highway navigation [8]. However, this latter work did not provide any comparison with existing techniques. 3.1 The Simulated Agent The agent in the 2-dimensional simulation consisted of a circular agent with differential steering and three sensors. The sensors were two distance sensors, similar to sonar in operation, and a target heading sensor giving the orientation of the agent’s target relative to its own orientation (Figure 1). The distance sensors have a limited range and report a fixed distance when nothing is within their range. The agent moves at a constant speed throughout the simulation and can only control the angle of its turn at each time step.

Figure 1: The simulated robotic agent. 3.2 The Controller The controlling mechanism is a simple, multilayer, feedforward neural network with three inputs, four neurons in its hidden layer, and a single output (Figure 2). The input and hidden layers are fully connected, as are the hidden and output layers. A standard, bipolar sigmoid is used in both the hidden layer and output layer neurons. The output is interpreted as a change in heading for the agent ranging from -B to B radians relative to its current orientation. The three inputs are simply the values of the sensors fed unaltered into the network.

Figure 2: The topology of the neural network controller.

3.3 The Task Two different tasks were assigned to the agent. In the first, the agent is in an unobstructed, rectangular environment with four walls. A goal is located in the bottom-right corner of this environment. The agent must make its way to the goal as quickly as possible. In the second task, the agent and the goal are started in the same configuration, but four obstacles have been added to the environment, obstructing paths along the walls and the direct path from origin to goal. An images of the latter environment, taken from the simulator, is shown in Figure 4. The goal is the circle shown in the bottom-right corner. The objective function used for training was simple. The simulation was run for 400 steps, or until the agent reached the goal or hit an obstacle or wall, whichever occurred first. If the agent did not reach the goal, either by running out of time or colliding with something, its reward was based on its proximity to the goal. If the agent did reach the goal, it received all of the reward for being close to the goal, plus a reward based on how quickly it reached the goal. Thus, maximum reward was received for reaching the goal in the shortest possible time (the same as the using the shortest path in this case), and any agent reaching the goal always received more reward than any agent that did not. A reward of over 560 indicates that the agent reached the goal within the allotted time. 3.4 The Learning All learning consisted of creating sets of weights for the neural net controller and running a simulation to see how well it performed. The topology of the network was fixed so it was only necessary to learn the sixteen weights in the network.

Two algorithms were used and were based on those used in Baluja’s experiments. The first was “floating-point PBIL” (FP-PBIL). The second was a “floating-point standard genetic algorithm” (FP-SGA). Excepting the modifications described below, these algorithms were the same as Baluja’s “standard PBIL” (S-PBIL) and “standard GA” (SGA) [4]. The weights in the neural network were represented using floating-point values. In order to apply evolutionary techniques to determining the weights it was necessary to construct a representation for a set of floating-point values. One way to do this is simply to use a set of bits for each value. This approach was used by Baluja in his neural net training and other experiments. However, this representation causes problems when used in a GA with crossover operators since it will not necessarily produce crossovers on the boundaries of the values. A solution to this is to use crossover operators that act only on value boundaries. However, it was instead decided to represent the values directly using floating-point values and to adapt both the SGA and S-PBIL systems to work with floatingpoint. 3.4.1 FP-SGA: Only one change was necessary to make the SGA algorithm work with the floating-point representation. Instead of flipping bits, the mutation operator adds a value randomly selected between -0.5 and +0.5 to the gene. Unless otherwise specified, the FP-SGA used the following operators and parameters: • fitness proportional selection • 2-point crossover with a crossover probability of 1.0 (all individuals are crossover children in each generation) • a “per gene” mutation probability of 0.001 • an elitist scheme (worst individual in current generation is replaced by best in last generation) • a population of 10

3.4.2 FP-BPIL: For PBIL, the use of floating-point values was more problematic. It was now necessary to change the representation used for the probability distribution. Work has been done on non-binary alphabets [9], but the range of possible values in a 64bit floating-point value renders the discrete approach inefficient. Instead, the distribution was described using a set of independent Gaussian distributions, one

for each value. When updating the distribution based on the best sampled solution, the mean of the Gaussian for each value is moved towards the observed value, increasing the probability of producing a similar value. The update is shown in equation (1).

µ i'µ i%"(vi&µ i)

(1)

where µi is the mean of the ith Gaussian, vi is the ith floating-point value of the best solution, and " is the learning rate. The means and variances were initialized to 0.0 and 1.0 respectively. The variances were not adjusted during training but using such adjustments may prove useful and speed convergence towards good solutions. This would be a good direction for future research. The FP-PBIL learner is essentially the same as S-PBIL in that it updates based on the best solution found in a generation and uses no “negative learning” (moving the probability vector away from bad solutions). Mutation had to be changed to accommodate the new representation and is accomplished by randomly determining for each mean whether or not to mutate and then randomly deciding to add or subtract a small fixed amount from the mean. It is also possible to apply an elitist strategy in PBIL. In this case, the distribution is sampled as usual, but when selecting the solution to use for updating, the best solution from the previous generation is also considered. Unless otherwise specified, FP-PBIL used the following parameters: • a learning rate of 0.1 • a mutation probability of 0.02 • a mutation shift of +/-0.05 • no elitism • a “population” of 10

which good solutions were found. The average of the solution values at each generation is of little interest except as a means for evaluating the overall progress of the population (for FP-SGA) and the distribution (for FP-PBIL). Since this research is primarily concerned with the speed at which high quality solutions are generated, and not with the overall state of the algorithms, the results will focus on the best results achieved in a given run, and the generation in which certain levels of objective value were reached. In particular, solutions with a value of over 600 are noted, and solutions with a value of over 700 are of particular interest since they represent the very highest values found and are qualitatively the best solutions (the agent moves directly to the target whilst avoiding obstacles). 4.2 Schedule of Experiments All experiments were executed and the results averaged over four runs. Each run consisted of 200 generations and used the parameters described in section 3.4 except where noted as different. Experiments were concentrated on the obstructed environment which is the more interesting task. Unobstructed environment: - FP-PBIL - FP-SGA Environment with four obstacles: - FP-PBIL - FP-PBIL with elitism - FP-SGA - FP-SGA with mutation rate: 0.005 - FP-SGA with mutation rate: 0.01 - FP-SGA with crossover rate: 0.7 5. RESULTS

4. Experiment Design 4.1 Measurements In describing the performance measurements, a run is defined as the execution of an experiment involving some fixed number of generations with constant parameters. The best measurement of success for an individual controller is, of course, the objective value. In evaluating the performance of the algorithms, several observations are possible. Of primary interest is the best solution within a run and the speed with

Experiments with FP-SGA using varying parameters showed little or no improvement in performance. Therefore, only the results using the parameters from section 3.4.1 are shown here. 5.1 Unobstructed Environment

Algorithm

Avg Best of Run

Best of All

Avg Gen of > 600

Avg Gen of > 700

FP–PBIL

757

761

19

19

FP-SGA

658

712

22†

5††

Table 1: Overall results in environment with no obstacles. † ††

Figure 5: Typical best solution from FP-SGA (521).

Only 3 of the 4 made it over 600. Only 1 of the 4 made it over 700.

5.2 Environment with Four Obstacles Algorithm

Avg Best of Run

Best of All

Avg Gen of > 600

Avg Gen of > 700

FP–PBIL

732

753

61

93**

FP-PBIL Elitist

736

749

25

96

FP-SGA

572

687

48*

***

Table 2: Overall results in environment with four obstacles. * ** ***

Only 1 of the 4 made it over 600 Only 3 of the 4 made it over 700 None of the runs made it over 700.

Figure 4: Overall best Figure 3: Overall best solution from FP-SGA solution from FP-PBIL (687). (753).

6. DISCUSSION It is clear from the numbers shown in Table 1 and Table 2 that FP-PBIL outperforms FP-SGA in terms of quality of solution and time taken to discover good solutions. In fact, FP-PBIL regularly achieved solution qualities that FP-SGA rarely or never achieved. The overall best solutions for both FP-SGA and FP-PBIL differ qualitatively as well. As one can see in Figure 4, the FP-SGA generated agent uses a “wall-following” strategy. Essentially, the agent has learned to avoid obstacles, but always turns in towards the wall. The best solution does appear to use the target heading sensor to a small extent, but the majority of solutions found by FP-SGA are similar to that shown in Figure 5. These solutions do not appear to use the target heading sensor at all and never reach the goal but only approach close to it. The location of the goal in a corner seems to make the task of actually entering the goal area difficult to agents that rely largely on collision avoidance and do not use the target heading sensor. By contrast, the FP-PBIL solution show in Figure 4 is clearly using the target heading sensor to decide its overall direction and only uses its distance sensors to avoid the obstacles. The solutions found by FP-PBIL do not leave much room for improvement. However, elitist FP-PBIL produces top quality results more reliably than standard FP-PBIL. Whether this reliability comes with a risk of premature convergence to weaker solutions is a question that should be explored further. The results demonstrate that PBIL can yield better results than traditional GA’s when applied to evolutionary robotics. However, there are a multitude of variations on the classic GA which have not been

examined here. However, it is important to note that PBIL as applied in this research involved no adjustment of parameters or selection of operators, aside from the simple choice regarding elitism. The parameters used were taken directly from earlier research. For the GA, several different parameter values were tried, including some not presented here. Furthermore, many alternate operators remain to be explored, each with their own associated parameters. In terms of results for time invested, PBIL resulted in a much higher payoff than the GA. PBIL is also simpler to implement and, as noted above, requires fewer resources, rendering it highly suited to autonomous robotics.

Conference on Systems, Man, and Cybernetics, IEEE, New York, NY, USA, 1998, vol.3, p.2418-23.

[4] S. Baluja, “Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning”, Technical Report CMU-CS-94-163, Carnegie Mellon University, June 1994.

[5] S. Baluja, “Genetic algorithms and explicit search statistics”, Advances in Neural Information Processing Systems 9, Proceedings of the 1996 Conference, pp.319, 1997.

[6] T. Kohonen, “Self-organization and Associative 7. CONCLUSION The research presented here has explored the use of FP-PBIL, a variant of PBIL adapted for use with floating-point values. Simulation results have shown that in evolving a simple neural network-based controller for a navigation and collision-avoidance task, FPPBIL outperforms an implementation of the traditional genetic algorithm. 8. ACKNOWLEDGEMENTS Many thanks to Relu Patrascu for the simulation framework used in this research. Finnegan Southey was funded during this research by a National Science Engineering and Research Council (NSERC) Postgraduate scholarship. 9. REFERENCES

[1] S. Baluja and R. Caruana, “Removing the genetics from the standard genetic algorithm”, Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann Publishers, San Francisco, CA, USA; 1995; xiv+591 pp.38-46.

[2] J. Meyer, P. Husbands, and I. Harvey, “Evolutionary Robotics: A Survey of Applications and Problems”, Evolutionary Robotics, First European Workshop, Proceedings of EvoRobot98, eds. P. Husbands and J. Meyer, Springer-Verlag, Berlin, Germany, 1998, pp. 1-21.

[3] J. A. Meyer, “Evolutionary approaches to neural control in mobile robots”, 1998 IEEE International

Memory (3rd Ed.)”, Berlin, Springer-Verlag, 1989.

[7] D. Floreano and F. Mondada, “Evolutionary neurocontrollers for autonomous mobile robots”, Neural Networks, vol.11, no.7-8, Oct.-Nov. 1998, pp. 1461-78.

[8] S. Baluja, “Evolution of an artificial neural network based autonomous land vehicle controller”, IEEE Transactions on Systems, Man and Cybernetics,-Part-B-(Cybernetics), vol.26, no.3; June 1996; pp.450-63.

[9] M.P. Servais, G. de Jager and J.R. Greene, "Function optimisation using multiple-base population based incremental learning (MB-PBIL)," in Proceedings of the Eighth Annual South African Workshop on Pattern Recognition (PRASA '97), (Rhodes University), November 1997, pp. 6-11.

Approaching Evolutionary Robotics Through ...

Population-Based Incremental Learning (PBIL) ... Learning in PBIL consists of using the current ... distance sensors have a limited range and report a fixed.

Download PDF

106KB Sizes 0 Downloads 175 Views

Report

Approaching Evolutionary Robotics Through ...

Recommend Documents