GENETIC PROGRAMMlNG TECHNIQUES THAT ...

Viewer
Transcript

G E N t C PROGRAMMINGTECHNIQUES; THAT WOLVE RECURRENTNEURAL N t O R K ARCHKECTURES FOR SIGNAL PROCESSIN(G Anna I. Esparcia-Alcsizar & Kenneth C. S h a m a n Dept. of Electronics + Electrical Engineering The University of Glasgow Glasgow G12 8LT Scotland, UK e-inail: [email protected], [email protected]

ABSTRACT We propose a novel design paradigm for recurrent neural networks. This employs a two-stage Genetic Programming / Siniulated Annealing hybrid algorithm to produce a neural network which satisfies a set of design constraints. The Genetic Prograinining part of the algorithni is used to evolve the general topology of the network, along with specifications for the neuronal transfer functions, while the Simulated Annealing component of the algorithm adapts the network’s connection weights in response to a set of training data. Our approach offers important advantages over existing methods for automated network design. Firstly, it allows us to develop recurrent network connections; secondly, we are able to employ neurones with arbitrary transfer functions, and thirdly, the approach yields an etEcient and easy to implement on-line training algorithm. The procedlures involved in using the GP/SA hybrid algorithm are illustrated by using it to design a neural network for adaptive filtering in a signal processing application.

1. INTRODUCTION In the past, Neural Networks (NNs) have been successfully applied to Digital Signal Processing (DSP) problems [5]. In general the procedure involves selecting a topology for the NN and then adapting the parameters to the observed data. We believe, however, that this may lead to poor performance if the chosen structure is not well matched to the problem. We present here a technique for adapting both the topolo,gy and weights of NNs suited for I X P applications. The problem of automatically obtaining the topology of a neural network has been recently tackled by Evolutionary Algorithm, such as genetic algorithms (GAS)[2] and evolutionary programming (EP). These are population-based search methods which do not constrain the final architectures 0-7803-3550-3/96$5.0001996

139

that can be reached. In GAS, both the structure and parameters of a network are represented as a (usually) fixed-length string and a population of such strings (or genotypes) is evolved, mainly by recombination. However, one problem with this approach is the size of the required genotype bit string [ 6 ] :for a fully connected network with N neurones, there must be at least N2 genetic components to specify the connection weights. This large genome size leads to impractical long convergence times. This problem has been addressed by Gruau [3], who has developed a compact cellular growth (constructive) algorithm based on symbolic S-expressions that are evolved by Genetic Programming. Although this method can evolve very elaborate structures, we have observed that it takes very long to converge to an optimum, which is unsuitable for certain applications On the other hand, in standard EP there is no such dual representation where a genotype maps to a network phenotype. EP algorithms evolve by mutation only, operating directly on the network components. It has been suggested [l]that this approach is much more suited to network design than GAS. The basis for this claim is that the dual gene/phene relationship used in GASabstracts the physical network topology and leads to deceptive evolution. We believe, however, that any problems are due to the representations and codings used and not to the genetic techniques the~nselves,and that, with suitable coding schemes, it is possible to develop non-deceptive GA-based design algorithms. We present here a recombination-based algorithm hybrid of Genetic Programming and Simulated Annealing. The basic idea is to separate the evolution of the structures from the evolution of the weights: the weights are encoded as a vector of node gains and are not part of the genotype. The weighE vector is initialised whenever a new genotype is created The genotypes (the structures) will then be evolved by GP and the weights by SA. This GP/SA hybrid algorithm is described in sections 2 and 3 below and represents a powerful automated design technique for a wide variety of neural network systems. Its main features are: 1. It easily caters for both recurrent networks (feedback) and non-recurrent network architectures. 2. It allows for the use of arbitrary neuronal transfer functions and it is not restricted to the weighted sum sigmoid of classical systems. 3. The training algorithm is automatically provided as part of the evolution.

2. REPRESENTING A RECURRENT N E U M L NETWORK AS A SET OF EXPRESSION TREES 2.1 Functional Specification of Neural Networks

In this section we describe the basis of our approach which is to describe networks and their dynamics as discrete-time functions. These discrete-time functions are then represented as a set of expression trees which are evolved by

140

Genetic Programming. A Simulated Annealing algorithm operates on the parameters of these expression trees to implement on-line traiining. We consider discrete-time networks described by thie following pair of equations: Y n = F(xm sn) (1)

= G(xn, s n ) (2) where: yn E RN,and xn E RM,n = O,l,. .. are the N outputs from and M inputs to the network respectively. The vector snE %' represents a set of L internal state variables, which are required to describe recurrent connections within the network. The function, sntl

F: RNx a'-+

sM x 8'

(3)

describes the input/output mapping of the system, while

G:

aNx BL- RL

(4) describes the internal network dynamics in terms of the state variables. These two functions can be written as a list of single-output/multi-input expressions F={fi,fit--.fN) and G = { g l ,

gL},

(5) where and gi are single-valued functions of the inputs and state variables. These sets of individual functions are the objects which we [evolve using Genetic Programming. To do this we write each function as an expsession tree which in turn can be written as a variable length string of symbols in polish (prefix) notation. The following section discusses in detail the components of these expression trees. g2,

1..

2.2 Components of the Expression Trees In this section we introduce a class of nodes as the primitive elements of expression trees which are suited to designing neural networks. A node is a primitive function with one output and zero or more inputs. Every node also has a real valued output gain, w, which is adjusted by the Simulated Annealing algorithm (see section 3). The basic nodes we will be using are: Input/!Output nodes XN and UN. The former represent the system input and the latter represent the system output, both delayed by N' samples. Non-linear transfer function, nlN, implements a sigmoidal function,

where the amount of non-linearity

p , PE [p,o... phi], is

a linear function of

the parameter N as follows. The range [PIo... Phi], is partitioned into maxInr equally spaced subintervals that provide maxInt+Z possible values for P; the 1

The index N represents an integer in the range [0 ... m u I n t ] .

141

parameter N simply addresses each of these values, e.g. for n10 p = P I o and for nlmaxlnt p = phi Delay node, Z, returns the value of its argument delayed by one time sample. This is implemented by using a memory stack with as many positions as z nodes appear in a particular tree (up to a certain maximum). When a particular z is found the value at the associated position of the stack is returned as output of the node and its input is stored in its place. Function node, fN, executes the Nth subroutine tree. These, also called automatically defined functions (or ADFs [ 8 ] )are in every respect the same as any other function used by the main tree except that they can have a variable number of arguments. Subroutine trees are intrinsic to a particular main tree and are created and evolve together with it, not being accessible by any other trees. Thus, an expression tree would be properly defined as the set of a main tree and all its associated subroutine trees, if any. Function nodes are important in addressing the problem of scalability, (i.e. the increment in the size of the expression trees as the size of the networks increases). By providing a compact way of expressing repetitive tasks, bigger networks can be expressed as small trees. Average node, avgN, returns the average of its N inputs. Constant node, C N returns the Nthentry of a constant table, whose values can be randomly initialised or preselected by the user. Together with these, other node types are used to implement mathematical operations. These are +, , *, /, +I, -1, and /2. It is interesting to note that we are not restricting the networks to have sigmoidal cells with weighted sum of inputs.

-

2.3 Local Recursion nodes

In the previous section we have shown how to achieve recursion from the output of the tree. Here we present an important feature to represent DSP algorithms: a class of nodes that allow for local recursion within the network. These are called psh and StkN, the former pushes the value of its associated subtree onto a stack and the latter returns the value of the Nth position in the stack. In order to maintain data coherency, two stacks are used whose size equals the maximum number of psh nodes allowed in a tree. In a particular evaluation of the tree (e.g. for the nfhinput) one of the stacks is used for writing whenever a psh node is encountered; the other stack is used for reading whenever an sfk is encountered. In the next evaluation (for the n+Zfh input) the two stacks are swapped and what was written before is now read. Internal recursion is important for developing modular solutions to certain problems. For example, the biquadratic digital filter section in canonical form is described by the coupled equations, [lo],

142

A possible genotype coding for this using psh and SfkN n,odes is, (+ (psh (+ x0 (+ stk0))))

c l stk0) (* c2 (2 stk0)))) )(+ (* c:3 stk0) (* c4 (Z

In this expression tree, the sub-tree shown in bold evalualtes the term pn and pushes this value onto the stack memory ready for the next c!ycle. The SfkO node therefore returns the value, pn.,, which can be delayed by the iZnode to get pn-a. It could be argued that an equivalent result can be achieved by using Z nodes only. In digital signal processing practice, however, this woluld imply a loss of significant digits in the obtained parameters which doesn’t occur when internal recursion is used. An alternative way of achieving internal recursion would be keeping track of the output of every node in the tree in each evaluation and dlefining a node type that would address this information in the next evaluation. ’We believe, though, that the amount of memory required would be too high, especially considering that not all the information stored would be required. 2.4 An example

In Figure 1 a three-cell RNN and a possible tree representatioin for it are shown. Here, each cell in the network is represented by its own function (fl, f2 and f3) which are invoked by the main tree. The main tree: executes the tree associated to each function, but only returns the value of the output node, which is represented by function fl (if CO = 0). The psh nodes in each function tree store the computed cell outputs on the stack, and these are: accessed from the previous cycle: using the stkN nodes. Therefore, x0 represents the input in the current instant of time, stkO represents fl in the previous instant and so on. The only difference between f l , f2 and f3 are the valuer; of their respective node gains and the amount of non linearity introduced by the nl node.

nWz2 [Main tree] (+ f l

r CO (+ f2 f3)))

psh nll (avg4 (x0 stk0 stkl stk2)) psh n12 (avg4 (x0stkO stkl stk2)) psh n12 (avg4 (x0stkO stkl stk2))

-

Figure 1 A Fully Recurrent Neural Network. Each processing cell labelled P1-3 implements a sigmoidal transfer function on the average of the cell’s input values. The connecting links have independent strengths labelled wij. The system output is taken from cell P3 and the input is applied to each cell simultaneously.

143

In this representation the weights wij have been omitted for simplicity. We will see how to include them in section 3 .

2.5 Review of Genetic Programming Genetic Programming (GP) is concerned with the simulated evolution of populations of executable computer programs. This is in contrast to the standard Genetic Algorithm (GA) [2] whose aim is to evolve data, usually in the form of fixed-length strings of bits or other symbols. The individuals in a GP population are the expression trees described above. The symbol string that encodes the tree is usually referred to a s a genotype and the associated function (what the string means) is called phenotype. A fitness value is associated to each phenotype. This nieasures the performance of the function coded by the expression tree in solving the problem at hand. To evaluate the fitness of a network several error measures can be used: mean squared error (MSE), maximum absolute error (Hm), average absolute error, average exponential error ... The evolution of tree structures proceeds as follows. An initial population of trees is generated at random and this is then evolved by means of genetic operators: selection, crossover and mutation.

3. SIMULATED ANNEALING: TRAINING BY EVOLUTION 3.1 Node gains

One of the main problems of GP is to obtain values for the parameters in a system. In general, this has been tackled by the introduction of special nodes. References [4] & [7] provide different approaches to this problem. Here we describe another way of addressing the parameter representation by means of what we call node gains. A gain value is assigned to each link between a pair of nodes in the tree, so that the data are modified as they pass from node to node through the tree. These gain values are optimised using a simulated annealing algorithm. Let us consider the link between the output of a node labeled P and the input to a following node labeled Q. The link has a strength of apq (a real number), and the relationship between the value at the output of node P, (x), and the input to node Q, (y) is y = c ~ p ~ x .

Fitness maximisation with respect to the node gains is accomplished using an annealing algorithm 191 which is applied after a new tree has been produced by crossover or mutation during evolution.

144

Let o ~ (i) p be ~ a vector of the values of all the node gains in a tree at iteration i during the annealing process, and let f(i) be the fitness of the tree using this set of node gains. The annealing algorithm is summarised in Table 1. Table 1: THE ANNEALING ALGORITHM FOR ADAPTING NODE GAINS

while ( not terminated ) do 1 Perturb

g,, (i) to get g,, ‘ (i).

2 Evaluate the fitness,f(i) ’ using these perturbed parameters. 3 If (rti)’ a f(i) ) then accept the perturbation, g,, ’ (i+l) = g,‘(i), and continue. f’(i)-j ( i ) ~.

Else accept the perturbation with probability e and continue. 4. Reduce the temperature T according to an annealing schedule .

Two interesting features of the simulated annealing algorithm are first that no mowledge with respect to the derivatives is required and second that the ilgorithm doesn’t get stuck in local optima. Another characteristic is what we consider to be the main attractive of the iimulated Annealing algorithm: that is similar to the learning process in nature. Mhen an individual is young, it can accept changes that decrease its fitness with relatively high probability; the older the individual gets., the smaller the mobability. An additional attractive is the simplicity of implementation.

I. RESULTS We applied the techniques presented in this paper to a commonly encountered ignal processing task, namely the channel equalisation problem. This consists of estoring a sequence of symbols (a signal), s, which has been dlistorted by a noninear communiications channel and corrupted by additive gaussian noise n and hus converted into a set of observations, o. %

I

t ------

. . . . . . ..

_.

- - -

I - - - - - _-

Figure 2 The Channel Equalisation problem

The aim is to find a function F ( ) so that s^ = F(o)where. s^ is the estimate f the signal, and F ( ) is the so called equahingfilter. The classical approach to

145

this problem involves sending an initial known signal and finding the filter that minimises the mean square output error,

;1

-

SI . It has been shown [5] that 2

recurrent neural netwoEks are very proficient at tackling this problem, known as trained adaptation. An example is given in Figure 3. The symbol string (the values in brackets being the node gains) is represented as an expression tree or as a neural network, as shown. The procedure to obtain this tree was as follows. The GP was trained with 100 samples of the observations; the termination criterion for the run was obtaining a tree -the solution tree - for which the number of misclassified symbols was zero. This solution was then tested with a further loo00 samples of the observations; the number of misclassified symbols was 4, thus giving a bit error rate of 0.04%. Other solutions obtained gave a lower value of the bit-error-rate but their structures were not similar to that of a neural network. This suggests that a neural network may not be the best possible structure for this problem

5. CONCLUSIONS We have shown how Recurrent Neural Networks can be generated and trained by means of a combination of Genetic Programming and Simulated Annealing techniques. Genetic Programming is a recombination-based evolutionary algorithm which has been used here to evolve the structure of the networks represented by expression trees. For the tree representation we have introduced node types that are well suited for tackling discrete time systems in general and digital signal processing problems in particular. Of special interest are the nodes that allow for time recursion, because by this means recurrent neural networks can be easily represented. An added feature of the R ” s thus obtained is that they are more general than classic NNs in the sense that different kinds of operations are allowed in their cells. Simulated Annealing is used to adapt the weights of each network, which have been represented as a vector of node gains. SA has the advantages over gradient-based methods of not needing information with respect to the derivatives and of not getting stuck in local optima. Other characteristics are the simplicity of implementation and the similarity to the learning process in nature. We are aware, however, that the implementation of the hybrid GP/SA is very computationally expensive and therefore not suited for real time applications. Further research will still have to be done in this domain.

146

Tree

Neural Network

-(phenotype)

1

Symbol String (genotype)

[ Main Tree ] ([-4.72831m([-O..l8676]n146 ([1.22253m(r.369931-([1.36358]+([-1.10633]/2 ([7.55221JT [l O.O9,49]c6[-0.805955]stk3))) [-0.976717lxO) ([-3.44618]n164 ([4.07016]+ ([1.4578]+ ([l 1.06491-1[5.42!253]XO) ([-1.26823]*2 ([2.65039]p~h ([-2.405741[-7.12902]XO P.69221IYl)))))))))

Figure 3 An equalising filter for the channel xk = ck + 0.1~.yl: + nk , where Ck = 0.5.Ck.1 + sk + O.6.sk.1, nk is 'the additive gaussian noise and Sk is the original siignal to be restored. rhe variance of the noise was chosen so that the resulting signal to noise ratio was 30 1B. This solution was obtained in one of the runs and is depkted here in three Pepresentations: tree, neural network and symbol string. All the links both in the :ree and the NM are weighted (i.e. there's a gain value associated to them). In the NN c is the input, y is the output and A represents a unit time delay; ~6 is a constant of a predefined value equal to -0.1. No function nodes were used.

147

REFERENCES Angeline PJ,Saunders GM and Pollack JB, “An Evolutionary Algorithm that Constructs Recurrent Neural Networks”. IEEE Trans. on Neural Networks, vol. 5, Jan 1994. Goldberg DE, Genetic Algorithms in search optimization and machine learning, Addison-Wesley 1989 Gruau F, “Genetic micro programming of Neural Networks”, Advances in Genetic Programming, The MIT Press, 1994. Howard LM & D’Angelo DJ, “The GA-P: a Genetic Algorithm & Genetic Programming hybrid”, IEEE Expert Aug 1995, pp 11-15 Kechriotis G, Zervas E and Manolakos ES, “Using Recurrent Neural Networks for Adaptive Communication Channel Equalization”. IEEE Trans. on Neural Networks, vol. 5, Jan 1994. Kitano, H , “Designing Neural Networks Using Genetic Algorithms with Graph Generation System”, Complex Systems, vol. 4, 1990 Koza JR, Genetic Programming: On the programming of computers by means of natural selection. The MIT press, 1992 Koza JR, Genetic Programming 11: Automatic discovery of reusable programs. The MIT press, 1994 Press,WH Flannery, BP, Teukolsky, SA & Vetterling, WT. Numerical Recipes in C. The Art of Scientific Computing. Cambridge University Press, 1988. Proakis G & Maiiolakis DG, Digital Signal Processing: Principles, Algorithms and Applications, Macmillan, 1992. Sharman KC, Esparcia-Aidzar AI and Li Y, “Evolving Signal Processing Algorithms by Genetic Programming” Proceedings of IEE/IEEE Genetic Algorithms in Engineering Systems: Innovations and Applications, GALESIA, 1995. Sharman KC and Esparcia-Aldzar AI, “Genetic Evolution of Symbolic Signal Models”, Proceedings of the 2”d Workshop on Natural Algorithms in Signal Processing, 1993.

148

Syllabus for Advanced molecular genetic marker techniques ...

Identification of common genetic variants that account ...

Genetic Identification of a Network of Factors that ...

Read PDF Cold Calling Techniques (That Really Work!)