Invasive Connectionist Evolution

Viewer
Transcript

Invasive Connectionist Evolution Paulito P. Palmes and Shiro Usui RIKEN Brain Science Institute, 2-1 Hirosawa, Wako, Saitama 351-0198 Japan [email protected], [email protected]

Abstract. The typical automatic way to search for optimal neural network is to combine structure evolution by evolutionary computation and weight adaptation by backpropagation. In this model, since structure and weight optimizations are carried out by two diﬀerent algorithms each using its own search space, every change in network topology during structure evolution requires relearning of the entire weights by backpropagation. Because of this ineﬃciency, we propose that the evolution of network structure and weights shall be purely stochastic and tightly integrated such that good weights and structures are not relearned but propagated from generation to generation. Since this model does not depend on gradient information, the entire process allows more ﬂexibility in the implementation of its evolution and in the formulation of its ﬁtness function. This study demonstrates how invasive connectionist evolution can easily be implemented using particle swarm optimization (PSO), evolutionary programming (EP), and diﬀerential evolution (DE) with good performances in cancer and glass classiﬁcation tasks.

1

Introduction

Artiﬁcial Neural Network (ANN) has been a popular tool in many ﬁelds of study due to its general applicability to diﬀerent problem domains that require intelligent processing such as classiﬁcation, recognition, clustering, prediction, generalization, etc. The most popular algorithm in ANN learning is BP (backpropagation) which uses minimization of the error surface by gradient descent. Since BP is a local search algorithm, it has fast convergence but can easily be trapped to local optima. Moreover, choosing the optimal architecture for a particular problem remains to be an active area of research because of BP’s tendency to overﬁt or underﬁt the training data due to its sensitivity to the choice of architecture. A typical approach to help BP ﬁgure out the appropriate architecture is by evolving its structure. Many studies have been conducted how to carry out structure evolution by evolutionary computation. A comprehensive review of papers related to evolutionary neural networks can be found in [1]. Recent insights and techniques for eﬀective evolution strategies are found in the papers of [2,3]. The most typical approach is non-invasive [4]. This type of evolution uses dual representation: one for stochastic or rule-based structure evolution and the other for gradient-based weight adaptation. While non-invasive evolution makes L. Wang, K. Chen, and Y.S. Ong (Eds.): ICNC 2005, LNCS 3612, pp. 1119–1127, 2005. c Springer-Verlag Berlin Heidelberg 2005

1120

P.P. Palmes and S. Usui

the hybridization process straightforward, there is no tight integration between its structure evolution and weight adaptation. Hence, every time its network structure evolves, there is a need for the relearning of the entire weights by BP. In a typical evolutionary model, optimal parameter values are not relearned but propagated to the succeeding generations. This is not possible, however, in a gradient-based weight adaptation. One alternative approach we proposed belongs to the class of “invasive evolutionary model” [4] which relies on pure stochastic evolution of the network structure and weights. Invasive evolution uses a network representation where weights and structures are tightly integrated such that changes to the former bring corresponding changes to the latter, and vice-versa. It avoids relearning of good weights and structures by propagating them in the succeeding generations. Since invasive connectionist evolution uses direct representation and does not rely on ﬁx rules or heuristics, it can easily utilize the evolution process of other evolutionary models such as particle swarm optimization (PSO) [5], diﬀerential evolution (DE) [6], and evolutionary programming (EP) [7]. Dynamic adaptation is important since ﬁx rules or parameter values optimized for a particular problem domain become useless for another set of problem domain [8]. What is needed is to let the processes of mutation, crossover, adaptation, and selection ﬁlter the most appropriate set of rules, traits, and parameters to solve the problem under consideration. It is important, therefore, to avoid developing evolutionary systems that rely on ﬁx rules or heuristics. We believe in the principle that a pure stochastic implementation with a proper adaptation strategies are important for a robust connectionist evolution.

2

Invasive Connectionist Model

ANN learning can be considered as a form of optimization with the main objective of ﬁnding the appropriate network structure and weights that has optimal generalization performance. Its performance is measured using quality function Qf it which measures the distance of ANN’s output F (X, S, W ) from the target output T (X): Qf it = T (Xi ), F (Xi , Si , Wi )θ (1) where X, S, and W are the network’s input, structure, and weights, respectively; and xθ is a similarity metric or error function. The main objective is to evolve the appropriate structure and weights so that the output of the function F is as close as the output of the target T . The function F uses the typical feedforward computation commonly used in n-layered network: ⎞ ⎛ (2) oi = F ⎝ wij ⎠ j

1 F (x) = 1 + e−x

(3)

Invasive Connectionist Evolution

1121

l a

x m

b

y

n a

b l m n

⎛ oi = f ⎝

l m n x y

⎞ Wij ⎠

f(x) =

j∈col

1 1 + e−x

Fig. 1. Subnetwork Nodule

Figure 1 shows the invasive connectionist’s building-block component which is composed of two weight marices. The ﬁrst weight matrix contains the topology, strength of connections, and threshold values between the input and the hidden layer. Similarly, the second weight matrix describes the topology, threshold values, and connection strengths between the hidden layer and the output layer. While each nodule can be considered as a complete network capable of performing neural computation or learning, more complex structures can be easily created by combining several of these nodules to address more challenging problems in machine learning. Figure 2 shows an example of a complex network formed by combining a population of nodules. Evolutionary operators such as mutation and crossover can independently operate on the weight matrices of each nodule to improve the ﬁtness of the entire network. In the next section, we will discuss several ways to induce invasive evolution on swarm of networks using PSO, DE, and EP.

3

Invasive Connectionist Algorithm

Figure 3 shows a neural network swarm model. Each independent nodule optimizes its structure and weights through its interactions with other nodules in the neighborhood. In this example, the degree of overlapping is set to two. Hence, every network pair has 2 neighboring pairs. This model can be reduced into the commonly used single population model by considering just one neighborhood. From the implementation point of view, multi-neighborhood representation is a generalization of the single neighborhood representation. This allows us to develop both single and multiple neighborhood

1122

P.P. Palmes and S. Usui

nodule

weight matrices and threshold vectors

Fig. 2. Invasive Connectionist Model

Network Swarms

Fig. 3. Connectionist Swarm

approaches without changes to the representation of the base component network. The invasive connectionist evolution algorithm is summarized in Fig. 4. For PSO implementation, the update of component’s position relative to its best neighbor and personal best has the following formulation [9,5]:

Invasive Connectionist Evolution

1123

Start

Initialize Structures and Weights Initialize Neighborhood Assignment

Compute Fitness Update Position of BestNeighbor and PersonalBest

Fly components

wi = wi + ∆wis ∗ sf ∗ rnd(0, 1) + ∆wip ∗ pf ∗ rnd(0, 1)

wi =

Perform Differential Evolution wip1 + α ∗ (wip2 − wip3 ) wi = wi + ρ(0, σ)

if rnd(0, 1) < cr otherwise.

Perform Evolutionary Programming sspi

=

mi

=

Qf iti ) Qtot mi + αρ(0, 1)sspi

=

ωi + ρ(0, mi )

ωi

NO

U(0, 1)(β +

Stop? YES

Test component with best validation

End

Fig. 4. Invasive Connectionist Algorithm

wi = wi + ∆wis ∗ sf ∗ U(0, 1) + ∆wip ∗ if ∗ U(0, 1)

(4)

∆wis = (wi − wis )

(5)

∆wip

(6)

such that:

= (wi −

wip )

where sf = 1.0 and if = 1.0 refer to the component’s sociability and individuality factors, respectively. More sociable components have higher sf over if and have greater tendency to converge towards the best component in their neighborhood. On the other hand, components with higher if over sf have greater tendency to converge towards their personal best. It is through the interactions of each component based on their sociability and individuality that allows the entire population to perform both local and global searching of the weight and structure spaces. All weights are randomly initialized between the range of -1 and 1. The invasive connectionist model also supports the incorporation of other evolutionary approaches such as diﬀerential evolution (DE) [6] and evolutionary programming (EP) [7]. The DE and EP in the current implementation operate

1124

P.P. Palmes and S. Usui

on the entire population although it can also be applied to each neighborhood. The feasibility of the latter scheme will be studied in the future. The weight update of DE resembles roughly with that of the PSO. It randomly selects 3 neighbors (p1, p2, p3) from the entire population as bases for changing the weights and structure of a component. Equation (7) is a modiﬁcation of the DE implementation. There are two main operations, namely: exploitation and discovery. The exploitation part uses information from 3 randomly selected parents to form a new set of weights while the discovery part introduces new weights by gaussian perturbation: wip1 + α ∗ (wip2 − wip3 ) if U(0, 1) < cr (7) wi = wi + ρ(0, σ) otherwise. where cr = 0.99 is the probability of exploitation and 1 − cr is the probability of discovery; U is a uniform distribution; ρ is the gaussian distribution with mean 0 and standard deviation σ; and α is a scaling factor. Network initialization starts from zero weights and the only way for the components to have new weights is through the discovery operation in (7). The purpose of having the probability cr set to a value close to 1 is to give the population more time to exploit the existing weight space before dealing with the new weights slowly added by the discovery operation. Selection follows the standard DE policy where only new components with better ﬁtness replace their corresponding parents. EP implementation [4], on the other hand, uses uniform crossover, rank-based selection, and adaptation of the step size parameter (ssp) during mutation: Qf iti ) Qtot mi = mi + αρ(0, 1)sspi

sspi = U(0, 1)(β + ωi

= ωi +

ρ(0, mi )

if U(0, 1) < mp

(8) (9) (10)

where: α = 0.25 and β = 0.5 are arbitrary constants that minimize the occurrences of too large and too weak mutations, respectively; Qf it and Qtot refer to the component’s ﬁtness and total ﬁtness, respectively; U is the Uniform random function which minimizes large ssp occurrences; mp = 0.01 is the mutation probability; ρ is the gaussian; and ω refers to weights and threshold values. The parameter m accumulates the net amount of changes in the mutation strength intensity over time. It is expected that those networks that survived in the later generation have the appropriate m that enabled them to adapt their structure and weights better than the other networks. EP implementation uses elitist replacement policy to avoid loosing the best traits found so far. The initial state of all networks start with no connection. This ensures that introduction of new connections and weights are carried out gradually by stochastic mutation. All algorithms use the stopping criterion described in our previous papers [10,11,4]. It monitors the presence of overﬁtness using validation performance and stop training as soon as overﬁtness becomes apparent.

Invasive Connectionist Evolution

4

1125

Simulations and Results

The quality or ﬁtness function we used in this study considers two major criteria, namely: classiﬁcation error and normalized mean-squared error: Qf it = α ∗ Qacc + β ∗ Qnmse correct ) Qacc = 100 ∗ (1 − total P N 100 Qnmse = ∗ (Tij − Oij )2 N P j=1 i=1

(11) (12) (13)

where: N and P refer to the number of samples and outputs, respectively; Qacc is the percentage error in classiﬁcation; Qnmse is the percentage of normalized mean-squared error (NMSE); α = 0.7 and β = 0.3 are user-deﬁned constants used to control the strength of inﬂuence of their respective factors. Simulation results include comparisons of the performances of four types of connectionist evolution, namely: connectionist EP (cEP), connectionist DE (cDE), connectionist PSO (cPSO), and connectionist PSO-DE (cPSO-DE). These four variants were tested using cancer and glass classiﬁcation tasks from the UCI repository [12]. The datasets from each task were copied from the experiments of Prechelt. They were divided into 50% training, 25% validation, and 25% testing [13]. Also, results from Prechelt’s manually optimized pivot BP architecture were included for benchmarking purposes. Table 1 summarizes the main features of the diﬀerent variants. Analysis of variance (ANOVA) and Tukey’s HSD test using α = 0.05 level of signiﬁcance were used for signiﬁcance and multiple comparison testing. Figure 5 shows the plots of means and standard deviations of the diﬀerent variants in the cancer and glass problems, respectively. A line connecting two or more means indicates no signiﬁcant diﬀerence within this group of means. Result of the ANOVA for the cancer problem indicates no signiﬁcant difference in the mean classiﬁcation error among the ﬁve approaches. However, the ANOVA for the glass problem indicates signiﬁcant diﬀerence in their performances. A closer analysis using Tukey HSD indicates that cEP has the best Table 1. Main Features of Invasive Connectionist Variants

1.47

40

38.5 35

1.5

1.72 1.55

Mean Classification Error (%)

2.0

1.72

44.3

39.0

35.9 33.4

30

1.0

Mean Classification Error (%)

1.95

45

50

P.P. Palmes and S. Usui

2.5

1126

pBP

cEP

cPSO cDE cPSO−DE a) Cancer

cEP

cDE

cPSO pBP cPSO−DE b) Glass

Fig. 5. Mean Classiﬁcation Error Performance

performance. Its performance, however, is not signiﬁcantly diﬀerent from the performance of cPSO, cDE, and pBP.

5

Conclusion

All variants performed as good as the manually optimized BP architecture in spite of using a relatively large hidden layer (see Table 1). These two preliminary results demonstrated the feasibility of using invasive connectionist evolution. Furthermore, this study showed several advantages of invasive evolution such as high degree of ﬂexibility in formulation and ease in implementation such that incorporating and combining other stochastic evolutionary techniques becomes trivial. The swarm model of neural network is one way of combining several nodules to achieve complexity base on their collective behavior. This nodule organization has great potential to be used for expert ensembling. The idea is to have several swarms specializing on diﬀerent parts of the solution space. Finding the best solution requires identifying which swarm will be used for evaluation. The degree of overlapping can be minimized to increase specialization or search localization of each swarm. This may provide better identiﬁcation or discrimination in noisy classiﬁcation or clustering tasks. This concept will be further investigated in the near future.

References 1. Yao, X.: Evolving artiﬁcial neural networks. Proceedings of the IEEE 87 (1999) 1423–1447 2. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA (1998) 3. K.C.Tan, Lim, M., Yao, X., Wang, L., eds.: Recent Advances in Simulated Evolution and Learning, Singapore, World Scientiﬁc (2004)

Invasive Connectionist Evolution

1127

4. Palmes, P., Hayasaka, T., Usui, S.: Mutation-based genetic neural network. IEEE Transactions on Neural Network 16 (2005) 5. Eberhart, R.C., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micromachine and Human Science, Nagoya, Japan (1995) 39–43 6. Storn, R., Price, K.: Diﬀerential evolution - a simple and eﬃcient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11 (1997) 341–359 7. Fogel, D.: Evolutionary Computation. Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway, NJ (1995) 8. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1 (1997) 67–82 9. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Network. Volume 8., Piscataway, NJ (1995) 1942–194 10. Palmes, P., Hayasaka, T., Usui, S.: Evolution and adaptation of neural networks. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN. Volume II., Portland, Oregon, USA, IEEE Computer Society Press (2003) 397–404 11. Palmes, P., Hayasaka, T., Usui, S.: Sepa: Structure evolution and parameter adaptation. In Paz, E.C., ed.: Proceedings of the Genetic and Evolutionary Computation Conference. Volume 2., Chicago, Illinois, USA, Morgan Kaufmann (2003) 223 12. Murphy, P.M., Aha, D.W.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1994) 13. Prechelt, L.: Proben1–a set of neural network benchmark problems and benchmarking. Technical report, Fakultat fur Informatik, Univ. Karlsruhe, Karlsruhe, Germany (1994)

Invasive Connectionist Evolution

of architecture. A typical approach to help BP figure out the appropriate architecture is by evolving its structure. Many studies have been conducted how to carry ... dual representation: one for stochastic or rule-based structure evolution and the .... of each component based on their sociability and individuality that allows the.

Download PDF

474KB Sizes 1 Downloads 237 Views

Report

Invasive Connectionist Evolution

Recommend Documents