Invasive Connectionist Evolution Paulito P. Palmes and Shiro Usui RIKEN Brain Science Institute, 2-1 Hirosawa, Wako, Saitama 351-0198 Japan [email protected], [email protected]

Abstract. The typical automatic way to search for optimal neural network is to combine structure evolution by evolutionary computation and weight adaptation by backpropagation. In this model, since structure and weight optimizations are carried out by two different algorithms each using its own search space, every change in network topology during structure evolution requires relearning of the entire weights by backpropagation. Because of this inefficiency, we propose that the evolution of network structure and weights shall be purely stochastic and tightly integrated such that good weights and structures are not relearned but propagated from generation to generation. Since this model does not depend on gradient information, the entire process allows more flexibility in the implementation of its evolution and in the formulation of its fitness function. This study demonstrates how invasive connectionist evolution can easily be implemented using particle swarm optimization (PSO), evolutionary programming (EP), and differential evolution (DE) with good performances in cancer and glass classification tasks.

1

Introduction

Artificial Neural Network (ANN) has been a popular tool in many fields of study due to its general applicability to different problem domains that require intelligent processing such as classification, recognition, clustering, prediction, generalization, etc. The most popular algorithm in ANN learning is BP (backpropagation) which uses minimization of the error surface by gradient descent. Since BP is a local search algorithm, it has fast convergence but can easily be trapped to local optima. Moreover, choosing the optimal architecture for a particular problem remains to be an active area of research because of BP’s tendency to overfit or underfit the training data due to its sensitivity to the choice of architecture. A typical approach to help BP figure out the appropriate architecture is by evolving its structure. Many studies have been conducted how to carry out structure evolution by evolutionary computation. A comprehensive review of papers related to evolutionary neural networks can be found in [1]. Recent insights and techniques for effective evolution strategies are found in the papers of [2,3]. The most typical approach is non-invasive [4]. This type of evolution uses dual representation: one for stochastic or rule-based structure evolution and the other for gradient-based weight adaptation. While non-invasive evolution makes L. Wang, K. Chen, and Y.S. Ong (Eds.): ICNC 2005, LNCS 3612, pp. 1119–1127, 2005. c Springer-Verlag Berlin Heidelberg 2005 

1120

P.P. Palmes and S. Usui

the hybridization process straightforward, there is no tight integration between its structure evolution and weight adaptation. Hence, every time its network structure evolves, there is a need for the relearning of the entire weights by BP. In a typical evolutionary model, optimal parameter values are not relearned but propagated to the succeeding generations. This is not possible, however, in a gradient-based weight adaptation. One alternative approach we proposed belongs to the class of “invasive evolutionary model” [4] which relies on pure stochastic evolution of the network structure and weights. Invasive evolution uses a network representation where weights and structures are tightly integrated such that changes to the former bring corresponding changes to the latter, and vice-versa. It avoids relearning of good weights and structures by propagating them in the succeeding generations. Since invasive connectionist evolution uses direct representation and does not rely on fix rules or heuristics, it can easily utilize the evolution process of other evolutionary models such as particle swarm optimization (PSO) [5], differential evolution (DE) [6], and evolutionary programming (EP) [7]. Dynamic adaptation is important since fix rules or parameter values optimized for a particular problem domain become useless for another set of problem domain [8]. What is needed is to let the processes of mutation, crossover, adaptation, and selection filter the most appropriate set of rules, traits, and parameters to solve the problem under consideration. It is important, therefore, to avoid developing evolutionary systems that rely on fix rules or heuristics. We believe in the principle that a pure stochastic implementation with a proper adaptation strategies are important for a robust connectionist evolution.

2

Invasive Connectionist Model

ANN learning can be considered as a form of optimization with the main objective of finding the appropriate network structure and weights that has optimal generalization performance. Its performance is measured using quality function Qf it which measures the distance of ANN’s output F (X, S, W ) from the target output T (X): Qf it = T (Xi ), F (Xi , Si , Wi )θ (1) where X, S, and W are the network’s input, structure, and weights, respectively; and xθ is a similarity metric or error function. The main objective is to evolve the appropriate structure and weights so that the output of the function F is as close as the output of the target T . The function F uses the typical feedforward computation commonly used in n-layered network: ⎞ ⎛  (2) oi = F ⎝ wij ⎠ j

1 F (x) = 1 + e−x

(3)

Invasive Connectionist Evolution

1121

l a

x m

b

y

n a

b l m n

⎛ oi = f ⎝



l m n x y

⎞ Wij ⎠

f(x) =

j∈col

1 1 + e−x

Fig. 1. Subnetwork Nodule

Figure 1 shows the invasive connectionist’s building-block component which is composed of two weight marices. The first weight matrix contains the topology, strength of connections, and threshold values between the input and the hidden layer. Similarly, the second weight matrix describes the topology, threshold values, and connection strengths between the hidden layer and the output layer. While each nodule can be considered as a complete network capable of performing neural computation or learning, more complex structures can be easily created by combining several of these nodules to address more challenging problems in machine learning. Figure 2 shows an example of a complex network formed by combining a population of nodules. Evolutionary operators such as mutation and crossover can independently operate on the weight matrices of each nodule to improve the fitness of the entire network. In the next section, we will discuss several ways to induce invasive evolution on swarm of networks using PSO, DE, and EP.

3

Invasive Connectionist Algorithm

Figure 3 shows a neural network swarm model. Each independent nodule optimizes its structure and weights through its interactions with other nodules in the neighborhood. In this example, the degree of overlapping is set to two. Hence, every network pair has 2 neighboring pairs. This model can be reduced into the commonly used single population model by considering just one neighborhood. From the implementation point of view, multi-neighborhood representation is a generalization of the single neighborhood representation. This allows us to develop both single and multiple neighborhood

1122

P.P. Palmes and S. Usui

nodule

weight matrices and threshold vectors

Fig. 2. Invasive Connectionist Model

Network Swarms

Fig. 3. Connectionist Swarm

approaches without changes to the representation of the base component network. The invasive connectionist evolution algorithm is summarized in Fig. 4. For PSO implementation, the update of component’s position relative to its best neighbor and personal best has the following formulation [9,5]:

Invasive Connectionist Evolution

1123

Start

Initialize Structures and Weights Initialize Neighborhood Assignment

Compute Fitness Update Position of BestNeighbor and PersonalBest

Fly components



wi = wi + ∆wis ∗ sf ∗ rnd(0, 1) + ∆wip ∗ pf ∗ rnd(0, 1)

wi =

Perform Differential Evolution wip1 + α ∗ (wip2 − wip3 ) wi = wi + ρ(0, σ)

if rnd(0, 1) < cr otherwise.

Perform Evolutionary Programming sspi

=

mi

=

Qf iti ) Qtot mi + αρ(0, 1)sspi

=

ωi + ρ(0, mi )

ωi

NO

U(0, 1)(β +

Stop? YES

Test component with best validation

End

Fig. 4. Invasive Connectionist Algorithm

wi = wi + ∆wis ∗ sf ∗ U(0, 1) + ∆wip ∗ if ∗ U(0, 1)

(4)

∆wis = (wi − wis )

(5)

∆wip

(6)

such that:

= (wi −

wip )

where sf = 1.0 and if = 1.0 refer to the component’s sociability and individuality factors, respectively. More sociable components have higher sf over if and have greater tendency to converge towards the best component in their neighborhood. On the other hand, components with higher if over sf have greater tendency to converge towards their personal best. It is through the interactions of each component based on their sociability and individuality that allows the entire population to perform both local and global searching of the weight and structure spaces. All weights are randomly initialized between the range of -1 and 1. The invasive connectionist model also supports the incorporation of other evolutionary approaches such as differential evolution (DE) [6] and evolutionary programming (EP) [7]. The DE and EP in the current implementation operate

1124

P.P. Palmes and S. Usui

on the entire population although it can also be applied to each neighborhood. The feasibility of the latter scheme will be studied in the future. The weight update of DE resembles roughly with that of the PSO. It randomly selects 3 neighbors (p1, p2, p3) from the entire population as bases for changing the weights and structure of a component. Equation (7) is a modification of the DE implementation. There are two main operations, namely: exploitation and discovery. The exploitation part uses information from 3 randomly selected parents to form a new set of weights while the discovery part introduces new weights by gaussian perturbation:  wip1 + α ∗ (wip2 − wip3 ) if U(0, 1) < cr (7) wi = wi + ρ(0, σ) otherwise. where cr = 0.99 is the probability of exploitation and 1 − cr is the probability of discovery; U is a uniform distribution; ρ is the gaussian distribution with mean 0 and standard deviation σ; and α is a scaling factor. Network initialization starts from zero weights and the only way for the components to have new weights is through the discovery operation in (7). The purpose of having the probability cr set to a value close to 1 is to give the population more time to exploit the existing weight space before dealing with the new weights slowly added by the discovery operation. Selection follows the standard DE policy where only new components with better fitness replace their corresponding parents. EP implementation [4], on the other hand, uses uniform crossover, rank-based selection, and adaptation of the step size parameter (ssp) during mutation: Qf iti ) Qtot mi = mi + αρ(0, 1)sspi

sspi = U(0, 1)(β + ωi

= ωi +

ρ(0, mi )

if U(0, 1) < mp

(8) (9) (10)

where: α = 0.25 and β = 0.5 are arbitrary constants that minimize the occurrences of too large and too weak mutations, respectively; Qf it and Qtot refer to the component’s fitness and total fitness, respectively; U is the Uniform random function which minimizes large ssp occurrences; mp = 0.01 is the mutation probability; ρ is the gaussian; and ω refers to weights and threshold values. The parameter m accumulates the net amount of changes in the mutation strength intensity over time. It is expected that those networks that survived in the later generation have the appropriate m that enabled them to adapt their structure and weights better than the other networks. EP implementation uses elitist replacement policy to avoid loosing the best traits found so far. The initial state of all networks start with no connection. This ensures that introduction of new connections and weights are carried out gradually by stochastic mutation. All algorithms use the stopping criterion described in our previous papers [10,11,4]. It monitors the presence of overfitness using validation performance and stop training as soon as overfitness becomes apparent.

Invasive Connectionist Evolution

4

1125

Simulations and Results

The quality or fitness function we used in this study considers two major criteria, namely: classification error and normalized mean-squared error: Qf it = α ∗ Qacc + β ∗ Qnmse correct ) Qacc = 100 ∗ (1 − total P N 100   Qnmse = ∗ (Tij − Oij )2 N P j=1 i=1

(11) (12) (13)

where: N and P refer to the number of samples and outputs, respectively; Qacc is the percentage error in classification; Qnmse is the percentage of normalized mean-squared error (NMSE); α = 0.7 and β = 0.3 are user-defined constants used to control the strength of influence of their respective factors. Simulation results include comparisons of the performances of four types of connectionist evolution, namely: connectionist EP (cEP), connectionist DE (cDE), connectionist PSO (cPSO), and connectionist PSO-DE (cPSO-DE). These four variants were tested using cancer and glass classification tasks from the UCI repository [12]. The datasets from each task were copied from the experiments of Prechelt. They were divided into 50% training, 25% validation, and 25% testing [13]. Also, results from Prechelt’s manually optimized pivot BP architecture were included for benchmarking purposes. Table 1 summarizes the main features of the different variants. Analysis of variance (ANOVA) and Tukey’s HSD test using α = 0.05 level of significance were used for significance and multiple comparison testing. Figure 5 shows the plots of means and standard deviations of the different variants in the cancer and glass problems, respectively. A line connecting two or more means indicates no significant difference within this group of means. Result of the ANOVA for the cancer problem indicates no significant difference in the mean classification error among the five approaches. However, the ANOVA for the glass problem indicates significant difference in their performances. A closer analysis using Tukey HSD indicates that cEP has the best Table 1. Main Features of Invasive Connectionist Variants

1.47

40

38.5 35

1.5

1.72 1.55

Mean Classification Error (%)

2.0

1.72

44.3

39.0

35.9 33.4

30

1.0

Mean Classification Error (%)

1.95

45

50

P.P. Palmes and S. Usui

2.5

1126

pBP

cEP

cPSO cDE cPSO−DE a) Cancer

cEP

cDE

cPSO pBP cPSO−DE b) Glass

Fig. 5. Mean Classification Error Performance

performance. Its performance, however, is not significantly different from the performance of cPSO, cDE, and pBP.

5

Conclusion

All variants performed as good as the manually optimized BP architecture in spite of using a relatively large hidden layer (see Table 1). These two preliminary results demonstrated the feasibility of using invasive connectionist evolution. Furthermore, this study showed several advantages of invasive evolution such as high degree of flexibility in formulation and ease in implementation such that incorporating and combining other stochastic evolutionary techniques becomes trivial. The swarm model of neural network is one way of combining several nodules to achieve complexity base on their collective behavior. This nodule organization has great potential to be used for expert ensembling. The idea is to have several swarms specializing on different parts of the solution space. Finding the best solution requires identifying which swarm will be used for evaluation. The degree of overlapping can be minimized to increase specialization or search localization of each swarm. This may provide better identification or discrimination in noisy classification or clustering tasks. This concept will be further investigated in the near future.

References 1. Yao, X.: Evolving artificial neural networks. Proceedings of the IEEE 87 (1999) 1423–1447 2. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA (1998) 3. K.C.Tan, Lim, M., Yao, X., Wang, L., eds.: Recent Advances in Simulated Evolution and Learning, Singapore, World Scientific (2004)

Invasive Connectionist Evolution

1127

4. Palmes, P., Hayasaka, T., Usui, S.: Mutation-based genetic neural network. IEEE Transactions on Neural Network 16 (2005) 5. Eberhart, R.C., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micromachine and Human Science, Nagoya, Japan (1995) 39–43 6. Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11 (1997) 341–359 7. Fogel, D.: Evolutionary Computation. Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway, NJ (1995) 8. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1 (1997) 67–82 9. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Network. Volume 8., Piscataway, NJ (1995) 1942–194 10. Palmes, P., Hayasaka, T., Usui, S.: Evolution and adaptation of neural networks. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN. Volume II., Portland, Oregon, USA, IEEE Computer Society Press (2003) 397–404 11. Palmes, P., Hayasaka, T., Usui, S.: Sepa: Structure evolution and parameter adaptation. In Paz, E.C., ed.: Proceedings of the Genetic and Evolutionary Computation Conference. Volume 2., Chicago, Illinois, USA, Morgan Kaufmann (2003) 223 12. Murphy, P.M., Aha, D.W.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1994) 13. Prechelt, L.: Proben1–a set of neural network benchmark problems and benchmarking. Technical report, Fakultat fur Informatik, Univ. Karlsruhe, Karlsruhe, Germany (1994)

Invasive Connectionist Evolution

of architecture. A typical approach to help BP figure out the appropriate architecture is by evolving its structure. Many studies have been conducted how to carry ... dual representation: one for stochastic or rule-based structure evolution and the .... of each component based on their sociability and individuality that allows the.

474KB Sizes 1 Downloads 195 Views

Recommend Documents

Connectionist Symbol Processing - GitHub
Department of Computer Science, University of Toronto,. 10 Kings College Road, Toronto, Canada M5S 1A4. Connectionist networks are composed of relatively ...

Rapid evolution in introduced species, 'invasive traits'
predictive power of these schemes, but also complicates their evaluation. We argue that including the ... the economic benefits of screening out actual invasives have far ..... Several wind-dispersed plants that recently colonized small islands in ..

Connectionist Neuroimaging
respect to computation or the representational issues concerning neural tissue. ... neural networks cannot learn symbols independent of rules (see Pinker). A basic puzzle in the ..... Agreement about event boundaries extends to online.

Comprehending Structural Priming: A Connectionist ...
Priming does not require meaning or open-class lexical overlap (Bock .... at the 15th Annual CUNY Conference on Human Sentence Processing, New York City.

Article On Being Systematically Connectionist - Semantic Scholar
Address for correspondence: Lars F. Niklasson, Department of Computer Science, University of. Skövde, S-54128, SWEDEN. Email: [email protected]. Tim van Gelder, Philosophy Program,. Research School of Social Sciences, Australian National University, C

Invasive Species Reading.pdf
Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Invasive Species Reading.pdf. Invasive Species Reading.pdf. Open. Extract. Open with. Sign In.

Invasive Species Information - Why Science?
Information about food sources and eating habits. Information about habitat and methods of spreading. Other information. Invasive Species Information.

2017 - Invasive Pathogens intro.pdf
Jun 4, 2017 - One thing that has helped to. stoke the fear is that these are not simply viruses, but zoonoses—infec- tious diseases that are transmitted between species, or more specifi- cally, from nonhuman animals or birds to humans. Although zoo

Weeds manual - Center for Invasive Plant Management
outlined in this manual will improve the consistency of national-scale data and help guide resource ... as a suitable national standard for mapping weeds (AWC meeting number 10, 2005). All 13 core, and the two ...... New South Wales National Parks an

3.3 Removing invasive results with minus (-)
3.3 Removing invasive results with minus (-). ○ Another filter is the minus (-) operator. [ tesla coil ]. [ tesla coil -circuits]. [ tesla coil ]. [ tesla coil -circuits ]. Page 2. Example of using minus (-) filtering. ○ Original query: [ salsa ]

Prioritising quaternary catchments for invasive alien - WIS
quaternary catchment is given in parentheses after the project name. ...... Kleynhans, C.J. (2000) Desktop estimates of the ecological importance and sensitivity.

2017 - Invasive Others Intro.pdf
Frontex, Europe's border patrol agency, sees these migrants as posing. an “imminent danger” and uses cutting-edge technology to surveil and. detect them.

“Lateral Inhibition” in a Fully Distributed Connectionist ...
hibition calls for localist representation; and (2) points toward a neural .... one, in that it is closer to the original “error-free” vector than to any unrelated vector ...

Weight-based processing biases in a connectionist ...
with weight (RC position * weight, t = 6.49, p < 0.001; Article * weight, t = 6.57, p < 0.001; Verb position ... Model accounts for a wide range of data (structural.

Connectionist Models of Language Production ... - Wiley Online Library
May 9, 2002 - Theories of language production have long been expressed as connectionist models. We outline the issues and challenges that must be addressed by connectionist models of lexical access and grammatical encoding, and review three recent mo

Symbolically speaking: a connectionist model of ... - Wiley Online Library
Critiques of connectionist models of language often center on the inabil- ity of these models to generalize .... properties explain human acquisition and aphasia data. The acquisition of syntactic structures in the model is compared with acquisition

Part-Set Cuing: A Connectionist Approach to Strategy ...
Department of Psychology, Florida State University. Tallahassee, FL 32306. Roy W. Roring ... performance, providing a good fit across a number of analyses.

Symbolically speaking: a connectionist model of ... - Wiley Online Library
order to take advantage of these variables, a novel dual-pathway architecture with event semantics is proposed ... Given that the language system seems to require both symbolic and statistical types of knowledge ...... The developmental pattern of th

Symbolically speaking: a connectionist model of ... - CSJ Archive
which people use to act symbolically on objects in the world, to help the model do symbolic processing in ...... Several other lists were created by replacing the action semantics of the throw sentences with the verbs ...... Philadelphia, PA: Lea &.

Symbolically speaking: a connectionist model of sentence production
Analysis of the model's hidden units demonstrated that the model learned different ..... tendency to choose a single word, the output units employed a soft-max.

Rumelhart, The Architecture of Mind, A Connectionist Approach.pdf ...
used a hydraulic model of libido flowing through the system , and that. the telephone-switchboard model of intelligence had played an im -. portant role as well .

Prioritising quaternary catchments for invasive alien - WIS
We used the Analytic Hierarchy Process. (AHP) to ..... (Working for Water Information Management System) database. .... Protect surface water systems (restore.

Download Book Hemodynamic Monitoring: Invasive ...
May 26, 2017 - Download Book Hemodynamic Monitoring: Invasive and. Noninvasive Clinical Application Full Epub (26/5/2017). Book Synopsis. Praised by ...

Weeds manual - Center for Invasive Plant Management
Fax: 02 6272 2330. Email: [email protected]. Internet: http://www.brs.gov.au. Preferred way to cite this publication: McNaught, I., Thackway, R., Brown, L. and Parsons, M. (2006). A field manual for surveying and mapping nationally significant weed