A multiple-population evolutionary approach to gate matrix layout A. MENDESy and A. LINHARESz* This paper deals with a Very-Large-Scale Integrated systems design problem that belongs to the NP(Nondeterministic Polynomial)-hard class. The Gate Matrix Layout problem has numerous applications in the chip-manufacturing industry and in other industrial settings. A memetic algorithm is employed to solve a set of benchmark instances, and numerical comparisons with a highly competitive method—a microcanonical optimization approach—are performed. Beyond the eﬀectiveness of the method, shown by the results obtained for these instances, an additional goal of this work is to study how the performance of the algorithm is aﬀected by the use of multiple populations and of diﬀerent individual-migration policies between such populations. The results signal a strong performance improvement of multiple populations over single population approaches. Finally, the proposed algorithm presents several reﬁnements, like structured populations and a specially tailored local search.

1. Introduction With applications ranging from ﬁelds as distinct as fuzzy modelling (Xiong 2001), autonomous robot behaviour (Luk et al. 2001), back-propagation learning (Foo et al. 1999), and multicriteria optimization (Viennet et al. 1996), evolutionary methods have become an indispensable tool for systems scientists. In this arena, an interesting emerging issue is the use of multiple populations, which is gaining increased momentum from the conjunction of two technologies: On the hardware side, computer networks, multiprocessor computers and distributed processing systems (such as workstation clusters) are becoming increasingly widespread. Regarding the software issue, the introduction of PVM (Parallel Virtual Machine), and later MPI (Message-passing Interface) as well as web-enabled object-oriented languages (such as Java) have also had their role. As most evolutionary algorithms (EAs) are inherently parallel methods, the distribution of Received 29 April 2002. Revised 12 September 2003. Accepted 3 December 2003. y School of Electrical Engineering and Computer Science, University of Newcastle, Callaghan, NSW, 2308, Australia. email: [email protected] z EBAPE/FGV, Praia de Botafogo, 190, 22.250-900 Rio de Janeiro—RJ—Brazil. * To whom correspondence should be addressed. e-mail: [email protected]

the tasks is relatively easy for most applications. The workload can be distributed at an individual or a population level, the ﬁnal choice depending on the complexity of the computations involved. In this work, we do not use parallel computers, or networks of workstations. The proposed algorithm runs in a sequential way on a single processor, but populations evolve separately, simulating the behaviour of a parallel environment. Species evolve in nature grouped in populations, with boundaries deﬁned by speciﬁc features like distance or geographical barriers. The role of closed (or nearly closed) populations in evolution is extremely important. Consider, for instance, the Gala´pagos Islands, an example of notorious inspirational role for Darwin’s ideas when aboard the HMS Beagle (Darwin 1993). A set of islands separated by several kilometres of water can be colonized by a single species of birds. In the beginning of such colonization, all animals will share the same characteristics (and genetic pool), but as evolution takes place, the groups concentrated in each island will start to diﬀerentiate by adapting themselves to the particular characteristics present in each island (Weiner 1995). This independent adaptation may lead eventually to the emergence of diﬀerent species, after a suﬃcient number of generations, given that very little or no migration exists between the islands. Even if the islands share the same characteristics, dif-

International Journal of Systems Science ISSN 0020–7721 print/ISSN 1464–5319 online ß 2004 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/00207720310001657054

14

A. Mendes and A. Linhares

ferent species might arise, due to the relative isolation and the ‘genetic drift’ phenomenon (Weiner 1995). Clearly, if two similar populations are separated and submitted to equal conditions, due to the random nature of the processes involved in evolution, they may still follow diﬀerent evolutionary paths and become diﬀerent species after a large number of generations. Analogously, it is not unusual in EAs to see that if the same algorithm is run twice, it may generate diﬀerent ﬁnal solutions. Usually, this is viewed as a setback, but it can be very useful when multiple populations are used. With several populations evolving in parallel, larger portions of the search space can be sampled, and any particularly important genetic information found can be spread among them through migration of individuals. This mechanism makes the parallel search potentially more powerful than when a single population is employed (Cantu´-Paz 1999, 2000).

2. A VLSI optimization problem: Gate Matrix Layout The Gate Matrix Layout problem is an NP-hard problem (Lengauer 1990, Linhares et al. 1999, Nakatani et al. 1986) that arises in the context of the physical layout of Very-Large-Scale Integration (VLSI) systems. There are numerous stages in the design of VLSI systems, the last being the physical design stage, in which the underlying logic function has been previously transformed to a set of interconnected wires. The architecture known as gate matrices (Wing et al. 1985; Mo¨hring 1990) is part of a set of problems in which the physical design demands a linear arrangement of gates to minimize the circuit area (and thus minimize production costs and maximize performance). Mathematically, it can be stated as: suppose that there are g gates (vertical wires) and n nets (horizontal

Figure 1.

wires) on a gate matrix layout circuit. Gates can be described as vertical wires holding transistors at speciﬁc positions with nets interconnecting all the distinct gates that share transistors at the same position. An instance can be represented as a 0–1 matrix, with g columns and n rows. A value of 1 in position (i, j) means that a transistor must be implemented at gate i and net j; 0 means that no such connection is required. Two characteristics are fundamental to the problem: ﬁrst, all transistors in the same net must be interconnected. Second, the sequence of the gates does not alter the underlying circuit logic and is, thus, amenable to optimization of the design. As long as a speciﬁc wire connects all transistors in the same net, the circuit logic is stable. However, diﬀerent net wires cannot overlap, and should two diﬀerent nets share the same gate, they must be implemented over two separate physical tracks, adding to the overall circuit size. This superposition of interconnections deﬁnes the number of tracks needed to build the circuit. The mathematical objective is to ﬁnd a permutation of the g columns so that the superposition of interconnections is minimal, thus minimizing the number of required physical tracks and the overall circuit area (the number of gates is ﬁxed, so the circuit area is proportional to the number of tracks). Numerous nets may share the same track, if there is no wire superposition between them. Figure 1 shows clockwise from top left an instance of the problem, described by the net-gate matrix. In this particular (identity) sequence of gates, four tracks are required: the reader may notice that gate 2 requires diﬀerent tracks for nets 1, 4, and 5. Moreover, net 2 demands the fourth track because its wire is running through from gate 1 to gate 7. The following gate sequence changes this in a signiﬁcant manner, as nets 2 and 5 and nets 3 and 4 do not overlap and thus can share tracks. The layout then requires only three tracks.

Translation from a given instance’s solution into the real circuit.

A multiple-population evolutionary approach to gate matrix layout If, at each net and for each gate sequence, we change the values between the rightmost and leftmost from 0 to 1 then the column sum will give us the required number of tracks for each gate. The required number of tracks for the whole circuit will be given by the maximum column sum, and such is the function intended for minimization here. A ﬁnal note: the process of mapping the nets from a particular gate sequence to a speciﬁc track assignment is computable in polynomial time by the so-called left-edge algorithm (Hashimoto and Stevens 1971). In the example, the permutation of the columns is h 2-4-3-1-5-7-6 i. After the interconnection of all transistors, represented by the horizontal lines, we calculate the number of tracks needed to build each gate. This number is given by the sum of positions used in each column, and the number of tracks required to build the circuit is its maximum. In the example, this value is 3. The lower-left diagram shows the circuit layout after the grouping of the connections, and the use of only three tracks. More detailed information on this problem, including other industrial settings in which it arises, can be found in Linhares (1999), Linhares et al. (1999), Linhares and Yanasse (2002a), and Yanasse (1997). The reader should be aware that this is not just a ‘regular’ NP-hard problem: it was in fact the ﬁrst problem identiﬁed as being ﬁxed parameter-tractable, and this result eventually led to the creation of a new, large, class of problems following under the label FPT, for ﬁxed parameter tractability (Downey and Fellows 1995, Fellows and Langston 1987). In the next section, we present a new memetic algorithm for the gate matrix layout problem.

3. Memetic algorithms Since the publication of John Holland’s book, Adaptation in Natural and Artiﬁcial Systems, the ﬁeld of Genetic Algorithms (GA) and the broader ﬁeld of Evolutionary Computation were clearly established as new research areas. However, other pioneering works could also be cited, as they became increasingly conspicuous in many engineering ﬁelds and in industrial problems. In the mid- 1980s, a new class of ‘knowledgeaugmented GAs’, also called ‘hybrid GAs’, started to appear in the computer science and engineering literature. The main idea supporting these methods is that of making use of other forms of ‘knowledge’, i.e. other solution methods already available for the problem at hand. As a consequence, the resulting algorithms had little resemblance to biological evolution analogies. Recognizing important diﬀerences and similarities with other population-based approaches, some of them

15

were categorized as memetic algorithms (MAs) in 1989 (Moscato 1989, Moscato and Norman 1992), emanating from the term ‘meme’ introduced by Dawkins (1976). The ﬁeld of ‘cultural evolution’ was suggested as being more relevant, as a working metaphor, to understand the performance and ﬁnd inspiration sources to improve these new methods. Let us describe the main features present in the implemented MA in the following. 3.1. Population structure It is illustrative to show how some MAs resemble more the cooperative problem-solving techniques that can be found in some organizations. For instance, in our approach, we use a hierarchically structured population based on a complete ternary tree. In contrast with a non-structured population, the complete ternary tree can also be understood as a set of overlapping sub-populations (which we will refer to as clusters). The use of this population structure, together with a recombination scheme that selects parents always belonging to the same cluster, introduces a ‘multiple-population’ character within each population (see Section 3.5). The choice of the ternary tree structure was based mainly on empirical aspects. The ﬁrst is motivated by the fact that any hierarchical tree behaves like a set of overlapping clusters, as said before. Therefore, the dynamics are similar to several populations evolving in parallel—each cluster acts as an independent population— and exchanging individuals at a given rate. This exchange of individuals comes as a consequence of the tree-restructuring phase, carried out to maintain a speciﬁc hierarchical consistence (see Section 3.6). Now, consider the use of trees with other degrees of complexity. A binary tree-based population, for instance, would be formed by three-individual clusters only, with only two recombinations possibilities. This would degrade the ‘multiple population’ character of the tree structure. Trees with a greater order—quaternary or more— increase the multiple population character, but initial tests indicated that the performance does not improve at all, and moreover, the number of individuals rapidly jumps to prohibiting levels in terms of computational eﬀort requirements. The best trade-oﬀ points to the selected ternary tree structure. In ﬁgure 2, we can see that each cluster consists of one single leader and three supporter agents. Any leader agent in an intermediate layer has both leader and supporter roles. The leader agent always contains the best solution—considering the number of tracks— of all agents in the cluster. The number of agents in the population is equal to the number of nodes in the ternary tree, i.e. we need 13 individuals to make a ternary tree with three levels and 40 individuals to have four levels. For this work, we ﬁxed the population

16

A. Mendes and A. Linhares

Figure 3. Figure 2.

Population structure.

size to be 13. This value might seem too low at ﬁrst glance, but after tests with 40 and 121 individuals, we concluded that 13 individuals are suﬃcient to make the algorithm keep its convergence speed under control. The use of 40 or more individuals does not deteriorate the algorithm’s behaviour, but the computational eﬀort increases considerably, as well as the CPU time. We must emphasize that the use of structured populations allows a reduction in population size without any loss of search power. In comparison, non-structured populations would require more than 100 individuals to achieve a similar performance, as related in other works (Franc¸a et al. 2001, Mendes et al. 2002). 3.2. Representation and crossover Representation of the problem is quite intuitive. A solution is represented as a ‘chromosome’ in which the alleles assume diﬀerent integer values in the [1, n] interval, where n is the number of columns of the associated matrix. These values will deﬁne the sequence (permutation) of the gates. The crossover tested is a variant of the well-known Order Crossover (OX) called Block Order Crossover (BOX). After choosing two parents, several fragments of the chromosome from one of them are randomly selected and copied into the oﬀspring. In the second phase, the oﬀspring’s empty positions are sequentially ﬁlled according to the chromosome of the other parent. The BOX resembles the second variant of the OX crossover presented in (Syswerda 1991). The procedure tends to perpetuate the relative order of the columns, although some alterations might appear (see ﬁgure 3). In ﬁgure 3, Parent A contributes two pieces of its chromosome to the oﬀspring. These parts are thus copied to the same position they occupy in the parent. The blank spaces are then ﬁlled with the information of Parent B, going from left to right. The values in Parent B already present in the oﬀspring are skipped, with only the new ones being copied. The percentage to be ‘contributed’ from each parent is set at 50%. This means that the oﬀspring will be created from information inherited in equal proportion from both

Block Order Crossover (BOX) example.

parents. In each generation, we create 26 new solutions, twice the number of agents present in the population. This number for the crossover rate, higher than that in other EAs, is due to the oﬀspring acceptance policy. The acceptance rule means that several new individuals are discarded, in a high-infant-mortality selection process. Thus, after several tests, with values varying from 0.5 to 2.5, we decided to use 2.0. The insertion of new solutions in the population will be discussed later in (Section 3.6).

3.3. Mutation In our implementation, a traditional mutation strategy based on the swapping of columns was implemented. Two positions are selected uniformly at random, and their values are swapped. This mutation procedure is applied (on average) to 10% of all new individuals every generation. We have also implemented a heavy mutation procedure that executes the job swap move 10.g times in each individual, except the best one. This procedure is executed every time the population diversity is considered to be low, i.e. it has converged to individuals that are too similar (see Section 3.6).

3.4. Local search Local search algorithms for combinatorial optimisation problems generally rely on a neighbourhood deﬁnition that establishes a relationship between solutions in the conﬁguration space. In this work, two neighbourhood deﬁnitions were chosen. The ﬁrst one was the all-pairs. This consists of swapping pairs of columns from a given solution. A hill-climbing algorithm can be deﬁned by reference to this neighbourhood; i.e. starting with an initial permutation of all columns, every time a proposed swap reduces the number of tracks utilized, it is conﬁrmed, and another cycle of swaps takes place, until no further improvement can be achieved. The second neighbourhood implemented was the insertion one. This involves removing a column from one position and inserting it in another position (which could include any point between a pair of gates, or the leftmost or rightmost extremes of the permutation).

A multiple-population evolutionary approach to gate matrix layout The hill-climbing iterative procedure is the same regardless of neighbourhood deﬁnition. For small instances, it is possible to evaluate every swap or insertion possibility, since the complexity remains low. As the instance size increases, however, it becomes imperative to implement neighbourhood reductions. For this problem, a simple neighbourhoodreduction scheme is to test only the k-nearest neighbours of each column for swap and insertion movements. For instance, if k ¼ 20, each column will be tested for swap and insertion with its 10 neighbours to the left and to the right. This procedure was implemented and greatly reduced the number of possibilities and the associated computational eﬀort. Nevertheless, since the standard problem instances can have up to 141 gates (matrix columns), the complexity was still too large, and reducing the k value had a dramatic impact on the algorithm’s performance. The minimum k values necessary to reach optimal solutions for the larger instances were still too high, however, making the search process too time-consuming. A complementary procedure was then tailored. In this paper, we introduce a new local search reduction scheme, which discards useless swaps and insertion tries. The reduction is based on the position of the so-called critical columns. Critical columns deﬁne the maximum number of tracks required by the solution. Figure 4 shows an example identifying such columns. The critical column is given by gate number 2, requiring six tracks to be implemented. The local searchreduction policy works by prohibiting any exchange or insertion that cannot aﬀect the critical column. In the example above, movements involving any pair of columns extracted from the set {1, 8, 7, 4, 9} are not allowed, since they cannot decrease the number of tracks required by column 2. Likewise, movements between columns belonging to the set {3, 6, 5} are also prohibited. In the example, the number of possibilities of swaps and insertions without reduction is 108. Under the reduction scheme, this number drops to 49. If there is more than one critical column, the prohibited

17

movements are those for which both columns belong to the region before the far-left critical column and after the far-right one. As the computational complexity of the local search neighbourhoods is O(n2), the reduction will become particularly more appealing in the larger instances. The motivation of such scheme is that any exchange between columns that does not aﬀect a critical column improves the number of tracks only locally, i.e. between the columns being moved, not globally. The reduction strongly improved the MA performance, making it reach better values with a much lower number of individuals’ evaluations. The use of critical columns information has also allowed the increase in number of nearest neighbours to the tested—the k value. This ensured a general better performance, continuously surpassing the previous approach to the problem (see below). The use of a local search in this problem is critical and had to be thoroughly studied. At ﬁrst, we tested the application of the local search on all new individuals created. The resulting algorithm’s performance did not improve considerably, despite the additional requirement for CPU time. An additional experiment applied a local search only to a small percentage of the new individuals. The results improved a little but were still disappointing, regardless of the percentage utilized (tests ranged between 10 and 90% of new individuals, over 10% of all steps). The very best results were obtained when the local search was applied only to the best individual of the population, after its convergence (the deﬁnition of population convergence is given in Section 3.6). The use of only one local search at the end of an evolutionary cycle—from the population creation until its convergence—apparently has balanced the exploration/ exploitation characters of the algorithm better.

3.5. Selection for recombination

Figure 4.

Description of the critical column for a problem with nine gates and eight tracks.

The recombination of solutions in the hierarchically structured population can only involve a leader and one of its supporters within the same cluster. The recombination procedure selects any leader uniformly at random, and then it chooses—also uniformly at random—one of the three supporters. As mentioned before, the use of this selection scheme and the population structure introduces a ‘multiple-population’ character within each population. The dynamics of the population is similar to several populations evolving in parallel and exchanging individuals when the tree restructuring takes place (see Section 3.6).

18

A. Mendes and A. Linhares

3.6. Oﬀspring insertion into the population Once the leader and one supporter are selected, the processes of recombination, mutation, and local search take place, and an oﬀspring is generated. The acceptance of the oﬀspring follows two rules: (1) The oﬀspring is inserted into the population replacing the supporter that took part in the recombination that generated it. (2) The replacement occurs only if the ﬁtness of the new individual is better than the supporter. This is an extremely elitist and restrictive policy, which generates a very fast loss of diversity. The positive side is that the algorithm becomes more ‘focused’ early on and evolves faster. In order to deal with the accelerated loss of diversity, a more sensitive populationconvergence checking had to be employed. Generally, population convergence is evaluated by the similarity degree of the individuals’ chromosomes and/or ﬁtness. In this work, we concluded that if, during the recombination phase, no individual was accepted for insertion, the population had converged, and accordingly, the heavy mutation procedure would be applied. Finally, after the recombination phase, the population is restructured. The hierarchy described in Section 3.1 states that the degree of ﬁtness of the leader of a cluster must be lower than the ﬁtness of the leader of the cluster just above it. Following this rule, the higher clusters will have leaders with a better ﬁtness, and the best solution will be the leader of the root cluster. The adjustment is done comparing the supporters of each cluster with the leader. If any supporter turns out to be better than its respective leader, they swap places. In our problem, the higher the position that an individual occupies in the tree, the fewer the number of tracks it utilizes.

Figure 5. Two migration policies in an example with ﬁve populations arranged in a ring structure.

to both populations connected to it, replacing randomly chosen individuals. Every population receives two new individuals. Figure 5 shows the migration policies. The ﬁve populations are placed in a ring structure. In 1-Migrate, we have only one individual being received by each population. In 2-Migrate, this number rises to two individuals. The migration phase occurs always after all populations have converged, and the heavy mutation procedure was applied. As the strong mutation always preserves the best individual, it can migrate to the neighbouring population, carrying its genetic information. The comparison between the policies is, in fact, a comparison between none, weak, and strong migration, given the diﬀerence of communication intensity. In 0-Migrate, every population will restart from the heavy mutation with just one high-quality individual—the best one previous to the heavy mutation process. In the 1-Migrate policy, every population will restart with two highquality individuals—the second individual is received from a neighbouring population. In 2-Migrate, the number of high-quality individuals increases to three.

4. Migration policies For the study with multiple populations, we had to deﬁne how individuals migrate from one population to another. Three major population migration policies deserve mention: (1) 0-Migrate: No migration is used, and all populations evolve in parallel without any kind of communication or solution exchange. (2) 1-Migrate: Populations are arranged in a ring structure. Migration occurs in all populations, and the best individual from each migrates to the population right next to it, replacing a randomly chosen individual. Every population receives only one new individual. (3) 2-Migrate: Populations are also arranged in a ring structure. Migration does also occur in all populations, but the best individual of each migrates

5.

Computational tests

The computational tests were divided over two experiments. The ﬁrst tested the inﬂuence of the number of populations on overall performance. For this evaluation, the number of populations varied from one to ﬁve. The second experiment evaluated the inﬂuence of migration on the algorithm’s performance: for each number of populations, the three migration polices were tested, totalling 15 conﬁgurations. The tests were applied into ﬁve instances, and for each instance, we tested the whole set of conﬁgurations, 10 times for each. The MA was also tested against the GA, i.e. the MA without a local search, to evaluate the inﬂuence of the local search in the results found. The stop criterion was a time limit, ﬁxed as follows: 10 for instance W2, 30 s for V4470 and X0; 90 s for W3;

A multiple-population evolutionary approach to gate matrix layout Table 1. Instance name W2 V4470 X0 W3 W4

19

Information on the instances

Number of gates

Number of nets

Best-known solution

33 47 48 70 141

48 37 40 84 202

14 9 11 18 27

and 40 min for W4. The diﬀerence of maximum CPU times is due to the dimension of the instances and takes into account the average time to ﬁnd high-quality solutions. The reader should note that these instances are taken from real industrial circuits, and each implements a speciﬁc logic: for example, in these sets there are logic functions for decoders (v4050, v4090), 4-b comparators (v4470), full-adders (wan), ALUs(W3), ITT1s and ITT2s (W2, W4). We refer the reader to Hu and Chen (1990) for details. Table 1 lists information on the instances examined in this work: one small three medium-sized, one large. There are very few large instances in the literature on which to test methods. Linhares et al. (1999) presented the most extensive computational tests, with 25 instances in total. However, most of them were too small and easy to solve either to optimality or to the best-known solution with the algorithm. Considering the instances’ sizes, only V4470, X0, W2, W3, and W4 had more than 30 gates, and for this reason, we concentrated the study on them. We must emphasize that the best MA results, in terms of the number of tracks required to build the circuits, are the same as those found by the microcanonical optimization approach. The only performance diﬀerence was the higher eﬃciency—measured in terms of number of evaluations— in ﬁnding such values. Next, we show the experimental results. Four numbers describe the results for each conﬁguration (see ﬁgure 6). In clockwise order, we have in boldface the best number of tracks found for that instance. Next in the sequence, we display the number of times this solution was found in 10 tries, the worst value found for the conﬁguration, and ﬁnally, in the lower-left part of the cell, the average value found for the number of tracks. All tests were carried out on a Pentium 366-MHz Celeron computer, using Sun JDK 2.0 Java language running in a Windows environment. We estimate that the performance of a Pentium Celeron processor lies somewhere between a Pentium and a Pentium II processor. Instance W2 reached the optimal value in all conﬁgurations of migration policy and number of populations

Figure 6.

Data ﬁelds for each conﬁguration.

in the MA. For the GA, the algorithm found the optimal solution in all conﬁgurations, although not in all trials. The best conﬁguration for both MA and GA was the 1-Migrate with four populations because it required the lowest number of individuals’ evaluations. The other instances have shown more noticeable diﬀerences in performance, and their results are listed in tables 2–5. Since the tests take into account two parameters— migration policy and number of populations—let us begin by evaluating the ﬁrst one. At ﬁrst, two aspects of randomised search algorithms must be pointed out: exploitation and exploration (see also the debate on Linhares and Yanasse (2002b). Exploitation is the property of the algorithm to thoroughly explore a speciﬁc region of the search space, looking for any improvement in the best currently available solution(s). Exploration is the property to explore wide portions of the search space, looking for promising regions, where exploitation procedures should be employed. With no migration, we observed more instability in the answers, expressed by worst solutions and averages found for instances W3 and W4. In the smaller circuits, this feature was not so clear. By contrast, the 1-Migrate apparently balanced exploitation and exploration better, returning good average and worst-solution values. A more stable algorithm was also found with highquality solutions usually being found. The 2-Migrate policy did not perform so well, with a clear degradation of both the average and the worst solution. This might have been caused by an overtly strong exploitation, in detriment to the exploration. Thus analyzing only the migration policies, we concluded that migration should be set at medium levels, represented by the 1-Migrate. The second aspect to be analysed is the number of populations. Although it is not clear which conﬁguration was best, the use of only one is surely not the best choice, since several multi-population conﬁgurations returned better values. Based on the results, the conclusion is that when multiple populations are utilized, at least three should be employed. With only two populations, the algorithm does not seem to take advantage of the ‘genetic drift’ eﬀect. Under the optimization point of view, the use of multiple populations allows much larger portions of the search space to be explored,

20 Table 2.

A. Mendes and A. Linhares Results for the V4470 instance. Parameters: k ¼ 20; maximum CPU time ¼ 30 seconds. The best conﬁguration was 1-Migrate with 3 populations for both MA and GA Number of populations 1

V4470-MA 0-Migrate

2

4

5

9 10.1

2 11

9 10.1 9 9.9 9 9.5

1 11 2 11 5 10

9 9.6 9 9.5 9 9.7

4 10 6 11 4 11

9 10.0 9 9.6 9 9.8

1 11 5 11 2 10

9 9.5 9 9.6 9 9.5

5 10 4 10 5 10

10 10.5

5 11

10 10.4 10 10.1 9 10.3

6 11 9 11 1 11

10 10.3 9 10.5 9 10.3

7 11 2 11 1 11

10 10.4 10 10.1 10 10.3

6 11 9 11 7 11

10 10.4 9 10.2 10 10.2

6 11 1 11 8 11

1-Migrate 2-Migrate V4470-GA 0-Migrate

3

1-Migrate 2-Migrate

Table 3.

Results for the X0 instance Number of populations

1 X0-MA 0-Migrate

2

2-Migrate

5

6 13

11 11.1 11 11.0 11 11.2

9 12 10 11 8 12

11 11.2 11 11.0 11 11.1

8 12 10 11 9 12

11 11.1 11 11.0 11 11.2

9 12 10 11 8 12

11 11.1 11 11.1 11 11.1

9 12 9 12 9 12

11 11.8

2 12

11 11.5 11 11.1 11 11.1

5 12 9 12 9 12

11 11.4 11 11.2 11 11.1

6 12 8 12 9 12

11 11.5 11 11.2 11 11.0

5 12 8 12 10 11

11 11.6 11 11.2 11 11.1

4 12 8 12 9 12

2-Migrate

1-Migrate

4

11 11.6

1-Migrate

X0-GA 0-Migrate

3

Parameters: k ¼ 20; maximum CPU time ¼ 30 s. The best conﬁguration was 1-Migrate with four populations for the MA and 2-Migrate with four populations for the GA.

but both the MA and the GA require at least three populations to make the genetic drift eﬀect noticeable. The other characteristic of the method was its high eﬃciency. The number of individuals’ evaluations was very low. Comparing this with the previous work of Linhares et al. (1999), the improvement is striking. The microcanonical optimization presented in that work is based on a fast variant of the well-known simulated annealing approach, which divides the search into two alternating phases: initiation and sampling. These phases have dual objectives: in initiation, the system

strives to rapidly obtain a new locally optimum solution, while in the sampling phase, the system moves out of the local optimum while retaining similar cost values (as controlled by parameters analogous to the temperature in simulated annealing). The proposed algorithm was able to outperform ﬁve previous approaches in all previously tested instances. For details, see Linhares et al. (1999). Table 6 compares both methods. Both the memetic algorithm and the microcanonical optimization either matched or outperformed the pre-

21

A multiple-population evolutionary approach to gate matrix layout Table 4.

Results for the W3 instance Number of populations

1 W3-MA 0-Migrate

2

4

5

18 20.0

3 23

18 20.0 18 20.0 18 20.1

3 23 3 22 4 23

18 19.1 18 18.9 18 18.6

5 20 5 20 7 21

18 18.4 18 18.6 18 18.5

7 20 6 21 6 20

18 18.8 18 18.1 18 18.9

5 20 9 19 4 22

18 21.2

1 23

18 20.7 19 20.5 18 20.8

1 22 3 22 1 22

18 20.5 19 20.3 18 21.2

1 22 5 23 1 23

19 20.4 18 20.5 18 20.9

5 22 1 22 1 22

19 20.4 19 20.6 18 20.8

4 22 3 23 2 22

1-Migrate 2-Migrate W3-GA 0-Migrate

3

1-Migrate 2-Migrate

Parameters: k ¼ 30; maximum CPU time ¼ 90 s. The best conﬁguration was 1-Migrate with ﬁve populations for the MA and 2-Migrate with ﬁve populations for the GA.

Table 5.

Results for the W4 instance Number of populations

1 W4-MA 0-Migrate

2

2-Migrate

5

4 34

28 30.9 29 30.9 28 31.2

1 36 2 34 1 36

28 30.6 28 30.6 28 30.0

1 34 2 35 4 34

28 30.4 27 29.4 28 29.7

2 35 2 34 2 32

28 30.3 27 30.2 28 29.4

2 34 1 34 3 30

29 31.5

2 35

29 31.1 29 31.0 29 31.4

4 36 4 34 2 34

29 31.5 29 31.3 29 31.4

3 34 3 36 2 35

29 31.5 29 30.5 29 31.2

3 33 5 33 3 34

29 31.4 29 31.2 29 31.3

4 34 3 35 2 34

2-Migrate

1-Migrate

4

29 30.5

1-Migrate

W4-GA 0-Migrate

3

Parameters: k ¼ 60; maximum CPU time ¼ 40 min. The best conﬁguration was 1-Migrate with four populations for both MA and GA.

vious heuristics in the literature, simulated annealing, microcanonical annealing, GM-Plan, GM-Learn, and constructive heuristics, in terms of the quality of solutions obtained. Using the memetic algorithm, it was possible to obtain those high-quality solutions, however, with a fraction of the eﬀort required in microcanonical optimization. The MA yielded better values than the microcanonical approach. It is worth mentioning that the reduction ranges between 80 and 90%, depending on the instance size. There are at least three reasons for these outstanding results. First, the local search embedded in the

algorithm does not consider all possible movements, as a neighbourhood reduction enables considerable computational gains (without any major loss of solution quality). Second, the idea of discarding all possible movements that do not aﬀect critical columns (and hence cannot improve solution quality) also is a major step towards greater eﬃciency. Finally, a local search is carried out only for the best individuals of each population. These factors, taken in combination, help to explain the superior performance of this algorithm. The direct comparison between the MA and the GA shows the importance of the local search, especially

22

A. Mendes and A. Linhares

Table 6.

Comparison of the number of individuals’ evaluations between the MA and the Microcanonical approach presented in Linhares et al. (1999) Number of evaluations Memetic algorithm Min 3125 32509 18136 79089 3213532

W2 V4470 X0 W3 W4

Ave 3523 176631 43033 203892 9428591

Microcanonical approach Max 5398 451377 117384 495306 15643651

Table 7. Statistics on the MA related to the number of individual’s evaluations (ﬁgures are averages)

W2 V4470 X0 W3 W4

k value

Evaluations per local search

Evaluations per second

10 20 20 30 60

379 548 585 749 3242

6906 5702 5557 3374 662

when the instances become larger. Considering the W4, for example, the GA was not able to ﬁnd the value of 28 tracks even once. This means it missed the best value by two tracks. We believe that such failures tend to increase as the instances become larger because of the critical reduction of search power in the absence of a local search operator. Table 7 presents some MA-related statistics to give more information about the speed of the algorithm.

6. Conclusions This work presented a multiple-population evolutionary approach to solve the Gate Matrix Layout problem. We used a memetic algorithm and a genetic algorithm as search engines. The numerical results, obtained from a standard set of industrial-sized instances, were very encouraging, and the best multi-population conﬁgurations require at least three populations evolving in parallel and exchanging individuals at a medium rate. The MA was successful in solving a very complex systems science problem, obtaining solutions with a quality that rivals that of the previous methods, while dramatically decreasing the computational eﬀort involved. The number of evaluations was reduced by at least a factor of eight. The GA, however, was not able to match the MA performance in the larger instances. Future studies should include the use of

Min 12892 102976 52126 1700667 24192291

Ave 19839 1109036 95253 7143872 167986282

Max 26541 2714220 187335 21289846 405324093

parallel techniques to distribute the populations and/or individuals through a computer network. Moreover, comprehensive studies on the local search theme must be carried out, since it is a critical part of the MA, and its inﬂuence on the general performance is very signiﬁcant. This algorithm is included in a framework for general optimization called NP-Opt (Mendes et al. 2001). Although the MA utilizes special features, like the new local search, it runs in this general optimization environment, which facilitates the programming of optimization methods, but it reduces code ﬂexibility and generally does not allow data-structure tricks, very useful and common when dealing with intensive processing tasks like optimization. We believe that further development and study of such frameworks may ultimately provide an invaluable tool to the ﬁeld of systems science.

Acknowledgements This work was supported by the NBI program of the University of Newcastle, ‘Fundac¸a˜o de Amparo a` Pesquisa do Estado de Sa˜o Paulo’ (FAPESP—Brazil) and the PROPESQUISA program of the Getulio Vargas Foundation.

References CANTU´-PAZ, E., 1999, Topologies, Migration Rates and MultiPopulation Parallel Genetic Algorithms. Technical Report No. 97007, Illinois Genetic Algorithms Laboratory (ILLIGAL), University of Illinios, USA. CANTU´-PAZ, E., 2000, Eﬃcient and Accurate Parallel Genetic Algorithms (Dordrecht: Kluwer Academic). DARWIN, C. R., 1993, The Origin of the Species (Random House). DAWKINS, R., 1976, The Selﬁsh Gene (Oxford: Oxford University Press). DOWNEY, R. G., and FELLOWS, M. R., 1995, Fixed parameter tractability and completeness 1. Basic results. SIAM Journal on Computing, 24, 873–921. FELLOWS, M. R., LANGSTON, M. A., 1987, Non-constructive advances in polynomial-time complexity. Information Processing Letters, 26, 157–162.

A multiple-population evolutionary approach to gate matrix layout FOO, S. K., SARATCHANDRAN, P., and SUNDARARAJAN, N., 1999, An evolutionary algorithm for parallel mapping of backpropagation learning on heterogeneous processors. International Journal of Systems Science, 30, 309–321. FRANC¸A, P. M., MENDES, A., and MOSCATO, P., 2001, A memetic algorithm for the total tardiness single machine Scheduling problem. European Journal of Operational Research, 132, 224–242. HASHIMOTO, A., and STEVENS, J., 1971, Wire routing by optimising channel assignment within large apertures. Proceedings of the 8th Design Automation Conference, pp. 155–169. HU, Y. H., and CHEN, S. J., 1990, GM_Plan: A gate matrix layout algorithm based on artificial intelligence planning techniques. IEEE Transactions on Computer-Aided Design, 9, 836–845. LENGAUER, T., 1990, Combinatorial Algorithms for Integrated Circuit Layout (New York: John Wiley). LINHARES, A., 1999, Synthesizing a predatory search strategy for VLSI layouts. IEEE Transactions on Evolutionary Computation, 3, 147–152. LINHARES, A., and YANASSE, H. H., 2002a, Connections between cutting-pattern sequencing, VLSI design, and flexible machines. Computers & Operations Research, 29, 1759–1772. LINHARES, A., and YANASSE, H. H., 2002b, Local search intensity versus local search diversity: a false trade-off? Manuscript submitted for publication. LINHARES, A., YANASSE, H., and TORREA˜O, J., 1999, Linear Gate Assignment: a fast statistical mechanics approach. IEEE Transactions on Computer-Aided Design on Integrated Circuits and Systems, 18, 1750–1758. LUK, B. L., GALT, S., and CHEN, S., 2001, Using genetic algorithms to establish efficient walking gaits for an eight-legged robot. International Journal of Systems Science, 32, 703–713. MENDES, A. S., FRANC¸A, P. M., and MOSCATO, P., 2001. NP-Opt: an optimisation framework for NP problems. Proceedings of

23

POM2001—International Conference of the Production and Operations Management Society, pp. 82–89. MENDES, A. S., MU¨LLER, F. M., FRANC¸A, P. M., and MOSCATO, P., 2002, Comparing meta-heuristic approaches for parallel machine scheduling problems. Production Planning & Control, 13, 143–154. MO¨HRING, R. H., 1990, Graph problems related to gate matrix layout and PLA folding. Computing, 7, 17–51. MOSCATO, P., 1989, On evolution, search, optimisation, genetic algorithms and martial arts: towards memetic algorithms. Caltech Concurrent Computation Program, C3P Report 826. MOSCATO, P., and NORMAN, M. G., 1992, A ‘memetic’ approach for the travelling Salesman Problem. Implementation of a computational ecology for combinatorial optimisation on message-passing systems. In M. Valero, E. Onate, M. Jane, J. L. Larriba, and B. Suarez (eds.) Parallel Computing and Transputer Applications (Amsterdam: IOS Press), pp. 187–194. NAKATANI, K., FUJII, T., KIKUNO, T., and YOSHIDA, N., 1986, A heuristic algorithm for gate matrix layout. Proceedings of International Conference of Computer-Aided Design, pp. 324–327. SYSWERDA, G., 1991, Schedule optimization using genetic algorithms. In Handbook of Genetic Algorithms (New York: Van Nostrand Reinhold), pp. 332–349. VIENNET, R., FONTEIX, C., and MARC, I., 1996, Multicriteria optimisation using a genetic algorithm for determining a pareto set. International Journal of Systems Science, 27, 255–260. WEINER, J., 1995, The Beak of the Finch (New York: Vintage Books). WING, O., HUANG, S., and WANG, R., 1985, Gate matrix layout. IEEE Transactions on Computer-Aided Design, 4, 220–231. XIONG, N., 2001, Evolutionary learning of rule promises for fuzzy modelling. International Journal of Systems Science, 32, 1109–1118. YANASSE, H. H., 1997, On a pattern-sequencing problem to minimize the number of open stacks. European Journal of Operational Research, 100, 454–463.