The Role of Social Interaction in the Evolution of Learning Rory Smead Northeastern University April 29, 2013 Abstract It is generally thought that cognition evolved to help us navigate complex environments. Social interactions make up one part of a complex environment and some have argued that social settings are crucial to the evolution of cognition. This paper uses the methods of evolutionary game theory to investigate the effect of social interaction on the evolution of cognition very broadly construed as strategic learning or plasticity. I delineate the conditions under which social interaction alone, apart from any additional external environmental variation, can provide the selective pressure necessary for the initial evolution of learning. Furthermore, it is argued that, in the context of social interactions, we should not expect traditional learners that “best-respond” to dominate the population. Consequently, it may be important to consider non-traditional learners when modeling social evolution.

1

Introduction

Cognitive systems are biologically expensive, delicate, and complex. How did such systems evolve? The primary function of cognitive systems is to enable adaptation; so it is natural to assume that complex environments will provide the best setting for their evolution. This is the environmental complexity thesis advanced by Godfrey-Smith ([1996], [2002a], [2002b]).

1

The role of complex environments in the evolution of learning and cognition has has been recognized in many settings (see e.g. Real[1991]; Sterelny [2003]; Borenstein et al. [2008]). Social interactions form one part of a complex environment and some have argued they are particularly important for the evolution of cognition: intelligence evolved to navigate our social world (Humphrey [1976]; Byrne [1996]; Byrne and Bates [2007]). The potential importance of social interaction for the evolution of learning has also been investigated across many disciplines (see e.g. Harley [1981]; Maynard Smith [1982]; Rogers [1988]; Kirchkamp [1999]; Richerson and Boyd [2000]; Moyano and S´anchez [2009]; Zollman and Smead [2010]). However, there is not a broad consensus on the role that social interaction plays in the evolution of learning. Questions around the evolution of cognition and intelligence have obvious appeal. Here, however, we will focus on the broader notion of behavioral plasticity—exemplified by behavioral learning.1 Phrased in these terms, the question becomes: what role does social interaction play in the evolution of plasticity? To answer this question, we will use the tools of evolutionary game theory where plasticity is represented as a kind of behavioral flexibility within strategic interactions. There are some recent studies that have used game theoretic models to show that social interaction in certain settings can lead to the evolution of learning (Hamblin and Giraldeau [2009]; Dubois et al. [2010]; Arbilly et al. [2010]; Katsnelson et al. [2012]). However, these studies have largely been focused on specific games with a specific type of learning in mind. The models I will present below are more general in that they will apply to a broad class of games and a broad class of learning rules. This paper will focus on determining if and when social interaction alone—in the form of a game, apart from other external complexity—can allow for the invasion and evolution of strategic plasticity. Despite the sole focus on game-theoretic interactions, the issue is a complex one since the best way for one agent to learn will depend on the way that other agents are learning. Nevertheless, a precise characterization the conditions for the evolution of learning in some general strategic settings is possible. After briefly presenting some preliminaries, the subsequent analysis is broken into two parts based on how we characterize the learning situation: learners adapting to a population and learners adapting to other specific individuals. The central finding is that the situations involving unstable or polymorphic populations— populations composed of multiple distinct types—are most conducive to the evolution of learning. Additionally, if learners are able to respond to specific 2

individuals, they can can thrive in social interactions. Furthermore, I will argue that there is good reason to expect settings of social interaction will select for “non-traditional” learning rules—e.g. learners that do not seek out best-responses. As a consequence, we should not expect evolved populations to consist of individuals that uniformly learn in ways that produce best-response behavior.

1.1

Formal preliminaries

In the models presented here, social interaction will take the form of some two-player symmetric game (labeled G), which consists of a set of strategies S for each player and a payoff function π : S×S → R which specifies the payoffs for every pair of strategies. A mixed strategy σ is a probability distribution over pure strategies. The set of mixed strategies will be represented as ∆S and payoffs with respect to a mixed σ ∈ ∆S will be calculated as expected utilities. Let Br(σ) identify the set of best responses against σ. A strategy profile (σ, σ 0 ) is a Nash equilibrium (NE) just in case each strategy is a best response to the other. It will be helpful to analyze the game from the perspective of the population (Sandholm [2010]). We will assume an infinite population of randomly mixing individuals that each have some fixed strategy of G. For a game with n strategies, a population state can be represented with a vector x = (x1 , ..., xn ) where xi represents the frequency of strategy i in the population. Note that if playing against a random member of population x it is similar to facing an individual playing a mixed strategy σ = x, where strategy i is played with probability xi . We can calculate the fitness of strategy (or population) x against population y as follows: X XX F (x, y) = u(x, j)yj = π(i, j)xi yj j∈S

i∈S j∈S

where xi and yj represent the frequency of strategies i and j present in populations x and y respectively. A population x will be said to be at a NE just in case all the strategies present in x are best responses to x. The concept of evolutionary stability can be applied to populations. Suppose we introduce some proportion of new individuals into a population x, thus perturbing x and creating x0 . If x does better relative to the perturbed state x0 than x0 does relative to itself, then x is evolutionarily stable:

3

Definition 1. A population x is an evolutionarily stable state (ES state) if and only if 1. F (x, x) > F (x0 , x) for all x0 6= x. or 2. F (x, x) = F (x0 , x) and F (x, x0 ) > F (x0 , x0 ) for all x0 6= x within some neighborhood of x. With a single, infinite, and randomly mixing population, an ES state is equivalent to the well known evolutionarily stable strategy (Maynard Smith and Price [1973]).2 Since we will be specifically interested in the invasion of strategies, we can use the intuitions behind these stability concepts to introduce the notion of weak and strong invasiveness. Definition 2. (i) A strategy s is weakly invasive with respect to x if and only if F (s, x) ≥ F (x, x) and (ii) s is strongly invasive if F (s, x) > F (x, x).3 The stability concepts above are important, but to get a full evolutionary picture, evolutionary dynamics need to be considered as well. Given the complexity of the models considered here, however, we will forego a full dynamical analysis and instead consider some illustrative examples that involve simulations with the discrete time version of the replicator dynamic.4

2

The model

The aim of the model is to specify the conditions under which learning individuals can invade populations of non-learners. Learners are capable of using different strategies in different circumstances on the basis of some response rule. For generality, we will say very little about the character of the response rules unless we are considering specific examples.5 In each model, we will assume an infinite population of randomly-mixing individuals that use either pure strategies or are learners. The learners will be introduced to the game G by creating a “learning game” GL which expands the strategy set to S L to consist of both learners (L0 , ..., Lm ) and pure strategies (s0 , ..., sn ). The payoffs π L of GL will be a function of the payoffs of G and the description of the learning situation— described in detail within the models below. The evolutionary analysis will be applied to GL using the standard concepts of evolutionary stability and related concepts of invasion by mutants. We will also assume that each type 4

of learning Li has an associated cost ci .6 The costs associated with learning or cognition may be due to a number of factors including metabolic costs, time required for learning, error in action, etc.7 Finally, we will be considering the relationships between two games at the population-level with a different number of types: a two-player symmetric game G and the learning game GL . It will be convenient to extend a population x—which specifies a distribution of strategies in G—to the new game GL in a unique way. To do this, we will simply let a population in G be the corresponding population in GL where none of the learners are present. More precisely, we set xL = (xL1 , ..., xLn+m ) where n is the number of pure strategies in G and m is the number of learners in GL . Then set xLi = xi for 1 ≤ i ≤ n and xLi = 0 for i > n. In other words, populations specified for game G are simply populations of GL without any learners. Since we will never reduce a population from xL to x, for notational simplicity, we will usually suppress the superscript and refer to a population x while assuming the straight forward extension. There are two extremes in considering how agents will be learning to behave in social interaction settings: responding to a population or responding to specific individuals. In the former case, agents cannot distinguish specific individuals and must learn what to do when playing “the field”—a sequence of plays against randomly chosen individuals. In the latter case, agents face each opponent repeatedly, learning what to do within each interaction and may learn different behaviors for different interaction partners. These two alternatives represent the extreme points of a continuum where agents learn to interact with larger or smaller subsets of a population—or where agents have less or more discriminatory ability. Section 3 considers the case where learners are adapting to a population and Seciton 4 considers the case where learners are adapting to other individuals.

3

Adapting to the population

Suppose individuals are facing others from a population and learners can only adapt their behavior with respect to the behavior of the whole population. This may be because agents cannot distinguish individuals, have no memory of specific interactions, or simply do not have repeated interactions with specific individuals. We assume learners have a large number of interactions with others and have an adaptive rule which specifies some behavior given 5

the history of their interactions. We will also make the idealizing assumption that these individuals adapt very quickly relative to their life-span so that the payoffs they receive are relatively stable—i.e. the learned behavior stabilizes quickly relative to evolutionary time and the early interactions used for learning do not have a large bearing on the resulting payoffs from the learned behavior. To model this situation, we will suppose an infinite, randomly mixing population of individuals are paired to play a one-shot game G with one another. This pairing is repeated, with new partners, many times, each chosen independently and at random. There are pure strategy types in the population, which always behave the same way as well as adaptive individuals Li , which behave according to some adaptive rule. Since we are modeling learners as responding to the population, they do not have a simple fixed behavior and it becomes necessary to distinguish between the behavior of the population and the distribution of types in the population. If G has n strategies, the distribution of behavior in a population can be represented with a vector y = (y1 , ..., yn ), where yi represents the overall frequency that strategy si is used by both learners and non-learners. Without learners in the population, the behavioral distribution is identical to the population distribution, since strategy types uniquely identify behavior. However, with learners, this will not be the case. We will let β(x) = y represent the long-run behavioral distribution for a population x according to the strategy types in x and the nature of the learners present in x.8 Li will adopt some pure or mixed strategy of game G in response to the distribution of behavior in the rest of the population. Let ri (y) = σ represent the adaptive function for Li such that Li adopts strategy σ in response to population behavior y. ri (β(x)) = σ means that Li adopts σ in response to the behavior β(x). Because ri and β are interdependent, the payoffs of GL will not be fixed. The payoffs will change, possibly in very complex ways, with respect to the behavior and composition of the population. This is unusual with respect to the traditional analysis of evolutionary games, in which, the payoffs between strategies remain constant and only the composition of the population changes. The fitness functions for GL will be: F L (s, x) = F (s, β(x)) for any s ∈ S, F L (Li , x) = F (ri (β(x)), β(x)) − ci for any Li . Where ci represents the cost to learning rule Li . This makes a full evolutionary analysis of populations containing learners impossible without specifying 6

the response rules. However, we are still able to consider whether or not learners might be able to invade a population x devoid of learners. We will refer to the situation described by this model as “learners responding to the population.” It is now possible to determine the conditions for an invasion of an learning with respect to a non-learning population. The pair of propositions below shows that the key condition for the invasion of learners in this setting is the instability of the population. Propositon 1. If learners respond to the population, c > 0, population x has no learners, and x is at a NE of G, then no learning L is weakly invasive with respect to x. Proof. Suppose L responds to the population, c > 0, x is without learners and is at a NE of G. Since x has no learners, β(x) = x. Without loss of generality, let i ∈ S be a strategy in the support of x and j ∈ S be a strategy the support of rL (x). Since x is at a NE, F (i, x) ≥ F (s, x) for all s ∈ S, hence F (i, x) ≥ F (j, x) for any j in the support of rL (x). It follows that, F (i, x) ≥ F (rL (x), x). We now have the following: F L (L, x) < F L (i, x)

iff iff

F (rL (x), x) − c < F (i, x) c > F (rL (x), x) − F (i, x).

Any c > 0 will satisfy the right side of the above equation, and L will not be weakly invasive to x. This proposition shows that if a population is at a NE with respect to G it cannot be invaded by any learning in GL . This implies that any x which is evolutionarily stable with respect to G cannot be invaded by a learning type. The next proposition shows that some L will be able to invade a population that is not at a NE. Propositon 2. If learners respond to the population, c > 0 is sufficiently small, x has no learners, and x is not at an NE of G, then there is some L that is strongly invasive with respect to x. Proof. Suppose L responds to the population, c > 0 and x has no L, so β(x) = x, and is not at a NE of G. By hypothesis, there is some s in the support of x that is not a best response to x and hence some t ∈ S s.t. F (s, x) < F (t, x). Let σ = (y1 , ..., yn ) ∈ ∆S be such that yi = xi for i 6= s 7

and yt = xs ; σ is the mixed strategy just like x except it plays t in place of s. By virtue of F (s, x) < F (t, x), F (x, x) < F (σ, x). Thus, for any L such that rL (x) = σ we have the following: F L (L, x) > F L (x, x)

iff iff

F (rL (x), x) − c > F (x, x) c < F (σ, x) − F (x, x).

Since F (x, x) < F (σ, x), we know that there are sufficiently small c > 0 such that L will be strongly invasive with respect to x. Taken together, these two propositions show that if learners are to invade a population in this setting, the population must be unstable in the strong sense of not being at a Nash equilibrium.9 These results, however, say nothing about whether or not selection will be able to sustain learners in stable populations. And, without specifying the character of the response functions, we cannot know β(x) for any x that includes learners. Hence, we will not be able to analyze such populations. Informally, there is good reason to suspect that learners will not be sustained by selection in this setting. Since learners are always paying some positive cost and are restricted to the strategies in ∆S, an individual who adopts the best pure strategy in the support of rL will presumably do just as well as the learners but without paying the cost and hence be able to invade. This intuition is corroborated by Smead ([2012]) for any learning rule that always leads the population to an equilibrium. This result is proven within a model of evolving learning rules similar to the one presented above. The reason for the instability of learners is that learning is only advantageous to individuals when the population is not in equilibrium. If a learning rule that brings the population to an equilibrium is pervasive, then other individuals that partially mimic equilibrium behavior are quickly accommodated by the native population and do just as well—or better if learning is costly. On the other hand, if a learning rule leads to a state that is not an equilibrium, it means that there are behavioral responses available to potential invaders that would be above average in the population. In either case, it seems there is reason to think a single learning type will not be evolutionarily stable unless the learning rule generates continuous change in behavior.

8

3.1

Learning an equilibrium

Maynard Smith ([1982]) suggests that we ought to begin the investigation of the evolution of learning rules by looking at which rules do well when “playing the field,” and then apply these rules to repeated contests between individuals (p.56).10 Furthermore, both he and Harley ([1981]) argue that any evolutionarily stable learning rule, in this population-level setting, will be one that takes the population behavior to the evolutionarily stable strategy of the game.11 The model above has given us reasons to doubt both of these claims. The first claim is dubious because no stable population will allow for the invasion of learners who are “playing the field” and, as we will see in the next section, there is no such restriction when learning occurs in contests between individuals. Consequently, there is no obvious reason to think “playing the field” is a setting that is particularly conducive to the evolution of learning. The second claim, that learning will lead to the evolutionarily stable strategy, is also dubious. As mentioned above, learning rules which lead the population to an equilibrium, including the evolutionarily stable strategy, can be invaded by other learning rules, or by non-learners that are adopting similar behaviors. In summary, population-level interactions, if they are unstable, can provide the selection pressure necessary for the initial invasion, but probably not complete evolution of learning. Furthermore, these settings do not necessarily suggest anything about the character of the learners that evolve. The only conclusion we can draw in this regard is that they need not be rules that lead to equilibria of games and that rules which lead to continuous change in behavior may be important. However, there are more ways that social interaction could influence the evolution of learners; particularly if learners are responsive to other specific individuals rather than average population behavior. As we will see below, such a setting is more conducive to the evolution of learning.

4

Adapting to individuals

Suppose now that agents can adapt specifically to each opponent they face. Agents may adopt different behaviors within different interactions. This case may represent a setting where agents have a good memory of specific

9

individuals and have repeated interactions with them. Alternatively, this model could represent a setting where agents can somehow identify the type of their opponent and adopt a type-specific behavior. Learners may adopt a different behavior for each individual, or type, with whom they interact. To model this, we will assume an infinite population of individuals who are randomly paired to play a game G that is repeated. In this repeated game, individuals use either a fixed pure strategy, or some learning rule Li . These learners specify a behavior on the first round of the game and provide a method for the individual to modify her behavior as the game proceeds. Each learning Li will also be assumed to carry an additional fitness cost ci > 0. We will assume that play continues infinitely and that for any two strategies i, j ∈ S L there is a well-defined function αij : S × S → [0, 1] which specifies a probability distribution over pure strategy profiles in G given an individual of type i and an individual of type j. αij gives us the long run average frequency of game play between any two individuals. The character of αij will be left largely unspecified for learners and leave the pure strategies unchanged. The constraints on αij are as follows: 1. αst (s, t) = 1 if s, t ∈ S, P 2. s∈S αit (s, t) = 1 if t ∈ S (similarly for i, t reversed), P P 3. s∈S t∈S αij (s, t) = 1. To calculate the payoffs of different kinds of learners, we can create a learning game GL , which will be a type-game derived from G and the character of each learning. The payoff for two strategies in S L will be the payoff of their long-run interactions in the repeated G.12 The payoff function for GL can be defined as follows: XX π L (i, j) = αij (s, t)π(s, t). s∈S t∈S

For pure strategies s and t, π L (s, t) = π(s, t). But for learners, the payoff is some weighted sum of the possible payoffs in G. The associated utility and fitness functions can then be defined in the usual way for GL . We will refer to this model as learners “responding to individuals.” We would like to consider whether or not this setting allows for possibilities for an invasion and evolution of learners that were not seen in the previous model. The first thing we can show is that there will be a restriction 10

on the sorts of populations that learners can invade: learners cannot invade stable monomorphic populations. Propositon 3. If learners respond to individuals, c > 0, x has no learners, and x is playing a pure strategy NE of G, then no L is weakly invasive with respect to x. Proof. Suppose that learners respond to individuals, c > 0 and population x is playing a pure strategy NE of G. Let s be the unique strategy represented in x so xs = 1. It suffices to show that F L (L, s) < F L (s, s). Since L and s are pure strategies of GL we have F L (L, s) < F L (s, s) iff iff

π L (L, s) − c < π L (s, s) XX αLs (t, v)π(t, v) − π(s, s) c> t∈S v∈S

iff

c>

X

αLs (t, s)π(t, s) − π(s, s)

t∈S

by definition of π L P and constraint 2 on α. By hypothesis π(t, s) ≤ π(s, s) for all t ∈ S, hence t∈S αLs (t, s)π(t, s) ≤ π(s, s). Therefore, for any c > 0, F L (L, s) < F L (s, s). It follows from Proposition 3 that the extension of any monomorphic ES state x of G will be a monomorphic ES state of GL . This proposition also means that the strategic settings most conducive to the evolution of learners are games without symmetric pure strategy Nash equilibria, because these games cannot have evolutionarily stable monomorphic populations. This class of games includes “competitive” interactions such as Hawk-Dove. On the other hand, games with symmetric pure strategy equilibria such as coordination games, the Stag Hunt, or the Prisoner’s Dilemma will not be conducive to the invasion of learning strategies. It is possible to identify precisely a broad class of games where learning can invade a population of non-learners. Propositon 4. For any game G and any polymorphic population x without learners such that for some s represented in x and some t ∈ S, π(s, s) < π(t, s), if learners respond to individuals and c is sufficiently small, then there exists a L that is strongly invasive with respect to x.

11

Proof. Suppose learners respond to individuals, x has no learners and is polymorphic such that for some s represented in x and some t ∈ S and π(s, s) < π(t, s). It suffices to construct a L such that F L (L, x) > F L (x, x). Let L L play the best response to any pure strategy opponent, P so that π (L, v) = L π(Br(v), v) for each P v ∈ S. We now have F (L, x) = i∈S (π(Br(i), i) − c)xi L and F (x, x) = i∈S u(x, i)xi . ! F L (L, x) > F L (x, x)

iff

X

π(Br(i), i)xi

−c>

X

i∈S

iff

c<

X i∈S

u(x, i)xi

i∈S

π(Br(i), i)xi ) −

X

u(x, i)xi .

i∈S

By definition π(Br(i), i) ≥ u(x, i) for all i ∈ S. And, for i = s there is a t ∈ S such that P π(Br(i), s) > π(s, s). By hypothesis, P π(t, s) > π(s, s), hence xs > 0, thus i∈S π(Br(i), i)xi > i∈S u(x, i)xi . Therefore for a sufficiently small c > 0, F L (L, x) > F L (x, x) and L is strongly invasive with respect to x. This proposition suggests that polymorphic populations will provide a good setting for the invasion of learners, even if these polymorphic populations are evolutionarily stable with respect to G. This condition for the successful invasion of learning is very weak: if at least one strategy s present in the population is not a best-response to itself, s ∈ / Br(s), learning can invade. Furthermore, the way the invading learning is constructed in the proof of Proposition 4 suggests that a traditional “best-response” sort of learning may be important for the invasion of learners.13 However, we should be cautious with respect to this second point because there is also reason to believe learners that seek best-responses will not be able to dominate the population.

4.1

Learning a best-response and non-traditional learners

Many of the learning rules considered in game theory adopt a best-response, or at very least, tend toward best-responses in some way or another. This is true of learning rules ranging from simple reinforcement learning (Roth

12

and Erev [1995]; Beggs [2005]) to more sophisticated Bayesian learning (Fudenberg and Levine [1998]). As we saw in the proof of proposition 4, such learning rules may be good at invading populations. However, this does not mean that we should expect these learning rules will fixate in the population. Suppose we have a population that consists entirely of learners that always adopt some best-response against their opponent. If learners pay a cost, all that is required for a pure-strategy to invade such a population is to play a strategy that does as well against the learners as learners do against one another. Many games will have such a strategy. More precisely, any game with a strict Nash equilibrium that is optimal for one of the players has such a strategy. To see this, note that a strict Nash equilibrium is one in which each player would do strictly worse by changing her strategy. Furthermore, a Nash equilibrium is optimal for one player if and only if that player does at least as well in that Nash equilibrium as in any other. In this case, the strategy that does the best against its best-response will be able to invade a population of best-response learners. Such a strategy is part of a Nash equilibrium that will generate the highest possible payoff against its bestresponse, matching the best possible payoff that a learner could earn against another learner. And, since non-learners that have such a strategy will not pay the costs of learning, they will do strictly better than the natives in a population of learners. In other words, when other players are learning a best-response, it is better to deterministically use a strategy that does well against its best-response and let the other players do the learning (see Smead and Zollman [2009] for a formal analysis of this case). This point is perhaps best illustrated by an example. Consider a HawkDove game with learning. There are the three strategies in the population: h, d, and L. The payoffs between the pure strategies h and d are simply those of a standard Hawk-Dove game. We assume that L will adopt a bestresponse to her opponent—h against d and d against h. When two learners play each other, we assume that they reach one of the two pure-strategy Nash equilibria, (h, d) and (d, h), with equal probability. The type-game GL can be expressed in the matrix form shown in Table 1.14 We will use computer simulations with the discrete-time replicator dynamic to explore the properties of this game. For c = 0.01, every simulation results in a mixture of h and L. The global dynamics is shown in Figure 1. Learning invades non-learning populations and is also maintained by selection. However, learning does not dominate the population, and is only 13

sustained in a polymorphic mix. This is very different from what one would see in the corresponding game in the adapting to populations model. If learners adapt to the population in this case and adopt a best-response to the population, they are able to invade some populations but are always eliminated by selection (see Smead [2012] for details on this case when learners are the adapting to populations). In both cases, learning rules which learn best-responses do not go to fixation in populations playing Hawk-Dove games. Given these insights and Proposition 4 above, we should expect that if learners are to dominate the population it will most likely involve learning rules that do not lead to best-responses. In other words, when investigating whether learning is likely to evolve in settings of social interaction, it is important to consider learning rules that are “non-traditional”—learning rules that do not adopt best-responses to their opponent’s behavior. The nature of these non-traditional learners is difficult to predict without specifying some space of possible learning rules. This places a complete investigation of this issue beyond the scope of the current paper. However, using the models presented here we can provide some illustrative examples that demonstrate some evolutionary differences between non-traditional learners and best-response learners. Consider the following learning rule, which we will call “competitiveresponse.” Choose an initial strategy at random, observe your payoff and the payoff of your opponent and switch to a new strategy if and only if your opponent received a higher payoff. Let Lcr denote this type of learner. This rule often will not lead to best-responses. For instance, in the Hawk-Dove game considered above, this rule would adopt the strategy h in response to an opponent playing h. Two competitive-responders playing against one another in the Hawk-Dove game will, eventually, end up playing (h, h) or (d, d), neither of which correspond to Nash equilibria of the game. Despite this, the competitive-response rule does very well when placed in that game along with the non-learning strategies and best-response learners. With all four types included and c = 0.1, the non-learners go extinct and the evolved population is a stable polymorphism of competitive-responders and bestresponders. Such stable polymorphic populations with multiple types of learners also regularly occur in models of individual and social learning (see e.g. Rogers [1988]; Richerson and Boyd [1999]). Furthermore, this same competitive-response learning rule can do very well in situations where best-response learners do poorly. Consider the 14

Prisoner’s Dilemma game with the two pure strategies c and d and the competitive-response learning rule Lcr . The payoff matrix for GL is presented in Table 2. In this case, the initial invasion of learners is difficult. However, if there are enough competitive-response learners in the population, they can drive the non-learners to extinction. The global dynamics of this game with the discrete-time replicator dynamics and c = 0.1 is shown in Figure 2. A population consisting of all Lcr will be evolutionarily stable for any c < 0.25. Placing best-response learners in a similar game generates very different results: all populations result in fixation to the non-learning pure-strategy d. If all four types are placed in the same population there are two possible outcomes: a uniform population of d or a uniform population of Lcr . This shows that there may be cases where non-traditional learning rules are evolutionarily stable and traditional learning rules are not. In summary, there is reason to think that best-response type learning rules are unlikely to fully evolve in the context of social interaction. However, this does not mean that other kinds of learning rules cannot. Consequently, if learners are to dominate the population, some of them will be not be traditional best-response type learners. This conclusion may have methodological implications for modeling learning in the social sciences and perhaps the study of strategic plasticity more generally. For instance, it has been observed by Skyrms ([2004]) that different learning mechanisms will result in different social contracts. The examples above suggest that we should expect some portion of evolved populations to learn in ways that do not directly track best-responses. Consequently, selecting an appropriate learning rule for modeling purposes may be both more difficult and more significant than previously thought. Furthermore, certain learning rules that are widely applied in modeling social evolution, such as Roth-Erev reinforcement learning (see e.g. Skyrms [2010]), may not be the kind of learning rules we would expect to evolve in the context of social interaction. Thus, the use of these learning rules in modeling social interaction may not be appropriate without some additional justification—theoretical or empirical.

5

Conclusion

We have examined two models of the invasion or non-invasion of learners with respect to social interaction: when learners are responding to other individuals and when they are responding to a population. These represent 15

two extremes on a continuum of learning situations and included some substantial idealizing assumptions—to be discussed below. Nevertheless, we are able to draw several conclusions from the results of these two models. 1. Under certain conditions—regarding the game being played and the composition of the population—social interaction alone is able to provide the selective pressure required for the invasion of learning. 2. The ability to adapt to specific individuals is much more conducive to the evolution of learning when compared to adapting to population behavior. Hence, this distinction is important for understanding the role of social interaction in the evolution of learning. 3. The games that are conducive to the invasion of learners are those associated with unstable or polymorphic populations. 4. If learning is to dominate the population in social interaction settings it will likely involve “non-traditional” learners that do not directly seek best-responses. More generally, the results presented above suggest social interaction may have an important role to play in the evolution of learners which is not a straight-forward generalization of complex environments. The interaction of learning rules becomes significant: the best way for an individual to learn depends, very strongly, on the way others are learning. A further and more detailed investigation into the evolution of learning will require specifying the learning rules of interest so that the specific behavior—and payoffs—can be determined. Furthermore, relatively little has been said about the effects of social interaction on the character of the learners that might evolve in those settings. I have argued that we should expect “non-traditional” learning rules to be important in social interaction, but what varieties of non-traditional learning rules are important and how they might work has not been specified apart from an illustrative example. It is also important to keep in mind the limitations of the models presented here. We have assumed that populations are facing singular fixed games and that learners have only the ability to change behavior over time. However, real learning does more than simply offer behavioral flexibility in games. For instance, learning may invent entirely new strategies and thus change the underlying interaction. In this respect, the learners considered 16

here are relatively limited and it would be worthwhile to consider learners that are more sophisticated. Additionally, it would be possible to consider models where multiple games are being played simultaneously, or models where the game changes over time (Hashimoto and Kumagai [2003]; Bednar and Page [2007]; Zollman [2008]). Such settings are more rich and interesting than the cases considered here and models which included these possibilities may offer important results. Despite these limitations, the model here has provided a set of clear results that can form a starting point for a detailed investigation on the role of social interaction in the evolution of learning.

Acknowledgements I would like to thank Brian Skyrms, Jeffrey Barrett, Simon Huttegger, Louis Narens, P. Kyle Stanford, Elliott Wagner, Kevin Zollman and two anonymous referees for helpful comments and suggestions on several earlier drafts of this paper. Department of Philosophy & Religion Northeastern University 360 Huntington Ave. Boston, MA 02115, USA [email protected]

Notes 1

We will use the terms “learning”, “cognition”, and “plasticity” interchangeably: the models provided below are intended to be applicable to any variety of social plasticity. Whether or not the term “learning” or “cognition” is appropriate in a particular situation will most likely depend on the way in which plasticity is achieved, which will not be specified in the models considered below. See Christensen ([2010]) for a recent discussion the evolution of cognition. 2 For a discussion and analysis on when these two concepts come apart (see Bergstrom and Godfrey-Smith [1998]; Thomas [1984]). 3 Note that even if a strategy is weakly invasive it may not be able to invade – this will depend on the second condition for ES state relative to the population. But if a strategy is not weakly invasive, it cannot invade. Weak and strong invasiveness is similar and related to the concept of strong uninvadability introduced by Bomze and Weibull ([1995]).

17

4

For a discussion on the discrete time replicator dynamic see (Weibull [1995]). This dynamic and it’s continuous version version (Taylor and Jonker [1978]) have close connections to the concepts set out above (see e.g. Hofbauer and Sigmund [1998]). It should be noted, however, that the presence of learners in the population complicates matters and the usual connections may no longer hold. 5 We will also say very little about the sort of input/information available to the learners. Instead we will simply specify the range of behaviors available to the learners, which may be determined on a number of factors including what information the learners use. 6 Where only one type of learning is considered, we will omit the associated index. 7 The general nature of the costs associated with plasticity is still not well understood (for a discussion on costs associated with plasticity generally see DeWitt et al. [1998]; Auld et al. [2010]). 8 It is important to note that we are assuming that there is such a function β(·), but that may not be the case for all games and all types of learners—it is possible that learners exhibit chaotic behavior and hence do not have a determinant long-run behavioral distribution (see e.g. Wagner [2012]). 9 For instance, population which is currently in evolutionary transition would provide an ideal setting for the invasion of learners. 10 For a numerical study of evolutionary stable learning rules in a setting somewhat similar see Josephson ([2008]). 11 Harley ([1981]) also attempted to examine the limiting form of ES learning rules and the properties of the relative-payoff-sum learning rule. However, this study has been called into doubt (for a dialogue concerning the validity of these claims see Harley [1983], [1987]; Houston [1983]; Houston and Sumida [1987]; Tracy and Seaman [1995]). 12 This idealizing assumption allows us to create a fixed payoff matrix for GL making the analysis much simpler. 13 “Best-response” should be read very loosely here. Given the idealizations above, all that is required is that the learners converge on some best response in G eventually–a property held by many learning dynamics studied in game theory. 14 Since the game is symmetric, only the payoffs for the row player are shown.

References Arbilly, M., Motro, U., Feldman, M. W., and Lotem, A. [2010]: ‘Co-evolution of learning complexity and social foraging strategies’, Journal of Theoretical Biology 267, pp. 573–581. Auld, J. R., Agrawl, A. A, and Relyea, R. A. [2010]: ‘Re-evaluating the costs and limits of adaptive phenotypic plasticity’, Proc. R. Soc. B 277, pp. 503–511. Bednar, J. and Page, S. [2007]: ‘Can Game(s) Theory Explain Culture?: The Emergence of Cultural Behavior within Multiple Games’, Rationality and Society 19, pp. 65–97. Beggs, A. W. [2005]: ‘On the convergence of reinforcement learning’. Journal of Economic Theory 122, pp. 1–36.

18

Bergstrom, C. T. and Godfrey-Smith, P. [1998]: ‘On the evolution of behavioral heterogeneity in individuals and populations’, Biology and Philosophy 13, pp. 205–231. Bomze, I. M. and Weibull,J. W. [1995]: ‘Does neutral stability imply lyapunov stability?’, Games and Economic Behavior 11, pp. 173–192. Borenstein, E., Feldman, M. W, and Aoki, K. [2008]: ‘Evolution of learning in fluctuating environments: When selection favors both social and exploratory individual learning’, Evolution 62, pp. 568–602. Byrne, R. W. [1996]: ‘Machiavellian intelligence’, Evolutionary Anthropology 5, pp.172180. Byrne, R. W. and Bates, L. A. [2007]: Biology 17, pp. R714–R723.

‘Sociality, evolution and cognition’, Current

Christensen, W. [2010]: ‘The decoupled representation theory of the evolution of cognition–a critical assessment’, British Journal for the Philosophy of Science 61, pp. 361–405. DeWitt, T. J., Sih, A., and Wilson, D. S. [1998]: ‘Costs and limits of phenotypic plasticity’, Trends in Ecology & Evolution 13, pp. 77–81. Dubois, F., Morand-Ferron, J., and Giraldeau, L. A. [2010]: ‘Learning in a game context: strategy choice by some keeps learning from evolving in others’, Proc. R. Soc. B 277, pp. 3609–3616. Fudenberg, D. and Levine, D. K. [1998]: The Theory of Learning in Games, MA: MIT Press. Godfrey-Smith, P. [1996]: Complexity and the Function of Mind in Nature, Cambridge: Cambridge University Press. Godfrey-Smith, P. [2002a]: ‘Environmental complexity, signal detection and the evolution of cognition’, in G. M. B. Marc Bekoff, Colin Allen (eds), 2002, The Cognitive Animal, MA: MIT Press, pp. 135–141. Godfrey-Smith, P. [2002b]: ‘Environmental complexity and the evolution of cognition’, in R. Sternberg and J. Kaufman (eds), 2002, The Evolution of Intelligence. Mahwah: Lawrence Erlbaum, pp. 233–249. Hamblin, S. and Giraldeau, L. A. [2009]: ‘Finding the evolutionarily stable learning rule for frequency-dependent foraging’, Animal Behavior 78, pp. 1343–1350. Harley, C. B. [1981]: ‘Learning the evolutionarily stable strategy’, Journal of Theoretical Biology 89, pp. 611–633.

19

Harley, C. B. [1983]: ‘When do animals learn the evolutionarily stable strategy’, Journal of Theoretical Biology 105, pp. 179–181. Harley, C. B. [1987]: ‘Learning rules, optimal behaviour, and evolutionary stability’, Journal of Theoretical Biology 127, pp. 377–379. Hashimoto, T. and Kumagai, Y. [2003]: ‘Meta-Evolutionary Game Dynamics for Mathematical Modelling of Rules Dynamics’, in W. Banzhaf, T. Christaller and J. Ziegler (eds), 2003, Advances in Artificial Life: 7th European Conference ECAL, Springer, pp. 107-117. Hofbauer, J. and Sigmund, K. [1998]: Evolutionary Games and Population Dynamics. Cambridge University Press. Houston, A. I. [1983]: ‘Comments on ‘learning the evolutionarily stable strategy”, Journal of Theoretical Biology 105, pp. 175–178. Houston, A. I. and Sumida, B. H. [1987]: ‘Learning rules, matching, and frequency dependence’, Journal of Theoretical Biology 126, pp. 289–308. Humphrey, N. K. [1976]: ‘The social function of intellect’, in P. P. G. Bateson and R. A. Hinde (eds), 1976, Growing Points in Ethology, Cambridge: Cambridge University Press, pp. 303–317. Josephson, J. [2008]: ‘A numerical analysis of the evolutionary stability of learning rules’, Journal of Economic Dynamics and Control 32, pp. 1569–1599. Katsnelson, E., Motro, U., Feldman, M. W., and Lotem, A. [2012]: ‘Evolution of learned strategy choice in a frequency-dependent game’, Proc. R. Soc. B 279, pp. 1176–1184. Kirchkamp, O. [1999]: ‘Simultaneous evolution of learning rules and strategies’, Journal of Economic Behavior & Organization 40, pp. 295–312. Maynard Smith, J. [1982]: Evolution and the Theory of Games. Cambridge: Cambridge University Press. Maynard Smith, J. and Price, G. R. [1973]: ‘The logic of animal conflict’, Nature 246, pp.15–18. Moyano, L. G. and S´ anchez, A. [2009]: ‘Evolving learning rules and emergence of cooperation in spatial prisoner’s dilemma’, Journal of Theoretical Biology 259, pp. 84–95. Real, L. A. [1991]: ‘Animal choice behavior and the evolution of cognitive architecture’, Science 253, pp. 980–986. Richerson, P. J. and Boyd, R. [2000]: ‘Climate, culture, and the evolution of cognition’, in C. Heyes and L. Huber (eds), 2000, Evolution of Cognition, MA: MIT Press, pp. 329–346.

20

Rogers, A. R. [1988]: ‘Does biology constrain culture?’, American Anthropologist 90, pp. 819–831. Roth, A. and Erev, I. [1995]: ‘Learning in extensive form games: Experimental data and simple dynamical models in the intermediate term’, Games and Economic Behavior 8, pp. 164–212. Sandholm, W. H. [2010]: Population Games and Evolutionary Dynamics, MA: MIT Press. Skyrms, B. [2004]: The Stag Hunt and the Evolution of Social Structure, Cambridge: Cambridge University Press. Skyrms, B. [2010]: Signals: Evolution, Learning, and the Flow of Information, Oxford: Oxford University Press. Smead, R. [2012]: ‘Game theoretic equilibria and the evolution of learning’, Journal of Experimental and Theoretical Artificial Intelligence 24, pp. 301–313. Smead, R. and Zollman,K. J. S. [2009]: ‘The stability of strategic plasticity’. Carnegie Mellon Technical Report, Paper 182, . Sterelny, K. [2003]: Thought in a Hostile World. Oxford: Blackwell Publishing. Taylor, P. and Jonker, L. [1978]: ‘Evolutionary stable strategies and game dynamics’, Mathematical Biosciences 40, pp. 145–56. Thomas, B. [1984]: ‘Evolutionary stability: States and strategies’, Theoretical Population Biology 26, pp. 49–67. Tracy, N. D. and Seaman, J. W. [1995]: ‘Properties of evolutionarily stable learning rules’, Journal of Theoretical Biology 177, pp. 193–198. Wagner. E. O. [2012]: ‘Deterministic Chaos and the Evolution of Meaning’, The British Journal for the Philosophy of Science 63, pp. 547–575. Weibull, J. W. [1995]: Evolutionary Game Theory. MA: MIT Press. Zollman, K. J. S. [2008]: ‘Explaining fairness in complex environments’, Politics, Philosophy & Economics 7, pp. 81–98. Zollman, K. J. S. and R. Smead (2010). ‘Plasticity and language: An example of the baldwin effect?’ Philosophical Studies 147, pp. 7–21.

21

Table 1: The payoff matrix for GL where G is a Hawk-Dove game and L is a best-response learner.

h d L

h d L 0 3 3 1 2 1 1−c 3−c 2−c

Table 2: The payoff matrix for GL where G is a Prisoner’s Dilemma and L is a competitive-response learner.

c d Lcr

c d Lcr 2 0 1 3 1 1 2.5 − c 1 − c 1.25 − c

22

Figure 1: The global dynamics of the Hawk-Dove learning game with BestResponders and c = 0.01.

23

Figure 2: The global dynamics of the Prisoner’s Dilemma learning game with Competitive-Responders.

24

The Role of Social Interaction in the Evolution of Learning

Apr 29, 2013 - way for one agent to learn will depend on the way that other agents are learn- ..... Consider the following learning rule, which we will call “competitive- ..... (eds), 2003, Advances in Artificial Life: 7th European Conference ECAL ...

373KB Sizes 1 Downloads 197 Views

Recommend Documents

The multidimensional role of social media in ... - ACM Digital Library
informed consent to informed choice in medical decisions. Social media is playing a vital role in this transformation. I'm alive and healthy because of great doctors. Diagnosed with advanced kidney cancer, I received care from a great oncologist, a g

The role of social networks in health
The role of social networks in health. 6th UK Social Networks Conference. 12 – 16 April 2010. University of Manchester. Mara Tognetti Bordogna, Simona ...

The Role of Attitude Functions in Persuasion and Social Judgment
Mar 4, 2002 - social role of attitudes has been referred to as the social identity function (Shavitt, 1989) and comprises both ... direct assessment of functions through struc- ...... tive media environments. .... Journal of Business Research,.

The multidimensional role of social media in healthcare
Mine was social media—an online patient network. My doctor recommended ... best option is to inform ourselves by talking .... graduate studies in areas such as.

The role of metacognition in human social interactions
http://rstb.royalsocietypublishing.org/content/367/1599/2213.full.html#related-urls ... 2. METACOGNITION AND MENTALIZING. (a) Metacognition and self- ...

Evolution of Social Structures in India Through the Ages.pdf ...
MHI-OO : EVOLUTION OF SOCIAL ... Buddhism and Jainism. 3. Examine ... Displaying Evolution of Social Structures in India Through the Ages.pdf. Page 1 of 8.

The Role of Imitation in Learning to Pronounce
adult judgment of either similarity or functional equivalence, the child can determine correspondences ...... Analysis (probably) of variable data of this kind from a range of speakers. 3. .... that ultimately produce them, including changes in respi

The Role of Imitation in Learning to Pronounce
Summary. ..... 105. 7.3.3. Actions: what evidence do we have of the acts of neuromotor learning that are supposed to be taking place?

The Role of Technology in Improving Student Learning ...
coupled with the data richness of society in the information age, led to the development of curriculum materials geared .... visualization, simulation, and networked collaboration. The strongest features of ..... student involvement tools (group work

The Role of Technology in Improving Student Learning ...
Technology Innovations in Statistics Education Journal 1(1), ... taught in a classroom with a computer projected on a screen, or may take place in a laboratory ...

The Role of Imitation in Learning to Pronounce
I, Piers Ruston Messum, declare that the work presented in this thesis is my own. Where ... both for theoretical reasons and because it leaves the developmental data difficult to explain ...... Motor, auditory and proprioceptive (MAP) information.

The Role of Imitation in Learning to Pronounce
The second mechanism accounts for how children learn to pronounce speech sounds. ...... In the next chapter, I will describe a number of mechanisms which account ...... (Spanish, for example, showing a reduced effect compared to English.) ...

Evolution of Social Structures in India Through the Ages.pdf ...
... political and socio- 20. economic context in which Buddhism and Jainism ... Displaying Evolution of Social Structures in India Through the Ages.pdf. Page 1 of 6.

The Role of Imitation in Learning to Pronounce
SUMMARY . ..... Actions: what evidence do we have of the acts of neuromotor learning that are supposed to be taking place?

The Role of Social Media in the Capital Market - The University of ...
We study the use of social media within the context of consumer product recalls to explore how social media affects the capital market consequences of firms' ...

The Role of Social Media in the Capital Market - The University of ...
April 2013, the SEC announced that companies can use social media outlets to .... strong incentives to adopt 'best practices' for their recall procedures to .... Social media refers to web-based technologies that enable interactions ... 10. Managing

The Role of the EU in Changing the Role of the Military ...
of democracy promotion pursued by other countries have included such forms as control (e.g. building democracies in Iraq and Afghanistan by the United States ...

The role of learning in the acquisition of threat-sensitive ...
learning is through conditioning with conspecific alarm cues paired with visual and/or chemical cues of the ... the acquisition of threat-sensitive predator learning in prey animals. In this study, we focus on understanding the ... The experiment con

The role of learning in the development of threat ...
Schematic diagram (side view) of test tanks used in experiments 1 and 2. ANIMAL BEHAVIOUR, 70, 4 ..... Ecoscience, 5, 353–360. Siegel, S. & Castellan, N. J. ...

The role of learning in the development of threat ...
Prey should gain a fitness advantage by displaying antipredator responses with an intensity .... grid pattern drawn on the side and contained a gravel substrate ...

The Importance of Rapid Cultural Convergence in the Evolution of ...
Page 1 ... Adam Ferguson Building, 40 George Square, Edinburgh EH8 9LL ... Recent work by Oliphant [5, 6], building on pioneering work by Hurford [2], ...

Evolution of Voting in the US - Home
1975 - Further amendments to the. Voting Rights Act require that many voting materials ... 1947 - Miguel Trujillo, a Native. American and former Marine, wins a.