10 :ular evolution. Theor.

Inferences about the structure and history of populations: coalescentsand intraspecific phylogeography

evant to its evolution? Wath.Monthly3S:2-26. reticsat the molecular a phase of molecular

JOHN WAKELEY Departmentof Organismicand Euolutionary Biolog, H aruard (Jniaersity,Cambridge

reutrality.I. Heterozydistribution of factors

10.1 Introduction

r: a simulation study. re adaptation of DNA I in finite populations. rutant eflect in nearly mporal fluctuation of 'oc.Natl.AcaiL Sci. USA rtainedin a finite popuation of selection inrtural selection.J. MoL r, G. G. Simpson,and 65-89. Princeton, NJ:

Population geneticistsand phylogeneticistsview tree structuresdiflerently' To the phylogeneticist,tree structuresare the objectsof study and the branching patterns a tree displaysare inherently significant. Phylogeneticistsare interestedin the relationshipsamong speciesor other taxa, and thesehistories are tree-like structures.To the population geneticist, particularly to the student of coalescenttheory, individual tree structuresare usually not of interest. Insteadattention is focused on the characteristicsof populations or species,and intraspecific trees, or gene genealoeies,are a stepping stone on the path to such knowledge. This difference in approach divides workers who study current and historical population structure into two groups: those who ascribe significanceto single gene trees and those who focus on summary properties of gene trees over many lclci. The purpose of this chapter is to give some perspectiveon this clivisionand to suggestwaysof identifying the domain of application of coalescentsand intraspecific phylogeography in terms of the historiesof populationsor species.This is not meant to be divisive'In the not too distant firture, we can hope that these complementary approacheswill be unified, as models catch up with data and a scienceof population genomics is realized. 10.1.1 Population geneticshistory Theoretical population seneticswasborn out of the tension between Riometricians (or Darwinians) and Mendelians in the early decadesof last century' We often trace our fielcl back to the famous paper of Fisher (1918) which settleclthis dispute;seeProvine (1971).In short, the Biometricians,represented by w. F. R. Weldon and Karl Pearson,had for decadesbeen measuring quantitative traits ancl considering such things as the correlation of traits between parents and oflspring. They maintained that natural selection acted on these 'fhe

liztolatiun oJ PIpukt,tirtrt, Biot,ogy,e<1.IL. S. Singh and M. K. Uyenoyama. Published by Carnbriclge University Press.O Cambridge University Press 2003'

193

t94

John Wahelq

continuous characters and that change in these was slow; discrete variation was unimportant to evolution. After the rediscovery of Mendel's laws in 1900, William Bateson, Hugo de Vries, and other Mendelians argued for the importance of discrete variations in evolution. Their views were directly opposed to those of the Biometricians; selection on continuous variation could not result in significant evolutionary steps, which were discontinuous. In hindsight we might say that the Biometricians' mistake was to confuse the continuity of traits with that of the underlying variation, and the Mendelians' error was to equate the mechanism of inheritance with that of evolution itself. In any case, it is clear that the two camps agreed only on one point: continuous variation and Mendelian inheritance were incompatible. This fundamental conflict was resolved mathematically by Fisher (1918). Specifically, Fisher showed that continuous variation could be explained by the action of many Mendelian loci of small effect. In the decade or so after this remarkable start, the major results of this new branch of science, which was called theoretical population genetics, were laid down by Fisher (1930), Haldane (1932), and Wright ( 193 1 ) . Following the birth of theoretical population senetics, the mathematical theory was extended and the facts of genetics were reconciled with Darwin's theory of evolution. During the Modern Synthesis, these avenues of research were merged into the neo-Darwinian theory of evolution, providing a series of welljustified, more or less qualitative explanations of patterns of speciation, adaptation, and geographic variation. Two of the major architects of the Modern Synthe sis were Dobzhansky ( 1937) and Mayr (1942). Our modern understanding of evolution is grounded in neoDarwinism. During the next few decades, many workers contributed to the theory, although Mal6cot (1948) and Kimura (1955a,b) certainly stand out. By 1960 the mathematical theoryof population genetics had developed avery high degree of sophistication, although for the most part, as Lewontin (I974) notes, this was in the absence of genetic data. It wasn't until the mid 1960s that population genetics finally confronted genetic data (Harris 1966, Lewontin and Hubby 1966). Since then, we have seen a grand shift in population genetics from the forward-looking view of the classical theory of Fisher, Haldane, and Wright to the backward-looking view of the coalescent or genealogical approach; see Ewens (1990) for a review of this transformation. The modern approach focuses on inferences from samples of genetic data and, often to great advantage, recasts theoretical problems in terms of genealogies. Significant works along the path to this include Ewens (1972), which describes the distribution of the counts of alleles in a moderate-sized sample from a large population, and Watterson (1975), which describes the distribr.rtion of the number of polymorphic nucleotide sites in either a moderate or a larse sample from a large population. The retrospec-. tive approach came fully to life in the early 1980s with the introduction of the coalescent process by Kingman ( 1982a,b,c), Hudson ( 1983b), and Tajima (1983). The present relative lack of concern for the structures of particular

gene genealo coalescent mi Section 10.2t

10.

CharlesDarw

logenetic tree the idea ofdel the relationsh way to represe Prior to Darw convenientor ple, the class structurewhic lution. \Arhent explanationfo icance.Theyr. historiesofgro and Wallaceha addition phylo lutionary idea1 will inherit any Until the la population ge now a lot ofove population ge mathematicsa many parallels statistical,and t models within 1 In contrast skepticismof s dent in the cla 1966).This app the leastwith th that requiresth potentially true predictswhatfu false.It is not vir is understanda directly observ ogy and develo derived charact

POPULA-T-ION STRUCTURE

slow; discrete variation f Mendel's laws in 1900, Lsargued for the impor,ere directly opposed to Lriationcould not result nuous. In hindsight we nfuse the continuity of lendelians' error was to ution itself. In any case, rt: continuous variation ically by Fisher (1918). could be explained by , the decade or so after "anch of science, which down by Fisher (1930), :h of theoretical populalnd the facts of senetics turing the Modern Syne neo-Darwinian theory or lessqualitative explargraphic variation. Twcr Dobzhansky(1937) and on is grounded in neokers contributed to the L,b)certainly stand out. Lcshad deveklped a very rart,asLewontin (l974) etics finally confionted 6). Since then, we have :orwardlookins view of' o the backward-looking Ewens (1990) for zr reuseson inf'erenccs fiorn recaststheoretical probthe path to this includc re counts of alleles in a Vatterson( 1975), which phic nucleotide sites in lulation. The retrospec,ith the introduction of on (1983b) , zrnd Tajirnzr structurcs of particrrlar

AND

HISTORY

195

gene eenealogies traces back to the constant-size, single-population, neutral coalescent model described in these works. which is discussed in detail in Section 10.2 below.

10.1.2 Phylogenetics and intraspecific

phylogeography

Charles Darwin's famous book contains.just one figure: a hypothetical phylogenetic tree. Long before Darwin (1859) and Wallace (1858) put forward the idea of descent with modification, biologists had employed trees to depict the relationships among species and higher taxa. Tiee structures are a natural way to represent such affinities, which are sroups nested within other groups. Prior to Darwin and Wallace, howeveq trees had been employed strictly as convenient organizational tools to represent systematic affinities. For example, the classification system put fbrward by Linnaeus (1735) is a branching stmcture which clelineates relationships, yet Linnaeus rejected the idea of evolution. \4rhen the idea of descent with modification gained acceptance as the explanation fbr biological cliversity, these tree structures gained a new signif: icance. They were no longcr an expedient, but rather represented the actual histories of eroups of species. The development of phylosenetics since Dzrrwin ancl Wallace has been strongly influence d by the concept of trces as hist
196

John Wakel.el

approach have been made in responseto seeing the blind application of the parsimony method to data which have not been subject to the careful prior study Hennig envisioned,and which are more labile than complex morphological features.Thus, with the introduction of model-basedapproaches,like the maximum likelihood method of Felsenstein (1981), the recent history of phylogeneticshas been a progressiveacceptanceof the mathematical and statisticaltheory. However,this processof acceptanceis still ongoing. Coincident with the emergence of the backwardJooking,genealogicalapproach to population genetics,phylogenetic methods began to be applied to intraspecificdata.Thiswasgreatlyfacilitatedby the nonrecombining nature of the first molecule examined - animal mitochondrial (mt) DNA- and the growing technical ability during the 1970sand 1980sto assaysamplesof mtDNA from natural populations. The result was a new and active subfield of evolutionary biology called intraspecific phylogeography,or just phylogeography (Avise al at. 1987,Avise 1989, 2000). A number of new methods of historical inference have resulted from this approach (Neigel et al' 199I, Neigel and Avise 1993,Templeton et al. 1995,Templeton 1998). The hallmark of phylogeography is that inferencesare drawn from intraspeciesor organismal gene trees which are reconstructed from data. The fcrcuson gene trees as indicators of population structure, population history, and speciation has provided a much needed bridge between phylogeneticsand population genetics (Hey 1994,Avise 2000). However, there is still a gulf between workers schooled in population genetics and those who favor traditional phylogeneticsor cladistics. Bluntly put, the latter group tends to place too much emphasison single gene genealogieswhereasthe former group placestoo little. Drawing conclusionsfrom single genealogiescan be problematic becauseeach is only a single point in the spaceof all possiblegenealogies.Under some kinds of population histories,this will causeseriouserrors in inference. Conversely,focusing too much on the standard, structure-less,history-lesscoalescentmodel gives a picture of the utility of single gene trees that is too discouraging' 10.2 Gene genealogies and the coalescent In the early 1980s,the ancestralprocessknown asthe coalescentwasdescribed. Kingman (1982a,b,c)provided a mathematical proof of the result. Hudson (1983b) and Tajima (1983) introduced this genealogical approach to population geneticistsand derived many biologically relevant results. Nordborg (2001) providesa recent review;see also Hudson (1990) and Donnelly and Tavar,6(1995). Kingman found a simple ancestral process to hold for samples from a wide variety of different typesof populations, in the limit of large population sizeand providing that the genetic lineagesin the population are exchangeable (Cannings 1974). Exchangeablelineagesare ones whose predicted properties are unchanged if they are relabeled or permuted (Kingman

P

1982b,Aldous 1985 miliar Wright-Fishe nonoverlappinggen ation model of Mor; tual population size the rate of the coale important assumpt population must be population subdivis \Arhen time is me diploid organisms,c organisms, the time mean

where A is the numl model, each of the (N recombination, whic exactly n - 1 coales sample.Thus, every1 the most r€c€Ilt, ft : shows an averageco alescent intervalsdn coalescentintervalst average the final co; time from the prese ple. Becausethe time we expect genealog As we trace the al that existshas the sa happens each pair is of treesunder the co pairs of lineages.Thr starting at the root of from the fact that th Iineagesare exchang Without intralocusrc share the samegene will have uncorrelate topology.Considerin and labeled them A, topologies- ((,4,B),

POPULATION

tpplication of the the careful prior romplexmorphoLapproaches,like he recent history nathematicaland ongoing. , genealogicalapn to be applied to nbining nature of {A-and the growof mtDNA Lmples subfield of evolut phylogeography hodsof historical 1991,Neigel and Lallmarkof phylo'organismalgene re treesas indication hasprovided ion genetics(Hey rkersschooledin Ieneticsor cladismphasison single . Drawingconclulch is only a single : kinds of populanversely,focusing scentmodel gives raging.

It

entwasdescribed. re result.Hudson approachto popresults.Nordborg md Donnelly and i to hold for samr the limit of large he population are e oneswhose prermuted (Kingman

STRUCTURE

AND

HISTORY

t97

1982b,Aldous 1985).With the assumptionthat all variation is neutral, the familiar Wright-Fisher model (Fisher 1930,Wright 1931) of a population with nonoverlapping generationsfits this criterion, as does the overlapping generation model of Moran ( 1958). Different populations will differ in how the actual population sizeis related to the effectivepopulation sizethat determines the rate of the coalescentprocess.The standard coalescentinvolvestwo very important assumptionsbesidesexchangeability.For this model to hold, the population must be of constant effectivesizeover time and there must be no population subdivision. \Arhen time is measured in units of 2N" generations for a population of diploid organisms,or in units of N, generationsfor a population of haploid organisms, the time to a coalescentevent is exponentially distributed with mean

(10.1) where & is the number of ancestral lineages present. Under the coalescent model, each of the (f) possiblepairs of lineagescoalesceswith rate 1. Without recombination, which will be treated later, a sample of size n will go through exactly n - 1 coalescentevents to reach the common ancestor of the entire sample.Thus, every genealogyhas n - 1 coalescentintervals,beginning with the most recent, h: n, and ending with the most ancient,k - 2. Figure 10.1 shows an average coalescentgenealogy;that is, with the lengths of the coalescent intervals drawn in proportion to Equation 10.1. The more recent coalescentintervals tend to be much shorter than the ancient ones, and on averagethe final coalescentinterval representsmore than half of the total time from the present back to the most recent common ancestorof the sample. Becausethe time scaleof the coalescentprocessdepends inverselyon ly'", we expect genealogiesto be longer when the effectivesizeis larger. As we trace the ancestry of the lineages back in time, becauseeach pair that existshas the same rate of coalescence,when a common ancestor event happens each pair is equally likely to be the one that coalesces.The structure of treesunder the coalescentis determined by this processofjoining random pairs of lineages.The result is, if we think in forward time for the moment starting at the root of the tree, a random-bifurcating tree topology.This results from the fact that there is no structure to the coalescentprocess- that all Iineagesare exchangeable- and the resulting treesare likewiseunstructured. Without intralocus recombination, all the sitesat a single genetic locus will share the same genealogy.Loci that segregateindependently of each other will have uncorrelated genealogies,both in terms of the coalescenttimes and topology.Considering topological structure,ifwe took a sampleof three items, and labeled them A, B, and C, then each of the three possible rooted tree topologies- ( (,4, B), C), (( A, C), B), (( B, C), A) - is equallylikely to occur.If

John Wakeley

198

tt

LB

+.

h : ABCDEFGHI The lengths Figure 10.1. A hypothetical coalescent genealogy of a sample of size N:9. ol'the coalescent intcrvals, l, through 12,are drawn in proportior-r to their expected values given by Equation 10.1.

we take a large sampleof independently segregatingloci, we expect to observe eoual numbers of each of these three trees.

10.3 The axes of genealogical variation: tree size and branching pattern As a starting point in talking about demographic history, we can take the standard, coalescentprocessas a null model. The underlying, exchangeable population genetic models, such as the Wright-Fisher model, are familiar to most biologists and their use as null models is not uncommon. This estabIishespredictions fbr what we should observein a sample of sequencesfrom a population. With reference to the discussionof the coalescentabove, we are interested in two kinds of genealogicalvariation: (1) variation in the total

length of the tree,ar a genealogyis the su coalescentmodel,th times with different when a large numbe tern of a genealogyr tips, or leavesof the members of the samp The genealogyor bri draw from the rather It is very importan ogy of genealogiesis nealogieswill comei rely on mutations oc sequence polymorph tation per locus is $1 generation, and mut Therefore, the numh length I will be Pois tation rate per gene becomesT0l2,where model, the paramete ferences betweentwo mutation processis a tion in the observab sitesin the sample.T regating sitesin a sam identical in sizethere the randomnessof thr variation in S amongI Our ability to uncc We become awareof on them. \Arhenthe m small, and recombina model of Watterson(1 Under this model,eac unmutated site.The as in a sample of DNA se this is not the mostim1 mutates at most onceir the result of a singlem titions of the samplem tical. Correlation in ge in sequencedataby the

POPULATION

STRUCTURE

AND

HISTORY

199

length of the tree, and (2) variation in the branching pattern. The leneth of a eenealogy is the sum of the lengths of all its branches. under the standard coalescent model, this is given by the sum of n - r, independent exponential times with different pararneters. we expect this distribution to be realized

t,t

t.l

t1

ampleoi'sizc N: 9. Tl-rclcnsths 'oportion to their expcctcd vallrcs

ing loci, we expe ct to observe

iation: tree size fn

hic history, we can take the re underlying, exchaneeable Fisher model, are familiar to not uncommon. This estaba sample of sequences from of the coalescent above, we ion: (1) variation in the total

when a larse number of independent loci are sampled. The branching pattern of a genealow specifies 2n- 3 partitions of the n sampled sequences, tips, or leaves of the tree . That is, each branch in the genealogy divides the members of the sample into two groups, the ones on either side of the branch. The senealogy or branchine pattern at each sampled locus will be a random draw fiom the rather large universe of all possible random-bifurcating trees. It is very irnportant to note that our ability to observe the length and topology of genealoeies is rnediated by mutation. Even without any variation, qenealogies will come in different sizes and shapes; we.just won't know it. we rely on mutations occurrins along the branches of the tree to procluce the sequence polymorphisms that provide clues about history. The rate of mutation per locus is typicallyvery small, somewhere around 10-a to 10 6 per generation, and mutation events in diff'erent senerations are independent. Therefrrre, the number of mutations that occur along a senetic lineage of length I will be Poisson distributed with expectation tu, where z is the mutation rate per generation. \vhen time is rescaled as in the coalescent, this becomes 7'0l2,where 7' : t/ (2N,) and 0 : 4N,u. In the standard coalescent model, the pararneter 0 is equal to the expected number of nucleotide dit ferences between two randornly chosen gene copies. The randomness of the mutation process is an important fhctor in determinine among-locus variation in the clbservable indicatrlr of tree length: the number of polymorphic sites in the sample. The letter s is used to denote the number of thcse sesregating sites in a sample. Even when the genealogies at different loci are all identical in size there will be Poisson variation around the expectation due to the randomness of the rnutation process. This imposes a lower botrnd on the variation in s amons loci, namely that the variance will be equal to the mean. our ability to uncover geneal'uical topolouy also depends on muution. we becorne aware of particular branches in the tree when mutations occur on them. When the mutation rate at each nucle<>tidesite at a genetic locus is small, and recombination is absent or vcry unlikely, the infrnite-sites mutation mode I of watterson ( I 975) is a goocl approximation to the mutation process. Under this rnodel, each time a new mutation occurs, it hzrppensat a previously unmutated site. The assumption of no recombination grraranteesthat all sites in a sample of DNA sequences will share the szrme bifurcatine t.pology, but this is not the most important aspect of warterson's (1975) model. If each site mutates at most once in the history of the sample, then each polymorphisrn is the result of a sinE;lernutation event on some branch in the tree , and the partitions of the sarnple made by the branch ancl by the polyrnorphism are identical. correlation in genealoeical topoloeies amons loci will be represented in sequence data by the repctition of such site fiequency patterns at rnany loci.

200

POPUL,

John Waheley 10.4 The effects of population structure history on genealogies

First Lr

and population

of deThis section describes the effects on the size and shape of genealogies model, particularly viations tiom the assumptions of the standard coalescent subdivision' of population kinds two and time over sire changes in effective in the figure lines thin The 10.2. Figure in summarized are These eltects incomplete represent population boundaries, and thin, dashed lines indicate samples of these of genealop;ies The of individuals. movement the barriers to (a) throueh scenario, size four are drawn usins thick lines. For each historical indepenclently (d), hypothetical geneatgies are shown for samples from tlvo and popustructure of populati.n effects the illusirates ihis loci. ,"g."guting "shape" here on the sizes and shapes of genealogies' Note that tul,rrinirtJ.y different of lenS;ths refers only to topological structure and not to the relative time change the thro'gh size in pop.lation changes I' brief, a'tree. of parts dependent, but distribution of tree sizes by making the coalescence rate time the distrialters do not affect the topology of trees. Population subdivision shape of the on bution of tree lengths, but it also can have dramatic effects than likely tnore trees because it makes some colnmon ancestor events much

\a)

o

U

Ifclnepopulationistwiceasbigasanother.theformerhasonehalfthe twice as big in the rate t>f coalescence as the latter. on average, trees will be has srown population a single when larger population as in the smaller one. hack in Looking proportionately' ir-r ,ir., th" .ut" of coalescence responds it will then of growth, time the until time, the rate of coalescence will be low sizes of relative for the coalescent increase. The predictions of the standard longer no will 10.1 in Figure pictured ancient and recent coalescent intervals and the more hold. Instead, the more recent intervals will be relatively longer relatively ancl is rapid If growth shorter. ancient intervals will be relatively recent,genealogieswilltendtobestarshaped,thatis,tohavesmallinternal populati.n growth by itself will nor alter n.ar-r.he, (slatkin and Hudson lggi). because when a coalescent event topologies, the probabilities of genealogical chance of being the one that equal an has still o...r.. each pair of lineages coalesces. by a single If population growth is rapid enough, it is rvell approximated has twtr process ancestral the case, this In size. abrupt ihur'tg" in population in r.eas'red size in population of change tinte the addiiional parameters: 7i;, of : ratio the l''[,, Nrtl and generations, size) Q units of 2|v, (currenteffective and present the Between sizes. population eff'ective the ancestral and currcnt ',1i;, as befbre each pair of lineages coalesceswith rate equal to one' where time '{; describes also model this course Of lineages' , the rate is Q per paiiof

-A | /l AB

(b)

O U o

o

!. AB(

others. 10.4.1 Population growth

I

lc,

o d

(,J)

d

c d

o a

), A

Figure 10.2. Two hypothetica segregating loci under the fc growth, (b) population declrr flow Thin lines indicate popr

POPULATION

First Locus

md population

STRUCTURE

AND

201

HISTORY

Second Locus

(o)

pe ofgenealogies ofde:ent model, particularly population subclivision. thin lines in the figure nesindicate incomplete rgiesof these samples of al scenario, (a) through iom two independently ion structure and popuNote that "shape" here rtive lengths of different rrough time change the lte time dependent, but livision alters the distrieffects on the shape of s much more likely than

.q

o !

O

lnl -A:

(b)

o

= o O

-L __l.]//r LA_.] ]A[ABCD

ABCD

(,,) rrmer has onc half the ill be twice as big in the lepopulation has grown nately.Looking back in reof growth. then it will r for the relative sizes of . g u r el 0 . l w i l l n o l o n g e r e l yl o n g e r a n d t h e m o r e r is rapid and relatively is, to have small internal wth by itselfwill not altcr w h e r ra c o a l e s c c n le v e n t :e of being the onc lhat

tr c e

(,J)

tr o ft

E @

pproximated by a single ncestralprocess has twtr u l a t i o ns i z em e a s t t r e di r r Q: ,V'^/lv . the ratio o{' ]etween the present and il to one , whereas befcrre ris model also describes

ABCDABCD Figure 10.2, Two hypothctical gcnealogicrsIbr a samplc of size four :rt two inclependently segregating loci under the four population models discusserl in the text: (a) population growth, (D) population decline, fu) cquilibriurn migr:rtion, and (d) isolation without gene flow. Thin lines indicate populatiorr boundaries.

202

John Wakelq

populationdecline,whichisdiscussedinSection10'4'2below'IfQ<1' be relatively g.o*tt, has occurred and the more recent coalescenttimes will times coalescent recent most the and occurred has io.rg, urrd if e > 1 decline will be relatively short. inFigure 10.2(a) showsthe genealogiesof samplesfrom two hypothetical' In both event. growth abrupt an of case the for loci depe"ndentlysegregating change in cases,the ,u*p1. of f"nt lineages traces all the way back to the effecrecent the Because event. coalescent single a sizewithout experiencing is much event coalescent first the to back time tive size is large, the expected arrive in greater than ihe time back to the growth event. \Arhenthe lineages in increase a great experience they population, the much smaller ancestral reached is sample the of ancestor common the and the rate of coalescence, among quickly.Therefore, most treeswill be about the samesize'and variation distributhe Flowever, coalescent. in Kingman's than less ihem will be much model' tion of tree topologieswill be the sameasin the standard,constant-size tlvo gethe Thus, pattern. branching in differ will loci and trees at different A samples left' the On structures' different have 10.2(a) nealogiesin Figure first. are c which B and it is right the on to coalesce, and B are the first 10.4.2 Population decline in Figure Turning rapid growth on its head, we have the caseof rapid decline the during coalescence of rate higher relatively (b). a be will there Here 10.2 size' recent part of the history, up until the time of the decline in effective As above, the event is assumedto be abrupt, simply for easeof explanation' Samplesatsomeloci,liketheoneontheleftinFigurel0'2(b)'willtraceback treeswill to a most recenr common ancestorbefore reaching the event.These in size, be short. If multipte lineagestrace their ancestryback to the decline then the rate of coalescencefor those remaining lineagesdecreasesin proporintervals tion to the magnitude of the change in size.The ancient coalescent Figure in will be much elongated in this case,which is depicted on the right 10.2(b). Therefore, there will be a lot of variation in the size of genealogies terms of among loci, more than in the standard, constant-ly'rcoalescent.In will tree siucture, again becausethe lineagesare exchangeable'genealogies of corlevel low be random-bifurcating trees and there will be the samevery standard relation in branching pattern at independent loci that is seenin the coalescent.Thus,asinFigurel0.2(a),thegenealogiesinFigurel0'2(b)are diff'erent at two independent loci. f 0.4.3 Equilibrium migration population subdivisionintroduces structure to genealogies,structure that may loci to coirelate with geography,and causesthe tree topologies at different of genealogies sizes be correlated. Subdivisionwill also affect variation in the

among loci, bu occur amongs Section 10.4.4 into D demesa The demesare by migrants ea commonlyeml theoreticalstu constant-raten that the effect grants are equz model doesnc assumptionsoI ography and g They will shor and powerful r (Hudson et al. can be consid sidered here ir genealogies. The param of rz1sequence 4l,lm.If r, and within and be model we have

(Li 1976).For There are twos 2,, doesnot dr This is a spec within-demep migrant and to Equation10.2 no longer hol of the deme r migration rate of z,r, and the back to Wrigh panmictic som even when the

POPULATION

) . 4 . 2b e l o w . l f Q < 1 , times will be relatively ecent coalescent times n two hypothetical, ingrowth event. In both rack to the change in causethe recent effecalescentevent is much r the lineages arrive in rce a great increase in the sample is reached e, and variation among However, the distribud, constant-sizemodel, .ern. Thus, the two geOn the left, samples A C which are first.

rapid decline in Figure :oalescenceduring the :cline in efTective size. rr easeof explanation. 10.2(b),will trace back ) event.These trees will l to the decline in size, ;esdecreasesin propor)nt coalescentintervals on the right in Figurc the size of qe nealosies :oalescent. ln terms of ;eable,genealogies will re very low levcl of coris seen in the standard s in Figure 10.2(b) are

,gies,structure that may gies at different loci to [he sizesof genealoeies

SI-RUCTURE

AND

HISTORY

203

amons loci, but the direction of this effect depends on whether misration can occur among subpopulations or demes, as this section supposes, or not, as in Section 10.4.4 below. For simplicity, assume that a population is subdivided into D demes and conforms to the symmetric island model of Wright (1931). The demes are of equal size, N, and the fraction of each deme that is replaced by migrants each generation is the same and equal to m. This is by far the most commonly employed model of a subdivided population in both empirical and theoretical studies. The term equilibrium migration refers to the fact that this constant-rate migration is supposed to have been ongoing for long enough that the effects of any prior history are erased. In Wright's island model, misrants are equally likely to come from any deme in the population. Thus, this model does not include explicit geography. Populations that adhere to the assumptions of the island model will not display the correlation between geography and senetic variation known as isolation by distance (Wright 1943). They will show different levels of polymorphism within vs. between demes, and powerful nonparametric tests to detect subdivision have been developed (Hudson et al. 1992). In the case of .just two populations, the islancl model can be considered an explicit model of geography. This sirnple case is considered here in order to illustrate the effects of equilibrium migration on genealogies. The parameters that determine the pattern of genetic variation in a sample of n 1 seqrrences fiom one deme and n2 sequences frclm another are 0 and M : 41,'lm.If n,,, and r 1,are the average number of pairwise nucleotide differences within and befiveen populations, respectively, then fbr the D-deme island mcldel we have E(nr,,) :

P9,

Ii(n):De(+*)

(10.2) (10.3)

(Li 1976). For the hvo-deme model, we put D :2in Equations 10.2 and 10.3. There are two surprisins aspects of these equations. First, the expected value of 2,,, does not depend on the rate of migration (Slatkin 1987, Strobeck 1987). This is a special property of the symmetric island model: the tendencies of within-cleme pairwise coalescence times to be short if neither of the pair is a migrant and to be long if one of them is a migrant averase out perfectly to give Eqr-ration 10.2. If any asymmetries are introduced into the model, this result no longer holds. Second, the effect of subdivision depends on the product of the deme size and the migration rate, which is captured in the scaled migration rate M. As Msrows large, the expectation of z, converges on that of T,,,, and the population will appear panmictic. This surprisins result traces back to Wright (1931), and explains why populations that are obviously not panmictic sometimes show no evidence of subdivision. That is, M can be large even when the per-generation rate of migration, rz, is small. Equations for the

204

JohnWakelq

be found (Wakeley variancesof t. and z6 both within and among loci can rate' \A4ren Mis migration 1996a,b),and these both depend on the scaled population' and panmictic in a large, the variancesbecome ihose expected grow' u, M d....utes the variancesofpairwise differences to levels of The predictions of Equations 10.2 and 10,3 can be extended of gelevels migration' polymoiphism in larger sumple'' under equilibrium for than samples multi-deme for netic variation will be largei on average In small' is M when be greater single-demesamples.Thelffect of this will sequences n1 the among times thJsample (ry,ra) from two demes,coalescent two' will tend to from deme one, and among the n2 sequencesfrom deme demes' different from sequences be shorter than coalescenttimes between be the longer no will genealogies of This means that the topological structure be will There coalescent. standard random-bifurcating tree, piedicted by the exactly sample the divides that branch a tendency towardstreeswhich have a example trees in into the n1 arrdlxz sequencestaken from each deme' for this tendency Again, monophyletic. which the demic sampiesare reciprocally two demes the between rate migration will be more pronounced if the sialed and left right the on loci independent two is small. Thui, the genealogiesfor ofFigure10.2(c)uotnsnowthiskindoftopology.Inaddition,variationin levelsofpolymorphismamonglociwilldependinverselyonthescaledmigra. tionrate,M;lorexample,r..H"y(1991)'so,forthesameaveragerateof have very short polymorphisms under Lquilibrium migration, some loci will (c) ' 10'2 Figure in displayed is also und ,o*! very long histoiies. This 10.4.4 Isolation without gene flow Equilibriummigrationisjustoneofamultitudeofpossibleexplanations uncommon for a for the occurrence of s,,bdivision.In fact, it is probably populationtoremainstablysubdivided,bothinthesizesofdemesandinthe equilibrium' One ,ui., und patterns of migration, for long enough to reach is that most studies phylogeographic from of the earliest tenets to1r.r"rg. over time in demography shifts dramatic speciesappear to have experienced with models to moment the for ourselves ionfining *a rpu.L^1,tuiselggg). and isolation is migration equilibrium of opposite discrete demes, the polar ancestral an posits model isolation This exchartge. divergencewithout genetic some time, 3r, in the popuf,utionthat spliti into two descendantpopulations at pu.tu,.aafterthattimethetwopopulationsdonotexchangemigrants.The in Section 10'4'3 isolation model can be .o.r'rpu..J with the migration model nonequilibrium and equilibrium between differences to illustrate the striking 'population I , . ' g . ' . , u l , esubdivision. achpopulationintheisolationmodelmightbeofadifferent : 4l{au as parameters ,ir", a"ttdwe would have 91 : 4N11r,0z : 4Nzu' and 0a the equi(Wakeleyand Hey 1997). However,for purposesof comparison with :02: 0a' In 9l librium migration model of Section 10'4'3,we assumethat

this case,thr demeshaver

(Li 1e77). isolationwit

of geneticva words, if z, and their pa betweenmil two modelsi isolationmo a particular the levelofc all time, and drift and m! In additi netic variati even when t 1985,Wake isolation,an fact that unt demescano there canbe the ancestr isolationm< gible. In thi Ioci will ap1 d Tp.In con model, the much great pendent Ioc those show lation witho towardsrec

1 0 . 5I

The abover population nealogies.T

POPULATION

:an be found (Wakeley 'ation rate. \A4ren M is micticpopulation, and row. e extendedto levels of migration,levels of geieme samplesthan for :r when M is small. In sequences mong the ??1 deme two, will tend to from different demes. eswill no longer be the ralescent.There will be 'idesthe sample exactly -', for example trees in ic. Again,this tendency betweenthe two demes rci on the rieht and left r addition, variation in elyon the scaledmigrare sameaveragerate of Lociwill have verY short Figure10.2(c). Low f possibleexplanations bablyuncommon for a of demesand in the Lzes reachequilibrium. One hic studiesis that most idemographyover time momentto models with igrationis isolation and rodelpositsan ancestral sat sometime, 4;, in the :xchangemigrants. The modelin Section10.4.3 um and nonequilibrium I might be of a different t:41\/tu asparameters rmparisonwith the equin e t h a t0 t : 0 z : 0 * l n

STRUCTURE

AND

205

HISTORY

this case, the average numbers of pairwise differences within and between demes have expected values

(10.4)

E(n.) :9 E ( n 6 ): 0 ( I ' l

To)

(10.5)

(Li 1977).Aside from a constantscalingfactor (D), equilibrium migration and isolation without gene flow make identical predictions about averagelevels of genetic variation within and between demeswhere T2 : r I (2 M'In other words, if r,u and Ir6 aLremeasured from data, then both models could be fit and their parametersestimated,but 2,, and t 6 would not serveto distinguish between migration and isolation. The most obvious difference betr,veenthe two models is in the interpretation of the pattern of polymorphism. Under the isolation model, geneticvariation betweendemesin a sampleis a snapshotfbr -l a particular 4r. If the population were sampled again at a later date,7'n 7' , the level of divergencewould be greater.Equation 10.3,in contrast, holds for all time, and representsa dynamic balanceachievedbetween ongoing genetic drift and migration. In addition to this difference in interpretation, variation in levels of genetic variation among loci will be different under migration and isolation even when the averagelevelsare the same (Li 1976, 1977,Takahataand Nei 1985,Wakeley 1996a). The variancesare larger under migration than under isolation, and the difference grows with rD : r I (2 M ' This results from the fact that under migration, coalescenteventsbetween samplesfrom different demescan occur at any time, mediated by migration, whereasunder isolation there can be no interdeme coalescenteventsuntil the lineagestrace back into the ancestralpopulation. In the extreme of a very long divergencetime in the isolation model (7i) >> 1), difference between E(tr1,) and 0To w\l\ be negligible. In this casethe distribution of the number of segregatingsitesamong Ioci will approach a Poisson distribution, with mean and variance equal to 0 7)t.ln contrast, in the extreme of a very low migration rate in the migration model, the variance of the number of segregatingsites among loci will be much greater than the mean (Wakeley1996a). Thus, the trees for two independent loci under isolation in Figure 10.2(d) are more similar in size than rhose shown in Figure 10.2(c) for migration. Equilibrium migration and isolation without p;eneflow share the prediction that genealogicaltreeswill tend towardsreciprocal monophyly, and this is also displayedin Figure 10'2(d)' 10.5 Domains of application: coalescents and phylogeography The above discussionillustrates some general principles about the effect of population structure and population history on the sizesand shapesof genealogies.To summarize:

John Wakel.el

206

POPUL

1. population growth/decline tends to decrease/increase variation in tree size among loci but does not affect variation in tree shape relative to the standard coalescentmodel, 2. both equilibrium and nonequilibrium population subdivision (migration vs. isolation above) alter the structure ofgenealogies such that genealogies at independently segregating loci will tend to share topological features, and 3. migration increasesvariation in tree sizeamong loci whereas isolation decreases

ir. This section investigates how the strengths of these trends depend on the parameters of a population. The goal is to identif' population histories for which the analysis of single gene genealogies is likely to be fruitful and those for which it witl be less useful to refer to any speciflc genealogy. Simulations are used to determine the distribution of tree size and shape among loci. The parameters are those discussed above in Section 10.4 and the quantities usecl to measure variation in the size and shape of genealogies are described below.

10.5.1 Measures of variation in tree size

infi nite-sites mutation, ,-l

t-l

l

(10.6) '

n.r I

Y(S):9f=+0't-

r _ _ lI 1

(10.7)

:''

(Watterson 1975). \A/henwe sample a large number of loci, we should find that the mean and variance among them would conform to Equations 10.6 and 10.7. This, of course, assumesthat the sample size,n, and the mutation parameter,6, are the same at every locus. However,this assumption is made only as a matter of conveniencein comparing different population structures and histories below; it would be straightforward to allow for differences in I and n among loci. There are many waysin which we could compare levels of variation in S, our measure of tree size,among loci. The standardizedmeasure, c)-

r/kr - S r// ' \ "s\/

The number of segreg Section 10.3).Thus we c ( l) to variationin t-ree siz is no variation in the size< Swill be due to the Poiss be zero. Instead,if the va the mean, then V(S) wil one. Thus, Q is anormali assumptionsabout the p< predicts a fairly high valt and n:20, which are tl g i v e sE ( Q ) : 0 . 8 2 .

10.5.2 Measu

The most straightforward measure of the size of a genealogy is the number of segregating sites, ,S.A sample from any population will have some expected value of S and some variance. For example, in the case of a sample of n sequences under the standard, constant size, unstructured coalescent with

E ( s ): 0 ) -u ?:

is easy to compute. The e

(10.8)

will be used here, in which S is the averagenumber of segregatingsitesand lGl ir the observedvariance of S among loci. Given a multilocus data set, I

There is also a multitude among loci. If we knewtl trees reconstructedfrom like that of Robinsonan dent about our reconstr to them, we could use s terns among loci suchasr Kojima 1960).This meas but multilocus statistics ar focus on simple two-dem co-occurrenceof identica identical patternsof poly loci. This presupposesth loci. Assuming that the infi site in a sample dividest which retain the ancest the mutant base.As nott between mutation event servation ofa pattern in 1 genealogyof the sample phism does. For exampl in the genealogyin Figu make a polymorphic site tant base and samplesA,

POPUI,Af'ION

creasevariation in tree slze rape relative to the standard n subdivision (migration vs. ; s u c ht h a t g e n e a l o g i e sa t i n pologicalfeatures, and :i whereasisolation decreases

Lesetrends depend on the iS' population histories for rely to be fmitful and those :ific genealogy. Simulations and shape amons loci. The 0.4 and the quantities used llogies are described below.

r tree size

( r 0.6) (1 0 . 7 )

?,1

iber of loci, we should find conform to Equations 10.6 l e s i z e ,n , a n d t h e m u t a l i o n 'eq this assumption is made terent population stmctures Loallow for difl'erences in 0 >arelevels ofvariation rdized measure,

ANIJ HISTORY

207

is easy to compute. The expectation of Q is given approximately by

E(a) =

v ( s )- E ( s )

(10.e)

"ft)

The number of segregating sites, S, is a compound random variable (see Secrion 10.3). Thus we can intuitively partition V(S) into contributions due ( 1) to variation in tree size and (2) to variation in the mutation process. If there is no variation in the size of genealogies among loci, then all of the variation in S will be due to the Poisson mutation process and the expected value of Q will be zero. Instead, if the variation in tree size among loci is much greater than the mean, then V(5') will be larse and Q will be close to its upper borrnd of one. Thr"rs,I is a normalized measure which can be cornpared under diflerent zrssumptions about the population. Our null model, the standard coalescent, predicts a fairly high value of 9, depending of c()urse on 0 and n. If 0 : I0 which are the values used in sirnr.rlationsbelow, Equation 10.9 an:tdn:20, g i v e sE ( Q ) : 0 . 8 2 .

10.5.2 Measures of correlation in branching pattern

genealogy is the number of rn will have some expected the case of a sample of n rstructured coalescent with

I ;

STRUCTURE

in S,

(r 0 . 8 ) ber of segregating sites and iven a multilocus data set, Q

Thcre is zrlso a multitude of ways we could compare genealogical topologies anlong loci. If we knew the true trees clr if we were very confident about our t r c e s r C C ( ) n s f r t r c t e Idi o m d a t a . l h e n w e c o t t l d t t s c a 1 1 q s6 0 m p a r i s ( ) n t n e t r i c like that of Robinson and Foulds (1981). Alternatively, if we are not confident about ()ur rec()nstructed trees or do not wish to make explicit reference to them, we could use some measure ol the correlation in haplotype patterns antons loci such as coefficient of linkage clisequilibrium (Lewontin and Kojima 1960). This measures gametic associationsbetween alleles at two loci, but rnultilocus statisticsare also pctssible(Smouse 1974). Here, because of the fbcus on simple two-deme models of subdivision, we will instead consider the co-occurrence of identical data partitions amone loci, that is the observalion of identical patterns of polymorphism among members of the sample at several loci. This presupposes that the same inclividuals were assayed at all genetic loci. Assumine that the infinite-sites mutation model holds, each polymorphic site in a sample divides the mernbers of the sample into two sroups, ones which retain the ancestrzrl base at the site and ones which have inherited the rnutant base. As noted in Section 10.3, the one-to-one correspondence between mutation events and polymorphic sites in the sample, and the observation of zrpattern in the data guarantee the existence of a branch in the genealogy of the sample, one that divides the sample exactly as the polyrnorphism cloes. For example, a mutati()n event on the shortest internal branch in the senealogy in Figure 10.1, the one which exists only during 15,,would make a polymorphic site at which samples E, i', and G would show the mutantbase and samples A, B, C, D, H,and lwould show the ancestral base'

20u

John Wakelel

POPUL

In the standard coalescent model, we would not expect to see this pattern repeated at another, independent locus sequenced in the same individuals because the fiaction of random-bifurcating trees that contain such a branch is very small. However, all genealogies contain n external branches, on which singleton polymorphisms can arise, so we would expect to see these partitions, i.e., all n kinds of sing;letons, repeated at many loci. Thus, there is a negative correlation between the allele frequency at a polymorphic site and the chance that the same pattern will be fbund at other loci. In a sample from a subdivided population, we expect sites which divide the sequences along deme-sample lines to tend to be repeated at multiple loci. There might be a fairly low overall concordance of whole tree topologies among loci, because of the variability of within-deme Patterns of cornmon ancestry, but some branches would tend to be repeated. For the simple twc> derne models considered here, these repeated branches will be the ones that divide the sample into the fl.1and i?2sequences sampled from demes one and two. A statistic that will be sensitive to the co-occurrence of single partitions across loci is max(P) , in which 1; is the fraction of loci that show at least one polymorphic site with partition L singleton partitions are excluded in the calcr-rlation of max(p;) because all loci are expected to show these resardless of population structure and history. This measure will be sensitive to the effects of subdivision as it is mrtdeled here. As thc level of subclivision increases, the partition most frequently observed across loci will be the one that corresponds exactly to thc two demes' samples, and ma,x(p;) will approzrch one . We take the null distribution Of max(1t;) to be that fbund uncler the stzrndard coalescent. This will depend on the sample size and on 0. For 0 : 10 and n : 20, used in the simulations bclow, the standard ctlalescent gives rnax(Nti) - O.O+.

10.5.3 Simulations of population

structure and population

history

The usual c()alescent simulzrtions were perfbrmed (Hudson 1990), adding a chanse in sizc, cf. Hr,rdson (1990), or migration/isolation, cf. Wakeley (1996b), as indicated. The statisticsQ ancl rnax(p;) were computed firr each simtrlation replicate. In addition to sirnulations ttnder the standard coalescent model, a small set of' parameter values was chosen to illustrate the eff'ects of population structure and population history on the .ioint distriwhen therc was no bution oI'Q and max(Nt;). The sarnple size was n:20 Only one case :'n2:10 isolation' and nligration under structure, and nl ( d : 1 0 0 ' 0 , i s d e c l i n e p r c s e n t e d : ?;:0'l) each of growth and Q:0'0t, 7?r:0.1). These were selectedto represent exand (g :0.25, (]:100.0, treme growth and extreme clecline respectively, and the values of 0 were chosen so that the average number of polymorphic sites per locus would be the same under both models. Several levels of strbdivision were investigated for equilibrium migration and isolation without sene flow Under migration these were M: 0.5,0.25, 0.01 with 0 : 5.0, and under isolation they were

I

* o&o@

0 . 7 5 [-t -E

o

_#

0.5

0.25

U

0

{).2;

ma,

Figure 10.3. The resultsof the is the pair ol (m.,x(p;),9) val

7 o : 1 . 0 , 2 . 0 , 5 0 . 0 w i t hd sideration of Equations I pairwise differences withi in the two models for tnr independent loci were su The results are shown perfbrmed fbr each set o cases,and the resultsof a under the standard coaie max(p;) - 0.04 mentione vzrlue of max(p;) is near value o1'S)chanses drastic I0.4 above. The minor dif coalescent result from thr conrptrtine mo,x(p;), and growth than under popu rnutation rate effect on Q Equations 10.6 and 10.7 sum, under this model of eenealogy to accurately r< there is no structure to thr or no infbrmation about s i z c n o r t h e s h a p eo f a s i n Sr.rbdivided populatior equilibrium and nonequ

POPULATION

to see this pattern re same individuals Ltainsuch a branch )ranches, on which ieethese partitions, there is a negative site and the chance . sites which divide peated at multiple role tree topologies rtterns of common For the simple tworill be the ones that om demes one and of single partitions at show at least one re excluded in the rw these regardless nsitive to the effects ision increases, the re that corresponds ch one. We take the .andard coalescent. L n dn : 2 0 , u s e d i n

p,) x 0.04. )ulation history lson1990),adding lation, cf. Wakeley computedfor each re standardcoales:n to illustrate the on the joint distri,vhenthere was no on. Only one case l = 0 . 0 1 ,4 ' : 0 . 1 ) :d to represent exe valuesof 0 were per locuswould be n were investigated v.Under migration isolationthey were

AND

STRUCTURE

1

tu

0.75

o

o

209

HISTORY

tr

Coalescent

O

N { i g r a t i o nM , :

o

\tligration, &I : 0.25

A

\'Iigration, ,\4 - 0.01

n

Isolation,Tn:

O

Isolation,TD- 2.0

tE

Isolation,Tt:

Z

S i z e - c h a n gQ c ,: 0 . 0 1

Z

Size-c1itr,ngc, Q: 100.0

0.5

0.5

0.25

U

0

0.25

0.5 mar(p,)

0.75

1

1O

50.u

Figurel0.3. Theresultsofthesimulationsdescribedinthetext.Eachpointinthescatterplot is the pair of (max(pi), Q) values for a single simulation replicate.

7 b : \ . 0 , 2 . 0 , 5 0 . 0w i t h 0 : 1 0 . 0 .T h e s ep a r a m e t e rs e t sw e r e c h o s e ni n c o n siderationof Equations10.2through 10.5,so that the expectednumbers of pairwise differenceswithin and between the two demes would be equivalent in the two models for three different levels of differentiation. One hundred independent loci were surveyedin the sampled individuals. The results are shown in Figure 10.3. Only ten simulation replicateswere performed fbr each set of parameters,as this was enough to distinguish the cases,and the results of all replicatesare plotted in Figure 10.3. Simulations under the standard coalescentmodel cluster around the valuesQ : 0.82 and max(Fi) - 0.04mentioned above.Under population growth and decline,the value of max(p) is nearly unchanged from the constant-sizecase,but the value of Q changesdrastically.This accordswell with the discussionin Section 10.4above.The minor diff-erencesin max(pt) between theseand the standard coalescentresult from the fact that singleton polymorphisms are ignored in computing max(p;), and there are a lot more singletonsunder poprrlation growth than under population decline. This is essentiallythe same as the mutation rate effect on Q that can be seen for the standard coalescentfrom Equations 10.6 and 10.7; as I grows, so does the expected value of S). In sum, under this model of dramatic growth we expect the sizeof even a single genealogyto accuratelyrepresent the history of the population but, because there is no structure to the population, the topology of the tree contains little or no information about historical demography. Under decline, neither the sizenor the shapeof a single genealogywill be informative about history. Subdivided populations vary both in O and in max(Ft).Under both equilibrium and nonequilibrium subdivision,the repetition of genealogical

2t0

John Wahelq

topologies acrossloci provides information about the structure of the population. That is, migration and isolation converge on max(p;) : 1 when M becomessmall and Tp becomeslarge, respectively.Two interesting aspectsof this are evident in Figure 10.3.First, the rates of convergenceto this extreme are different under migration and isolation. For example, when we expect the averagenumber of pairwise differences befiveen demes to be twice as big as thatwithin demes (M : 0.5 or Tp - 1.0; seeEquations10.2 to 10.5), simulationsgive max(p;) - 0.18 under migration and max(p;) x 0.45 under isolation. This is expectedfrom previouswork on genealogicaltopologies under the two models (Tajima 1983,Takahataand Slatkin 1990,Wakeley1996b). In the present context it means that, other things being equal, single gene treeswill be more informative about population structure under isolation than under migration. The second point is related to this; that is, subdivision has to be quite strong under migration for max(pi) to approach one. Even when the averagenumber of pairwise differences between demes is 50 times that within demes, about four out of 100 loci will not show the (ry, ,a) Partition that defines the samples.That equilibrium migration is a highly variable processcan also be seen in valuesfor 9, which approach one as M decreases'In contrast, as T2 increasesbetween two isolated demes,I decreases,but a very long divergencetime is required for f2 to be close to zero. The measuresI and max(pt) appear to distinguishwell among the models. In addition, they serveto illustrate how single gene trees might or might not be representativeof population structure and population history in terms of the parametersof the models.The broad empty area of Figure 10.3,for lower valuesof Q and intermediate valuesof max(p,i),is an artifact of the simplicity of the models considered here. Populations that follow the isolation model but have a small value of 6,4relative to 6l and 02 can produce values in this range. 10.6 Conclusions \Arhilereconstructing a genealogy is not a necessarystep in population genetic inference, it can be quite informative under some circumstances.There is a difference of approach in this regard between workers who use coalescent techniques and those who practise intraspecific phylogeography.\{trile this dichotomy is far from complete, it is real enough. Coalescenttechnicians do not usually make reference to particular gene trees. This is part of the culture of coalescents:that gene trees are unobservable random quantities which certainly shapegenetic variation but whose branching patterns do not contain much information about population history. This view is most reasonablewhen populations conform to the standard coalescentmodel. \Arhen trees are referred to explicitly, it is typical to "integrate" over them in making inferences (Kuhner et al. 1995, Grifiths and Tavar6 1996). In contrast, the first step in a phylogeographic analysisis to reconstruct a gene tree from

data, and inferen the significancec phylogenetics.At favorable to usin drift is relativelyr splitting of popu Only the siml population that s kind of historyhr mostfruitful; tha in the genealog common ancest the single-geneconsideredby Sla a single samplew population treetr upon the popula drift to be neglig different from th 1988).This will t that haveunderg to be restrictedf< This treatmen bination betwee Multiple genealo will be correlate 1985).Restricte pled loci correla Intralocus recom alize a given part branches to be s though. Intraloc ' cause there will lation among loc will increasether lematic for inferr bifurcating tree I hurdle to coales patterns (Grifith The entire fie change between already. On the r ident in Avise'sI the currently mc

POPULATION

of the pop=lwhenM rg aspectsof ]ris extreme I we expect be twice as ) . 2t o 1 0 . 5 ) , r0.45under pologiesuneley1996b). singlegene rlation than division has Evenwhen J times that e) Partition ariableproecreases. In s,but a very themodels. r might not in termsof 3, for lower e simplicity tion model .luesin this

'ulationgerces.There usecoalesLphy,\{hile :echnicians part of the quantities :rnsdo not s most reardel.\{4-ren :m in makn conrast, :"treefrom

STRUCTURE

AND

HISTORY

211

data, and inferences are basedupon this inferred tree. This sensibilityabout the significanceof inferred treeswas received and adapted from the field of phylogenetics.At the intraspecific level, roughly speaking,the circumstances favorable to using inferred gene trees are those in which random genetic drift is relativelyunimportant compared with nonequilibrium factors like the splitting of populations. Only the simplest nonequilibrium model was considered here: a single population that split into two isolated demes at some time in the past. This kind of history has the qualities necessaryfor the single-treeapproach to be mostfruitful; thatis,smallI andlarge max(p;).Howeveqmostof the branches in the genealogiesunder this model, those for the intrademe patterns of common ancestry,will be discordant amons loci. A more ideal scenario for the single-gene-treeapproach is the stepping-stonemodel of range expansion consideredby Slatkin (1993), which is a history of multiple isolation events.If a single samplewastaken from each subpopulation, then we might expect the population tree to be reproduced at many loci. Of course,this too will depend upon the population splits being separatedenough in time for the effect of drift to be negligible. Otherwise, even without migration, a gene tree may be different from the population tree (Neigel and Avise 1986, Pamilo and Nei 1988). This will be an issue as well for continuously distributed populations that haveundergone ranse expansions;the movement of individualswill have to be restricted for historical structure to be evident in gene tree topologies. This treatment has assumedno recombination within loci and free recombination betweenloci. Intralocus recombination will decouple sites'histories. Multiple genealogieswill be realized in the history of a single locus and these will be correlated along the sequence (Hudson 1983a,Kaplan and Hudson 1985). Restrictedinterlocus recombination will make genealogiesacrosssampled loci correlated. Both of theseprocessesshould tend to increasemax(p;). Intralocus recombination increasesthe number of chancesa locus has to realize a given partition, and restricted recombination between loci will cause branches to be shared acrossloci. They should have opposite effects on f), though. Intralocus recombination will lower the variation in tree sizesbecause there will be more independence among sites.The increased correlation among loci causedby restricted interlocus recombination, conversely, will increasethe variance of tree size.Intralocus recombination is quite problematic for inferred gene-treeapproachessince the genealogyis no longer a bifurcating tree (Hein 1993). It also representsa significant computational hurdle to coalescentinference methods which make explicit use of linkage patterns (Grifiths and Marjoram 1996). The entire field of population genetics will benefit fiom increased exchange between coalescentsand phylogeography.There is growing overlap already. On the one hand, the importance of coalescentapproaches is evident in Avise's (2000) book about phylogeography.On the other, one of the currently most used coalescentinference programs, cENETREE(Bahlo

212

John Wakelq

and Grifiths 2000), produces an inferred genealogy.The future availability of multilocus genetic data will serve as a further bridge befween these two approaches.

10.7 Acknowledgments It hasbeen my pleasureof the pastfewyearsto be a colleagueto Dick Lewontin' I am thankful to him for the inspiration to do good work, and to Rama Singh for the invitation to contribute to this volume in Dick's honor. I also thank Monty Slatkin for comments on the manuscript. This work was supported by grant DEB-9815367from the National ScienceFoundation'

REFERENCES Aldous, D. ]. (1985). Exchangeability and related topics. In A. Dold and B. Eckmann - 19$, pp. 1-198. Vol. 1117 of reds) lciole dEte d,eprobabititistl,eSaint-IilourXII Springer-Verlag' Berlin: Le.ctureNotesin Mathematics' to Avise,J. c. (1989) . Gene trees and organismal histories: a phylogenetic approach population biology. Euolution 43:7192-208' Cambridge' MA: theHi'storyand'Fotmationof Species' evise,J. C. (2000) . Fhylogeography: Press. Harvard UniversirY Avise,J. C.,Arnold,J., Ball, R. M', Bermingham, E', Lamb, T', NeigelJ' E" Reeb' C.,4., and Saunders, N. C. (1987) ' Intraspecific phylogeography: the mitochondrial DNA bridge between population genetics and systematics.Annu. Rzu.Ecol. Sysr.18:489-522. gahlo, M., and Grifiths, R. C. (2000). Inference from gene trees in a subdivided population. Theor.Popul.Biol. 57:79-9b. cannings, c. (1974). The latent roots of certain Markov chains arising in genetics: a new approach. L Haploid models. Ada, Appl. Prob'6:260-90' London: Murray' Darwin, C. (1859). On the Ori$n of Species. Dobzhansky, T. (1937). Geneticsand Lhe origtn of species.New York: columbia University Press. Donnelly, i., and Tavart, S. (1995). Coalescentsand genealogical structure under neutrality. Annu. Reu. Genet.29:401-21. Ewens,W j. (1972). The sampling theory of selectivelyneutral alleles. Theor.Popul. Biol.3:87-II2. - the Past and the future. In Ewens, w. J. (1990) . Population genetics theory S'Lessard(ed')Mathematicaland'statisticalDeuelopmentsoJ'EaolationarlTheorl, pp. 177-227. Amsterdam: Kluwer Academic Publishers' Feisenstein,;. (1981). Evolutionary trees from DNA sequences:a maximum likelihood approach. J. Mol. Euol. 17:368-76. Fisher, R. A. (19f8). The correlation between relatives on the supposition of Mendelian inheritance. Tians. R. Soc.Edin.52:399-433' oxford: clarendon. Fisher,R.A. (1930). TheGeneticalTheoryofNaturalselzction. Grifiths, R. c., and Marjoram, P. (1996). Ancestral inference from samples of DNA sequenceswith recombination'./. Comp.Biol' 3:479-b02'

Grifiths, R. genetic Haldane,J Harris, H. 310. Hein,J. (1 to recon Hennig, l\ Hennig, l\ Hey,J.(19 tion mo HenJ. (19 els.InB tion:Apl Hudson, F binatior Hudson, F sequen Hudson,R andJ.I Oxford: Hudson, tr geograF Ikplan,N a select Kimura, M model. Kimura, N under n Kingman, Kingman, In G. I{

Pp.97-

Kingman, 19A27 Kuhner, IV size anc Genetia Lewontin, Univers Lewontin diversit gosityir Lewontin polymo Li,w-H. ( cistronr 10:303

POPUI,ATION

Ihe future availability ge betweenthese two

Lgueto Dick Lewontin. :k, and to Rama Singh .'shonor. I also thank rork wassupported by tion.

\,.Dold and B. Eckmann c p . 1 - 1 9 8 .V o l . 1 1 1 7 o f rylogeneticapproach to Species, Cambridge , MA: r, T, NeigelJ. E., Reeb, rgraphy: the mitochonmatics.Annu. Reu.Ecol. e treesin a subdivided ainsarising in genetics: L90. New York: Columbia .logicalstructure under tral alleles.

'I'heor. Popul.

ast and the future. In i of Euolutionarl l'heory, rces:a maximum likelion the supposition of Jxford: Clarendon. e from samplesof DNA

STRUCTURE

AND

HISTORY

213

Grifiths, R. C., and Tavar6,S. (1996). Monte Carlo inference methods in population genetics. Math. Com,put.Modelling 23:141-58. London: Longmans Green. Haldane, J. B. S. ( 1932). The Causesof Natural Selection. Harris, H. (1966). Enzyme polymorphism in man. Proc.R. Soc.Lond. Ser.B164:298310. Hein,J. (1993). A heuristic method to reconstruct the history of sequencessubject to recombination. J. Mol. Euol. 36:396-405. . n l o n o l .t 0 : 9 7 - 1 1 6 . H e n n i g . W . ( 1 9 6 5 ) .P h y l o g e n e t i sc l s t e m a l i c sA. n n u . R c u E Urbana: University of Illinois Press. Hennig, W. (1966). PhllogeneticSystematics. Hey,J. ( l99l ) . A multi-dimensional coalescentprocessapplied to multi-allelic selection models and migration models. Theor.Popul.Biol.39:30-48. Hey,.j. ( 1994). Bridging phylogenetics and population geneticswith gene tree models. In B. Schierwateq G. P.Wagneq and R. DeSalle (eds) MolecularEcologiandEuolution: Approathesand Applications,pp.4SS-49.Basel,Switzerland:Birkhiuser Verlag. Hudson, R. R. ( 1983a). Properties of a neutral allele model with intragenic recombination.'fheor. Popul. Biol. 23:183-207. Hudson, R. R. (1983b). Testing the constant-rateneutral allele model with protein sequence data.Euolation37:203-17. InD.J.Futuyma Hudson,R.R. (1990).Genegenealogiesandthecoalescentprocess. and J. Antonovics (eds) OxJorl Suruey in Eaolutionarl Biologl, pp. 7-44. Vol. 7. Oxford: Oxford University Press. Hudson, R. R., Boos, D. D., and Ihplan, N. L. (1992).A statisticaltest {br detecting geographic subdivision. MoL Biol. Eaol. 9:1.38-5I. Kaplan, N. L., and Hudson, R. R. (1985). The use of sample genealogiesfor studying a selectivelyneutral mloci model with recombination. TheorPopuLBiol.28:382-96. Kimura, M. (1955a).Solution of a processo{ random geneticdriftwith a continuous model. Prot:.NatI. Acad. Sti..USA4I:744-50. Kimura, M. (1955b). Stochasticprocessesand the distribution of gene frequencies Lrnder natural selection. Colrl Spring Harbor S1m,p.Quant. Biol.20:33-53. Appl. 13:235-48. Process. Kingman,J. F. C. (1982a). The coalescent.Stochasti,c Kingman,J. F. C. (1982b). Exchangeability and the evolution of large populations. In G. Koch and F. Spizzichino (eds) I)xchangeabilitlin Probabilityand Stati,stics, pp. 97-l I 2. Amsterdam: North-Holland. Kingman,.l.F.C. (1982c). On the genealogyof large populations..l. Appl. Prob. 19A:27-43. Kuhner, M. K., Yamato,J.,and Felsenstein,J.(1995). Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. ()eneticsI 40:7421-30. NewYork: Columbia Lewontin,R.C. (1974) . TheGeneticBasisofEuolutionaryChange. University Press. L e w o n t i n , R . C . , a n d H u b b y , J . L .( 1 9 6 6 ) . A m o l e c u l a r a p r o a c h t o t h e s t u d y o f g e n i c diversity in natural populations II. Amount of variation and degree of heterozyGenetics54:595-609. gosity in natural populations of Drosophi,lapseudoohscura. Lewontin, R. C., and Kojima, K. (1960). The evolutionary dynamics of complex polymorphism s. Euolution I 4:450-72. Li,W-H. (1976).Distributionofnucleotidedifferencebetweentworandomlychosen cistrons in a subdivided population: the finite island model , Theor.Popul. Iliol. 10:303-8.

214

John Wahel,el

Li, W.-H. (1577). Distribution ofnucleotide difference between two randomlv chosen cistrons in a finite population. Geneti.cs 85:331-7. Linnaeus, K. ( 1735) . SystemaNaturae. Mal6cot, G. (1948). Les Mathimatiques de I'Heridi.ti. Paris..Masson. Extended translation: TheMathematicsoJHeredig.San Francisco:W H. Freeman (1969). Mayr, E. (1942). Slstematicsand the Ori$n of Species. New York: Columbia University Press. Moran, P.A. P. (1958). Random processesin genetics.Proc.Camb.Phil. Soc.54:60-77. Neigel, J. E., and Avise,J. C. (1980). Phylogenetic relationships of mitochondrial DNA under various demographic models of speciation, In E. Nevo and S. Karlin (eds) EvoltttionaryProwsses andTheory,pp.5l5-34. NewYork: Academic Press. Neigel,J. E., and Avise,J. C. (1993). Application of a random walk model ro geographic distribution of animal mitochodrial DNAvariation. Genetics I35:1209-20. Neigel,J. E., Ball, M., andAvise,J.C. (1991).Estimationof singlegenerarionmigration distancesfrom geographic variation in animal mitochodrial DNA. Euolution 45:423- 32. Nordborg, M. (2001). Coalescenttheory. In D.J. Balding, M.J. Bishop, and C. Cannings (eds) Handboohof Statistical(knetics.Chichester:.|ohn Wiley. Pamilo, P., and Nei, M. (1988). The relationships between gene rrees and species trees. Mol. Biol. Euol.5:568-83. Provine,W. B. (1971). T'heOriginsofT'heoretit:alPopulationGenetics. Chicago: University of Chicago Press. Robinson, D. F., and Foulds, L. R. (l9Bl). Comparison of phylogenetic trees.Ma,th. Biosci.53:137-47. Slatkin, M. (1987). The averagenumber of sites separaring DNA sequencesdrawn from a subdivided population. l'heor PopuLBiol.32:42-49. Slatkin, M. (1993). Isolation by distance in equilibrium and non-equilibriurn populations. Euoluti,on47:264-79. Slatkin,M., and Hudson, R. R. (1991).Pairwisecomparisonsof mitochondrial DNA sequencesin stable and exponentially growing populations. Genetics 129:5b5-62. Smouse, P. E. (1974). Likelihood analysis of recombination disequilibrium in multipleJocus gametic frequencies. Genetics 76:557-65. Strobeck, C. (1987). Average number of nucleotide differences in a sample fiom a single subpopulation: a test for population subdivision. Genetics ll7:149-53. Tajima,F. (1983).EvolutionaryrelationshipofDNAsequencesinfinitepopulations. Genetics105:437-60. Takahata,N., andNei, M. (198b).Genegenealogyandvarianceof interpopulational nucleotide differences. Genetics 110:325-44. Takahata, N., and Slatkin, M. (1990). Genealogy of neutral genes in two parrially isolated populations. Theor.Popul.Biol. 38:331-50. Templeton, A. R. (1998). Nested clade analysis of phylogeographic data: resring hypothesesabout eene flow and population history. Mol. Ecol.7:3Bl-97. Templeton, A. R., Routman, E., and Phillips, C. (1995). Separating population structure fiom population history: a cladistic analysisof the geographical distribution of mitochondrial DNA haplotypes in the tiger salamandeq Am\stoma tig'inum. Genetics 140767-82.

P(

Wakeley,J. (1996a).Tl tions with migration Wakeley,J. (1996b).D pairwise differences Wakeley,J.,and Hey,J 14b:847-55. Wallace,A. R. (1858) original rype. Proc.I Watterson, G. A. (197 without recombinat Wright, S. (1931).Evo Wright, S. (1943),Iso

POPULAI'ION

)etweentwo randomly chosen

is: Masson.Extended transla[. Freeman(1969). ' w Y o r k :C o l u m h i a U n i v c r s i r y )roc.Camb.Phil. Soc.5460-71.. lationshipsof mitochondrial ion, In E. Nevo and S. Karlin 'ew York:Academic Press. random walk modcl to georiation. GeneticsI 35 : I 209-20. n of singlegeneration misramitochodrial DN A. EaoluLi,on rlding,M. J. Bishop, and C. ester:JohnWiley. weengene trccs and specics t Crnil i rs.Ch icago:U n iversiry of phylogenetic trees.Mat,l't. ating DNA sequencesdrawn Lr-4a. t and non-equilibrium popu:isonsof mitochondrial DNA lations.Genetics 129:555-62. r b i n a t i o nd i s c q t r i l i b r i r r mi n 15.

lferencesin a sample fiom a tn. Geneti cs117: | 49-53. uencesin finite populations. 'arianceof in terpopulational eutral genes in two partially rylogeographicdata: testing; Mol.Ecol.7:381-97. Separatinepopulation strucre geographicaldistribution nander, AmbysLomatigrinum.

STRUCTURE

AND

FIISTORY

215

Wakeley,J. (1996a). The variance of pairwise nucleotide differences in two popr,rlations with mieration. TheorPopuLBi.ol.49:39-5?. Wakeley,.f. (1996b). Distineuishing migration from isolation using the variance of pairwise differences. Theor Pop'ul.Biol. 19:369-86. Wakeley,J., and Hey,.f. ( I 997) . Estimating ancestralpopulation parameters. Geneti,cs 145:847-55. Wallace,A. R. ( l85B). On the tendency of varieties to depart inclefinitely from the original qtpe. Prot: Li,'nn.Soc.LontL.3:53-62. Watterson, G. A. (1975). On the number of segregating sites in genetical models wi thout recombinatio n.'I'heor.PopuL Biol. 7 :256-76. l6:97-159. Wright, S. (1931). Evolntion in Mendelian populations. Genetics Wright, S. (1943). Isolation by distance.(lenelics28:114-38.

Wakeley 2003 Inferences about the structure and history of ...

Page 1 of 23. :ular evolution. Theor. evant to its evolution? Wath. Monthly3S:2-26. retics at the molecular. a phase of molecular. reutrality. I. Heterozy- distribution of factors. r: a simulation study. re adaptation of DNA. I in finite populations. rutant eflect in nearly. mporal fluctuation of. 'oc. Natl. AcaiL Sci. USA. rtained in a finite ...

5MB Sizes 0 Downloads 165 Views

Recommend Documents

The-Structure-Of-Magic-A-Book-About-Language-And-Therapy.pdf
Page 1 of 3. Download ]]]]]>>>>>(-PDF-) The Structure Of Magic: A Book About Language And Therapy. (-EPub-) The Structure Of Magic: A Book About ...

Source fault structure of the 2003 Bam earthquake ...
May 14, 2005 - earthquake, southeastern Iran, inferred from the aftershock .... geological map supplied by the National Geoscience Database of Iran. (c) The ...

Source fault structure of the 2003 Bam earthquake ...
May 14, 2005 - inferred from the aftershock distribution and its relation to the heavily damaged area: ... Mw 6.5 by using a temporal seismic network. The.

Source fault structure of the 2003 Bam earthquake ...
Received 5 February 2005; revised 9 April 2005; accepted 20 April 2005; published 14 May 2005. [1] We investigate the hypocenter distribution of aftershocks of the December 26, 2003 Bam earthquake. Mw 6.5 by using a temporal seismic network. The hypo

Thermal structure and exhumation history of the ... - Caltech Authors
Oct 20, 2004 - (HHC) and the black slates and quartzites (LH), lending support to the validity of defining the MCT based on ... [14] The carbonaceous black schists, which are rich in CM and relatively abundant in the LH, make it possible ...... The a

25-Amazing-and-Disturbing-Facts-about-the-Hidden-History-of ...
before making any changes to your diet, prescription drug use, lifestyle or .... 25-Amazing-and-Disturbing-Facts-about-the-Hidden-History-of-Medicine.pdf.

Implications of life history for genetic structure and ...
Nov 11, 2005 - of low river Xow, and recruitment of species that require access to the ...... 3.0: an inte- grated software package for population genetics data analysis. .... brachyuran crabs and implications for tidal inlet management. Wetlands ...

Implications of life history for genetic structure and ...
Nov 11, 2005 - brates with three main types of larval development: (1) dispersal ...... uthern. African co astal in verteb rate sp ecies. R esu lts o f no n-sign ifi can.

Intuitive and reflective inferences
individual development would go a long way towards explaining how human ...... Resnick, L. B., Salmon, M., Zeitz, C. M., Wathen, S. H., & Holowchak, M. (1993).

Differential Expression and Network Inferences through Functional ...
5University of British Columbia, Department of Urologic Sciences, Vancouver, British Columbia ...... lel to an increased understanding of the natural history of the.

Liakos Recent information about the building history of St Paul ...
Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Liakos Recent information about the building history of St Paul monastery.pdf. Liakos Recent informa

Wakeley 2010 Natural selection and coalescent theory ...
Page 3 of 33. Wakeley 2010 Natural selection and coalescent theory. WakeleyChapterShortDraft.pdf. Wakeley 2010 Natural selection and coalescent theory.

Differential Expression and Network Inferences through ...
Jun 19, 2008 - time transformation coefficients, we note that the natural domain of the parameters φi is ... Sensitivity analysis to our prior choices is presented in the Web Supplementary ..... yses (Web Supplementary Materials, Section 5) indicate

Review of A History of the Federal Reserve. Volume 1 (2003) by Allan ...
Chicago, and the level of short-term nominal interest rates, would indicate whether monetary ease or tightening ... support of the other Reserve banks. Strong's ...

An Introduction to Its History and Grammatical Structure ...
Click link bellow and free register to download ebook: SUMERIAN LANGUAGE: AN INTRODUCTION TO ITS HISTORY AND GRAMMATICAL. STRUCTURE (MESOPOTAMIA) BY MARIE-LOUISE THOMSEN. DOWNLOAD FROM OUR ONLINE LIBRARY ...

Making Inferences Template - Blank.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Making ...

READINGS AND RESOURCES ABOUT THE SCIENCE OF ...
http://teachpsych.org/ebooks/asle2014/index.php. Deans for Impact (2015). ​The Science of Learning. ​. Austin, TX: Deans for Impact. [​Link​]. Dumont et al ...

Monodromies and the structure of gauge and gravity ...
In particular, remarkable new insight arises from the intimate relation between grav- ity and Yang-Mills ..... S-Matrix Program,” Phys. Lett. B 695 (2011) 350 ...

Graded structure and the speed of category verification: On the ...
For non-social categories (e.g., BIRD), participants were faster to classify typical instances than atypical .... testable propositions, both of which received support.