phylogenetic information, and implications for tropical forests.

4

Franck Jabot and Jérôme Chave

6

Laboratoire Evolution et Diversité Biologique CNRS / Université Paul Sabatier Bâtiment 4R3 31062 Toulouse cedex 4 France

8 Short running title: Inferring neutral parameters with phylogenies 10

Keywords: neutral theory, parameter inference, Approximate Bayesian Computation, phylogenetic imbalance, community phylogenetics, dispersal limitation, regional species pool.

12

Number of words in the abstract: 150 Number of words in the manuscript: 4678

14

Number of references: 48 Number of Tables: 1.

16

Number of Figures: 6. Correspondence:

18

Franck Jabot Laboratoire Evolution et Diversité Biologique CNRS / Université Paul Sabatier

20

Bâtiment 4R3 31062 Toulouse cedex 4 France Tel: 05. 61. 55. 67. 60.

22

Fax: 05. 61. 55. 73. 27. E-mail: [email protected]

1

24

Abstract

26

We develop a statistical method to infer the parameters of Hubbell’s neutral model of biodiversity using data on local species abundances and their phylogenetic relatedness. This

28

method uses the Approximate Bayesian Computation (ABC) approach, where the data are summarized into a small number of informative summary statistics. We used three statistics:

30

the number of species in the sample, Shannon H index of evenness, and Shao and Sokal’s B1 index of phylogenetic tree imbalance. Our approach was found to outperform previous

32

methods, illustrating the potential of ABC methods in ecology. Applying it to four large tropical forest tree datasets, the best-fit immigration rates m were two orders of magnitude

34

smaller and regional diversities θ larger than previously reported for the same data. This implies that neutral-compatible regional pools of tropical trees should extend over continental

36

scales, and that m measures, in this context, mostly the frequency of long distance dispersal events.

38

2

INTRODUCTION 40 As phylogenetic trees become more widely available through molecular methods, ecologists 42

have an invaluable new dimension of information with which to investigate the mechanisms of community assembly (Losos 1992; Webb et al. 2002; Pennington et al. 2006). It has long

44

been suggested that ecological processes may have a distinctive fingerprint on the patterns of evolutionary relatedness of coexisting species (Hutchinson 1959; Losos 1992; Wiens &

46

Donoghue 2004). However, most models of species coexistence fail to take into account the phylogenetic structure of a species assemblage, as they are usually restricted to the scale at

48

which individuals interact physically, thus neglecting larger spatial and temporal scales (Ricklefs 2004). One exception is Hubbell’s (2001) neutral theory of biodiversity and

50

biogeography, which includes both local and regional processes in a single conceptual framework. Hubbell’s neutral model assumes that a local community is connected to a larger

52

pool of species through dispersal, much like in classic island biogeography (MacArthur & Wilson 1967). A single parameter θ, the number of species arising per generation through

54

speciation in the regional species pool, summarizes the diversity in this pool. A second parameter m, the fraction of recruits coming from the source pool into the local community,

56

describes the magnitude of dispersal limitation in the local community.

58

In many of the recent applications of the neutral theory, the species abundance distribution of a local assemblage has been used to test the neutral theory. Of particular relevance here is the

60

work of Latimer et al. (2005), who assessed the evolutionary consequences of the neutral assumption by analyzing plant species abundance distributions in the South African fynbos.

62

They inferred the neutral parameters and found neutral migration rates to be two orders of magnitude smaller than those found in tropical forest tree communities. Latimer et al. (2005)

3

64

also found that parameter θ, as inferred from their dataset, was much higher than in previous reports. They interpreted this result as a signature of a peculiarly high speciation rate of the

66

fynbos flora. However, Etienne et al. (2006) critically reassessed this finding. They commented that the inferred neutral parameters θ and m often have two nearly equally likely

68

values when they are inferred from species abundance data. For the tropical forest trees of Barro Colorado Island (BCI), Etienne et al. (2006) found that the maximum likelihood

70

estimation of the neutral parameters (θ, m) yields two different but almost equally likely maxima. This finding is a serious theoretical challenge for the neutral theory, because it

72

suggests that there is no way to estimate accurately these parameters. For the BCI tree dataset, it has been assumed that the parameters are close to θ=47.7 and m=0.093 (Leigh 2007). The

74

second combination of neutral parameters (θ = 241.9, m = 0.003, Etienne et al. 2006) would imply that the BCI forest is far more dispersal-limited, and that the regional diversity is much

76

higher than previously imagined.

78

Aside from Latimer et al. (2005), attempts to examine the neutral theory at evolutionary timescales are scarce (but see e.g. Lande et al. 2003, Lavin et al. 2004). Many studies of

80

macroevolutionary patterns are based on simple models of lineage diversification, such as the Yule model, which produces tree topologies assuming that all species have the same chance

82

to give rise to an altogether new species (Yule 1924). Simulated trees are often compared with empirical ones using the balance of the tree topology, that is, the extent to which nodes define

84

subgroups of equal size (balanced trees are also called ‘symmetrical’, while imbalanced ones are sometimes called ‘pectinate’, see Mooers and Heard 1997). It turns out that the Yule

86

model produces trees that are generally more balanced than those reconstructed from biological data (Mooers and Heard 1997). Recently, Mooers et al. (2007) used Hubbell’s

88

neutral model to produce simulated trees, and they found that it produced trees less balanced

4

than empirical ones. Hubbell (2001) had already noticed that larger θ values were associated 90

with a more even distribution of speciation events among lineages, hence neutral trees simulated with larger θ should be more balanced. Thus, there are reasons to believe that the

92

trees produced by Hubbell’s model encompass a range of balance values, if the parameter θ is free to vary more than in Mooers et al. (2007)’s study. Turning this argument around, we

94

speculate that the balance of real phylogenetic trees may provide a direct way to infer the neutral parameter θ, independently from species abundance distributions.

96 Here, we develop a new sampling theory of Hubbell’s neutral model, which takes into 98

account not only the species abundance distribution in a sample, but also the phylogenetic relatedness of these co-occurring species. We first assess whether phylogenetic trees predicted

100

by Hubbell’s neutral model are a reasonable fit for real phylogenies, in particular with respect to phylogenetic imbalance. Next, we use simulated data to show that phylogenies enable us to

102

infer the neutral parameters more precisely. We finally apply our new inference method to tropical tree datasets in the Neotropics and in South-East Asia. New parameter estimates are

104

reinterpreted in light of dispersal biology and biogeographical patterns.

106

METHODS Approximate Bayesian Computation

108

In a classical likelihood framework, inferring parameters of the neutral model using phylogenies would entail the derivation of an exact likelihood function for these data.

110

Currently, there is no simple formula for such a likelihood function. Instead, it is possible to approximate this function through simulation, using a method called Approximate Bayesian

112

Computation (ABC). This method is commonly used in population genetics (Tavaré et al.

5

1997; Beaumont et al. 2002; Marjoram & Tavaré 2006), but to our knowledge, has not been 114

employed in ecology.

116

The ABC method is useful when the exact likelihood function of a model can be approximated by a large number of independent simulations, across a wide range of model

118

parameters values drawn from a set of prior distributions. Empirical data are summarized into a set of informative statistics, called summary statistics. These summary statistics are then

120

computed for each of the simulated datasets, and compared with the values observed in the empirical dataset. Only the simulations whose summary statistics are close enough to the

122

observed values are retained. The corresponding parameter values, in our case θ and m, then form an approximate joint posterior distribution for the parameters. A computer-intensive

124

approach like the ABC method relies on the ability to quickly simulate samples under the model considered. For Hubbell’s neutral model, this is made possible by using the coalescent

126

approach (Wakeley 2007). Details on our simulation algorithm are provided in Appendix S1.

128

Choice of summary statistics Various diversity indices may be used to summarize the species abundance distribution

130

(Magurran 2004). Likewise, phylogeny-based diversity indices, and statistics describing the tree topology may be used to summarize the phylogeny of species occurring locally in the

132

community (Mooers & Heard 1997). We included a total of 24 candidate summary statistics in our preliminary tests (Table S1). To assess which statistics were the most informative, we

134

simulated 1,000,000 local communities each of size J=20,000, where J is the number of individuals in the local community. We sampled the neutral parameters with a uniform prior

136

distribution on ln(θ) (0 ≤ ln(θ) ≤ 7), and on ln(I) (0 ≤ ln(Ι) ≤ 10), where I is the scaled immigration rate defined as I=m(J-1)/(1-m) (Etienne 2005). These priors contain the range of

6

138

neutral parameters that have been previously estimated for tropical forests (Chave et al. 2006).

140 The number of species S in the local community sample was retained as the first summary 142

statistic, because in the limiting case were m=1, S is a sufficient statistic for θ (Ewens 1972). We then tested which of the other statistics were the most informative in predicting the

144

neutral parameters. Constraining the simulation outputs to a fixed value of S (in practice, for S=200 species), we regressed the 23 remaining summary statistics against ln(θ). All statistics

146

of tree imbalance were monotonically correlated with ln(θ), hence we compared their relative predictive power based on the regression coefficient of a linear model. The summary statistics

148

for the species abundance distribution presented a mode for intermediate values of ln(θ) (Fig. 1). We then compared their predictive performance based on the adjusted-R² of a quadratic

150

model. Among these statistics, two were found to be the most correlated with ln(θ) (see Results): the Shannon’s index of diversity H defined as H = −∑i p i ln( pi ) , where pi is the

152

relative abundance of species i in the sample, and Shao & Sokal’s (1990) phylogenetic tree balance statistic B1 defined as B1 = ∑i 1 /M i , where Mi is the maximal number of nodes

154

between the interior node i and the terminal species of the subtree rooted at node i, the summation being on all interior nodes except the root (Shao & Sokal 1990). More balanced

156

phylogenetic trees have higher B1 values. The definition of the other trial statistics is reported in Table S1. Regression of the statistics against ln(I) would be very similar to those against

158

ln(θ), because the two parameters θ and I are strongly negatively correlated.

160

We also assessed whether the range of phylogenetic imbalance that the neutral theory is able to predict encompassed realistic values in phylogenetic imbalance. We retrieved the first 2000

7

162

published phylogenies in TreeBASE (http://www.treebase.org/), measured the imbalance statistics B1, and compared this statistics to that obtained from phylogenies simulated with

164

Hubbell’s metacommunity model (see Appendix S3 for more details).

166

Test of the ABC method using simulated data It is impossible to ensure that the chosen set of summary statistics is optimal to infer the

168

parameter in the ABC method (Marjoram et al. 2003). However, it is possible to check that the selected summary statistics do summarize most of the information present in the data, and

170

thus lead to an efficient estimation method. We simulated 300 neutral datasets with various parameter values, and then estimated the neutral parameters by ABC. More precisely, we

172

computed a mean standard error and a bias on the estimated parameters, defined as

(ln(θ estimated ) − ln(θ simul ))2 MSE = ln 2 (θ simul )

174

Bias =

ln (θ estimated ) − ln (θ simul ) ln (θ simul )

+

1/ 2

(ln(I estimated ) − ln(I simul ))2 + ln 2 (I simul )

1/ 2

ln (I estimated ) − ln (I simul ) ln (I simul )

where the brackets represent a mean over the 300 simulated communities. Both statistics 176

describe the estimation efficiency of the ABC method. For comparison, we also computed these statistics with the estimates obtained from an exact maximum-likelihood approach

178

based on the species abundance distribution only (Etienne 2005).

180

Application to four tropical forest tree datasets The ABC method was used to infer the neutral parameters from four large tropical forest tree

182

datasets (data from Condit et al. 2006). These datasets correspond to a full census of trees greater than 10 cm in trunk diameter at breast height (dbh) in plots of 25-52 ha in size, in

184

Central Panama (Barro Colorado Island, BCI), Colombia (La Planada), peninsular Malaysia

8

(Pasoh), and Malaysia, Sarawak (Lambir). More details on these study sites may be found in 186

Losos & Leigh (2004).

188

For each site, a maximally resolved phylogenetic tree subtending the local community was generated using the software Phylomatic (Webb & Donoghue 2005). For the BCI site, we also

190

included additional published data to produce an improved phylogenetic tree (see Appendix S2). For the BCI phylogeny, about 69 % of nodes were resolved in this improved

192

phylogenetic tree, compared to 53 % with the default options of Phylomatic. The remaining polytomies were resolved randomly.

194 For each dataset, we ran the ABC method using 200,000 simulated species assemblages, with 196

a uniform prior distribution for ln(θ) (0 ≤ ln(θ) ≤ 10), and for ln(I) (0 ≤ ln(Ι) ≤ 10). To control for the possible bias due to the incomplete resolution of the phylogenies, we repeated the

198

random resolution procedure 100 times (see Discussion). For each of the 100 random resolutions of the polytomies in the observed phylogenetic tree, we selected the 200 outputs

200

for which the simulated values of S, H and B1 were closest to the real ones – by taking the euclidean distance in the space of summary statistics (S, H, B1). We thus obtained

202

200*100=20,000 points in the plane (ln(θ),ln(I)), from which we computed an approximate posterior distribution.

204 Analyses using Etienne’s (2005) maximum likelihood estimation (MLE) were performed 206

using the TeTame freeware (http://www.edb.ups-tlse.fr/equipe1/chave/tetame.htm). Postprocessing of the ABC simulations (determination of the approximate posterior distribution

208

and mode) were carried out with the R software version 2.7.0 (http://www.r-project.org/),

9

using the routine “bkde2D” of the library “KernSmooth”. All R scripts are available upon 210

request.

212

RESULTS We found that the summary statistic of species abundances with the largest correlation with

214

ln(θ) was Shannon’s index H, a measure of the evenness of the species abundance distribution. The phylogenetic summary statistic with the largest correlation with ln(θ) was

216

found to be Shao and Sokal’s (1990) B1 imbalance statistic, which measures the level of symmetry in the phylogenetic tree subtending the local community (Fig. 1). All statistics

218

based on branch lengths were poorly correlated with ln(θ) and were thus found to be uninformative in our inferential framework. Our simulations showed that large θ values lead

220

to more balanced phylogenetic trees (larger B1 values) than small θ values (Fig. 1). We also found that observed levels of phylogenetic imbalance of the 2,000 published trees examined

222

were always within the range of Hubbell’s model predictions (Fig. 2).

224

When only species abundance data were used in the ABC method – by using solely the summary statistics S and H, we found that inference by ABC was nearly as efficient as

226

Etienne’s (2005) exact MLE method. The mean standard error on the parameters was equal to 30 % with the ABC method, as compared to 25 % for the MLE method. However, our

228

inference method was more biased that the MLE method (Bias = 22 % with the ABC method versus 9 % with the MLE method).

230 In contrast, when we estimated the neutral model parameters with the ABC method using all 232

three statistics S, H, and B1 – i.e. including information on the phylogenies, the mean standard error of the inferred parameters with ABC was equal to 17 % (versus 25% with the MLE),

10

234

and the bias was of 2 % (versus 9% with the MLE). Hence, phylogenies do add relevant information that improves the quality of parameter inference.

236 Based on species abundance data only, the likelihood function for the BCI dataset has two 238

alternative likelihood maxima, the low θ and high m value being slightly more likely (Fig. 3a, Etienne et al. 2006). Similarly, the ABC method yields two alternative maxima when based

240

on the two species abundance statistics (Fig. 3b). In contrast, our new method based on all three summary statistics unambiguously selects the high θ and low m value (Fig. 3c,d). The

242

parameter θ that was selected is one order of magnitude larger than in previous analyses, and the selected m parameter is two orders of magnitude smaller (Table 1). We were able to test

244

whether this result was sensitive to the resolution of the phylogeny with the BCI dataset, since a more refined phylogenetic hypothesis is available for this site. We found that our result was

246

independent of the choice of the phylogeny (Fig. 3c,d).

248

The ABC method was also applied to the La Planada, Pasoh, and Lambir datasets (Fig 4a,b,c, respectively). For all three datasets, the ABC method yielded a single maximum likelihood

250

estimate of the neutral parameters. In La Planada, the most likely value of θ was 345, versus 30 for the MLE, and m was equal to 0.003, versus 0.28 for the MLE (Table 1). In Pasoh and

252

Lambir the values of θ were four-fold larger than in the MLE, and those of m consistently equal to 0.01 (Table 1). In sum, the addition of phylogenetic information to infer the

254

parameters of Hubbell’s model led to strikingly larger values for θ, and lower values for m as compared with previous inference methods, in all four tropical forest datasets tested here.

256

DISCUSSION 258 11

Neutrality and parameter inference 260

Etienne (2005) previously showed that the neutral model of biodiversity is endowed with an exact sampling theory, like its counterparts in population genetics (Ewens 2004; Wakeley

262

2007). This sampling theory relates the full species abundance distribution of one community sample to the neutral model parameters. However, the species abundance distribution contains

264

a limited amount of information, and it is not sufficient to jointly estimate both parameters (Etienne et al. 2006, Fig. 6). Using simulated data, we first showed that the ABC method

266

based on species abundance only provides results almost as good as Etienne’s (2005) exact MLE. The great advantage of the ABC method is that it is easily amenable to generalizations,

268

through the addition of additional summary statistics. Adding phylogenetic information via the imbalance statistic B1, we inferred the neutral parameters of simulated datasets more

270

precisely than any previous inference method. Since B1 is monotonously related to the neutral parameters, it was successful at discriminating between the two alternative maxima in the

272

likelihood profile (Fig. 6).

274

More generally the challenge of estimating the parameters of ecological models based on real data has motivated much recent research (Clark 2005). To our knowledge, our study is the

276

first to make use of Approximate Bayesian Computation to solve an ecological problem. Widely used in population genetics (Marjoram & Tavaré 2006), such computer-intensive

278

inference techniques can allow complex models to be investigated. This should provide an opportunity to expand the range of data used in tests of ecological theories (McGill et al.

280

2007).

282

Neutrality and phylogenetic imbalance

12

Hubbell’s neutral model has often been rejected outright because it was felt that it is much too 284

simplistic to even remotely reflect the complex evolutionary dynamics of species assemblages (e.g. Ricklefs 2006). Although our work does not address this point directly (see Lande et al.

286

2003; Allen & Savage 2007), we found that Hubbell’s model is able to predict phylogenetic tree imbalance. This suggests that although crude, this model may be sufficient to capture

288

basic features of the speciation-extinction balance. Classic diversification models like the Yule and Hey models, which assume an equal rate branching probability among lineages,

290

predicts consistently too balanced phylogenies (Mooers & Heard 1997). In contrast, Hubbell’s model, which assumes that the branching probability of a lineage is proportional to its

292

abundance, produces phylogenetic balance consistent with observed ones (Fig. 2, Appendix S3). This suggests that the neutral theory’s assumption of a speciation rate proportional to

294

species abundance might be a less crude diversification model than the Yule model (Webb & Pitman 2002). Since in Hubbell’s model, regional pools with largerθ have more even regional

296

species abundances (Hubbell 2001), the corresponding phylogenies are more balanced (Fig. 5). This explains why Mooers et al. (2007) found that Hubbell’s model predicted too

298

imbalanced phylogenies compared with the observed ones, since they only used small value of θ (θ=10) in their simulations.

300 Our result sheds light on studies of the phylogenetic tree shape in other species groups. For 302

instance, Heard & Cox (2007) recently compared primate phylogenies across continents, and they found that the phylogeny of New World primates was more balanced than that of Old

304

World primates. They explained this pattern as a consequence of different biogeographic histories: repeated speciation events in connection with stepwise dispersal, or massive

306

extinctions, may lead to less balanced phylogenies. Conversely, vicariance events should lead to more balanced phylogenies. However it may also be argued that South America is a richer

13

308

regional pool of primate species (S=80) than Asia (S=57), Africa (S=52), and Madagascar (S=28). Hence the more balanced phylogeny observed in South America as compared to other

310

regional phylogenies is consistent with neutral expectations, even in the absence of differential speciation or extinction mechanisms.

312

Regional assembly of tropical rainforest trees 314

We found values of θ in four large tropical forest tree plots that were up to one order of magnitude larger than estimates based on previous methods (Table 1). In Hubbell’s model, θ

316

is the product of the regional pool size and of the speciation rate. Larger values of θ therefore mean that the regional pool size is larger than previously thought, or that the speciation rate is

318

larger. Latimer et al. (2005) found remarkably comparable values of θ in the South African fynbos that they studied. To interpret their result, they reasoned that the regional pool of the

320

fynbos is roughly the Cape Floristic Region, which extends over 50000 km². This extent leads to a regional pool size of about 1.3 1011 individuals (Appendix S4). Using the same logic our

322

new estimates of θ imply that the regional pools of our neotropical tree plots should extend over areas of the size of the entire Neotropics, and the South-East Asian tree pool should

324

extend over an area of the order of the former Sunda Shelf (Appendix S4). An alternative interpretation would be that speciation rates should be extraordinarily high for trees. While

326

limited evidence would support this claim in a few species-rich groups (Richardson et al. 2001), this pattern does not seem to hold universally across tropical plant lineages

328

(Pennington & Dick 2004; Pennington et al. 2006).

330

Is a continental extent for the regional pool of tropical trees a biologically sound inference? By definition the regional pool of an ecological community is the ensemble of species likely

332

to immigrate into the local community. In tropical forests, long distance dispersal events have

14

been reported based on floristic evidence (Pennington & Dick 2004), and using molecular 334

tools (Dick et al. 2008). Although rare, these long distance dispersal events stir tropical forest pools over wide geographical scales. Further, a regional species pool extending over

336

continental scales is consistent with the fact that numerous Amazonian tree species have a wider distribution than previously thought. For instance, many of the tree species in the

338

family Sapotaceae that were previously reported as having a narrow distribution are now recognized as being pan-Amazonian species (T.C. Pennington, pers. comm.). Finally, we

340

found comparable values of θ across sites within the same continent, suggesting that these sites indeed share the same pool, and these values were also comparable across continents

342

suggesting that their regional diversity in tree species is comparable as confirmed by independent evidence (Gentry 1988; Fine & Ree 2006). In contrast, previous estimates of θ

344

were one order of magnitude larger in Asia than in South America (Chave et al. 2006).

346

A possible limitation in our analysis is due to the fact that the phylogenies used for parameter inference were not fully resolved. In order to use our inference method, we had to resolve

348

these phylogenies randomly. This may have lead to a bias towards high θ values, because randomly branched trees are more balanced than real ones (Mooers & Heard 1997). However,

350

this potential bias is unlikely to lead to the high estimated value of θ because increasing the resolution of the phylogeny at BCI did not increase the probability of selecting the low-θ peak

352

(Table 1, Fig. 3c,d).

354

Local assembly of tropical rainforest trees Another finding of our study is that the immigration rate m is much smaller than previously

356

reported for tropical forests (Table 1). This result is a direct consequence of the large regional diversities θ measured with our new method. If the regional pool has more species, then the

15

358

local community has to be more dispersal-limited from this pool to maintain the same level of local diversity. Does this result make sense biologically? To answer this question, we first

360

emphasize that in Hubbell’s model, an immigration event is not equivalent to an observed immigration event in real continuous landscapes (Alonso et al. 2006). In Hubbell’s model,

362

immigration events come from anywhere in the regional pool, so parameter m measures the amount of sampling of this regional pool. In real landscapes, immigration events mostly come

364

from close surroundings of the focal community, and only a small fraction of the regional pool is actually sampled by short-distance dispersal. In contrast, long-distance dispersal

366

events are contributed by the entire regional pool, as assumed in Hubbell’s model. Consequently, in real landscapes, one must distinguish short-distance immigration events

368

which constitute the bulk of immigration, but do not contribute much to the sampling of the regional pool, and long distance immigration events which, while rare, are likely to contribute

370

much more to the sampling of the regional pool (Nathan 2006). In this light, our estimate of m predicts that long distance dispersal events contribute at best 0.2 to 1% of the within-site

372

recruitment.

374

The consistency between the neutral parameter m and field data have often made use of average dispersal distances inferred from seed trap data, i.e. short-distance dispersal. Using

376

seed trap counts on BCI, a cross-species mean seed dispersal distance from the parent tree to the propagule’s arrival site was estimated to be 39 m (Condit et al. 2002). Using this value of

378

mean dispersal distance, Etienne (2005) computed m as the proportion of seeds in the plot coming from outside the plot, and he found that this parameter should be close to 0.1 with a

380

Gaussian seed dispersal kernel. However, as mentioned above, local dispersal should be a poor predictor of the immigration rate m. Hence, the apparent contradiction between average

382

dispersal distances measured by seed trap data, and our estimates of m, is simply resolved by

16

the fact that the latter measures long-distance dispersal. This quantity is of crucial importance 384

in the context of global change, because long-distance dispersal is what will determine the overall ability of tropical forest species to track environmental changes.

386

Perspectives 388

Deviations from the point-wise mutation model assumed here may also contribute to the observed patterns of phylogenetic tree balance. If the new assumption is that a new lineage

390

starts with more than one individual, like in Hubbell’s fission model where population are randomly split into two during speciation events (Hubbell 2001), then phylogenetic balance

392

will be higher than with the point-wise mutation model (Mooers et al. 2007). Unfortunately, models of speciation with non point-wise mutation do not possess a simple mapping with a

394

coalescent, so the present approach cannot be straightforwardly extended to more general speciation models. We hope to return to this question in the future.

396 Our study paves the road between community modeling and studies of phylogenetic structure. 398

This theme has received a great deal of attention in the recent literature, and tests have been devised to compare the phylogenetic structure of local species assemblages to randomly

400

assembled (null) communities from a species pool (Webb 2000; Webb et al. 2002). As an illustration, Webb (2000) assumed that the species pool was simply the sum of all species

402

encountered in a surrounding area, because these were considered as potential immigrants into the focal community. Yet this approach strongly depends on the size of the hypothetical

404

regional pool (Swenson et al. 2006), and on the choice of the test statistics (Hardy & Senterre 2007). Further, it makes no use of local species abundances, although species abundances

406

may be informative for testing ecological mechanisms.

17

408

A significant improvement of these tests requires building a null theory based at the individual level, rather than at the species level. This theory needs to be endowed with a

410

proper sampling theory, making no explicit reference to a regional pool, for which information on abundances is seldom available. It needs further to take into account the

412

consequences of dispersal limitation on species abundances. Finally, it needs to incorporate demographic stochasticity and to be based on several sampling units. Our work supports the

414

view that the dispersal-limited neutral theory may be used in this research program (Pennington et al. 2006). We could extend it to include multiple plots simultaneously (Jabot et

416

al. 2008). Then, by looking at patterns that have not been used to fit the model parameters, such as phylogenetic and taxonomic similarity, one could assess the biological relevance of

418

additional non-neutral processes. Such a model would provide consistent null scenarios to test the hypothesis of community phylogenetics.

420

Acknowledgements: We thank Lounès Chikhi for sharing his expertise on ABC methods, 422

and for comments on a previous version of this manuscript. We also thank Mark Beaumont, Michaël Blum, Nathan Kraft and Christophe Thébaud for useful comments on a previous

424

version of this manuscript. We thank the Editor and three anonymous reviewers for suggestions that greatly improved this article. FJ was funded by the French Ministry of

426

Agriculture. This work was funded by the ANR-Biodiversité grant BRIDGE, by a CNRSAMAZONIE grant, and by the Egide Alliance grant n° 12130ZG.

428

18

References 430 432 434 436 438 440 442 444 446 448 450 452 454 456 458 460 462 464 466 468 470 472 474 476

Allen, A.P. & Savage, V.M. (2007). Setting the absolute tempo of biodiversity dynamics. Ecol. Lett., 10, 637-646. Alonso, D., Etienne, R.S. & McKane, A.J. (2006). The merits of neutral theory. Trends Ecol. Evol., 21, 451-457. Beaumont, M.A., Zhang, W.Y. & Balding, D.J. (2002). Approximate Bayesian Computation in population genetics. Genetics, 162, 2025-2035. Chave, J., Alonso, D. & Etienne, R.S. (2006). Theoretical biology - Comparing models of species abundance. Nature, 441, E1-E1. Clark, J.S. (2005). Why environmental scientists are becoming Bayesians. Ecol. Lett., 8, 2-14. Condit, R., Pitman, N., Leigh, E.G.Jr., Chave, J., Terborgh, J., Foster, R.B., Nuñez, P.V., Aguilar, S., Valencia, R., Villa, G., et al. (2002). Beta-diversity in tropical forest trees. Science, 295, 666-669. Condit, R., Ashton, P., Bunyavejchewin, S., Dattaraja, H.S., Davies, S., Esufali, S., Ewango, C., Foster, R., Gunatilleke, I.A.U.N., Hall, P. et al. (2006). The importance of demographic niches to tree diversity. Science, 313, 98-101. Dick, C.W., Hardy, O.J., Jones, F.A. & Petit, R.J. (2008). Spatial scales of pollen and seedmediated gene flow in tropical rain forest trees. Trop. Plant Biol., 1, 20-33. Etienne, R.S. (2005). A new sampling formula for neutral biodiversity. Ecol. Lett., 8, 253260. Etienne, R.S., Latimer, A.M., Silander, J.A. & Cowling, R.M. (2006). Comment on "Neutral ecological theory reveals isolation and rapid speciation in a biodiversity hot spot". Science, 311, 610b. Ewens, W.J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol., 3, 87-112. Ewens, W.J. (2004). Mathematical Population Genetics. I. Theoretical Introduction. 2nd edn. Springer, New York, 417 pp. Fine, P.V.A. & Ree, R.H. (2006). Evidence for a time-integrated species-area effect on the latitudinal gradient in tree diversity. Am. Nat., 168, 796-804. Gentry, A.H. (1988). Changes in plant community diversity and floristic composition on environmental and geographical gradients. Ann. Miss. Bot. Gard., 75, 1-34. Hardy, O.J. & Senterre, B. (2007). Characterizing the phylogenetic structure of communities by an additive partitioning of phylogenetic diversity. J. Ecol., 95, 493-506. Heard, S.B. & Cox, G.H. (2007). The shapes of phylogenetic trees of clades, faunas, and local assemblages: exploring spatial pattern in differential diversification. Am. Nat. , 169, E107-E118. Hubbell, S.P. (2001). The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton, NJ. Hutchinson, G.E. (1959). Homage to Santa Rosalia or why are there so many kinds of animals? Am.Nat., 93, 145-159. Jabot, F., Etienne, R.S. & Chave, J. (2008). Reconciling neutral community models and environmental filtering: theory and an empirical test. Oikos, 117, 1308-1320. Lande, R., Engen, S. & Saether, B.-E. (2003). Stochastic Population Dynamics in Ecology and Conservation. Oxford Series in Ecology and Evolution, Oxford University Press, Oxford UK, 212 pp. Latimer, A.M, Silander, J.A.Jr. & Cowling, R.M. (2005). Neutral ecological theory reveals isolation and rapid speciation in a biodiversity hot spot. Science, 309, 1722-1725. Lavin, M., Schrire, B.P., Lewis, G., Pennington, R.T., Delgado-Salinas, A., Thulin, M., Hughes, C.E., Matos, A.B. & Wojciechowski, M.F. (2004). Metacommunity process

19

478 480 482 484 486 488 490 492 494 496 498 500 502 504 506 508 510 512 514 516 518 520 522 524 526

rather than continental tectonic history better explains geographically structured phylogenies in legumes. Phil. Trans. Roy. Soc. Lond. B, 359, 1509-1522. Leigh, E.G.Jr. (2007). Neutral theory: a historical perspective. J. Evol. Biol., 20, 2075-2091. Losos, J.B. (1992). The evolution of convergent structure in caribbean anolis communities. Syst. Biol., 41, 403-420. Losos, E.C. & Leigh, E.G.Jr. (2004). Tropical Forest Diversity and Dynamism. Findings from a large-scale plot network. University of Chicago Press, Chicago. MacArthur, R.H. & Wilson, E.O. (1967). The Theory of Island Biogeography. Princeton University Press, Princeton NJ, 224 pp. Magurran, A.E. (2004). Measuring Biological Diversity. Blackwell, Oxford UK, 256 pp. Marjoram, P., Molitor, J., Plagnol, V. & Tavaré, S. (2003). Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA, 100, 15324-15328. Marjoram, P. & Tavaré, S. (2006). Modern computational approaches for analyzing molecular genetic variation data. Nat. Rev. Gen., 7, 759-770. McGill, B.J., Etienne, R.S., Gray, J.S., Alonso, D., Anderson, M.J., Benecha, H.K., Dornelas, M., Enquist, B.J., Green, J.L., He, F. et al. (2007). Species abundance distributions: moving beyond single prediction theories to integration within an ecological framework. Ecol. Lett., 10, 995-1015. Mooers, A.O. & Heard, S.B. (1997). Inferring evolutionary process from phylogenetic tree shape. Quat. Rev. Biol., 72, 31-54. Mooers, A.O., Harmon, L.J., Blum, M.G.B., Wong, D.H.J. & Heard, S.B. (2007). Some models of phylogenetic tree shape. In: Reconstructing Evolution: new mathematical and computational advances (eds Gascuel, O. & Steel, M.). Oxford University Press, Oxford, pp 149-170. Nathan, R. (2006). Long-distance dispersal of plants. Science, 313, 786-788. Pennington, R.T. & Dick, C.W. (2004). The role of immigrants in the assembly of the American rainforest tree flora. Phil. Trans. Roy. Soc. B, 359, 1611-1622. Pennington, R.T., Richardson, J.E. & Lavin, M. (2006). Insights into the historical construction of species-rich biomes from dated plant phylogenies, neutral ecological theory and phylogenetic community structure. New Phytol., 172, 605-616. Richardson, J.E., Pennington, R.T., Pennington, T.C. & Hollingsworth, P.M. (2001). Rapid diversification of a species-rich genus of neotropical rainforest trees. Science, 293, 22422245. Ricklefs, R.E. (2004). A comprehensive framework for global patterns in biodiversity. Ecol. Lett., 7, 1-15. Ricklefs, R.E. (2006). The unified neutral theory of biodiversity: do the numbers add up? Ecology, 87, 1424-1431. Shao, K.T. & Sokal, R.R. (1990). Tree balance. Syst. Zool., 39, 266-276. Swenson, N.G., Enquist, B.J., Pither, J., Thompson, J. & Zimmerman, J.K. (2006). The problem and promise of scale dependency in community phylogenetics. Ecology, 87, 2418-2424. Tavaré, S., Balding, D.J., Griffiths, R.C. & Donnelly, P. (1997). Inferring coalescence times from DNA sequence data. Genetics, 145, 505-518. Wakeley, J. (2007). Coalescent theory. An Introduction. Roberts & Company Publishers, Greenwood Village. Webb, C.O. (2000). Exploring the phylogenetic structure of ecological communities: an example for rain forest trees. Am. Nat., 156, 145-155. Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and community ecology. Annu. Rev. Ecol. Sys., 33, 475-505.

20

528 530 532 534

Webb, C.O. & Donoghue, M.J. (2005). Phylomatic: tree assembly for applied phylogenetics. Mol. Ecol. Notes, 5, 181-183. Webb, C.O. & Pitman, N.C.A. (2002). Phylogenetic balance and ecological evenness. Syst. Biol., 51, 898-907. Wiens, J.J. & Donoghue, M.J. (2004). Historical biogeography, ecology and species richness. Trends Ecol. Evol., 19, 639-644. Yule, G.U. (1924). A mathematical theory of evolution based on the conclusions of Dr. J.C. Willis. Phil. Trans. Roy. Soc. Lond. B, 213, 21-87.

536 538

Supplementary Material The following supplementary material is available for this article:

540 542 544

Table S1 Summary statistics explored for the ABC method. Appendix S1: ABC algorithm. Appendix S2: Compilation of phylogenies for the BCI plot. Appendix S3: Compatibility of neutral theory with 2,000 published phylogenies. Appendix S4: Computation of the regional pool sizes for the four tropical tree plots.

546

21

548

Table 1 Neutral parameter estimates inferred from local species abundances and phylogeny in four tropical forest plots: Barro Colorado Island (BCI, Panama), La Planada (Colombia), Pasoh

550

552

and Lambir (Malaysia). Site

J

S

θ1

m1

θ2

m2

W

θE

mE

BCI*

20,788

236

724

0.002

43

0.32

0.78

48

0.14

BCI†

20,788

236

571

0.002

44

0.36

0.88

48

0.14

La Planada

14,100

164

345

0.003

31

0.39

0.8

30

0.28

Pasoh

29,257

674

534

0.01

–

–

1

194

0.07

Lambir

29,890

990

2491

0.008

–

–

1

282

0.13

J and S are the number of individuals and the number of species in the sample, respectively. The neutral parameters estimated by the ABC method are (θ1, m1). A lower peak is often observed in

554

posterior distributions, whose values are (θ2, m2). The relative weight of the mode (θ1, m1) compared to the other peak is W, defined as the number of ABC simulations in the confidence interval of the mode

556

divided by the total number of retained ABC simulations. (θE, mE) are the estimates obtained by the exact likelihood formula that uses only species abundances (Etienne 2005).

558

*phylogeny based on the compilation of phylogenies made by the software Phylomatic. In this tree, 53% of the nodes are resolved.

560

†phylogeny based on additional compilation of phylogenies for species encountered in BCI. In this revised tree, 69% of the nodes are resolved.

22

562

Figure 1 Four summary statistics in simulated datasets for different values of the regional diversity index, ln(θ). The goodness of fit between the summary statistics and ln(θ) was measured by

564

the adjusted R², and the retained statistics correspond to the best fit results. (a) Shao and Sokal’s B1 imbalance index. (b) Shannon’s diversity index H. (c) Number of nodes in the phylogeny between two

566

randomly chosen individuals Dist_Node. (d) Simpson’s index 1-D.

568

Figure 2 Compatibility of 2,000 published phylogenies with Hubbell’s neutral theory in terms of phylogenetic tree shape. Phylogenetic tree shape is measured by the statistic of imbalance B1. Each

570

dot represents a published phylogeny. The lines correspond to minimum and maximum B1 values obtained in simulated neutral phylogenies.

572 Figure 3 Posterior distributions of the neutral parameters for the BCI tropical tree dataset. (a)

574

Likelihood profile for Hubbell’s neutral model based on species abundances only, and based on Etienne’s sampling formula. Likelihood values are color-coded. Solid lines: 95 % confidence limits of

576

the parameters. (b) Posterior distribution for the neutral model using the ABC method with species abundances only (i.e. based on the two summary statistics S and H only). Solid lines: density levels

578

(approximate 95% confidence intervals). (c) Posterior distribution for the neutral model using the ABC method with both species abundances and a phylogenetic hypothesis based on the Angiosperm

580

Phylogeny Group. In this phylogeny, 53% of the nodes are resolved. Three summary statistics, S, H, and B1 were used. (d) Same as in (c), but with a better resolved phylogeny where 69% of the nodes

582

are dichotomous.

584

Figure 4 Posterior distributions of the neutral parameters in three tropical forest sites using the ABC method. (a) Posterior distribution of the neutral parameters at La Planada. Solid lines: the

586

density levels (approximate 95% confidence intervals). (b) Posterior distribution of the neutral parameters at Pasoh. (c) Posterior distribution of the neutral parameters at Lambir.

588 Figure 5 Schematic depiction of the information contained in phylogenies. Each species is

590

denoted by a different symbol. Regional pools are connected to local communities by immigration (grey arrows). When θ is large, regional pools are species rich, and have more balanced phylogenies.

23

592

Based on species abundances only, communities 1 and 2 cannot be distinguished. However, the phylogenetic imbalance of community 1 is greater than the one of community 2.

594 Figure 6 Phylogenies improve the estimation of the neutral theory’s parameters. (a) Exact

596

likelihood profile of the neutral parameters using the BCI tree species abundance data (see Fig 2a). Parameter inference leads to a confidence interval containing two equally likely maxima: one at low θ

598

and high m, the other at high θ, and low m. (b) Variation of the evenness (measured by Shannon’s H) in simulated neutral communities of the same size as in the BCI dataset. The species richness S

600

determines a maximum-likelihood ridge in the parameter space (θ,m). The evenness H contains additional information about the position of the most likely parameters along this ridge. However, since

602

this evenness is unimodally correlated to θ, two parameter combinations yield the same evenness. (c) The phylogenetic tree imbalance statistic B1 of a local community is positively related to θ. A

604

combination of the statistics H and B1 yields a single most likely parameter set (θ,m).

606

24

Figure 1

608

25

Figure 2

610

26

Figure 3

612

27

Figure 4

614

28

616

Figure 5

29

618

Figure 6

30

620

Table S1 Summary statistics, and their correlations with the neutral parameter ln(θ).

Adjusted Summary Statistic

Definition

Reference R²

S

Species richness.

Magurran (2003)

–

B1=Σ(1/Mi), where for each node i except the root, Mi Shao & Sokal B1

0.54

= maximal number of nodes between the node i and (1990) the terminal species of the tree subtended by node i. Shannon's Index. H=-Σ(Ni * ln(Ni))+N * ln(N), where

H

Ni is the abundance of species i and N is the total Magurran (2003)

0.45

abundance of the sample. Mean number of nodes connecting two individuals in This paper

0.43

Simpson's index. 1-D=1-Σ(( Ni/N)²).

Magurran (2003)

0.35

Inv(Ni)

Σ(1/Ni)

This paper

0.37

Var(Ni)

Variance of Ni

This paper

0.35

Inv(Ni²)

Σ(1/Ni²).

This paper

0.31

Webb (2000)

0.21

This paper

0.19

This paper

0.17

Sackin (1972)

0.14

This paper

0.11

Colless (1982)

0.11

Dist_node the subtending phylogenetic tree 1-D

Similar to Webb (2000)’s mean pairwise nodal Dist_node_spec

distance, except that the phylogeny of the sample alone is considered. Variance of the number of nodes connecting two

Var(node) individuals in the subtending phylogenetic tree Variance of the number of nodes connecting two Var(node)_spec species in the phylogenetic tree of the community. σ(Nbar)

Standard variation of Hi, where Hi is the number of internal nodes between species i and the root

Fourth(Ni)

4

(Σ((Ni-mean(Ni)) )/S)

1/4

Σ(ri-si), where for each node i, ri and si are the IColless numbers of terminal species in the two subtrees

31

connected by node i (with ri greater than si).

Mean(Hi), where Hi is the number of internal nodes Nbar

Sackin (1972)

0.09

This paper

0.06

Webb (2000)

0.06

Faith (1992)

<0.01

This paper

<0.01

between species i and the root. Mean phylogenetic distance between two individuals Dist_neighbour of sister species. Mean phylogenetic distance between two sister Dist_neighbour_spec species Sum of the branch lengths (lengths are normalized PD so that the tree height equals 1). Variance of the phylogenetic distance between sister Var(Dist_neigh_spec) species. ∆

+

Clarke & Mean phylogenetic distance between two species.

<0.01 Warwick (1998)

Variance of the phylogenetic distance between Var(Dist_neighbour)

This paper

<0.01

individuals belonging to sister species Mean phylogenetic distance between two

Chave et al.

individuals.

(2007)

Variance of the phylogenetic distance between

Clarke &

species.

Warwick (2001)

D

Λ

+

<0.01

<0.01

Variance of the phylogenetic distance between Var(Dist)

This paper

<0.01

individuals.

622 References: 624

Chave, J. Chust, G. & Thébaud, C. (2007). The importance of phylogenetic structure in biodiversity studies. In: Scaling Biodiversity (eds Storch, D., Marquet, P. & Brown, J.H.). Santa Fe Institute

626

Editions, pp 151-167. Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. J.

628

Appl. Ecol., 35, 523-531.

32

Clarke, K.R. & Warwick, R.M. (2001). A further biodiversity index applicable to species lists: variation

630

in taxonomic distinctness. Mar. Ecol. Progr. Ser., 216, 265-278. Colless, D.H. (1982). Phylogenetics: the theory and practice of phylogenetic systematics. Syst. Zool.,

632

31, 100-104. Faith, D.P. (1992). Conservation evaluation and phylogenetic diversity. Biol. Cons., 61, 1-10.

634

Magurran, A.E. (2003). Measuring biological diversity. Blackwell Science Ltd. Sackin, M.J. (1972). “Good” and “bad” phenograms. Syst. Zool., 21, 225-226.

636

Shao, K.T. & Sokal, R.R. (1990). Tree balance. Syst. Zool., 39, 266-276. Webb, C.O. (2000). Exploring the phylogenetic structure of ecological communities: an example for

638

rain forest trees. Am. Nat., 156, 145-155.

33

Appendix S1: Simulation algorithm. 640

Neutral theory and phylogenies In its original formulation, Hubbell’s model considers that speciation events are point-wise

642

mutations, that is, at each recruitment event in the regional pool, individuals have a small probability of belonging to an altogether new species. This is equivalent to the infinite-alleles

644

Moran model with mutation in population genetics, where backward mutations are not allowed (Ewens 2004). The infinite-alleles Moran model does not keep track of the

646

evolutionary relationships among alleles. In our model, for each speciating individual, we do keep track of the species identity from which it descends. This enables us to construct

648

evolutionary relationships among species.

650

Simulation algorithm We use a modified version of Etienne's (2005) algorithm to reconstruct the subtending

652

phylogenies of local communities from the knowledge of the neutral parameters θ and the scaled immigration rate I=m(J-1)/(1-m), where m is the immigration rate, and J is the local

654

community size. We start by computing the number of immigrating ancestors and recording their numbers of descendants. Individuals are drawn one by one. The jth individual has a

656

probability I/(j+I-1) of descending from a newly immigrating ancestor and a probability (j1)/(j+I-1) of descending from an already recorded ancestor. In the latter case, one of the j-1

658

already tagged individuals is selected at random, and its ancestor is the ancestor of the jth individual. Applying this algorithm, the immigration history for the local community of size J

660

can be reconstructed. Once the number A of ancestors is known, the forward-in-time algorithm of Stephens (2000) is used to build a dated phylogeny for the A ancestors. This

662

algorithm starts with two lineages of the same species, and every timestep Tk, where Tk-Tk-1 is an exponentially distributed time with rate parameter λk=k(k-1+θ)/2, and k is the number of

34

664

lineages, a lineage is chosen at random from the existing k, and it is split in two with probability p=(k-1)/(k-1+θ). If it is not split (probability 1-p), it is speciating into a lineage of

666

another species. This algorithm is repeated until there are A+1 lineages and the last event (a lineage split) is deleted so that there is only A lineages at the end (Stephens 2000). This

668

procedure ensures that this forward coalescent algorithm is equivalent to the backward coalescent: if the algorithm was stopped when there are A lineages, then the coalescent tree

670

would necessarily end by a lineage split whereas here, it ends either by a speciation event or by a lineage split.

672 674 676 678

References: Etienne, R.S. (2005). A new sampling formula for neutral biodiversity. Ecol. Lett., 8, 253260. Ewens, W.J. (2004). Mathematical Population Genetics. Springer, Berlin. Stephens, M. (2000). Times on trees and the age of an allele. Theor. Popul. Biol., 57, 109119.

35

Appendix S2: Improving the resolution of the phylogeny of Barro Colorado Island’s 680

tropical tree species. Phylogenetic trees are in newick format, with additional brackets [] indicating remaining

682

polytomies, and $ symbols indicating the partial resolution of certain clades. A C++ code for randomly resolving this phylogeny, in respecting these partial resolutions is available upon

684

request.

686

(([([((((Schizolobium_parahybum,(Prioria_copaifera,((Senna_dariensis,(((Enterolobium_scho mburgkii,(Abarema_macradenia,[($(((Inga_laurina,Inga_ruiziana),Inga_punctata),(Inga_oerst

688

ediana,(Inga_nobilis,Inga_sapindoides)))$,Inga_acuminata,Inga_goldmanii,Inga_marginata,In ga_pezizifera,Inga_spectabilis,Inga_umbellifera)])),Acacia_melanoceras),Tachigalia_versicol

690

or)),((((Erythrina_costaricensis,Lonchocarpus_heptaphyllus),((((Pterocarpus_belizensis,Ptero carpus_rohrii),Platypodium_elegans),Platymiscium_pinnatum),Andira_inermis)),[(Ormosia_a

692

mazonica,Ormosia_coccinea,Ormosia_macrocalyx)]),((Dipteryx_oleifera,Myrospermum_frut escens),(Swartzia_simplex_gra,Swartzia_simplex_och)))))),((([((Brosimum_alicastrum,Brosi

694

mum_guianense),[(Ficus_bullenei,Ficus_colubrinae,Ficus_costaricana,Ficus_insipida,Ficus_ maxima,Ficus_obtusifolia,Ficus_popenoei,Ficus_tonduzii,Ficus_trigonata,Ficus_yoponensis)]

696

,Maquira_guianensis,Perebea_xanthochyma,Poulsenia_armata,Sorocea_affinis)],[((Cecropia_ insignis,Cecropia_obtusifolia),Pourouma_bicolor,(Trophis_caucana,Trophis_racemosa))]),(C

698

eltis_schippii,Trema_micrantha)),Colubrina_glandulosa)),[([([(Croton_billbergianus,((Alchor nea_costaricensis,Alchornea_latifolia),(Adelia_triloba,(Acalypha_diversifolia,Acalypha_macr

700

ostachya))),((Sapium_glandulosum,Sapium_%27broadleaf%27),Hura_crepitans),Hyeronima_ alchorneoides,Margaritaria_nobilis)],((Vismia_baccifera,Vismia_macrophylla),((Marila_laxif

702

lora,Calophyllum_longifolium),(Chrysochlamis_eclipes,(Symphonia_globulifera,(Garcinia_in termedia,Garcinia_madruno))))),[([(Casearia_aculeata,Casearia_arborea,Casearia_commerson

36

704

iana,Casearia_guianensis,Casearia_sylvestris)],Hasseltia_floribunda,Lacistema_aggregatum,( Laetia_procera,Laetia_thamnia),Lindackeria_laurina,Lozania_pittieri,Tetrathylacium_johanse

706

nii,Zuelania_guidonia)],(Cassipourea_elliptica,Erythroxylum_macrophyllum),Cespedesia_spa thulata,Drypetes_standleyi,((Hirtella_americana,Hirtella_triandra),(Licania_hypoleuca,Licani

708

a_platypus)),(Hybanthus_prunifolius,Rinorea_sylvatica),Spachea_membranacea)],Maytenus_ schippii,Sloanea_terniflora)]),([([(Allophylus_psilospermus,[(Cupania_cinerea,Cupania_latifo

710

lia,Cupania_rufescens,Cupania_seemannii)],(Talisia_nervosa,Talisia_princeps))],(((Anacardi um_excelsum,Astronium_graveolens),(Spondias_mombin,Spondias_radlkoferi)),[(((Protium_

712

costaricense,Protium_panamense),Protium_tenuifolium),Tetragastris_panamensis,Trattinnicki a_aspera)]),(((Cedrela_odorata,((Trichilia_pallida,Trichilia_tuberculata),(Guarea_grandifolia,

714

(Guarea_guidonia,Guarea_sp)))),[(Zanthoxylum_acuminatum,Zanthoxylum_ekmanii,Zanthox ylum_panamense,Zanthoxylum_setulosum)]),(Picramnia_latifolia,(Quassia_amara,Simarouba

716

_amara))))],(((([(Cavanillesia_platanifolia,Ceiba_pentandra,Pseudobombax_septenatum,(Pac hira_sessilis,Pachira_quinata))],(Quararibea_asterolepis,Hampea_appendiculata)),Ochroma_p

718

yramidale),Sterculia_apetala),((Guazuma_ulmifolia,Theobroma_cacao),((Trichospermum_gal eottii,Luehea_seemannii),((Apeiba_membranacea,Apeiba_hybrid),Apeiba_tibourbou))))),((([(

720

Chamguava_schippii,[(Eugenia_coloradoensis,Eugenia_galalonensis,Eugenia_nesiotica,Euge nia_oerstediana)],Myrcia_gatunensis,Psidium_friedrichsthalianum)],Vochysia_ferruginea),[(

722

Miconia_affinis,Miconia_argentea,Miconia_elata,Miconia_hondurensis)]),(Lafoensia_punicif olia,(Terminalia_amazonia,Terminalia_oblonga))),Turpinia_occidentalis)],(([(([(Aegiphila_pa

724

namensis,(Jacaranda_copaia,(Tabebuia_guayacan,Tabebuia_rosea)),Trichanthera_gigantea)], Solanum_hayesii),((Psychotria_grandis,(Coussarea_curvigemmia,Faramea_occidentalis)),((H

726

amelia_axillaris,Guettarda_foliacea),[((Pentagonia_macrophylla,(Alseis_blackiana,Macrocne mum_roseum)),[(Randia_armata,Genipa_americana,Amaioua_corymbosa,Alibertia_edulis,To

728

coyena_pittieri)],Posoqueria_latifolia)])),(Aspidosperma_spruceanum,(Thevetia_ahouai,(Lac

37

mellea_panamensis,(Tabernaemontana_arborea,Stemmadenia_grandiflora)))),((Cordia_alliod 730

ora,Cordia_bicolor),Cordia_lasiocalyx))],(Dendropanax_arboreus,Schefflera_morototoni)),[(( Ardisia_fendleri,Stylogyne_turbacensis),(((Chrysophyllum_argenteum,Chrysophyllum_cainit

732

o),((Pouteria_fossicola,Pouteria_reticulata),Pouteria_stipitata)),Gustavia_superba),Diospyros _artanthifolia)]),(((Coccoloba_coronata,Coccoloba_manzinellensis),Triplaris_cumingiana),G

734

uapira_standleyana),(Heisteria_acuminata,Heisteria_concinna))],(((((Annona_spraguei,(Guatt eria_dumetorum,Xylopia_macrantha)),(Desmopsis_panamensis,(Mosannona_garwoodii,Uno

736

nopsis_pittieri))),((Virola_multiflora,Virola_sebifera),Virola_surinamensis)),([(Beilschmiedia _pendula,Cinnamomum_triplinerve,[(Nectandra_cissiflora,Nectandra_lineata,Nectandra_purp

738

urea,Nectandra_%27fuzzy%27)],[(Ocotea_cernua,Ocotea_oblonga,Ocotea_puberula,Ocotea_ whitei)])],(Siparuna_guianensis,Siparuna_pauciflora))),(Piper_cordulatum,Piper_reticulatum)

740

)),((((Attalea_butyracea,Elaeis_oleifera),Astrocaryum_standleyanum),Oenocarpus_mapora),S ocratea_exorrhiza));

742

References: 744

Apocynaceae: (Aspidosperma_spruceanum,(Thevetia_ahouai,(Lacmellea_panamensis,(Tabernaemontana_ar

746

borea,Stemmadenia_grandiflora))))

748

Ref: Potgieter & Albert 2001. NB: Stemmadenia placed next to Tabernaemontana because it belongs to the same tribe.

750

752

Rubiaceae:

38

((Psychotria_grandis,(Coussarea_curvigemmia,Faramea_occidentalis)),((Hamelia_axillaris,G 754

uettarda_foliacea),[((Pentagonia_macrophylla, (Alseis_blackiana,Macrocnemum_roseum)),[(Randia_armata,Genipa_americana,Amaioua_co

756

rymbosa,Alibertia_edulis,Tocoyena_pittieri)],Posoqueria_latifolia)]))

758

Ref: Rova et al. 2002.

760

Bremer & Manen 2000. Persson 2000.

762

NB: Macrocnemum placed next to Alseis because it belongs to the same tribe.

764

Arecaceae: ((((Attalea_butyracea,Elaeis_oleifera),Astrocaryum_standleyanum),Oenocarpus_mapora),Soc

766

ratea_exorrhiza)

768

Ref: Hahn 2002.

770

Annonaceae: ([(Annona_spraguei,Guatteria_dumetorum,Xylopia_macrantha)],(Desmopsis_panamensis,(M

772

osannona_garwoodii,Unonopsis_pittieri)))

774

Ref: Pirie et al. 2006.

776

Anacardiaceae: ((Anacardium_excelsum,Astronium_graveolens),(Spondias_mombin,Spondias_radlkoferi))

39

778 Ref: Pell 2004. page 66. 780 Bombacaceae: 782

(((([(Cavanillesia_platanifolia,Ceiba_pentandra,Pseudobombax_septenatum,(Pachira_sessilis, Pachira_quinata))],(Quararibea_asterolepis,Hampea_appendiculata)),Ochroma_pyramidale),

784

Sterculia_apetala),((Guazuma_ulmifolia,Theobroma_cacao),((Trichospermum_galeottii,Lueh ea_seemannii),[(Apeiba_membranacea,Apeiba_hybrid,Apeiba_tibourbou)])))

786 Ref: 788

-Baum et al. 2004. -Alverson et al. 1999.

790 Clusiaceae: 792

((Vismia_baccifera,Vismia_macrophylla),((Marila_laxiflora,Calophyllum_longifolium),(Chry sochlamis_eclipes,(Symphonia_globulifera,(Garcinia_intermedia,Garcinia_madruno)))))

794 Ref: Gustafsson et al. 2002. 796 Euphorbiaceae: 798

[(Croton_billbergianus,((Alchornea_costaricensis,Alchornea_latifolia),(Adelia_triloba,(Acaly pha_diversifolia,Acalypha_macrostachya))),((Sapium_glandulosum,

800

Sapium_broadleaf),Hura_crepitans),Hyeronima_alchorneoides,Margaritaria_nobilis)]

802

Ref: Wurdack et al. 2005.

40

804

For Fabaceae: (Schizolobium_parahybum,(Prioria_copaifera,((Senna_dariensis,(((Enterolobium_schomburg

806

kii,(Abarema_macradenia,[($(((Inga_laurina,Inga_ruiziana),Inga_punctata),(Inga_oerstediana ,(Inga_nobilis,Inga_sapindoides)))$,Inga_acuminata,Inga_goldmanii,Inga_marginata,Inga_pe

808

zizifera,Inga_spectabilis,Inga_umbellifera)])),Acacia_melanoceras),Tachigalia_versicolor)),(( ((Erythrina_costaricensis,Lonchocarpus_heptaphyllus),((((Pterocarpus_belizensis,Pterocarpus

810

_rohrii),Platypodium_elegans),Platymiscium_pinnatum),Andira_inermis)),[(Ormosia_amazon ica,Ormosia_coccinea,Ormosia_macrocalyx)]),((Dipteryx_oleifera,Myrospermum_frutescens

812

),(Swartzia_simplex_gra,Swartzia_simplex_och))))))

814

Ref: -Wojciechowski, Lavin & Sanderson 2004.

816

-Inga: Richardson et al. 2001.

818

Meliaceae: (Cedrela_odorata,((Trichilia_pallida,Trichilia_tuberculata),[(Guarea_grandifolia,

820

Guarea_guidonia,Guarea_sp)]))

822

Ref: Muellner et al. 2003.

824

Moraceae: (Sorocea_affinis,((((Perebea_xanthochyma,Poulsenia_armata),Maquira_guianensis),[(Ficus_b

826

ullenei,Ficus_colubrinae,Ficus_costaricana,Ficus_insipida,Ficus_maxima,Ficus_obtusifolia,Fi

41

cus_popenoei,Ficus_tonduzii,Ficus_trigonata,Ficus_yoponensis)]),(Brosimum_alicastrum,Br 828

osimum_guianense)))

830

Ref: Datwyler & Weiblen 2004.

832

References:

834 836 838 840 842 844 846 848 850 852 854 856 858 860 862 864 866 868

Alverson, W.S., Whitlock, B.A., Nyffeler, R., Bayer, C. & Baum, D.A. (1999). Phylogeny of the core Malvales: evidence from ndhF sequence data. Am. J. Bot., 86, 1474-1486. Baum, D.A., Dewitt Smith, S., Yen, A., Alverson, W.S., Nyffeler, R., Whitlock, B.A. & Oldham, R.L. (2004). Phylogenetic relationships of Malvatheca (Bombacoideae and Malvoideae; Malvaceae sensu lato) as inferred from plastid DNA sequences. Am. J. Bot., 91, 1863-1871. Bremer, B. & Manen, J.-F. (2000). Phylogeny and classification of the subfamily Rubioideae (Rubiaceae). Plant Syst. Evol., 225, 43-72. Datwyler, S.L. & Weiblen, G.D. (2004). On the origin of the fig: phylogenetic relationship of Moraceae from ndhF sequences. Am. J. Bot., 91, 767-777. Gustafsson, M.H.G., Bittrich, V. & Stevens, P.F. (2002). Phylogeny of Clusiaceae based on rbcL sequences. Int. J. Plant. Sci., 163, 1045-1054. Hahn, W.J. (2002). A phylogenetic analysis of the Arecoid line of palms based on plastid DNA sequence data. Mol. Phyl. Evol., 23, 189-204. Muellner, A.N., Samuel, R., Johnson, S.A., Cheek, M., Pennington, T.D. & Chase, M.W. (2003). Molecular phylogenetics of Meliaceae (Sapindales) based on nuclear and plastid DNA sequences. Am. J. Bot., 90, 471-480. Pell, S.K. (2004). Molecular systematics of the cashew family (Anacardiaceae). PhD thesis. Louisiana State University. Persson, C. (2000). Phylogeny of the Neotropical Alibertia group (Rubiaceae), with emphasis on the genus Alibertia, inferred from ITS and 5S ribosomal DNA sequences. Am. J. Bot., 87, 1018-1028. Pirie, M.D., Chatrou, L.W., Mols, J.B., Erkens, R.H.J. & Oosterhof, J. (2006). ‘Andeancentred’ genera in the short-branch clade of Annonaceae: testing biogeographical hypotheses using phylogeny reconstruction and molecular dating. J. Biogeog., 33, 31-46. Potgieter, K. & Albert, V.A. (2001). Phylogenetic relationships within Apocynaceae s.l. based on trnL Intron and trnL-F Spacer and propagule characters. Ann. Miss. Bot. Gard., 88, 523549. Richardson, J.E., Pennington, R.T., Pennington, T.D. & Hollingsworth, P.M. (2001). Rapid diversification of a species-rich genus of neotropical rain forest trees. Science, 293, 22422245. Rova, J.H.E., Delprete, P.G., Andersson, L. & Albert, V.A. (2002). A trnL-F cpDNA sequence study of the Condamineeae-Rondeletieae-Sipaneeae complex with implications on the phylogeny of the Rubiaceae. Am. J. Bot., 89, 145-159. Wojciechowski, M.F., Lavin, M. & Sanderson, M.J. (2004). A phylogeny of legumes (Leguminosae) based on analysis of the plastid matK gene resolves many well-supported subclades within the family. Am. J. Bot., 91, 1846-1862.

42

870 872

Wurdack, K.J., Hoffmann, P. & Chase, M.W. (2005). Molecular phylogenetic analysis of uniovulate Euphorbiaceae (Euphorbiaceae sensu stricto) using plastid rbcL and trnL-F DNA sequences. Am. J. Bot., 92, 1397-1420.

43

874

Appendix S3: Compatibility with neutral theory of the imbalance levels of 2,000 published phylogenies.

876 Treebase extraction and phylogenetic trees preprocessing: 878

We extracted 2,000 published phylogenies from Treebase (http://www.treebase.org/) (Accession numbers 705 to 2704, the 704 first trees in Treebase are not numbered in the same

880

way which renders their automatic retrieval less easy) using the R package apTreeshape (Bortolussi et al. 2006). Following the method of Blum and François (2006), we

882

automatically removed putative outgroups used to reconstruct these phylogenies. This was done by detecting subtrees descending directly from the root and having only one or two

884

species (R scripts available upon request). If the trees contained polytomies, they were randomly resolved using the routine “multi2di” of the R package APE (Paradis et al. 2004).

886

Out of these 2,000 trees, 1,660 contained more than 5 species, and these were used to compute the statistic B1.

888 Neutral simulations: 890

For each observed value S of the species richness in the 1,660 trees, we simulated neutral trees of the same richness produced under Hubbell’s model with various θ values (C++ code

892

available upon request). As θ decreases, the regional pool size necessary to produce a given species richness increases. To avoid computer memory saturation, we fixed 1,000,000 as a

894

limit to this size, and stopped simulations when we reached this limit. Specifically, we started by simulating 30 phylogenies with θ equal to 1,000. We then repeated this simulation step

896

after dividing θ by 2, until the 1,000,000 limit was reached. For each of these simulated neutral trees, we computed the B1 statistic. For each richness value, all these neutral simulated

898

values of B1 formed a range consistent with the neutral assumption.

44

900

902

References: Blum, M.G.B. & François, O. (2006). Which random processes describe the tree of life? A

904

large-scale study of phylogenetic tree imbalance. Syst. Biol., 55, 685-691. Bortolussi, N., Durand, E., Blum, M.G.B. & François, O. (2006). apTreeshape: statistical

906

analysis of phylogenetic tree shape. Bioinformatics, 22, 363-364. Paradis, E., Claude, J. & Strimmer, K. (2004). APE: Analyses of Phylogenetics and Evolution

908

in R language. Bioinformatics, 20, 289-290.

45

910

Appendix S4: Computation of the regional pool sizes for the four tropical tree plots. 912 This computation is based on the comparison with Latimer et al. (2005)’s results. They 914

consider the regional pool for fynbos to be the Cape Floristic Region which extends over 50000 km². They further report a density of 0.1, 0.25, 4 and 8 individuals per m² for trees,

916

large shrubs, shrubs and shrublets respectively. Thus, if one assumes that trees, large shrubs, shrubs and shrublets occupy one quarter of the area each, this leads to an average density of

918

2.6 individuals per m², and eventually to a regional pool size of Jfynbos = 1.3*1011 individuals.

920

According to Latimer et al. (2005), speciation rates should be larger in the fynbos than in tropical trees. This implies that the regional pool sizes for tropical trees observed in a plot

922

should be larger than (θplot / θfynbos) * Jfynbos. We used the value reported in the text of Latimer et al. (2005) of 697 for θfynbos.To convert, these sizes in number of individuals, we used a

924

density for tropical trees of 500 individuals per ha. Plot

θplot

Minimal Pool Size ( * 1000 km²)

926

BCI

571

2130

La Planada

345

1287

Pasoh

534

1992

Lambir

2491

9292

For comparison, Fine & Ree (2006) report a value of 9220000 km² for the Neotropics, and of 5903000 km² for the Asian Tropics. This means that regional pools for tropical trees extend

928

over continental scales.

46