Inferring the parameters of the neutral theory of biodiversity using 2
phylogenetic information, and implications for tropical forests.
4
Franck Jabot and Jérôme Chave
6
Laboratoire Evolution et Diversité Biologique CNRS / Université Paul Sabatier Bâtiment 4R3 31062 Toulouse cedex 4 France
8 Short running title: Inferring neutral parameters with phylogenies 10
Keywords: neutral theory, parameter inference, Approximate Bayesian Computation, phylogenetic imbalance, community phylogenetics, dispersal limitation, regional species pool.
12
Number of words in the abstract: 150 Number of words in the manuscript: 4678
14
Number of references: 48 Number of Tables: 1.
16
Number of Figures: 6. Correspondence:
18
Franck Jabot Laboratoire Evolution et Diversité Biologique CNRS / Université Paul Sabatier
20
Bâtiment 4R3 31062 Toulouse cedex 4 France Tel: 05. 61. 55. 67. 60.
22
Fax: 05. 61. 55. 73. 27. E-mail:
[email protected]
1
24
Abstract
26
We develop a statistical method to infer the parameters of Hubbell’s neutral model of biodiversity using data on local species abundances and their phylogenetic relatedness. This
28
method uses the Approximate Bayesian Computation (ABC) approach, where the data are summarized into a small number of informative summary statistics. We used three statistics:
30
the number of species in the sample, Shannon H index of evenness, and Shao and Sokal’s B1 index of phylogenetic tree imbalance. Our approach was found to outperform previous
32
methods, illustrating the potential of ABC methods in ecology. Applying it to four large tropical forest tree datasets, the best-fit immigration rates m were two orders of magnitude
34
smaller and regional diversities θ larger than previously reported for the same data. This implies that neutral-compatible regional pools of tropical trees should extend over continental
36
scales, and that m measures, in this context, mostly the frequency of long distance dispersal events.
38
2
INTRODUCTION 40 As phylogenetic trees become more widely available through molecular methods, ecologists 42
have an invaluable new dimension of information with which to investigate the mechanisms of community assembly (Losos 1992; Webb et al. 2002; Pennington et al. 2006). It has long
44
been suggested that ecological processes may have a distinctive fingerprint on the patterns of evolutionary relatedness of coexisting species (Hutchinson 1959; Losos 1992; Wiens &
46
Donoghue 2004). However, most models of species coexistence fail to take into account the phylogenetic structure of a species assemblage, as they are usually restricted to the scale at
48
which individuals interact physically, thus neglecting larger spatial and temporal scales (Ricklefs 2004). One exception is Hubbell’s (2001) neutral theory of biodiversity and
50
biogeography, which includes both local and regional processes in a single conceptual framework. Hubbell’s neutral model assumes that a local community is connected to a larger
52
pool of species through dispersal, much like in classic island biogeography (MacArthur & Wilson 1967). A single parameter θ, the number of species arising per generation through
54
speciation in the regional species pool, summarizes the diversity in this pool. A second parameter m, the fraction of recruits coming from the source pool into the local community,
56
describes the magnitude of dispersal limitation in the local community.
58
In many of the recent applications of the neutral theory, the species abundance distribution of a local assemblage has been used to test the neutral theory. Of particular relevance here is the
60
work of Latimer et al. (2005), who assessed the evolutionary consequences of the neutral assumption by analyzing plant species abundance distributions in the South African fynbos.
62
They inferred the neutral parameters and found neutral migration rates to be two orders of magnitude smaller than those found in tropical forest tree communities. Latimer et al. (2005)
3
64
also found that parameter θ, as inferred from their dataset, was much higher than in previous reports. They interpreted this result as a signature of a peculiarly high speciation rate of the
66
fynbos flora. However, Etienne et al. (2006) critically reassessed this finding. They commented that the inferred neutral parameters θ and m often have two nearly equally likely
68
values when they are inferred from species abundance data. For the tropical forest trees of Barro Colorado Island (BCI), Etienne et al. (2006) found that the maximum likelihood
70
estimation of the neutral parameters (θ, m) yields two different but almost equally likely maxima. This finding is a serious theoretical challenge for the neutral theory, because it
72
suggests that there is no way to estimate accurately these parameters. For the BCI tree dataset, it has been assumed that the parameters are close to θ=47.7 and m=0.093 (Leigh 2007). The
74
second combination of neutral parameters (θ = 241.9, m = 0.003, Etienne et al. 2006) would imply that the BCI forest is far more dispersal-limited, and that the regional diversity is much
76
higher than previously imagined.
78
Aside from Latimer et al. (2005), attempts to examine the neutral theory at evolutionary timescales are scarce (but see e.g. Lande et al. 2003, Lavin et al. 2004). Many studies of
80
macroevolutionary patterns are based on simple models of lineage diversification, such as the Yule model, which produces tree topologies assuming that all species have the same chance
82
to give rise to an altogether new species (Yule 1924). Simulated trees are often compared with empirical ones using the balance of the tree topology, that is, the extent to which nodes define
84
subgroups of equal size (balanced trees are also called ‘symmetrical’, while imbalanced ones are sometimes called ‘pectinate’, see Mooers and Heard 1997). It turns out that the Yule
86
model produces trees that are generally more balanced than those reconstructed from biological data (Mooers and Heard 1997). Recently, Mooers et al. (2007) used Hubbell’s
88
neutral model to produce simulated trees, and they found that it produced trees less balanced
4
than empirical ones. Hubbell (2001) had already noticed that larger θ values were associated 90
with a more even distribution of speciation events among lineages, hence neutral trees simulated with larger θ should be more balanced. Thus, there are reasons to believe that the
92
trees produced by Hubbell’s model encompass a range of balance values, if the parameter θ is free to vary more than in Mooers et al. (2007)’s study. Turning this argument around, we
94
speculate that the balance of real phylogenetic trees may provide a direct way to infer the neutral parameter θ, independently from species abundance distributions.
96 Here, we develop a new sampling theory of Hubbell’s neutral model, which takes into 98
account not only the species abundance distribution in a sample, but also the phylogenetic relatedness of these co-occurring species. We first assess whether phylogenetic trees predicted
100
by Hubbell’s neutral model are a reasonable fit for real phylogenies, in particular with respect to phylogenetic imbalance. Next, we use simulated data to show that phylogenies enable us to
102
infer the neutral parameters more precisely. We finally apply our new inference method to tropical tree datasets in the Neotropics and in South-East Asia. New parameter estimates are
104
reinterpreted in light of dispersal biology and biogeographical patterns.
106
METHODS Approximate Bayesian Computation
108
In a classical likelihood framework, inferring parameters of the neutral model using phylogenies would entail the derivation of an exact likelihood function for these data.
110
Currently, there is no simple formula for such a likelihood function. Instead, it is possible to approximate this function through simulation, using a method called Approximate Bayesian
112
Computation (ABC). This method is commonly used in population genetics (Tavaré et al.
5
1997; Beaumont et al. 2002; Marjoram & Tavaré 2006), but to our knowledge, has not been 114
employed in ecology.
116
The ABC method is useful when the exact likelihood function of a model can be approximated by a large number of independent simulations, across a wide range of model
118
parameters values drawn from a set of prior distributions. Empirical data are summarized into a set of informative statistics, called summary statistics. These summary statistics are then
120
computed for each of the simulated datasets, and compared with the values observed in the empirical dataset. Only the simulations whose summary statistics are close enough to the
122
observed values are retained. The corresponding parameter values, in our case θ and m, then form an approximate joint posterior distribution for the parameters. A computer-intensive
124
approach like the ABC method relies on the ability to quickly simulate samples under the model considered. For Hubbell’s neutral model, this is made possible by using the coalescent
126
approach (Wakeley 2007). Details on our simulation algorithm are provided in Appendix S1.
128
Choice of summary statistics Various diversity indices may be used to summarize the species abundance distribution
130
(Magurran 2004). Likewise, phylogeny-based diversity indices, and statistics describing the tree topology may be used to summarize the phylogeny of species occurring locally in the
132
community (Mooers & Heard 1997). We included a total of 24 candidate summary statistics in our preliminary tests (Table S1). To assess which statistics were the most informative, we
134
simulated 1,000,000 local communities each of size J=20,000, where J is the number of individuals in the local community. We sampled the neutral parameters with a uniform prior
136
distribution on ln(θ) (0 ≤ ln(θ) ≤ 7), and on ln(I) (0 ≤ ln(Ι) ≤ 10), where I is the scaled immigration rate defined as I=m(J-1)/(1-m) (Etienne 2005). These priors contain the range of
6
138
neutral parameters that have been previously estimated for tropical forests (Chave et al. 2006).
140 The number of species S in the local community sample was retained as the first summary 142
statistic, because in the limiting case were m=1, S is a sufficient statistic for θ (Ewens 1972). We then tested which of the other statistics were the most informative in predicting the
144
neutral parameters. Constraining the simulation outputs to a fixed value of S (in practice, for S=200 species), we regressed the 23 remaining summary statistics against ln(θ). All statistics
146
of tree imbalance were monotonically correlated with ln(θ), hence we compared their relative predictive power based on the regression coefficient of a linear model. The summary statistics
148
for the species abundance distribution presented a mode for intermediate values of ln(θ) (Fig. 1). We then compared their predictive performance based on the adjusted-R² of a quadratic
150
model. Among these statistics, two were found to be the most correlated with ln(θ) (see Results): the Shannon’s index of diversity H defined as H = −∑i p i ln( pi ) , where pi is the
152
relative abundance of species i in the sample, and Shao & Sokal’s (1990) phylogenetic tree balance statistic B1 defined as B1 = ∑i 1 /M i , where Mi is the maximal number of nodes
154
between the interior node i and the terminal species of the subtree rooted at node i, the summation being on all interior nodes except the root (Shao & Sokal 1990). More balanced
156
phylogenetic trees have higher B1 values. The definition of the other trial statistics is reported in Table S1. Regression of the statistics against ln(I) would be very similar to those against
158
ln(θ), because the two parameters θ and I are strongly negatively correlated.
160
We also assessed whether the range of phylogenetic imbalance that the neutral theory is able to predict encompassed realistic values in phylogenetic imbalance. We retrieved the first 2000
7
162
published phylogenies in TreeBASE (http://www.treebase.org/), measured the imbalance statistics B1, and compared this statistics to that obtained from phylogenies simulated with
164
Hubbell’s metacommunity model (see Appendix S3 for more details).
166
Test of the ABC method using simulated data It is impossible to ensure that the chosen set of summary statistics is optimal to infer the
168
parameter in the ABC method (Marjoram et al. 2003). However, it is possible to check that the selected summary statistics do summarize most of the information present in the data, and
170
thus lead to an efficient estimation method. We simulated 300 neutral datasets with various parameter values, and then estimated the neutral parameters by ABC. More precisely, we
172
computed a mean standard error and a bias on the estimated parameters, defined as
(ln(θ estimated ) − ln(θ simul ))2 MSE = ln 2 (θ simul )
174
Bias =
ln (θ estimated ) − ln (θ simul ) ln (θ simul )
+
1/ 2
(ln(I estimated ) − ln(I simul ))2 + ln 2 (I simul )
1/ 2
ln (I estimated ) − ln (I simul ) ln (I simul )
where the brackets represent a mean over the 300 simulated communities. Both statistics 176
describe the estimation efficiency of the ABC method. For comparison, we also computed these statistics with the estimates obtained from an exact maximum-likelihood approach
178
based on the species abundance distribution only (Etienne 2005).
180
Application to four tropical forest tree datasets The ABC method was used to infer the neutral parameters from four large tropical forest tree
182
datasets (data from Condit et al. 2006). These datasets correspond to a full census of trees greater than 10 cm in trunk diameter at breast height (dbh) in plots of 25-52 ha in size, in
184
Central Panama (Barro Colorado Island, BCI), Colombia (La Planada), peninsular Malaysia
8
(Pasoh), and Malaysia, Sarawak (Lambir). More details on these study sites may be found in 186
Losos & Leigh (2004).
188
For each site, a maximally resolved phylogenetic tree subtending the local community was generated using the software Phylomatic (Webb & Donoghue 2005). For the BCI site, we also
190
included additional published data to produce an improved phylogenetic tree (see Appendix S2). For the BCI phylogeny, about 69 % of nodes were resolved in this improved
192
phylogenetic tree, compared to 53 % with the default options of Phylomatic. The remaining polytomies were resolved randomly.
194 For each dataset, we ran the ABC method using 200,000 simulated species assemblages, with 196
a uniform prior distribution for ln(θ) (0 ≤ ln(θ) ≤ 10), and for ln(I) (0 ≤ ln(Ι) ≤ 10). To control for the possible bias due to the incomplete resolution of the phylogenies, we repeated the
198
random resolution procedure 100 times (see Discussion). For each of the 100 random resolutions of the polytomies in the observed phylogenetic tree, we selected the 200 outputs
200
for which the simulated values of S, H and B1 were closest to the real ones – by taking the euclidean distance in the space of summary statistics (S, H, B1). We thus obtained
202
200*100=20,000 points in the plane (ln(θ),ln(I)), from which we computed an approximate posterior distribution.
204 Analyses using Etienne’s (2005) maximum likelihood estimation (MLE) were performed 206
using the TeTame freeware (http://www.edb.ups-tlse.fr/equipe1/chave/tetame.htm). Postprocessing of the ABC simulations (determination of the approximate posterior distribution
208
and mode) were carried out with the R software version 2.7.0 (http://www.r-project.org/),
9
using the routine “bkde2D” of the library “KernSmooth”. All R scripts are available upon 210
request.
212
RESULTS We found that the summary statistic of species abundances with the largest correlation with
214
ln(θ) was Shannon’s index H, a measure of the evenness of the species abundance distribution. The phylogenetic summary statistic with the largest correlation with ln(θ) was
216
found to be Shao and Sokal’s (1990) B1 imbalance statistic, which measures the level of symmetry in the phylogenetic tree subtending the local community (Fig. 1). All statistics
218
based on branch lengths were poorly correlated with ln(θ) and were thus found to be uninformative in our inferential framework. Our simulations showed that large θ values lead
220
to more balanced phylogenetic trees (larger B1 values) than small θ values (Fig. 1). We also found that observed levels of phylogenetic imbalance of the 2,000 published trees examined
222
were always within the range of Hubbell’s model predictions (Fig. 2).
224
When only species abundance data were used in the ABC method – by using solely the summary statistics S and H, we found that inference by ABC was nearly as efficient as
226
Etienne’s (2005) exact MLE method. The mean standard error on the parameters was equal to 30 % with the ABC method, as compared to 25 % for the MLE method. However, our
228
inference method was more biased that the MLE method (Bias = 22 % with the ABC method versus 9 % with the MLE method).
230 In contrast, when we estimated the neutral model parameters with the ABC method using all 232
three statistics S, H, and B1 – i.e. including information on the phylogenies, the mean standard error of the inferred parameters with ABC was equal to 17 % (versus 25% with the MLE),
10
234
and the bias was of 2 % (versus 9% with the MLE). Hence, phylogenies do add relevant information that improves the quality of parameter inference.
236 Based on species abundance data only, the likelihood function for the BCI dataset has two 238
alternative likelihood maxima, the low θ and high m value being slightly more likely (Fig. 3a, Etienne et al. 2006). Similarly, the ABC method yields two alternative maxima when based
240
on the two species abundance statistics (Fig. 3b). In contrast, our new method based on all three summary statistics unambiguously selects the high θ and low m value (Fig. 3c,d). The
242
parameter θ that was selected is one order of magnitude larger than in previous analyses, and the selected m parameter is two orders of magnitude smaller (Table 1). We were able to test
244
whether this result was sensitive to the resolution of the phylogeny with the BCI dataset, since a more refined phylogenetic hypothesis is available for this site. We found that our result was
246
independent of the choice of the phylogeny (Fig. 3c,d).
248
The ABC method was also applied to the La Planada, Pasoh, and Lambir datasets (Fig 4a,b,c, respectively). For all three datasets, the ABC method yielded a single maximum likelihood
250
estimate of the neutral parameters. In La Planada, the most likely value of θ was 345, versus 30 for the MLE, and m was equal to 0.003, versus 0.28 for the MLE (Table 1). In Pasoh and
252
Lambir the values of θ were four-fold larger than in the MLE, and those of m consistently equal to 0.01 (Table 1). In sum, the addition of phylogenetic information to infer the
254
parameters of Hubbell’s model led to strikingly larger values for θ, and lower values for m as compared with previous inference methods, in all four tropical forest datasets tested here.
256
DISCUSSION 258 11
Neutrality and parameter inference 260
Etienne (2005) previously showed that the neutral model of biodiversity is endowed with an exact sampling theory, like its counterparts in population genetics (Ewens 2004; Wakeley
262
2007). This sampling theory relates the full species abundance distribution of one community sample to the neutral model parameters. However, the species abundance distribution contains
264
a limited amount of information, and it is not sufficient to jointly estimate both parameters (Etienne et al. 2006, Fig. 6). Using simulated data, we first showed that the ABC method
266
based on species abundance only provides results almost as good as Etienne’s (2005) exact MLE. The great advantage of the ABC method is that it is easily amenable to generalizations,
268
through the addition of additional summary statistics. Adding phylogenetic information via the imbalance statistic B1, we inferred the neutral parameters of simulated datasets more
270
precisely than any previous inference method. Since B1 is monotonously related to the neutral parameters, it was successful at discriminating between the two alternative maxima in the
272
likelihood profile (Fig. 6).
274
More generally the challenge of estimating the parameters of ecological models based on real data has motivated much recent research (Clark 2005). To our knowledge, our study is the
276
first to make use of Approximate Bayesian Computation to solve an ecological problem. Widely used in population genetics (Marjoram & Tavaré 2006), such computer-intensive
278
inference techniques can allow complex models to be investigated. This should provide an opportunity to expand the range of data used in tests of ecological theories (McGill et al.
280
2007).
282
Neutrality and phylogenetic imbalance
12
Hubbell’s neutral model has often been rejected outright because it was felt that it is much too 284
simplistic to even remotely reflect the complex evolutionary dynamics of species assemblages (e.g. Ricklefs 2006). Although our work does not address this point directly (see Lande et al.
286
2003; Allen & Savage 2007), we found that Hubbell’s model is able to predict phylogenetic tree imbalance. This suggests that although crude, this model may be sufficient to capture
288
basic features of the speciation-extinction balance. Classic diversification models like the Yule and Hey models, which assume an equal rate branching probability among lineages,
290
predicts consistently too balanced phylogenies (Mooers & Heard 1997). In contrast, Hubbell’s model, which assumes that the branching probability of a lineage is proportional to its
292
abundance, produces phylogenetic balance consistent with observed ones (Fig. 2, Appendix S3). This suggests that the neutral theory’s assumption of a speciation rate proportional to
294
species abundance might be a less crude diversification model than the Yule model (Webb & Pitman 2002). Since in Hubbell’s model, regional pools with largerθ have more even regional
296
species abundances (Hubbell 2001), the corresponding phylogenies are more balanced (Fig. 5). This explains why Mooers et al. (2007) found that Hubbell’s model predicted too
298
imbalanced phylogenies compared with the observed ones, since they only used small value of θ (θ=10) in their simulations.
300 Our result sheds light on studies of the phylogenetic tree shape in other species groups. For 302
instance, Heard & Cox (2007) recently compared primate phylogenies across continents, and they found that the phylogeny of New World primates was more balanced than that of Old
304
World primates. They explained this pattern as a consequence of different biogeographic histories: repeated speciation events in connection with stepwise dispersal, or massive
306
extinctions, may lead to less balanced phylogenies. Conversely, vicariance events should lead to more balanced phylogenies. However it may also be argued that South America is a richer
13
308
regional pool of primate species (S=80) than Asia (S=57), Africa (S=52), and Madagascar (S=28). Hence the more balanced phylogeny observed in South America as compared to other
310
regional phylogenies is consistent with neutral expectations, even in the absence of differential speciation or extinction mechanisms.
312
Regional assembly of tropical rainforest trees 314
We found values of θ in four large tropical forest tree plots that were up to one order of magnitude larger than estimates based on previous methods (Table 1). In Hubbell’s model, θ
316
is the product of the regional pool size and of the speciation rate. Larger values of θ therefore mean that the regional pool size is larger than previously thought, or that the speciation rate is
318
larger. Latimer et al. (2005) found remarkably comparable values of θ in the South African fynbos that they studied. To interpret their result, they reasoned that the regional pool of the
320
fynbos is roughly the Cape Floristic Region, which extends over 50000 km². This extent leads to a regional pool size of about 1.3 1011 individuals (Appendix S4). Using the same logic our
322
new estimates of θ imply that the regional pools of our neotropical tree plots should extend over areas of the size of the entire Neotropics, and the South-East Asian tree pool should
324
extend over an area of the order of the former Sunda Shelf (Appendix S4). An alternative interpretation would be that speciation rates should be extraordinarily high for trees. While
326
limited evidence would support this claim in a few species-rich groups (Richardson et al. 2001), this pattern does not seem to hold universally across tropical plant lineages
328
(Pennington & Dick 2004; Pennington et al. 2006).
330
Is a continental extent for the regional pool of tropical trees a biologically sound inference? By definition the regional pool of an ecological community is the ensemble of species likely
332
to immigrate into the local community. In tropical forests, long distance dispersal events have
14
been reported based on floristic evidence (Pennington & Dick 2004), and using molecular 334
tools (Dick et al. 2008). Although rare, these long distance dispersal events stir tropical forest pools over wide geographical scales. Further, a regional species pool extending over
336
continental scales is consistent with the fact that numerous Amazonian tree species have a wider distribution than previously thought. For instance, many of the tree species in the
338
family Sapotaceae that were previously reported as having a narrow distribution are now recognized as being pan-Amazonian species (T.C. Pennington, pers. comm.). Finally, we
340
found comparable values of θ across sites within the same continent, suggesting that these sites indeed share the same pool, and these values were also comparable across continents
342
suggesting that their regional diversity in tree species is comparable as confirmed by independent evidence (Gentry 1988; Fine & Ree 2006). In contrast, previous estimates of θ
344
were one order of magnitude larger in Asia than in South America (Chave et al. 2006).
346
A possible limitation in our analysis is due to the fact that the phylogenies used for parameter inference were not fully resolved. In order to use our inference method, we had to resolve
348
these phylogenies randomly. This may have lead to a bias towards high θ values, because randomly branched trees are more balanced than real ones (Mooers & Heard 1997). However,
350
this potential bias is unlikely to lead to the high estimated value of θ because increasing the resolution of the phylogeny at BCI did not increase the probability of selecting the low-θ peak
352
(Table 1, Fig. 3c,d).
354
Local assembly of tropical rainforest trees Another finding of our study is that the immigration rate m is much smaller than previously
356
reported for tropical forests (Table 1). This result is a direct consequence of the large regional diversities θ measured with our new method. If the regional pool has more species, then the
15
358
local community has to be more dispersal-limited from this pool to maintain the same level of local diversity. Does this result make sense biologically? To answer this question, we first
360
emphasize that in Hubbell’s model, an immigration event is not equivalent to an observed immigration event in real continuous landscapes (Alonso et al. 2006). In Hubbell’s model,
362
immigration events come from anywhere in the regional pool, so parameter m measures the amount of sampling of this regional pool. In real landscapes, immigration events mostly come
364
from close surroundings of the focal community, and only a small fraction of the regional pool is actually sampled by short-distance dispersal. In contrast, long-distance dispersal
366
events are contributed by the entire regional pool, as assumed in Hubbell’s model. Consequently, in real landscapes, one must distinguish short-distance immigration events
368
which constitute the bulk of immigration, but do not contribute much to the sampling of the regional pool, and long distance immigration events which, while rare, are likely to contribute
370
much more to the sampling of the regional pool (Nathan 2006). In this light, our estimate of m predicts that long distance dispersal events contribute at best 0.2 to 1% of the within-site
372
recruitment.
374
The consistency between the neutral parameter m and field data have often made use of average dispersal distances inferred from seed trap data, i.e. short-distance dispersal. Using
376
seed trap counts on BCI, a cross-species mean seed dispersal distance from the parent tree to the propagule’s arrival site was estimated to be 39 m (Condit et al. 2002). Using this value of
378
mean dispersal distance, Etienne (2005) computed m as the proportion of seeds in the plot coming from outside the plot, and he found that this parameter should be close to 0.1 with a
380
Gaussian seed dispersal kernel. However, as mentioned above, local dispersal should be a poor predictor of the immigration rate m. Hence, the apparent contradiction between average
382
dispersal distances measured by seed trap data, and our estimates of m, is simply resolved by
16
the fact that the latter measures long-distance dispersal. This quantity is of crucial importance 384
in the context of global change, because long-distance dispersal is what will determine the overall ability of tropical forest species to track environmental changes.
386
Perspectives 388
Deviations from the point-wise mutation model assumed here may also contribute to the observed patterns of phylogenetic tree balance. If the new assumption is that a new lineage
390
starts with more than one individual, like in Hubbell’s fission model where population are randomly split into two during speciation events (Hubbell 2001), then phylogenetic balance
392
will be higher than with the point-wise mutation model (Mooers et al. 2007). Unfortunately, models of speciation with non point-wise mutation do not possess a simple mapping with a
394
coalescent, so the present approach cannot be straightforwardly extended to more general speciation models. We hope to return to this question in the future.
396 Our study paves the road between community modeling and studies of phylogenetic structure. 398
This theme has received a great deal of attention in the recent literature, and tests have been devised to compare the phylogenetic structure of local species assemblages to randomly
400
assembled (null) communities from a species pool (Webb 2000; Webb et al. 2002). As an illustration, Webb (2000) assumed that the species pool was simply the sum of all species
402
encountered in a surrounding area, because these were considered as potential immigrants into the focal community. Yet this approach strongly depends on the size of the hypothetical
404
regional pool (Swenson et al. 2006), and on the choice of the test statistics (Hardy & Senterre 2007). Further, it makes no use of local species abundances, although species abundances
406
may be informative for testing ecological mechanisms.
17
408
A significant improvement of these tests requires building a null theory based at the individual level, rather than at the species level. This theory needs to be endowed with a
410
proper sampling theory, making no explicit reference to a regional pool, for which information on abundances is seldom available. It needs further to take into account the
412
consequences of dispersal limitation on species abundances. Finally, it needs to incorporate demographic stochasticity and to be based on several sampling units. Our work supports the
414
view that the dispersal-limited neutral theory may be used in this research program (Pennington et al. 2006). We could extend it to include multiple plots simultaneously (Jabot et
416
al. 2008). Then, by looking at patterns that have not been used to fit the model parameters, such as phylogenetic and taxonomic similarity, one could assess the biological relevance of
418
additional non-neutral processes. Such a model would provide consistent null scenarios to test the hypothesis of community phylogenetics.
420
Acknowledgements: We thank Lounès Chikhi for sharing his expertise on ABC methods, 422
and for comments on a previous version of this manuscript. We also thank Mark Beaumont, Michaël Blum, Nathan Kraft and Christophe Thébaud for useful comments on a previous
424
version of this manuscript. We thank the Editor and three anonymous reviewers for suggestions that greatly improved this article. FJ was funded by the French Ministry of
426
Agriculture. This work was funded by the ANR-Biodiversité grant BRIDGE, by a CNRSAMAZONIE grant, and by the Egide Alliance grant n° 12130ZG.
428
18
References 430 432 434 436 438 440 442 444 446 448 450 452 454 456 458 460 462 464 466 468 470 472 474 476
Allen, A.P. & Savage, V.M. (2007). Setting the absolute tempo of biodiversity dynamics. Ecol. Lett., 10, 637-646. Alonso, D., Etienne, R.S. & McKane, A.J. (2006). The merits of neutral theory. Trends Ecol. Evol., 21, 451-457. Beaumont, M.A., Zhang, W.Y. & Balding, D.J. (2002). Approximate Bayesian Computation in population genetics. Genetics, 162, 2025-2035. Chave, J., Alonso, D. & Etienne, R.S. (2006). Theoretical biology - Comparing models of species abundance. Nature, 441, E1-E1. Clark, J.S. (2005). Why environmental scientists are becoming Bayesians. Ecol. Lett., 8, 2-14. Condit, R., Pitman, N., Leigh, E.G.Jr., Chave, J., Terborgh, J., Foster, R.B., Nuñez, P.V., Aguilar, S., Valencia, R., Villa, G., et al. (2002). Beta-diversity in tropical forest trees. Science, 295, 666-669. Condit, R., Ashton, P., Bunyavejchewin, S., Dattaraja, H.S., Davies, S., Esufali, S., Ewango, C., Foster, R., Gunatilleke, I.A.U.N., Hall, P. et al. (2006). The importance of demographic niches to tree diversity. Science, 313, 98-101. Dick, C.W., Hardy, O.J., Jones, F.A. & Petit, R.J. (2008). Spatial scales of pollen and seedmediated gene flow in tropical rain forest trees. Trop. Plant Biol., 1, 20-33. Etienne, R.S. (2005). A new sampling formula for neutral biodiversity. Ecol. Lett., 8, 253260. Etienne, R.S., Latimer, A.M., Silander, J.A. & Cowling, R.M. (2006). Comment on "Neutral ecological theory reveals isolation and rapid speciation in a biodiversity hot spot". Science, 311, 610b. Ewens, W.J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol., 3, 87-112. Ewens, W.J. (2004). Mathematical Population Genetics. I. Theoretical Introduction. 2nd edn. Springer, New York, 417 pp. Fine, P.V.A. & Ree, R.H. (2006). Evidence for a time-integrated species-area effect on the latitudinal gradient in tree diversity. Am. Nat., 168, 796-804. Gentry, A.H. (1988). Changes in plant community diversity and floristic composition on environmental and geographical gradients. Ann. Miss. Bot. Gard., 75, 1-34. Hardy, O.J. & Senterre, B. (2007). Characterizing the phylogenetic structure of communities by an additive partitioning of phylogenetic diversity. J. Ecol., 95, 493-506. Heard, S.B. & Cox, G.H. (2007). The shapes of phylogenetic trees of clades, faunas, and local assemblages: exploring spatial pattern in differential diversification. Am. Nat. , 169, E107-E118. Hubbell, S.P. (2001). The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton, NJ. Hutchinson, G.E. (1959). Homage to Santa Rosalia or why are there so many kinds of animals? Am.Nat., 93, 145-159. Jabot, F., Etienne, R.S. & Chave, J. (2008). Reconciling neutral community models and environmental filtering: theory and an empirical test. Oikos, 117, 1308-1320. Lande, R., Engen, S. & Saether, B.-E. (2003). Stochastic Population Dynamics in Ecology and Conservation. Oxford Series in Ecology and Evolution, Oxford University Press, Oxford UK, 212 pp. Latimer, A.M, Silander, J.A.Jr. & Cowling, R.M. (2005). Neutral ecological theory reveals isolation and rapid speciation in a biodiversity hot spot. Science, 309, 1722-1725. Lavin, M., Schrire, B.P., Lewis, G., Pennington, R.T., Delgado-Salinas, A., Thulin, M., Hughes, C.E., Matos, A.B. & Wojciechowski, M.F. (2004). Metacommunity process
19
478 480 482 484 486 488 490 492 494 496 498 500 502 504 506 508 510 512 514 516 518 520 522 524 526
rather than continental tectonic history better explains geographically structured phylogenies in legumes. Phil. Trans. Roy. Soc. Lond. B, 359, 1509-1522. Leigh, E.G.Jr. (2007). Neutral theory: a historical perspective. J. Evol. Biol., 20, 2075-2091. Losos, J.B. (1992). The evolution of convergent structure in caribbean anolis communities. Syst. Biol., 41, 403-420. Losos, E.C. & Leigh, E.G.Jr. (2004). Tropical Forest Diversity and Dynamism. Findings from a large-scale plot network. University of Chicago Press, Chicago. MacArthur, R.H. & Wilson, E.O. (1967). The Theory of Island Biogeography. Princeton University Press, Princeton NJ, 224 pp. Magurran, A.E. (2004). Measuring Biological Diversity. Blackwell, Oxford UK, 256 pp. Marjoram, P., Molitor, J., Plagnol, V. & Tavaré, S. (2003). Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA, 100, 15324-15328. Marjoram, P. & Tavaré, S. (2006). Modern computational approaches for analyzing molecular genetic variation data. Nat. Rev. Gen., 7, 759-770. McGill, B.J., Etienne, R.S., Gray, J.S., Alonso, D., Anderson, M.J., Benecha, H.K., Dornelas, M., Enquist, B.J., Green, J.L., He, F. et al. (2007). Species abundance distributions: moving beyond single prediction theories to integration within an ecological framework. Ecol. Lett., 10, 995-1015. Mooers, A.O. & Heard, S.B. (1997). Inferring evolutionary process from phylogenetic tree shape. Quat. Rev. Biol., 72, 31-54. Mooers, A.O., Harmon, L.J., Blum, M.G.B., Wong, D.H.J. & Heard, S.B. (2007). Some models of phylogenetic tree shape. In: Reconstructing Evolution: new mathematical and computational advances (eds Gascuel, O. & Steel, M.). Oxford University Press, Oxford, pp 149-170. Nathan, R. (2006). Long-distance dispersal of plants. Science, 313, 786-788. Pennington, R.T. & Dick, C.W. (2004). The role of immigrants in the assembly of the American rainforest tree flora. Phil. Trans. Roy. Soc. B, 359, 1611-1622. Pennington, R.T., Richardson, J.E. & Lavin, M. (2006). Insights into the historical construction of species-rich biomes from dated plant phylogenies, neutral ecological theory and phylogenetic community structure. New Phytol., 172, 605-616. Richardson, J.E., Pennington, R.T., Pennington, T.C. & Hollingsworth, P.M. (2001). Rapid diversification of a species-rich genus of neotropical rainforest trees. Science, 293, 22422245. Ricklefs, R.E. (2004). A comprehensive framework for global patterns in biodiversity. Ecol. Lett., 7, 1-15. Ricklefs, R.E. (2006). The unified neutral theory of biodiversity: do the numbers add up? Ecology, 87, 1424-1431. Shao, K.T. & Sokal, R.R. (1990). Tree balance. Syst. Zool., 39, 266-276. Swenson, N.G., Enquist, B.J., Pither, J., Thompson, J. & Zimmerman, J.K. (2006). The problem and promise of scale dependency in community phylogenetics. Ecology, 87, 2418-2424. Tavaré, S., Balding, D.J., Griffiths, R.C. & Donnelly, P. (1997). Inferring coalescence times from DNA sequence data. Genetics, 145, 505-518. Wakeley, J. (2007). Coalescent theory. An Introduction. Roberts & Company Publishers, Greenwood Village. Webb, C.O. (2000). Exploring the phylogenetic structure of ecological communities: an example for rain forest trees. Am. Nat., 156, 145-155. Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and community ecology. Annu. Rev. Ecol. Sys., 33, 475-505.
20
528 530 532 534
Webb, C.O. & Donoghue, M.J. (2005). Phylomatic: tree assembly for applied phylogenetics. Mol. Ecol. Notes, 5, 181-183. Webb, C.O. & Pitman, N.C.A. (2002). Phylogenetic balance and ecological evenness. Syst. Biol., 51, 898-907. Wiens, J.J. & Donoghue, M.J. (2004). Historical biogeography, ecology and species richness. Trends Ecol. Evol., 19, 639-644. Yule, G.U. (1924). A mathematical theory of evolution based on the conclusions of Dr. J.C. Willis. Phil. Trans. Roy. Soc. Lond. B, 213, 21-87.
536 538
Supplementary Material The following supplementary material is available for this article:
540 542 544
Table S1 Summary statistics explored for the ABC method. Appendix S1: ABC algorithm. Appendix S2: Compilation of phylogenies for the BCI plot. Appendix S3: Compatibility of neutral theory with 2,000 published phylogenies. Appendix S4: Computation of the regional pool sizes for the four tropical tree plots.
546
21
548
Table 1 Neutral parameter estimates inferred from local species abundances and phylogeny in four tropical forest plots: Barro Colorado Island (BCI, Panama), La Planada (Colombia), Pasoh
550
552
and Lambir (Malaysia). Site
J
S
θ1
m1
θ2
m2
W
θE
mE
BCI*
20,788
236
724
0.002
43
0.32
0.78
48
0.14
BCI†
20,788
236
571
0.002
44
0.36
0.88
48
0.14
La Planada
14,100
164
345
0.003
31
0.39
0.8
30
0.28
Pasoh
29,257
674
534
0.01
–
–
1
194
0.07
Lambir
29,890
990
2491
0.008
–
–
1
282
0.13
J and S are the number of individuals and the number of species in the sample, respectively. The neutral parameters estimated by the ABC method are (θ1, m1). A lower peak is often observed in
554
posterior distributions, whose values are (θ2, m2). The relative weight of the mode (θ1, m1) compared to the other peak is W, defined as the number of ABC simulations in the confidence interval of the mode
556
divided by the total number of retained ABC simulations. (θE, mE) are the estimates obtained by the exact likelihood formula that uses only species abundances (Etienne 2005).
558
*phylogeny based on the compilation of phylogenies made by the software Phylomatic. In this tree, 53% of the nodes are resolved.
560
†phylogeny based on additional compilation of phylogenies for species encountered in BCI. In this revised tree, 69% of the nodes are resolved.
22
562
Figure 1 Four summary statistics in simulated datasets for different values of the regional diversity index, ln(θ). The goodness of fit between the summary statistics and ln(θ) was measured by
564
the adjusted R², and the retained statistics correspond to the best fit results. (a) Shao and Sokal’s B1 imbalance index. (b) Shannon’s diversity index H. (c) Number of nodes in the phylogeny between two
566
randomly chosen individuals Dist_Node. (d) Simpson’s index 1-D.
568
Figure 2 Compatibility of 2,000 published phylogenies with Hubbell’s neutral theory in terms of phylogenetic tree shape. Phylogenetic tree shape is measured by the statistic of imbalance B1. Each
570
dot represents a published phylogeny. The lines correspond to minimum and maximum B1 values obtained in simulated neutral phylogenies.
572 Figure 3 Posterior distributions of the neutral parameters for the BCI tropical tree dataset. (a)
574
Likelihood profile for Hubbell’s neutral model based on species abundances only, and based on Etienne’s sampling formula. Likelihood values are color-coded. Solid lines: 95 % confidence limits of
576
the parameters. (b) Posterior distribution for the neutral model using the ABC method with species abundances only (i.e. based on the two summary statistics S and H only). Solid lines: density levels
578
(approximate 95% confidence intervals). (c) Posterior distribution for the neutral model using the ABC method with both species abundances and a phylogenetic hypothesis based on the Angiosperm
580
Phylogeny Group. In this phylogeny, 53% of the nodes are resolved. Three summary statistics, S, H, and B1 were used. (d) Same as in (c), but with a better resolved phylogeny where 69% of the nodes
582
are dichotomous.
584
Figure 4 Posterior distributions of the neutral parameters in three tropical forest sites using the ABC method. (a) Posterior distribution of the neutral parameters at La Planada. Solid lines: the
586
density levels (approximate 95% confidence intervals). (b) Posterior distribution of the neutral parameters at Pasoh. (c) Posterior distribution of the neutral parameters at Lambir.
588 Figure 5 Schematic depiction of the information contained in phylogenies. Each species is
590
denoted by a different symbol. Regional pools are connected to local communities by immigration (grey arrows). When θ is large, regional pools are species rich, and have more balanced phylogenies.
23
592
Based on species abundances only, communities 1 and 2 cannot be distinguished. However, the phylogenetic imbalance of community 1 is greater than the one of community 2.
594 Figure 6 Phylogenies improve the estimation of the neutral theory’s parameters. (a) Exact
596
likelihood profile of the neutral parameters using the BCI tree species abundance data (see Fig 2a). Parameter inference leads to a confidence interval containing two equally likely maxima: one at low θ
598
and high m, the other at high θ, and low m. (b) Variation of the evenness (measured by Shannon’s H) in simulated neutral communities of the same size as in the BCI dataset. The species richness S
600
determines a maximum-likelihood ridge in the parameter space (θ,m). The evenness H contains additional information about the position of the most likely parameters along this ridge. However, since
602
this evenness is unimodally correlated to θ, two parameter combinations yield the same evenness. (c) The phylogenetic tree imbalance statistic B1 of a local community is positively related to θ. A
604
combination of the statistics H and B1 yields a single most likely parameter set (θ,m).
606
24
Figure 1
608
25
Figure 2
610
26
Figure 3
612
27
Figure 4
614
28
616
Figure 5
29
618
Figure 6
30
620
Table S1 Summary statistics, and their correlations with the neutral parameter ln(θ).
Adjusted Summary Statistic
Definition
Reference R²
S
Species richness.
Magurran (2003)
–
B1=Σ(1/Mi), where for each node i except the root, Mi Shao & Sokal B1
0.54
= maximal number of nodes between the node i and (1990) the terminal species of the tree subtended by node i. Shannon's Index. H=-Σ(Ni * ln(Ni))+N * ln(N), where
H
Ni is the abundance of species i and N is the total Magurran (2003)
0.45
abundance of the sample. Mean number of nodes connecting two individuals in This paper
0.43
Simpson's index. 1-D=1-Σ(( Ni/N)²).
Magurran (2003)
0.35
Inv(Ni)
Σ(1/Ni)
This paper
0.37
Var(Ni)
Variance of Ni
This paper
0.35
Inv(Ni²)
Σ(1/Ni²).
This paper
0.31
Webb (2000)
0.21
This paper
0.19
This paper
0.17
Sackin (1972)
0.14
This paper
0.11
Colless (1982)
0.11
Dist_node the subtending phylogenetic tree 1-D
Similar to Webb (2000)’s mean pairwise nodal Dist_node_spec
distance, except that the phylogeny of the sample alone is considered. Variance of the number of nodes connecting two
Var(node) individuals in the subtending phylogenetic tree Variance of the number of nodes connecting two Var(node)_spec species in the phylogenetic tree of the community. σ(Nbar)
Standard variation of Hi, where Hi is the number of internal nodes between species i and the root
Fourth(Ni)
4
(Σ((Ni-mean(Ni)) )/S)
1/4
Σ(ri-si), where for each node i, ri and si are the IColless numbers of terminal species in the two subtrees
31
connected by node i (with ri greater than si).
Mean(Hi), where Hi is the number of internal nodes Nbar
Sackin (1972)
0.09
This paper
0.06
Webb (2000)
0.06
Faith (1992)
<0.01
This paper
<0.01
between species i and the root. Mean phylogenetic distance between two individuals Dist_neighbour of sister species. Mean phylogenetic distance between two sister Dist_neighbour_spec species Sum of the branch lengths (lengths are normalized PD so that the tree height equals 1). Variance of the phylogenetic distance between sister Var(Dist_neigh_spec) species. ∆
+
Clarke & Mean phylogenetic distance between two species.
<0.01 Warwick (1998)
Variance of the phylogenetic distance between Var(Dist_neighbour)
This paper
<0.01
individuals belonging to sister species Mean phylogenetic distance between two
Chave et al.
individuals.
(2007)
Variance of the phylogenetic distance between
Clarke &
species.
Warwick (2001)
D
Λ
+
<0.01
<0.01
Variance of the phylogenetic distance between Var(Dist)
This paper
<0.01
individuals.
622 References: 624
Chave, J. Chust, G. & Thébaud, C. (2007). The importance of phylogenetic structure in biodiversity studies. In: Scaling Biodiversity (eds Storch, D., Marquet, P. & Brown, J.H.). Santa Fe Institute
626
Editions, pp 151-167. Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. J.
628
Appl. Ecol., 35, 523-531.
32
Clarke, K.R. & Warwick, R.M. (2001). A further biodiversity index applicable to species lists: variation
630
in taxonomic distinctness. Mar. Ecol. Progr. Ser., 216, 265-278. Colless, D.H. (1982). Phylogenetics: the theory and practice of phylogenetic systematics. Syst. Zool.,
632
31, 100-104. Faith, D.P. (1992). Conservation evaluation and phylogenetic diversity. Biol. Cons., 61, 1-10.
634
Magurran, A.E. (2003). Measuring biological diversity. Blackwell Science Ltd. Sackin, M.J. (1972). “Good” and “bad” phenograms. Syst. Zool., 21, 225-226.
636
Shao, K.T. & Sokal, R.R. (1990). Tree balance. Syst. Zool., 39, 266-276. Webb, C.O. (2000). Exploring the phylogenetic structure of ecological communities: an example for
638
rain forest trees. Am. Nat., 156, 145-155.
33
Appendix S1: Simulation algorithm. 640
Neutral theory and phylogenies In its original formulation, Hubbell’s model considers that speciation events are point-wise
642
mutations, that is, at each recruitment event in the regional pool, individuals have a small probability of belonging to an altogether new species. This is equivalent to the infinite-alleles
644
Moran model with mutation in population genetics, where backward mutations are not allowed (Ewens 2004). The infinite-alleles Moran model does not keep track of the
646
evolutionary relationships among alleles. In our model, for each speciating individual, we do keep track of the species identity from which it descends. This enables us to construct
648
evolutionary relationships among species.
650
Simulation algorithm We use a modified version of Etienne's (2005) algorithm to reconstruct the subtending
652
phylogenies of local communities from the knowledge of the neutral parameters θ and the scaled immigration rate I=m(J-1)/(1-m), where m is the immigration rate, and J is the local
654
community size. We start by computing the number of immigrating ancestors and recording their numbers of descendants. Individuals are drawn one by one. The jth individual has a
656
probability I/(j+I-1) of descending from a newly immigrating ancestor and a probability (j1)/(j+I-1) of descending from an already recorded ancestor. In the latter case, one of the j-1
658
already tagged individuals is selected at random, and its ancestor is the ancestor of the jth individual. Applying this algorithm, the immigration history for the local community of size J
660
can be reconstructed. Once the number A of ancestors is known, the forward-in-time algorithm of Stephens (2000) is used to build a dated phylogeny for the A ancestors. This
662
algorithm starts with two lineages of the same species, and every timestep Tk, where Tk-Tk-1 is an exponentially distributed time with rate parameter λk=k(k-1+θ)/2, and k is the number of
34
664
lineages, a lineage is chosen at random from the existing k, and it is split in two with probability p=(k-1)/(k-1+θ). If it is not split (probability 1-p), it is speciating into a lineage of
666
another species. This algorithm is repeated until there are A+1 lineages and the last event (a lineage split) is deleted so that there is only A lineages at the end (Stephens 2000). This
668
procedure ensures that this forward coalescent algorithm is equivalent to the backward coalescent: if the algorithm was stopped when there are A lineages, then the coalescent tree
670
would necessarily end by a lineage split whereas here, it ends either by a speciation event or by a lineage split.
672 674 676 678
References: Etienne, R.S. (2005). A new sampling formula for neutral biodiversity. Ecol. Lett., 8, 253260. Ewens, W.J. (2004). Mathematical Population Genetics. Springer, Berlin. Stephens, M. (2000). Times on trees and the age of an allele. Theor. Popul. Biol., 57, 109119.
35
Appendix S2: Improving the resolution of the phylogeny of Barro Colorado Island’s 680
tropical tree species. Phylogenetic trees are in newick format, with additional brackets [] indicating remaining
682
polytomies, and $ symbols indicating the partial resolution of certain clades. A C++ code for randomly resolving this phylogeny, in respecting these partial resolutions is available upon
684
request.
686
(([([((((Schizolobium_parahybum,(Prioria_copaifera,((Senna_dariensis,(((Enterolobium_scho mburgkii,(Abarema_macradenia,[($(((Inga_laurina,Inga_ruiziana),Inga_punctata),(Inga_oerst
688
ediana,(Inga_nobilis,Inga_sapindoides)))$,Inga_acuminata,Inga_goldmanii,Inga_marginata,In ga_pezizifera,Inga_spectabilis,Inga_umbellifera)])),Acacia_melanoceras),Tachigalia_versicol
690
or)),((((Erythrina_costaricensis,Lonchocarpus_heptaphyllus),((((Pterocarpus_belizensis,Ptero carpus_rohrii),Platypodium_elegans),Platymiscium_pinnatum),Andira_inermis)),[(Ormosia_a
692
mazonica,Ormosia_coccinea,Ormosia_macrocalyx)]),((Dipteryx_oleifera,Myrospermum_frut escens),(Swartzia_simplex_gra,Swartzia_simplex_och)))))),((([((Brosimum_alicastrum,Brosi
694
mum_guianense),[(Ficus_bullenei,Ficus_colubrinae,Ficus_costaricana,Ficus_insipida,Ficus_ maxima,Ficus_obtusifolia,Ficus_popenoei,Ficus_tonduzii,Ficus_trigonata,Ficus_yoponensis)]
696
,Maquira_guianensis,Perebea_xanthochyma,Poulsenia_armata,Sorocea_affinis)],[((Cecropia_ insignis,Cecropia_obtusifolia),Pourouma_bicolor,(Trophis_caucana,Trophis_racemosa))]),(C
698
eltis_schippii,Trema_micrantha)),Colubrina_glandulosa)),[([([(Croton_billbergianus,((Alchor nea_costaricensis,Alchornea_latifolia),(Adelia_triloba,(Acalypha_diversifolia,Acalypha_macr
700
ostachya))),((Sapium_glandulosum,Sapium_%27broadleaf%27),Hura_crepitans),Hyeronima_ alchorneoides,Margaritaria_nobilis)],((Vismia_baccifera,Vismia_macrophylla),((Marila_laxif
702
lora,Calophyllum_longifolium),(Chrysochlamis_eclipes,(Symphonia_globulifera,(Garcinia_in termedia,Garcinia_madruno))))),[([(Casearia_aculeata,Casearia_arborea,Casearia_commerson
36
704
iana,Casearia_guianensis,Casearia_sylvestris)],Hasseltia_floribunda,Lacistema_aggregatum,( Laetia_procera,Laetia_thamnia),Lindackeria_laurina,Lozania_pittieri,Tetrathylacium_johanse
706
nii,Zuelania_guidonia)],(Cassipourea_elliptica,Erythroxylum_macrophyllum),Cespedesia_spa thulata,Drypetes_standleyi,((Hirtella_americana,Hirtella_triandra),(Licania_hypoleuca,Licani
708
a_platypus)),(Hybanthus_prunifolius,Rinorea_sylvatica),Spachea_membranacea)],Maytenus_ schippii,Sloanea_terniflora)]),([([(Allophylus_psilospermus,[(Cupania_cinerea,Cupania_latifo
710
lia,Cupania_rufescens,Cupania_seemannii)],(Talisia_nervosa,Talisia_princeps))],(((Anacardi um_excelsum,Astronium_graveolens),(Spondias_mombin,Spondias_radlkoferi)),[(((Protium_
712
costaricense,Protium_panamense),Protium_tenuifolium),Tetragastris_panamensis,Trattinnicki a_aspera)]),(((Cedrela_odorata,((Trichilia_pallida,Trichilia_tuberculata),(Guarea_grandifolia,
714
(Guarea_guidonia,Guarea_sp)))),[(Zanthoxylum_acuminatum,Zanthoxylum_ekmanii,Zanthox ylum_panamense,Zanthoxylum_setulosum)]),(Picramnia_latifolia,(Quassia_amara,Simarouba
716
_amara))))],(((([(Cavanillesia_platanifolia,Ceiba_pentandra,Pseudobombax_septenatum,(Pac hira_sessilis,Pachira_quinata))],(Quararibea_asterolepis,Hampea_appendiculata)),Ochroma_p
718
yramidale),Sterculia_apetala),((Guazuma_ulmifolia,Theobroma_cacao),((Trichospermum_gal eottii,Luehea_seemannii),((Apeiba_membranacea,Apeiba_hybrid),Apeiba_tibourbou))))),((([(
720
Chamguava_schippii,[(Eugenia_coloradoensis,Eugenia_galalonensis,Eugenia_nesiotica,Euge nia_oerstediana)],Myrcia_gatunensis,Psidium_friedrichsthalianum)],Vochysia_ferruginea),[(
722
Miconia_affinis,Miconia_argentea,Miconia_elata,Miconia_hondurensis)]),(Lafoensia_punicif olia,(Terminalia_amazonia,Terminalia_oblonga))),Turpinia_occidentalis)],(([(([(Aegiphila_pa
724
namensis,(Jacaranda_copaia,(Tabebuia_guayacan,Tabebuia_rosea)),Trichanthera_gigantea)], Solanum_hayesii),((Psychotria_grandis,(Coussarea_curvigemmia,Faramea_occidentalis)),((H
726
amelia_axillaris,Guettarda_foliacea),[((Pentagonia_macrophylla,(Alseis_blackiana,Macrocne mum_roseum)),[(Randia_armata,Genipa_americana,Amaioua_corymbosa,Alibertia_edulis,To
728
coyena_pittieri)],Posoqueria_latifolia)])),(Aspidosperma_spruceanum,(Thevetia_ahouai,(Lac
37
mellea_panamensis,(Tabernaemontana_arborea,Stemmadenia_grandiflora)))),((Cordia_alliod 730
ora,Cordia_bicolor),Cordia_lasiocalyx))],(Dendropanax_arboreus,Schefflera_morototoni)),[(( Ardisia_fendleri,Stylogyne_turbacensis),(((Chrysophyllum_argenteum,Chrysophyllum_cainit
732
o),((Pouteria_fossicola,Pouteria_reticulata),Pouteria_stipitata)),Gustavia_superba),Diospyros _artanthifolia)]),(((Coccoloba_coronata,Coccoloba_manzinellensis),Triplaris_cumingiana),G
734
uapira_standleyana),(Heisteria_acuminata,Heisteria_concinna))],(((((Annona_spraguei,(Guatt eria_dumetorum,Xylopia_macrantha)),(Desmopsis_panamensis,(Mosannona_garwoodii,Uno
736
nopsis_pittieri))),((Virola_multiflora,Virola_sebifera),Virola_surinamensis)),([(Beilschmiedia _pendula,Cinnamomum_triplinerve,[(Nectandra_cissiflora,Nectandra_lineata,Nectandra_purp
738
urea,Nectandra_%27fuzzy%27)],[(Ocotea_cernua,Ocotea_oblonga,Ocotea_puberula,Ocotea_ whitei)])],(Siparuna_guianensis,Siparuna_pauciflora))),(Piper_cordulatum,Piper_reticulatum)
740
)),((((Attalea_butyracea,Elaeis_oleifera),Astrocaryum_standleyanum),Oenocarpus_mapora),S ocratea_exorrhiza));
742
References: 744
Apocynaceae: (Aspidosperma_spruceanum,(Thevetia_ahouai,(Lacmellea_panamensis,(Tabernaemontana_ar
746
borea,Stemmadenia_grandiflora))))
748
Ref: Potgieter & Albert 2001. NB: Stemmadenia placed next to Tabernaemontana because it belongs to the same tribe.
750
752
Rubiaceae:
38
((Psychotria_grandis,(Coussarea_curvigemmia,Faramea_occidentalis)),((Hamelia_axillaris,G 754
uettarda_foliacea),[((Pentagonia_macrophylla, (Alseis_blackiana,Macrocnemum_roseum)),[(Randia_armata,Genipa_americana,Amaioua_co
756
rymbosa,Alibertia_edulis,Tocoyena_pittieri)],Posoqueria_latifolia)]))
758
Ref: Rova et al. 2002.
760
Bremer & Manen 2000. Persson 2000.
762
NB: Macrocnemum placed next to Alseis because it belongs to the same tribe.
764
Arecaceae: ((((Attalea_butyracea,Elaeis_oleifera),Astrocaryum_standleyanum),Oenocarpus_mapora),Soc
766
ratea_exorrhiza)
768
Ref: Hahn 2002.
770
Annonaceae: ([(Annona_spraguei,Guatteria_dumetorum,Xylopia_macrantha)],(Desmopsis_panamensis,(M
772
osannona_garwoodii,Unonopsis_pittieri)))
774
Ref: Pirie et al. 2006.
776
Anacardiaceae: ((Anacardium_excelsum,Astronium_graveolens),(Spondias_mombin,Spondias_radlkoferi))
39
778 Ref: Pell 2004. page 66. 780 Bombacaceae: 782
(((([(Cavanillesia_platanifolia,Ceiba_pentandra,Pseudobombax_septenatum,(Pachira_sessilis, Pachira_quinata))],(Quararibea_asterolepis,Hampea_appendiculata)),Ochroma_pyramidale),
784
Sterculia_apetala),((Guazuma_ulmifolia,Theobroma_cacao),((Trichospermum_galeottii,Lueh ea_seemannii),[(Apeiba_membranacea,Apeiba_hybrid,Apeiba_tibourbou)])))
786 Ref: 788
-Baum et al. 2004. -Alverson et al. 1999.
790 Clusiaceae: 792
((Vismia_baccifera,Vismia_macrophylla),((Marila_laxiflora,Calophyllum_longifolium),(Chry sochlamis_eclipes,(Symphonia_globulifera,(Garcinia_intermedia,Garcinia_madruno)))))
794 Ref: Gustafsson et al. 2002. 796 Euphorbiaceae: 798
[(Croton_billbergianus,((Alchornea_costaricensis,Alchornea_latifolia),(Adelia_triloba,(Acaly pha_diversifolia,Acalypha_macrostachya))),((Sapium_glandulosum,
800
Sapium_broadleaf),Hura_crepitans),Hyeronima_alchorneoides,Margaritaria_nobilis)]
802
Ref: Wurdack et al. 2005.
40
804
For Fabaceae: (Schizolobium_parahybum,(Prioria_copaifera,((Senna_dariensis,(((Enterolobium_schomburg
806
kii,(Abarema_macradenia,[($(((Inga_laurina,Inga_ruiziana),Inga_punctata),(Inga_oerstediana ,(Inga_nobilis,Inga_sapindoides)))$,Inga_acuminata,Inga_goldmanii,Inga_marginata,Inga_pe
808
zizifera,Inga_spectabilis,Inga_umbellifera)])),Acacia_melanoceras),Tachigalia_versicolor)),(( ((Erythrina_costaricensis,Lonchocarpus_heptaphyllus),((((Pterocarpus_belizensis,Pterocarpus
810
_rohrii),Platypodium_elegans),Platymiscium_pinnatum),Andira_inermis)),[(Ormosia_amazon ica,Ormosia_coccinea,Ormosia_macrocalyx)]),((Dipteryx_oleifera,Myrospermum_frutescens
812
),(Swartzia_simplex_gra,Swartzia_simplex_och))))))
814
Ref: -Wojciechowski, Lavin & Sanderson 2004.
816
-Inga: Richardson et al. 2001.
818
Meliaceae: (Cedrela_odorata,((Trichilia_pallida,Trichilia_tuberculata),[(Guarea_grandifolia,
820
Guarea_guidonia,Guarea_sp)]))
822
Ref: Muellner et al. 2003.
824
Moraceae: (Sorocea_affinis,((((Perebea_xanthochyma,Poulsenia_armata),Maquira_guianensis),[(Ficus_b
826
ullenei,Ficus_colubrinae,Ficus_costaricana,Ficus_insipida,Ficus_maxima,Ficus_obtusifolia,Fi
41
cus_popenoei,Ficus_tonduzii,Ficus_trigonata,Ficus_yoponensis)]),(Brosimum_alicastrum,Br 828
osimum_guianense)))
830
Ref: Datwyler & Weiblen 2004.
832
References:
834 836 838 840 842 844 846 848 850 852 854 856 858 860 862 864 866 868
Alverson, W.S., Whitlock, B.A., Nyffeler, R., Bayer, C. & Baum, D.A. (1999). Phylogeny of the core Malvales: evidence from ndhF sequence data. Am. J. Bot., 86, 1474-1486. Baum, D.A., Dewitt Smith, S., Yen, A., Alverson, W.S., Nyffeler, R., Whitlock, B.A. & Oldham, R.L. (2004). Phylogenetic relationships of Malvatheca (Bombacoideae and Malvoideae; Malvaceae sensu lato) as inferred from plastid DNA sequences. Am. J. Bot., 91, 1863-1871. Bremer, B. & Manen, J.-F. (2000). Phylogeny and classification of the subfamily Rubioideae (Rubiaceae). Plant Syst. Evol., 225, 43-72. Datwyler, S.L. & Weiblen, G.D. (2004). On the origin of the fig: phylogenetic relationship of Moraceae from ndhF sequences. Am. J. Bot., 91, 767-777. Gustafsson, M.H.G., Bittrich, V. & Stevens, P.F. (2002). Phylogeny of Clusiaceae based on rbcL sequences. Int. J. Plant. Sci., 163, 1045-1054. Hahn, W.J. (2002). A phylogenetic analysis of the Arecoid line of palms based on plastid DNA sequence data. Mol. Phyl. Evol., 23, 189-204. Muellner, A.N., Samuel, R., Johnson, S.A., Cheek, M., Pennington, T.D. & Chase, M.W. (2003). Molecular phylogenetics of Meliaceae (Sapindales) based on nuclear and plastid DNA sequences. Am. J. Bot., 90, 471-480. Pell, S.K. (2004). Molecular systematics of the cashew family (Anacardiaceae). PhD thesis. Louisiana State University. Persson, C. (2000). Phylogeny of the Neotropical Alibertia group (Rubiaceae), with emphasis on the genus Alibertia, inferred from ITS and 5S ribosomal DNA sequences. Am. J. Bot., 87, 1018-1028. Pirie, M.D., Chatrou, L.W., Mols, J.B., Erkens, R.H.J. & Oosterhof, J. (2006). ‘Andeancentred’ genera in the short-branch clade of Annonaceae: testing biogeographical hypotheses using phylogeny reconstruction and molecular dating. J. Biogeog., 33, 31-46. Potgieter, K. & Albert, V.A. (2001). Phylogenetic relationships within Apocynaceae s.l. based on trnL Intron and trnL-F Spacer and propagule characters. Ann. Miss. Bot. Gard., 88, 523549. Richardson, J.E., Pennington, R.T., Pennington, T.D. & Hollingsworth, P.M. (2001). Rapid diversification of a species-rich genus of neotropical rain forest trees. Science, 293, 22422245. Rova, J.H.E., Delprete, P.G., Andersson, L. & Albert, V.A. (2002). A trnL-F cpDNA sequence study of the Condamineeae-Rondeletieae-Sipaneeae complex with implications on the phylogeny of the Rubiaceae. Am. J. Bot., 89, 145-159. Wojciechowski, M.F., Lavin, M. & Sanderson, M.J. (2004). A phylogeny of legumes (Leguminosae) based on analysis of the plastid matK gene resolves many well-supported subclades within the family. Am. J. Bot., 91, 1846-1862.
42
870 872
Wurdack, K.J., Hoffmann, P. & Chase, M.W. (2005). Molecular phylogenetic analysis of uniovulate Euphorbiaceae (Euphorbiaceae sensu stricto) using plastid rbcL and trnL-F DNA sequences. Am. J. Bot., 92, 1397-1420.
43
874
Appendix S3: Compatibility with neutral theory of the imbalance levels of 2,000 published phylogenies.
876 Treebase extraction and phylogenetic trees preprocessing: 878
We extracted 2,000 published phylogenies from Treebase (http://www.treebase.org/) (Accession numbers 705 to 2704, the 704 first trees in Treebase are not numbered in the same
880
way which renders their automatic retrieval less easy) using the R package apTreeshape (Bortolussi et al. 2006). Following the method of Blum and François (2006), we
882
automatically removed putative outgroups used to reconstruct these phylogenies. This was done by detecting subtrees descending directly from the root and having only one or two
884
species (R scripts available upon request). If the trees contained polytomies, they were randomly resolved using the routine “multi2di” of the R package APE (Paradis et al. 2004).
886
Out of these 2,000 trees, 1,660 contained more than 5 species, and these were used to compute the statistic B1.
888 Neutral simulations: 890
For each observed value S of the species richness in the 1,660 trees, we simulated neutral trees of the same richness produced under Hubbell’s model with various θ values (C++ code
892
available upon request). As θ decreases, the regional pool size necessary to produce a given species richness increases. To avoid computer memory saturation, we fixed 1,000,000 as a
894
limit to this size, and stopped simulations when we reached this limit. Specifically, we started by simulating 30 phylogenies with θ equal to 1,000. We then repeated this simulation step
896
after dividing θ by 2, until the 1,000,000 limit was reached. For each of these simulated neutral trees, we computed the B1 statistic. For each richness value, all these neutral simulated
898
values of B1 formed a range consistent with the neutral assumption.
44
900
902
References: Blum, M.G.B. & François, O. (2006). Which random processes describe the tree of life? A
904
large-scale study of phylogenetic tree imbalance. Syst. Biol., 55, 685-691. Bortolussi, N., Durand, E., Blum, M.G.B. & François, O. (2006). apTreeshape: statistical
906
analysis of phylogenetic tree shape. Bioinformatics, 22, 363-364. Paradis, E., Claude, J. & Strimmer, K. (2004). APE: Analyses of Phylogenetics and Evolution
908
in R language. Bioinformatics, 20, 289-290.
45
910
Appendix S4: Computation of the regional pool sizes for the four tropical tree plots. 912 This computation is based on the comparison with Latimer et al. (2005)’s results. They 914
consider the regional pool for fynbos to be the Cape Floristic Region which extends over 50000 km². They further report a density of 0.1, 0.25, 4 and 8 individuals per m² for trees,
916
large shrubs, shrubs and shrublets respectively. Thus, if one assumes that trees, large shrubs, shrubs and shrublets occupy one quarter of the area each, this leads to an average density of
918
2.6 individuals per m², and eventually to a regional pool size of Jfynbos = 1.3*1011 individuals.
920
According to Latimer et al. (2005), speciation rates should be larger in the fynbos than in tropical trees. This implies that the regional pool sizes for tropical trees observed in a plot
922
should be larger than (θplot / θfynbos) * Jfynbos. We used the value reported in the text of Latimer et al. (2005) of 697 for θfynbos.To convert, these sizes in number of individuals, we used a
924
density for tropical trees of 500 individuals per ha. Plot
θplot
Minimal Pool Size ( * 1000 km²)
926
BCI
571
2130
La Planada
345
1287
Pasoh
534
1992
Lambir
2491
9292
For comparison, Fine & Ree (2006) report a value of 9220000 km² for the Neotropics, and of 5903000 km² for the Asian Tropics. This means that regional pools for tropical trees extend
928
over continental scales.
46