Inferring the parameters of the neutral theory of biodiversity using 2
phylogenetic information, and implications for tropical forests.
Franck Jabot and Jérôme Chave
Laboratoire Evolution et Diversité Biologique CNRS / Université Paul Sabatier Bâtiment 4R3 31062 Toulouse cedex 4 France
8 Short running title: Inferring neutral parameters with phylogenies 10
Keywords: neutral theory, parameter inference, Approximate Bayesian Computation, phylogenetic imbalance, community phylogenetics, dispersal limitation, regional species pool.
Number of words in the abstract: 150 Number of words in the manuscript: 4678
Number of references: 48 Number of Tables: 1.
Number of Figures: 6. Correspondence:
Franck Jabot Laboratoire Evolution et Diversité Biologique CNRS / Université Paul Sabatier
Bâtiment 4R3 31062 Toulouse cedex 4 France Tel: 05. 61. 55. 67. 60.
Fax: 05. 61. 55. 73. 27. E-mail: [email protected]
We develop a statistical method to infer the parameters of Hubbell’s neutral model of biodiversity using data on local species abundances and their phylogenetic relatedness. This
method uses the Approximate Bayesian Computation (ABC) approach, where the data are summarized into a small number of informative summary statistics. We used three statistics:
the number of species in the sample, Shannon H index of evenness, and Shao and Sokal’s B1 index of phylogenetic tree imbalance. Our approach was found to outperform previous
methods, illustrating the potential of ABC methods in ecology. Applying it to four large tropical forest tree datasets, the best-fit immigration rates m were two orders of magnitude
smaller and regional diversities θ larger than previously reported for the same data. This implies that neutral-compatible regional pools of tropical trees should extend over continental
scales, and that m measures, in this context, mostly the frequency of long distance dispersal events.
INTRODUCTION 40 As phylogenetic trees become more widely available through molecular methods, ecologists 42
have an invaluable new dimension of information with which to investigate the mechanisms of community assembly (Losos 1992; Webb et al. 2002; Pennington et al. 2006). It has long
been suggested that ecological processes may have a distinctive fingerprint on the patterns of evolutionary relatedness of coexisting species (Hutchinson 1959; Losos 1992; Wiens &
Donoghue 2004). However, most models of species coexistence fail to take into account the phylogenetic structure of a species assemblage, as they are usually restricted to the scale at
which individuals interact physically, thus neglecting larger spatial and temporal scales (Ricklefs 2004). One exception is Hubbell’s (2001) neutral theory of biodiversity and
biogeography, which includes both local and regional processes in a single conceptual framework. Hubbell’s neutral model assumes that a local community is connected to a larger
pool of species through dispersal, much like in classic island biogeography (MacArthur & Wilson 1967). A single parameter θ, the number of species arising per generation through
speciation in the regional species pool, summarizes the diversity in this pool. A second parameter m, the fraction of recruits coming from the source pool into the local community,
describes the magnitude of dispersal limitation in the local community.
In many of the recent applications of the neutral theory, the species abundance distribution of a local assemblage has been used to test the neutral theory. Of particular relevance here is the
work of Latimer et al. (2005), who assessed the evolutionary consequences of the neutral assumption by analyzing plant species abundance distributions in the South African fynbos.
They inferred the neutral parameters and found neutral migration rates to be two orders of magnitude smaller than those found in tropical forest tree communities. Latimer et al. (2005)
also found that parameter θ, as inferred from their dataset, was much higher than in previous reports. They interpreted this result as a signature of a peculiarly high speciation rate of the
fynbos flora. However, Etienne et al. (2006) critically reassessed this finding. They commented that the inferred neutral parameters θ and m often have two nearly equally likely
values when they are inferred from species abundance data. For the tropical forest trees of Barro Colorado Island (BCI), Etienne et al. (2006) found that the maximum likelihood
estimation of the neutral parameters (θ, m) yields two different but almost equally likely maxima. This finding is a serious theoretical challenge for the neutral theory, because it
suggests that there is no way to estimate accurately these parameters. For the BCI tree dataset, it has been assumed that the parameters are close to θ=47.7 and m=0.093 (Leigh 2007). The
second combination of neutral parameters (θ = 241.9, m = 0.003, Etienne et al. 2006) would imply that the BCI forest is far more dispersal-limited, and that the regional diversity is much
higher than previously imagined.
Aside from Latimer et al. (2005), attempts to examine the neutral theory at evolutionary timescales are scarce (but see e.g. Lande et al. 2003, Lavin et al. 2004). Many studies of
macroevolutionary patterns are based on simple models of lineage diversification, such as the Yule model, which produces tree topologies assuming that all species have the same chance
to give rise to an altogether new species (Yule 1924). Simulated trees are often compared with empirical ones using the balance of the tree topology, that is, the extent to which nodes define
subgroups of equal size (balanced trees are also called ‘symmetrical’, while imbalanced ones are sometimes called ‘pectinate’, see Mooers and Heard 1997). It turns out that the Yule
model produces trees that are generally more balanced than those reconstructed from biological data (Mooers and Heard 1997). Recently, Mooers et al. (2007) used Hubbell’s
neutral model to produce simulated trees, and they found that it produced trees less balanced
than empirical ones. Hubbell (2001) had already noticed that larger θ values were associated 90
with a more even distribution of speciation events among lineages, hence neutral trees simulated with larger θ should be more balanced. Thus, there are reasons to believe that the
trees produced by Hubbell’s model encompass a range of balance values, if the parameter θ is free to vary more than in Mooers et al. (2007)’s study. Turning this argument around, we
speculate that the balance of real phylogenetic trees may provide a direct way to infer the neutral parameter θ, independently from species abundance distributions.
96 Here, we develop a new sampling theory of Hubbell’s neutral model, which takes into 98
account not only the species abundance distribution in a sample, but also the phylogenetic relatedness of these co-occurring species. We first assess whether phylogenetic trees predicted
by Hubbell’s neutral model are a reasonable fit for real phylogenies, in particular with respect to phylogenetic imbalance. Next, we use simulated data to show that phylogenies enable us to
infer the neutral parameters more precisely. We finally apply our new inference method to tropical tree datasets in the Neotropics and in South-East Asia. New parameter estimates are
reinterpreted in light of dispersal biology and biogeographical patterns.
METHODS Approximate Bayesian Computation
In a classical likelihood framework, inferring parameters of the neutral model using phylogenies would entail the derivation of an exact likelihood function for these data.
Currently, there is no simple formula for such a likelihood function. Instead, it is possible to approximate this function through simulation, using a method called Approximate Bayesian
Computation (ABC). This method is commonly used in population genetics (Tavaré et al.
1997; Beaumont et al. 2002; Marjoram & Tavaré 2006), but to our knowledge, has not been 114
employed in ecology.
The ABC method is useful when the exact likelihood function of a model can be approximated by a large number of independent simulations, across a wide range of model
parameters values drawn from a set of prior distributions. Empirical data are summarized into a set of informative statistics, called summary statistics. These summary statistics are then
computed for each of the simulated datasets, and compared with the values observed in the empirical dataset. Only the simulations whose summary statistics are close enough to the
observed values are retained. The corresponding parameter values, in our case θ and m, then form an approximate joint posterior distribution for the parameters. A computer-intensive
approach like the ABC method relies on the ability to quickly simulate samples under the model considered. For Hubbell’s neutral model, this is made possible by using the coalescent
approach (Wakeley 2007). Details on our simulation algorithm are provided in Appendix S1.
Choice of summary statistics Various diversity indices may be used to summarize the species abundance distribution
(Magurran 2004). Likewise, phylogeny-based diversity indices, and statistics describing the tree topology may be used to summarize the phylogeny of species occurring locally in the
community (Mooers & Heard 1997). We included a total of 24 candidate summary statistics in our preliminary tests (Table S1). To assess which statistics were the most informative, we
simulated 1,000,000 local communities each of size J=20,000, where J is the number of individuals in the local community. We sampled the neutral parameters with a uniform prior
distribution on ln(θ) (0 ≤ ln(θ) ≤ 7), and on ln(I) (0 ≤ ln(Ι) ≤ 10), where I is the scaled immigration rate defined as I=m(J-1)/(1-m) (Etienne 2005). These priors contain the range of
neutral parameters that have been previously estimated for tropical forests (Chave et al. 2006).
140 The number of species S in the local community sample was retained as the first summary 142
statistic, because in the limiting case were m=1, S is a sufficient statistic for θ (Ewens 1972). We then tested which of the other statistics were the most informative in predicting the
neutral parameters. Constraining the simulation outputs to a fixed value of S (in practice, for S=200 species), we regressed the 23 remaining summary statistics against ln(θ). All statistics
of tree imbalance were monotonically correlated with ln(θ), hence we compared their relative predictive power based on the regression coefficient of a linear model. The summary statistics
for the species abundance distribution presented a mode for intermediate values of ln(θ) (Fig. 1). We then compared their predictive performance based on the adjusted-R² of a quadratic
model. Among these statistics, two were found to be the most correlated with ln(θ) (see Results): the Shannon’s index of diversity H defined as H = −∑i p i ln( pi ) , where pi is the
relative abundance of species i in the sample, and Shao & Sokal’s (1990) phylogenetic tree balance statistic B1 defined as B1 = ∑i 1 /M i , where Mi is the maximal number of nodes
between the interior node i and the terminal species of the subtree rooted at node i, the summation being on all interior nodes except the root (Shao & Sokal 1990). More balanced
phylogenetic trees have higher B1 values. The definition of the other trial statistics is reported in Table S1. Regression of the statistics against ln(I) would be very similar to those against
ln(θ), because the two parameters θ and I are strongly negatively correlated.
We also assessed whether the range of phylogenetic imbalance that the neutral theory is able to predict encompassed realistic values in phylogenetic imbalance. We retrieved the first 2000
published phylogenies in TreeBASE (http://www.treebase.org/), measured the imbalance statistics B1, and compared this statistics to that obtained from phylogenies simulated with
Hubbell’s metacommunity model (see Appendix S3 for more details).
Test of the ABC method using simulated data It is impossible to ensure that the chosen set of summary statistics is optimal to infer the
parameter in the ABC method (Marjoram et al. 2003). However, it is possible to check that the selected summary statistics do summarize most of the information present in the data, and
thus lead to an efficient estimation method. We simulated 300 neutral datasets with various parameter values, and then estimated the neutral parameters by ABC. More precisely, we
computed a mean standard error and a bias on the estimated parameters, defined as
(ln(θ estimated ) − ln(θ simul ))2 MSE = ln 2 (θ simul )
ln (θ estimated ) − ln (θ simul ) ln (θ simul )
(ln(I estimated ) − ln(I simul ))2 + ln 2 (I simul )
ln (I estimated ) − ln (I simul ) ln (I simul )
where the brackets represent a mean over the 300 simulated communities. Both statistics 176
describe the estimation efficiency of the ABC method. For comparison, we also computed these statistics with the estimates obtained from an exact maximum-likelihood approach
based on the species abundance distribution only (Etienne 2005).
Application to four tropical forest tree datasets The ABC method was used to infer the neutral parameters from four large tropical forest tree
datasets (data from Condit et al. 2006). These datasets correspond to a full census of trees greater than 10 cm in trunk diameter at breast height (dbh) in plots of 25-52 ha in size, in
Central Panama (Barro Colorado Island, BCI), Colombia (La Planada), peninsular Malaysia
(Pasoh), and Malaysia, Sarawak (Lambir). More details on these study sites may be found in 186
Losos & Leigh (2004).
For each site, a maximally resolved phylogenetic tree subtending the local community was generated using the software Phylomatic (Webb & Donoghue 2005). For the BCI site, we also
included additional published data to produce an improved phylogenetic tree (see Appendix S2). For the BCI phylogeny, about 69 % of nodes were resolved in this improved
phylogenetic tree, compared to 53 % with the default options of Phylomatic. The remaining polytomies were resolved randomly.
194 For each dataset, we ran the ABC method using 200,000 simulated species assemblages, with 196
a uniform prior distribution for ln(θ) (0 ≤ ln(θ) ≤ 10), and for ln(I) (0 ≤ ln(Ι) ≤ 10). To control for the possible bias due to the incomplete resolution of the phylogenies, we repeated the
random resolution procedure 100 times (see Discussion). For each of the 100 random resolutions of the polytomies in the observed phylogenetic tree, we selected the 200 outputs
for which the simulated values of S, H and B1 were closest to the real ones – by taking the euclidean distance in the space of summary statistics (S, H, B1). We thus obtained
200*100=20,000 points in the plane (ln(θ),ln(I)), from which we computed an approximate posterior distribution.
204 Analyses using Etienne’s (2005) maximum likelihood estimation (MLE) were performed 206
using the TeTame freeware (http://www.edb.ups-tlse.fr/equipe1/chave/tetame.htm). Postprocessing of the ABC simulations (determination of the approximate posterior distribution
and mode) were carried out with the R software version 2.7.0 (http://www.r-project.org/),
using the routine “bkde2D” of the library “KernSmooth”. All R scripts are available upon 210
RESULTS We found that the summary statistic of species abundances with the largest correlation with
ln(θ) was Shannon’s index H, a measure of the evenness of the species abundance distribution. The phylogenetic summary statistic with the largest correlation with ln(θ) was
found to be Shao and Sokal’s (1990) B1 imbalance statistic, which measures the level of symmetry in the phylogenetic tree subtending the local community (Fig. 1). All statistics
based on branch lengths were poorly correlated with ln(θ) and were thus found to be uninformative in our inferential framework. Our simulations showed that large θ values lead
to more balanced phylogenetic trees (larger B1 values) than small θ values (Fig. 1). We also found that observed levels of phylogenetic imbalance of the 2,000 published trees examined
were always within the range of Hubbell’s model predictions (Fig. 2).
When only species abundance data were used in the ABC method – by using solely the summary statistics S and H, we found that inference by ABC was nearly as efficient as
Etienne’s (2005) exact MLE method. The mean standard error on the parameters was equal to 30 % with the ABC method, as compared to 25 % for the MLE method. However, our
inference method was more biased that the MLE method (Bias = 22 % with the ABC method versus 9 % with the MLE method).
230 In contrast, when we estimated the neutral model parameters with the ABC method using all 232
three statistics S, H, and B1 – i.e. including information on the phylogenies, the mean standard error of the inferred parameters with ABC was equal to 17 % (versus 25% with the MLE),
and the bias was of 2 % (versus 9% with the MLE). Hence, phylogenies do add relevant information that improves the quality of parameter inference.
236 Based on species abundance data only, the likelihood function for the BCI dataset has two 238
alternative likelihood maxima, the low θ and high m value being slightly more likely (Fig. 3a, Etienne et al. 2006). Similarly, the ABC method yields two alternative maxima when based
on the two species abundance statistics (Fig. 3b). In contrast, our new method based on all three summary statistics unambiguously selects the high θ and low m value (Fig. 3c,d). The
parameter θ that was selected is one order of magnitude larger than in previous analyses, and the selected m parameter is two orders of magnitude smaller (Table 1). We were able to test
whether this result was sensitive to the resolution of the phylogeny with the BCI dataset, since a more refined phylogenetic hypothesis is available for this site. We found that our result was
independent of the choice of the phylogeny (Fig. 3c,d).
The ABC method was also applied to the La Planada, Pasoh, and Lambir datasets (Fig 4a,b,c, respectively). For all three datasets, the ABC method yielded a single maximum likelihood
estimate of the neutral parameters. In La Planada, the most likely value of θ was 345, versus 30 for the MLE, and m was equal to 0.003, versus 0.28 for the MLE (Table 1). In Pasoh and
Lambir the values of θ were four-fold larger than in the MLE, and those of m consistently equal to 0.01 (Table 1). In sum, the addition of phylogenetic information to infer the
parameters of Hubbell’s model led to strikingly larger values for θ, and lower values for m as compared with previous inference methods, in all four tropical forest datasets tested here.
DISCUSSION 258 11
Neutrality and parameter inference 260
Etienne (2005) previously showed that the neutral model of biodiversity is endowed with an exact sampling theory, like its counterparts in population genetics (Ewens 2004; Wakeley
2007). This sampling theory relates the full species abundance distribution of one community sample to the neutral model parameters. However, the species abundance distribution contains
a limited amount of information, and it is not sufficient to jointly estimate both parameters (Etienne et al. 2006, Fig. 6). Using simulated data, we first showed that the ABC method
based on species abundance only provides results almost as good as Etienne’s (2005) exact MLE. The great advantage of the ABC method is that it is easily amenable to generalizations,
through the addition of additional summary statistics. Adding phylogenetic information via the imbalance statistic B1, we inferred the neutral parameters of simulated datasets more
precisely than any previous inference method. Since B1 is monotonously related to the neutral parameters, it was successful at discriminating between the two alternative maxima in the
likelihood profile (Fig. 6).
More generally the challenge of estimating the parameters of ecological models based on real data has motivated much recent research (Clark 2005). To our knowledge, our study is the
first to make use of Approximate Bayesian Computation to solve an ecological problem. Widely used in population genetics (Marjoram & Tavaré 2006), such computer-intensive
inference techniques can allow complex models to be investigated. This should provide an opportunity to expand the range of data used in tests of ecological theories (McGill et al.
Neutrality and phylogenetic imbalance
Hubbell’s neutral model has often been rejected outright because it was felt that it is much too 284
simplistic to even remotely reflect the complex evolutionary dynamics of species assemblages (e.g. Ricklefs 2006). Although our work does not address this point directly (see Lande et al.
2003; Allen & Savage 2007), we found that Hubbell’s model is able to predict phylogenetic tree imbalance. This suggests that although crude, this model may be sufficient to capture
basic features of the speciation-extinction balance. Classic diversification models like the Yule and Hey models, which assume an equal rate branching probability among lineages,
predicts consistently too balanced phylogenies (Mooers & Heard 1997). In contrast, Hubbell’s model, which assumes that the branching probability of a lineage is proportional to its
abundance, produces phylogenetic balance consistent with observed ones (Fig. 2, Appendix S3). This suggests that the neutral theory’s assumption of a speciation rate proportional to
species abundance might be a less crude diversification model than the Yule model (Webb & Pitman 2002). Since in Hubbell’s model, regional pools with largerθ have more even regional
species abundances (Hubbell 2001), the corresponding phylogenies are more balanced (Fig. 5). This explains why Mooers et al. (2007) found that Hubbell’s model predicted too
imbalanced phylogenies compared with the observed ones, since they only used small value of θ (θ=10) in their simulations.
300 Our result sheds light on studies of the phylogenetic tree shape in other species groups. For 302
instance, Heard & Cox (2007) recently compared primate phylogenies across continents, and they found that the phylogeny of New World primates was more balanced than that of Old
World primates. They explained this pattern as a consequence of different biogeographic histories: repeated speciation events in connection with stepwise dispersal, or massive
extinctions, may lead to less balanced phylogenies. Conversely, vicariance events should lead to more balanced phylogenies. However it may also be argued that South America is a richer
regional pool of primate species (S=80) than Asia (S=57), Africa (S=52), and Madagascar (S=28). Hence the more balanced phylogeny observed in South America as compared to other
regional phylogenies is consistent with neutral expectations, even in the absence of differential speciation or extinction mechanisms.
Regional assembly of tropical rainforest trees 314
We found values of θ in four large tropical forest tree plots that were up to one order of magnitude larger than estimates based on previous methods (Table 1). In Hubbell’s model, θ
is the product of the regional pool size and of the speciation rate. Larger values of θ therefore mean that the regional pool size is larger than previously thought, or that the speciation rate is
larger. Latimer et al. (2005) found remarkably comparable values of θ in the South African fynbos that they studied. To interpret their result, they reasoned that the regional pool of the
fynbos is roughly the Cape Floristic Region, which extends over 50000 km². This extent leads to a regional pool size of about 1.3 1011 individuals (Appendix S4). Using the same logic our
new estimates of θ imply that the regional pools of our neotropical tree plots should extend over areas of the size of the entire Neotropics, and the South-East Asian tree pool should
extend over an area of the order of the former Sunda Shelf (Appendix S4). An alternative interpretation would be that speciation rates should be extraordinarily high for trees. While
limited evidence would support this claim in a few species-rich groups (Richardson et al. 2001), this pattern does not seem to hold universally across tropical plant lineages
(Pennington & Dick 2004; Pennington et al. 2006).
Is a continental extent for the regional pool of tropical trees a biologically sound inference? By definition the regional pool of an ecological community is the ensemble of species likely
to immigrate into the local community. In tropical forests, long distance dispersal events have
been reported based on floristic evidence (Pennington & Dick 2004), and using molecular 334
tools (Dick et al. 2008). Although rare, these long distance dispersal events stir tropical forest pools over wide geographical scales. Further, a regional species pool extending over
continental scales is consistent with the fact that numerous Amazonian tree species have a wider distribution than previously thought. For instance, many of the tree species in the
family Sapotaceae that were previously reported as having a narrow distribution are now recognized as being pan-Amazonian species (T.C. Pennington, pers. comm.). Finally, we
found comparable values of θ across sites within the same continent, suggesting that these sites indeed share the same pool, and these values were also comparable across continents
suggesting that their regional diversity in tree species is comparable as confirmed by independent evidence (Gentry 1988; Fine & Ree 2006). In contrast, previous estimates of θ
were one order of magnitude larger in Asia than in South America (Chave et al. 2006).
A possible limitation in our analysis is due to the fact that the phylogenies used for parameter inference were not fully resolved. In order to use our inference method, we had to resolve
these phylogenies randomly. This may have lead to a bias towards high θ values, because randomly branched trees are more balanced than real ones (Mooers & Heard 1997). However,
this potential bias is unlikely to lead to the high estimated value of θ because increasing the resolution of the phylogeny at BCI did not increase the probability of selecting the low-θ peak
(Table 1, Fig. 3c,d).
Local assembly of tropical rainforest trees Another finding of our study is that the immigration rate m is much smaller than previously
reported for tropical forests (Table 1). This result is a direct consequence of the large regional diversities θ measured with our new method. If the regional pool has more species, then the
local community has to be more dispersal-limited from this pool to maintain the same level of local diversity. Does this result make sense biologically? To answer this question, we first
emphasize that in Hubbell’s model, an immigration event is not equivalent to an observed immigration event in real continuous landscapes (Alonso et al. 2006). In Hubbell’s model,
immigration events come from anywhere in the regional pool, so parameter m measures the amount of sampling of this regional pool. In real landscapes, immigration events mostly come
from close surroundings of the focal community, and only a small fraction of the regional pool is actually sampled by short-distance dispersal. In contrast, long-distance dispersal
events are contributed by the entire regional pool, as assumed in Hubbell’s model. Consequently, in real landscapes, one must distinguish short-distance immigration events
which constitute the bulk of immigration, but do not contribute much to the sampling of the regional pool, and long distance immigration events which, while rare, are likely to contribute
much more to the sampling of the regional pool (Nathan 2006). In this light, our estimate of m predicts that long distance dispersal events contribute at best 0.2 to 1% of the within-site
The consistency between the neutral parameter m and field data have often made use of average dispersal distances inferred from seed trap data, i.e. short-distance dispersal. Using
seed trap counts on BCI, a cross-species mean seed dispersal distance from the parent tree to the propagule’s arrival site was estimated to be 39 m (Condit et al. 2002). Using this value of
mean dispersal distance, Etienne (2005) computed m as the proportion of seeds in the plot coming from outside the plot, and he found that this parameter should be close to 0.1 with a
Gaussian seed dispersal kernel. However, as mentioned above, local dispersal should be a poor predictor of the immigration rate m. Hence, the apparent contradiction between average
dispersal distances measured by seed trap data, and our estimates of m, is simply resolved by
the fact that the latter measures long-distance dispersal. This quantity is of crucial importance 384
in the context of global change, because long-distance dispersal is what will determine the overall ability of tropical forest species to track environmental changes.
Deviations from the point-wise mutation model assumed here may also contribute to the observed patterns of phylogenetic tree balance. If the new assumption is that a new lineage
starts with more than one individual, like in Hubbell’s fission model where population are randomly split into two during speciation events (Hubbell 2001), then phylogenetic balance
will be higher than with the point-wise mutation model (Mooers et al. 2007). Unfortunately, models of speciation with non point-wise mutation do not possess a simple mapping with a
coalescent, so the present approach cannot be straightforwardly extended to more general speciation models. We hope to return to this question in the future.
396 Our study paves the road between community modeling and studies of phylogenetic structure. 398
This theme has received a great deal of attention in the recent literature, and tests have been devised to compare the phylogenetic structure of local species assemblages to randomly
assembled (null) communities from a species pool (Webb 2000; Webb et al. 2002). As an illustration, Webb (2000) assumed that the species pool was simply the sum of all species
encountered in a surrounding area, because these were considered as potential immigrants into the focal community. Yet this approach strongly depends on the size of the hypothetical
regional pool (Swenson et al. 2006), and on the choice of the test statistics (Hardy & Senterre 2007). Further, it makes no use of local species abundances, although species abundances
may be informative for testing ecological mechanisms.
A significant improvement of these tests requires building a null theory based at the individual level, rather than at the species level. This theory needs to be endowed with a
proper sampling theory, making no explicit reference to a regional pool, for which information on abundances is seldom available. It needs further to take into account the
consequences of dispersal limitation on species abundances. Finally, it needs to incorporate demographic stochasticity and to be based on several sampling units. Our work supports the
view that the dispersal-limited neutral theory may be used in this research program (Pennington et al. 2006). We could extend it to include multiple plots simultaneously (Jabot et
al. 2008). Then, by looking at patterns that have not been used to fit the model parameters, such as phylogenetic and taxonomic similarity, one could assess the biological relevance of
additional non-neutral processes. Such a model would provide consistent null scenarios to test the hypothesis of community phylogenetics.
Acknowledgements: We thank Lounès Chikhi for sharing his expertise on ABC methods, 422
and for comments on a previous version of this manuscript. We also thank Mark Beaumont, Michaël Blum, Nathan Kraft and Christophe Thébaud for useful comments on a previous
version of this manuscript. We thank the Editor and three anonymous reviewers for suggestions that greatly improved this article. FJ was funded by the French Ministry of
Agriculture. This work was funded by the ANR-Biodiversité grant BRIDGE, by a CNRSAMAZONIE grant, and by the Egide Alliance grant n° 12130ZG.
References 430 432 434 436 438 440 442 444 446 448 450 452 454 456 458 460 462 464 466 468 470 472 474 476
Allen, A.P. & Savage, V.M. (2007). Setting the absolute tempo of biodiversity dynamics. Ecol. Lett., 10, 637-646. Alonso, D., Etienne, R.S. & McKane, A.J. (2006). The merits of neutral theory. Trends Ecol. Evol., 21, 451-457. Beaumont, M.A., Zhang, W.Y. & Balding, D.J. (2002). Approximate Bayesian Computation in population genetics. Genetics, 162, 2025-2035. Chave, J., Alonso, D. & Etienne, R.S. (2006). Theoretical biology - Comparing models of species abundance. Nature, 441, E1-E1. Clark, J.S. (2005). Why environmental scientists are becoming Bayesians. Ecol. Lett., 8, 2-14. Condit, R., Pitman, N., Leigh, E.G.Jr., Chave, J., Terborgh, J., Foster, R.B., Nuñez, P.V., Aguilar, S., Valencia, R., Villa, G., et al. (2002). Beta-diversity in tropical forest trees. Science, 295, 666-669. Condit, R., Ashton, P., Bunyavejchewin, S., Dattaraja, H.S., Davies, S., Esufali, S., Ewango, C., Foster, R., Gunatilleke, I.A.U.N., Hall, P. et al. (2006). The importance of demographic niches to tree diversity. Science, 313, 98-101. Dick, C.W., Hardy, O.J., Jones, F.A. & Petit, R.J. (2008). Spatial scales of pollen and seedmediated gene flow in tropical rain forest trees. Trop. Plant Biol., 1, 20-33. Etienne, R.S. (2005). A new sampling formula for neutral biodiversity. Ecol. Lett., 8, 253260. Etienne, R.S., Latimer, A.M., Silander, J.A. & Cowling, R.M. (2006). Comment on "Neutral ecological theory reveals isolation and rapid speciation in a biodiversity hot spot". Science, 311, 610b. Ewens, W.J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol., 3, 87-112. Ewens, W.J. (2004). Mathematical Population Genetics. I. Theoretical Introduction. 2nd edn. Springer, New York, 417 pp. Fine, P.V.A. & Ree, R.H. (2006). Evidence for a time-integrated species-area effect on the latitudinal gradient in tree diversity. Am. Nat., 168, 796-804. Gentry, A.H. (1988). Changes in plant community diversity and floristic composition on environmental and geographical gradients. Ann. Miss. Bot. Gard., 75, 1-34. Hardy, O.J. & Senterre, B. (2007). Characterizing the phylogenetic structure of communities by an additive partitioning of phylogenetic diversity. J. Ecol., 95, 493-506. Heard, S.B. & Cox, G.H. (2007). The shapes of phylogenetic trees of clades, faunas, and local assemblages: exploring spatial pattern in differential diversification. Am. Nat. , 169, E107-E118. Hubbell, S.P. (2001). The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton, NJ. Hutchinson, G.E. (1959). Homage to Santa Rosalia or why are there so many kinds of animals? Am.Nat., 93, 145-159. Jabot, F., Etienne, R.S. & Chave, J. (2008). Reconciling neutral community models and environmental filtering: theory and an empirical test. Oikos, 117, 1308-1320. Lande, R., Engen, S. & Saether, B.-E. (2003). Stochastic Population Dynamics in Ecology and Conservation. Oxford Series in Ecology and Evolution, Oxford University Press, Oxford UK, 212 pp. Latimer, A.M, Silander, J.A.Jr. & Cowling, R.M. (2005). Neutral ecological theory reveals isolation and rapid speciation in a biodiversity hot spot. Science, 309, 1722-1725. Lavin, M., Schrire, B.P., Lewis, G., Pennington, R.T., Delgado-Salinas, A., Thulin, M., Hughes, C.E., Matos, A.B. & Wojciechowski, M.F. (2004). Metacommunity process
478 480 482 484 486 488 490 492 494 496 498 500 502 504 506 508 510 512 514 516 518 520 522 524 526
rather than continental tectonic history better explains geographically structured phylogenies in legumes. Phil. Trans. Roy. Soc. Lond. B, 359, 1509-1522. Leigh, E.G.Jr. (2007). Neutral theory: a historical perspective. J. Evol. Biol., 20, 2075-2091. Losos, J.B. (1992). The evolution of convergent structure in caribbean anolis communities. Syst. Biol., 41, 403-420. Losos, E.C. & Leigh, E.G.Jr. (2004). Tropical Forest Diversity and Dynamism. Findings from a large-scale plot network. University of Chicago Press, Chicago. MacArthur, R.H. & Wilson, E.O. (1967). The Theory of Island Biogeography. Princeton University Press, Princeton NJ, 224 pp. Magurran, A.E. (2004). Measuring Biological Diversity. Blackwell, Oxford UK, 256 pp. Marjoram, P., Molitor, J., Plagnol, V. & Tavaré, S. (2003). Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA, 100, 15324-15328. Marjoram, P. & Tavaré, S. (2006). Modern computational approaches for analyzing molecular genetic variation data. Nat. Rev. Gen., 7, 759-770. McGill, B.J., Etienne, R.S., Gray, J.S., Alonso, D., Anderson, M.J., Benecha, H.K., Dornelas, M., Enquist, B.J., Green, J.L., He, F. et al. (2007). Species abundance distributions: moving beyond single prediction theories to integration within an ecological framework. Ecol. Lett., 10, 995-1015. Mooers, A.O. & Heard, S.B. (1997). Inferring evolutionary process from phylogenetic tree shape. Quat. Rev. Biol., 72, 31-54. Mooers, A.O., Harmon, L.J., Blum, M.G.B., Wong, D.H.J. & Heard, S.B. (2007). Some models of phylogenetic tree shape. In: Reconstructing Evolution: new mathematical and computational advances (eds Gascuel, O. & Steel, M.). Oxford University Press, Oxford, pp 149-170. Nathan, R. (2006). Long-distance dispersal of plants. Science, 313, 786-788. Pennington, R.T. & Dick, C.W. (2004). The role of immigrants in the assembly of the American rainforest tree flora. Phil. Trans. Roy. Soc. B, 359, 1611-1622. Pennington, R.T., Richardson, J.E. & Lavin, M. (2006). Insights into the historical construction of species-rich biomes from dated plant phylogenies, neutral ecological theory and phylogenetic community structure. New Phytol., 172, 605-616. Richardson, J.E., Pennington, R.T., Pennington, T.C. & Hollingsworth, P.M. (2001). Rapid diversification of a species-rich genus of neotropical rainforest trees. Science, 293, 22422245. Ricklefs, R.E. (2004). A comprehensive framework for global patterns in biodiversity. Ecol. Lett., 7, 1-15. Ricklefs, R.E. (2006). The unified neutral theory of biodiversity: do the numbers add up? Ecology, 87, 1424-1431. Shao, K.T. & Sokal, R.R. (1990). Tree balance. Syst. Zool., 39, 266-276. Swenson, N.G., Enquist, B.J., Pither, J., Thompson, J. & Zimmerman, J.K. (2006). The problem and promise of scale dependency in community phylogenetics. Ecology, 87, 2418-2424. Tavaré, S., Balding, D.J., Griffiths, R.C. & Donnelly, P. (1997). Inferring coalescence times from DNA sequence data. Genetics, 145, 505-518. Wakeley, J. (2007). Coalescent theory. An Introduction. Roberts & Company Publishers, Greenwood Village. Webb, C.O. (2000). Exploring the phylogenetic structure of ecological communities: an example for rain forest trees. Am. Nat., 156, 145-155. Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and community ecology. Annu. Rev. Ecol. Sys., 33, 475-505.
528 530 532 534
Webb, C.O. & Donoghue, M.J. (2005). Phylomatic: tree assembly for applied phylogenetics. Mol. Ecol. Notes, 5, 181-183. Webb, C.O. & Pitman, N.C.A. (2002). Phylogenetic balance and ecological evenness. Syst. Biol., 51, 898-907. Wiens, J.J. & Donoghue, M.J. (2004). Historical biogeography, ecology and species richness. Trends Ecol. Evol., 19, 639-644. Yule, G.U. (1924). A mathematical theory of evolution based on the conclusions of Dr. J.C. Willis. Phil. Trans. Roy. Soc. Lond. B, 213, 21-87.
Supplementary Material The following supplementary material is available for this article:
540 542 544
Table S1 Summary statistics explored for the ABC method. Appendix S1: ABC algorithm. Appendix S2: Compilation of phylogenies for the BCI plot. Appendix S3: Compatibility of neutral theory with 2,000 published phylogenies. Appendix S4: Computation of the regional pool sizes for the four tropical tree plots.
Table 1 Neutral parameter estimates inferred from local species abundances and phylogeny in four tropical forest plots: Barro Colorado Island (BCI, Panama), La Planada (Colombia), Pasoh
and Lambir (Malaysia). Site
J and S are the number of individuals and the number of species in the sample, respectively. The neutral parameters estimated by the ABC method are (θ1, m1). A lower peak is often observed in
posterior distributions, whose values are (θ2, m2). The relative weight of the mode (θ1, m1) compared to the other peak is W, defined as the number of ABC simulations in the confidence interval of the mode
divided by the total number of retained ABC simulations. (θE, mE) are the estimates obtained by the exact likelihood formula that uses only species abundances (Etienne 2005).
*phylogeny based on the compilation of phylogenies made by the software Phylomatic. In this tree, 53% of the nodes are resolved.
†phylogeny based on additional compilation of phylogenies for species encountered in BCI. In this revised tree, 69% of the nodes are resolved.
Figure 1 Four summary statistics in simulated datasets for different values of the regional diversity index, ln(θ). The goodness of fit between the summary statistics and ln(θ) was measured by
the adjusted R², and the retained statistics correspond to the best fit results. (a) Shao and Sokal’s B1 imbalance index. (b) Shannon’s diversity index H. (c) Number of nodes in the phylogeny between two
randomly chosen individuals Dist_Node. (d) Simpson’s index 1-D.
Figure 2 Compatibility of 2,000 published phylogenies with Hubbell’s neutral theory in terms of phylogenetic tree shape. Phylogenetic tree shape is measured by the statistic of imbalance B1. Each
dot represents a published phylogeny. The lines correspond to minimum and maximum B1 values obtained in simulated neutral phylogenies.
572 Figure 3 Posterior distributions of the neutral parameters for the BCI tropical tree dataset. (a)
Likelihood profile for Hubbell’s neutral model based on species abundances only, and based on Etienne’s sampling formula. Likelihood values are color-coded. Solid lines: 95 % confidence limits of
the parameters. (b) Posterior distribution for the neutral model using the ABC method with species abundances only (i.e. based on the two summary statistics S and H only). Solid lines: density levels
(approximate 95% confidence intervals). (c) Posterior distribution for the neutral model using the ABC method with both species abundances and a phylogenetic hypothesis based on the Angiosperm
Phylogeny Group. In this phylogeny, 53% of the nodes are resolved. Three summary statistics, S, H, and B1 were used. (d) Same as in (c), but with a better resolved phylogeny where 69% of the nodes
Figure 4 Posterior distributions of the neutral parameters in three tropical forest sites using the ABC method. (a) Posterior distribution of the neutral parameters at La Planada. Solid lines: the
density levels (approximate 95% confidence intervals). (b) Posterior distribution of the neutral parameters at Pasoh. (c) Posterior distribution of the neutral parameters at Lambir.
588 Figure 5 Schematic depiction of the information contained in phylogenies. Each species is
denoted by a different symbol. Regional pools are connected to local communities by immigration (grey arrows). When θ is large, regional pools are species rich, and have more balanced phylogenies.
Based on species abundances only, communities 1 and 2 cannot be distinguished. However, the phylogenetic imbalance of community 1 is greater than the one of community 2.
594 Figure 6 Phylogenies improve the estimation of the neutral theory’s parameters. (a) Exact
likelihood profile of the neutral parameters using the BCI tree species abundance data (see Fig 2a). Parameter inference leads to a confidence interval containing two equally likely maxima: one at low θ
and high m, the other at high θ, and low m. (b) Variation of the evenness (measured by Shannon’s H) in simulated neutral communities of the same size as in the BCI dataset. The species richness S
determines a maximum-likelihood ridge in the parameter space (θ,m). The evenness H contains additional information about the position of the most likely parameters along this ridge. However, since
this evenness is unimodally correlated to θ, two parameter combinations yield the same evenness. (c) The phylogenetic tree imbalance statistic B1 of a local community is positively related to θ. A
combination of the statistics H and B1 yields a single most likely parameter set (θ,m).
Table S1 Summary statistics, and their correlations with the neutral parameter ln(θ).
Adjusted Summary Statistic
B1=Σ(1/Mi), where for each node i except the root, Mi Shao & Sokal B1
= maximal number of nodes between the node i and (1990) the terminal species of the tree subtended by node i. Shannon's Index. H=-Σ(Ni * ln(Ni))+N * ln(N), where
Ni is the abundance of species i and N is the total Magurran (2003)
abundance of the sample. Mean number of nodes connecting two individuals in This paper
Simpson's index. 1-D=1-Σ(( Ni/N)²).
Variance of Ni
Dist_node the subtending phylogenetic tree 1-D
Similar to Webb (2000)’s mean pairwise nodal Dist_node_spec
distance, except that the phylogeny of the sample alone is considered. Variance of the number of nodes connecting two
Var(node) individuals in the subtending phylogenetic tree Variance of the number of nodes connecting two Var(node)_spec species in the phylogenetic tree of the community. σ(Nbar)
Standard variation of Hi, where Hi is the number of internal nodes between species i and the root
Σ(ri-si), where for each node i, ri and si are the IColless numbers of terminal species in the two subtrees
connected by node i (with ri greater than si).
Mean(Hi), where Hi is the number of internal nodes Nbar
between species i and the root. Mean phylogenetic distance between two individuals Dist_neighbour of sister species. Mean phylogenetic distance between two sister Dist_neighbour_spec species Sum of the branch lengths (lengths are normalized PD so that the tree height equals 1). Variance of the phylogenetic distance between sister Var(Dist_neigh_spec) species. ∆
Clarke & Mean phylogenetic distance between two species.
<0.01 Warwick (1998)
Variance of the phylogenetic distance between Var(Dist_neighbour)
individuals belonging to sister species Mean phylogenetic distance between two
Chave et al.
Variance of the phylogenetic distance between
Variance of the phylogenetic distance between Var(Dist)
622 References: 624
Chave, J. Chust, G. & Thébaud, C. (2007). The importance of phylogenetic structure in biodiversity studies. In: Scaling Biodiversity (eds Storch, D., Marquet, P. & Brown, J.H.). Santa Fe Institute
Editions, pp 151-167. Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. J.
Appl. Ecol., 35, 523-531.
Clarke, K.R. & Warwick, R.M. (2001). A further biodiversity index applicable to species lists: variation
in taxonomic distinctness. Mar. Ecol. Progr. Ser., 216, 265-278. Colless, D.H. (1982). Phylogenetics: the theory and practice of phylogenetic systematics. Syst. Zool.,
31, 100-104. Faith, D.P. (1992). Conservation evaluation and phylogenetic diversity. Biol. Cons., 61, 1-10.
Magurran, A.E. (2003). Measuring biological diversity. Blackwell Science Ltd. Sackin, M.J. (1972). “Good” and “bad” phenograms. Syst. Zool., 21, 225-226.
Shao, K.T. & Sokal, R.R. (1990). Tree balance. Syst. Zool., 39, 266-276. Webb, C.O. (2000). Exploring the phylogenetic structure of ecological communities: an example for
rain forest trees. Am. Nat., 156, 145-155.
Appendix S1: Simulation algorithm. 640
Neutral theory and phylogenies In its original formulation, Hubbell’s model considers that speciation events are point-wise
mutations, that is, at each recruitment event in the regional pool, individuals have a small probability of belonging to an altogether new species. This is equivalent to the infinite-alleles
Moran model with mutation in population genetics, where backward mutations are not allowed (Ewens 2004). The infinite-alleles Moran model does not keep track of the
evolutionary relationships among alleles. In our model, for each speciating individual, we do keep track of the species identity from which it descends. This enables us to construct
evolutionary relationships among species.
Simulation algorithm We use a modified version of Etienne's (2005) algorithm to reconstruct the subtending
phylogenies of local communities from the knowledge of the neutral parameters θ and the scaled immigration rate I=m(J-1)/(1-m), where m is the immigration rate, and J is the local
community size. We start by computing the number of immigrating ancestors and recording their numbers of descendants. Individuals are drawn one by one. The jth individual has a
probability I/(j+I-1) of descending from a newly immigrating ancestor and a probability (j1)/(j+I-1) of descending from an already recorded ancestor. In the latter case, one of the j-1
already tagged individuals is selected at random, and its ancestor is the ancestor of the jth individual. Applying this algorithm, the immigration history for the local community of size J
can be reconstructed. Once the number A of ancestors is known, the forward-in-time algorithm of Stephens (2000) is used to build a dated phylogeny for the A ancestors. This
algorithm starts with two lineages of the same species, and every timestep Tk, where Tk-Tk-1 is an exponentially distributed time with rate parameter λk=k(k-1+θ)/2, and k is the number of
lineages, a lineage is chosen at random from the existing k, and it is split in two with probability p=(k-1)/(k-1+θ). If it is not split (probability 1-p), it is speciating into a lineage of
another species. This algorithm is repeated until there are A+1 lineages and the last event (a lineage split) is deleted so that there is only A lineages at the end (Stephens 2000). This
procedure ensures that this forward coalescent algorithm is equivalent to the backward coalescent: if the algorithm was stopped when there are A lineages, then the coalescent tree
would necessarily end by a lineage split whereas here, it ends either by a speciation event or by a lineage split.
672 674 676 678
References: Etienne, R.S. (2005). A new sampling formula for neutral biodiversity. Ecol. Lett., 8, 253260. Ewens, W.J. (2004). Mathematical Population Genetics. Springer, Berlin. Stephens, M. (2000). Times on trees and the age of an allele. Theor. Popul. Biol., 57, 109119.
Appendix S2: Improving the resolution of the phylogeny of Barro Colorado Island’s 680
tropical tree species. Phylogenetic trees are in newick format, with additional brackets  indicating remaining
polytomies, and $ symbols indicating the partial resolution of certain clades. A C++ code for randomly resolving this phylogeny, in respecting these partial resolutions is available upon
Ref: Potgieter & Albert 2001. NB: Stemmadenia placed next to Tabernaemontana because it belongs to the same tribe.
Ref: Rova et al. 2002.
Bremer & Manen 2000. Persson 2000.
NB: Macrocnemum placed next to Alseis because it belongs to the same tribe.
Ref: Hahn 2002.
Ref: Pirie et al. 2006.
778 Ref: Pell 2004. page 66. 780 Bombacaceae: 782
786 Ref: 788
-Baum et al. 2004. -Alverson et al. 1999.
790 Clusiaceae: 792
794 Ref: Gustafsson et al. 2002. 796 Euphorbiaceae: 798
Ref: Wurdack et al. 2005.
For Fabaceae: (Schizolobium_parahybum,(Prioria_copaifera,((Senna_dariensis,(((Enterolobium_schomburg
Ref: -Wojciechowski, Lavin & Sanderson 2004.
-Inga: Richardson et al. 2001.
Ref: Muellner et al. 2003.
Ref: Datwyler & Weiblen 2004.
834 836 838 840 842 844 846 848 850 852 854 856 858 860 862 864 866 868
Alverson, W.S., Whitlock, B.A., Nyffeler, R., Bayer, C. & Baum, D.A. (1999). Phylogeny of the core Malvales: evidence from ndhF sequence data. Am. J. Bot., 86, 1474-1486. Baum, D.A., Dewitt Smith, S., Yen, A., Alverson, W.S., Nyffeler, R., Whitlock, B.A. & Oldham, R.L. (2004). Phylogenetic relationships of Malvatheca (Bombacoideae and Malvoideae; Malvaceae sensu lato) as inferred from plastid DNA sequences. Am. J. Bot., 91, 1863-1871. Bremer, B. & Manen, J.-F. (2000). Phylogeny and classification of the subfamily Rubioideae (Rubiaceae). Plant Syst. Evol., 225, 43-72. Datwyler, S.L. & Weiblen, G.D. (2004). On the origin of the fig: phylogenetic relationship of Moraceae from ndhF sequences. Am. J. Bot., 91, 767-777. Gustafsson, M.H.G., Bittrich, V. & Stevens, P.F. (2002). Phylogeny of Clusiaceae based on rbcL sequences. Int. J. Plant. Sci., 163, 1045-1054. Hahn, W.J. (2002). A phylogenetic analysis of the Arecoid line of palms based on plastid DNA sequence data. Mol. Phyl. Evol., 23, 189-204. Muellner, A.N., Samuel, R., Johnson, S.A., Cheek, M., Pennington, T.D. & Chase, M.W. (2003). Molecular phylogenetics of Meliaceae (Sapindales) based on nuclear and plastid DNA sequences. Am. J. Bot., 90, 471-480. Pell, S.K. (2004). Molecular systematics of the cashew family (Anacardiaceae). PhD thesis. Louisiana State University. Persson, C. (2000). Phylogeny of the Neotropical Alibertia group (Rubiaceae), with emphasis on the genus Alibertia, inferred from ITS and 5S ribosomal DNA sequences. Am. J. Bot., 87, 1018-1028. Pirie, M.D., Chatrou, L.W., Mols, J.B., Erkens, R.H.J. & Oosterhof, J. (2006). ‘Andeancentred’ genera in the short-branch clade of Annonaceae: testing biogeographical hypotheses using phylogeny reconstruction and molecular dating. J. Biogeog., 33, 31-46. Potgieter, K. & Albert, V.A. (2001). Phylogenetic relationships within Apocynaceae s.l. based on trnL Intron and trnL-F Spacer and propagule characters. Ann. Miss. Bot. Gard., 88, 523549. Richardson, J.E., Pennington, R.T., Pennington, T.D. & Hollingsworth, P.M. (2001). Rapid diversification of a species-rich genus of neotropical rain forest trees. Science, 293, 22422245. Rova, J.H.E., Delprete, P.G., Andersson, L. & Albert, V.A. (2002). A trnL-F cpDNA sequence study of the Condamineeae-Rondeletieae-Sipaneeae complex with implications on the phylogeny of the Rubiaceae. Am. J. Bot., 89, 145-159. Wojciechowski, M.F., Lavin, M. & Sanderson, M.J. (2004). A phylogeny of legumes (Leguminosae) based on analysis of the plastid matK gene resolves many well-supported subclades within the family. Am. J. Bot., 91, 1846-1862.
Wurdack, K.J., Hoffmann, P. & Chase, M.W. (2005). Molecular phylogenetic analysis of uniovulate Euphorbiaceae (Euphorbiaceae sensu stricto) using plastid rbcL and trnL-F DNA sequences. Am. J. Bot., 92, 1397-1420.
Appendix S3: Compatibility with neutral theory of the imbalance levels of 2,000 published phylogenies.
876 Treebase extraction and phylogenetic trees preprocessing: 878
We extracted 2,000 published phylogenies from Treebase (http://www.treebase.org/) (Accession numbers 705 to 2704, the 704 first trees in Treebase are not numbered in the same
way which renders their automatic retrieval less easy) using the R package apTreeshape (Bortolussi et al. 2006). Following the method of Blum and François (2006), we
automatically removed putative outgroups used to reconstruct these phylogenies. This was done by detecting subtrees descending directly from the root and having only one or two
species (R scripts available upon request). If the trees contained polytomies, they were randomly resolved using the routine “multi2di” of the R package APE (Paradis et al. 2004).
Out of these 2,000 trees, 1,660 contained more than 5 species, and these were used to compute the statistic B1.
888 Neutral simulations: 890
For each observed value S of the species richness in the 1,660 trees, we simulated neutral trees of the same richness produced under Hubbell’s model with various θ values (C++ code
available upon request). As θ decreases, the regional pool size necessary to produce a given species richness increases. To avoid computer memory saturation, we fixed 1,000,000 as a
limit to this size, and stopped simulations when we reached this limit. Specifically, we started by simulating 30 phylogenies with θ equal to 1,000. We then repeated this simulation step
after dividing θ by 2, until the 1,000,000 limit was reached. For each of these simulated neutral trees, we computed the B1 statistic. For each richness value, all these neutral simulated
values of B1 formed a range consistent with the neutral assumption.
References: Blum, M.G.B. & François, O. (2006). Which random processes describe the tree of life? A
large-scale study of phylogenetic tree imbalance. Syst. Biol., 55, 685-691. Bortolussi, N., Durand, E., Blum, M.G.B. & François, O. (2006). apTreeshape: statistical
analysis of phylogenetic tree shape. Bioinformatics, 22, 363-364. Paradis, E., Claude, J. & Strimmer, K. (2004). APE: Analyses of Phylogenetics and Evolution
in R language. Bioinformatics, 20, 289-290.
Appendix S4: Computation of the regional pool sizes for the four tropical tree plots. 912 This computation is based on the comparison with Latimer et al. (2005)’s results. They 914
consider the regional pool for fynbos to be the Cape Floristic Region which extends over 50000 km². They further report a density of 0.1, 0.25, 4 and 8 individuals per m² for trees,
large shrubs, shrubs and shrublets respectively. Thus, if one assumes that trees, large shrubs, shrubs and shrublets occupy one quarter of the area each, this leads to an average density of
2.6 individuals per m², and eventually to a regional pool size of Jfynbos = 1.3*1011 individuals.
According to Latimer et al. (2005), speciation rates should be larger in the fynbos than in tropical trees. This implies that the regional pool sizes for tropical trees observed in a plot
should be larger than (θplot / θfynbos) * Jfynbos. We used the value reported in the text of Latimer et al. (2005) of 697 for θfynbos.To convert, these sizes in number of individuals, we used a
density for tropical trees of 500 individuals per ha. Plot
Minimal Pool Size ( * 1000 km²)
For comparison, Fine & Ree (2006) report a value of 9220000 km² for the Neotropics, and of 5903000 km² for the Asian Tropics. This means that regional pools for tropical trees extend
over continental scales.