High Probabilities in Biology Do Not (As a Rule) Explain Better Hayley Clatterbuck University of Rochester Department of Philosophy [email protected]

1. Introduction Over the past decade, philosophers of science have raised several purported scandals for materialist Darwinism, all of which share a common structure. First, it is claimed that the theory confers a very low probability on an outcome of interest, such as the origin of life or the evolution of intelligence. Then, it is argued that a good scientific explanation of an outcome should show that it was highly probable1 (White 2007). From this, it is concluded that there is something lacking in the Darwinian explanation, either that it is flat out false or needs to be supplemented (Nagel 2012). Within biological practice too, the fact that a theory - such as group selection, the neutral theory of evolution, or Wright's shifting balance theory - makes observed outcomes highly contingent has sometimes been taken as grounds for rejecting those theories in favor of more deterministic, and thus purportedly more explanatory, competitors. These challenges to and within Darwinism rest on the claim that high probabilities explain better2 in such a way that constitutes grounds for preferring theories that confer high probabilities on observed outcomes. However intuitive, this claim seems at odds

1

The threshold for improbability is often left unspecified. A good explanation of an event like the origin of life should show that it “was not vanishingly improbable but a significant likelihood given the laws of nature and the composition of the universe” (Nagel 2012, 33), that it “should be seen to be quite likely, or at least not very surprising” (White 2007, 453). 2 I take this locution from Strevens (2000).

1

with typical explanatory standards within evolutionary biology. Rather than despair at the contingency of biological explanations, some theorists have argued that the acceptance of low probability is part and parcel of Darwinism itself (Gould 1989; Beatty 2006). In order to evaluate the complaint against Darwinism, and to understand the nature of evolutionary explanations, it will be necessary to evaluate explanatory “elitism”, the claim that outcomes which are shown to have a high probability by an explanation are explained better than those that are shown to have a low probability3. My focus will be on the claim that high probability explanations have an advantage which constitutes a reason to favor or accept the theories that provide those explanations4. Elitism and egalitarianism straightforwardly fall out of various account of explanation5, and therefore, one could argue for or against elitism by arguing for a particular account. While this is an important project, we might also hope to evaluate the merits of explanatory elitism independently; its plausibility, and the plausibility of the verdicts it renders about the quality of explanations delivered by various scientific theories, might serve as a crucial datum for arbitrating among accounts of explanation. Here, I will focus on one such direct argument for elitism from Strevens (2000, 2008). Strevens claims that unless elitism is true, we cannot make sense of the explanatory success and acceptance of statistical mechanics (SM), “perhaps the best

3

The “elitist” versus “egalitarian” terminology for describing the explanatory role of probabilities is due to Strevens (2000, 2008). Strevens (2008) distinguishes between elitism about the absolute probability conferred on the explanandum by the explanans (the “size debate”) and about the amount of difference the explanans makes to the probability of the explanandum (“the change debate”). Here, I will primarily be concerned with the size debate. 4 I should note here that White and Nagel endorse a fairly limited elitism, according to which only special, interesting, or “marvelous” outcomes, like the origin of life or intelligence, demand high probability explanations (White 2007, 467). For a somewhat different defense of elitism, see White (2005). 5 See Strevens (2008, 347-355) for a summary of the implications of various accounts of explanation for the elitism debates. Strevens’s defense of elitism takes place within the context of his kairetic account of explanation, the details of which I can not do justice here.

2

developed, most deeply probabilistic science we have” (Strevens 2000, 371). He argues that SM had an explanatory advantage and was thus favored over alternative theories in virtue of the high probabilities that it assigned to individual outcomes. Further, this is the best defense that one could give of elitism, as “the most important source of evidence concerning our explanatory practice is the sum total of the explanations regarded as scientifically adequate in their day, together with an understanding of the background against which they seemed adequate” (Strevens 2008, 37). This type of argument suggests a background assumption about the relationship between the explanations provided by a theory and the reasons for its acceptance. I will call this the Parity Principle, according to which the particular outcomes that a theory best explains are those that most favor the theory over its alternatives. We can interpret this principle in various ways and in various degrees of strength. A Parity Principle might state that an outcome favors a theory only if the theory gives a good explanation of it, or we might take it to assert a weaker ceteris paribus claim. By “favoring”, we might mean something normative or descriptive; the outcomes that a theory best explains are those that give us good reasons for favoring that theory over alternatives or that as a matter of historical fact lead to its acceptance6. We also might focus on different senses of “favoring”; the outcomes that a theory best explains are those that most confirm the theory, make it most probable, or best enhance its explanatory power (if this is a distinct virtue from the former two). Here, I will remain relatively neutral regarding the correct notion of favoring and what its

6

Strevens states that his account of explanation is purely descriptive (2008, 37). However, see Hartmann and Schupbach (2010) for a discussion of the normative components of his work.

3

relation to explanation is, though I will occasionally adopt a generally Bayesian approach7. Whatever its precise formulation, I will argue that an investigation of explanation in biological practice shows that the Parity Principle is incompatible with explanatory elitism. More specifically, I will evaluate and reject each of the following: 

T well explains O such that this provides pro tanto ground for favoring T only if T confers a high probability on O.



If theory T1 confers a higher probability on O than does T2, then T1 explains O better than does T2, such that this provides pro tanto ground for favoring T1 to T28.



If T confers a higher probability on O1 than O2, then T explains O1 better than O2, such that O1 provides better pro tanto ground for favoring T over alternatives than does O2.

I will argue that in many cases in the history of biological practice, the particular observations that played a crucial role in favoring probabilistic biological theories were ones that those theories showed to have a low probability of occurring. Therefore, if the Parity Principle is right, then it is false that theories only or best explain those observations that they show to be highly probable. On the other hand, if explanatory elitism is true, then there is something wrong with the Parity Principle. This result is enough to head off the particular argument from scientific practice that Strevens (2000) uses to support the claim that high probabilities explain better. However, I will suggest two stronger conclusions that we ought to draw from this conflict. First, if we must reject at least one of Parity or elitism, elitism should be the first to go. Second, in the cases I will examine, the Parity Principle is not correct as a merely

7

For a recent discussion of the relationships among explanation, Bayesian confirmation, and theory acceptance, see Cabrera (2015). 8 In subsequent discussion, contrastive interpretations will often be ignored as they do not play a significant role in Strevens’s defense of elitism.

4

descriptive claim. Rather, in each case, a low probability outcome provided a good reason to favor the theory in question. In the next section, I will reconstruct Strevens’s argument for elitism with respect to SM and possible responses on behalf of the egalitarian in the biological domain. Then, I will construct a plausible elitist interpretation of probabilistic explanations in biology. To arbitrate this debate requires a brief foray into the question of what, exactly, the explananda of probabilistic biological explanations are, which I will undertake in Section 3. In Sections 4-6, I will focus on several episodes in the history of the acceptance of Mendelism which show that the elitist Parity Principle gives a historically (and normatively) inaccurate characterization of explanation in biology. I do not purport to give a thorough history of the theory’s acceptance; instead I will handpick a (representative, I hope) sample of cases in which outcomes that Mendel’s theory showed to have a low probability were some of the most important observations leading to the acceptance of the theory, in direct contravention of the elitist Parity Principle.

2. Elitism in Statistical Mechanics and Biology Since the modern synthesis, evolutionary biology is rife with probabilistic claims. Mendelian genetics assigns probabilities to various offspring genotypes and phenotypes given those of its parents. Population genetics uses probabilistic models to specify the change in the distribution of alleles in a population given facts about population size, selection coefficients, rates of inbreeding, and other variables. Mutation, the ultimate source of the variation on which evolution acts, is a chance process, and the probabilities of various mutations have been used as a molecular clock to make inferences about the

5

past. Present distributions of traits and species are considered to be the result of biological and geographic processes that are contingent upon historical initial conditions, a fact that has been used to infer patterns of ancestry. Though these manifestations of probability have interesting differences with one another, it is undeniable that increased probabilification has been an explanatory goal of modern biological science. With respect to the question of elitism then, we might ask whether the success of such probabilifications derives from the goal of constructing biological explanations that confer high probabilities on observed outcomes. A brief and naïve look at several common types of biological explanation casts doubt on the claim that this is so, making elitism prima facie implausible with respect to the biological domain. In later sections, I will examine several episodes in the historical acceptance of Mendel’s probabilistic theory of inheritance, in particular9. Before delving into the details, however, it will be helpful to first examine Strevens’s argument for elitism in statistical mechanics and possible differences between that case and the biological ones under examination. In the late nineteenth century, physicists possessed a large body of observations regarding the flow of heat – hot irons placed in cold water cool until an equilibrium temperature is reached; when a partition separating a gas from a vacuum is removed, the gas fills the entire chamber, etc. – and a mechanical theory of molecular movement that explained these observations. However, according to Strevens, this mechanical theory was not accepted until it met its statistical counterpart, which probabilified initial conditions and resulting outcomes of thermodynamic processes.

9

For examinations of the epistemology of contingent explanations, especially in historical biological sciences, see Currie and Turner (2016) and the references cited therein.

6

The probabilities that SM assigned to our previously observed outcomes were exceedingly high: a hot iron placed in water has an enormously high probability of cooling, a gas will almost certainly evenly fill an open chamber, etc. According to Strevens, these high probabilities were the special novel contribution of the statistical part of SM (2000, 373); had SM not shown known outcomes to be highly probable, it would not have added anything at all to the molecular theory. The argument can be reconstructed as follows (Strevens 2000, 374). For some outcome e explained by SM, such as the cooling of a hot iron: 1. SM showed that e was highly probable. 2. Prior to SM, the molecular theory showed that e might possibly occur. 3. Parity Principle – SM was accepted because it improved upon the explanation of e provided by the molecular theory. 4. If egalitarianism is correct, then an explanation showing that an event e might possibly occur is just as good as an explanation showing that e is highly probable. 5. If egalitarianism is correct, then SM did not improve upon the explanation of e provided by the molecular theory. Conclusion 1: Therefore, egalitarianism cannot account for the acceptance of SM. Conclusion 2: Therefore, egalitarianism does not give a correct account of scientific explanation. Conclusion 3: Explanatory Elitism – It was in virtue of showing that e was highly probable that SM had an explanatory advantage and was accepted. For the sake of argument, I will grant premises (1)-(3). One possible egalitarian response is to deny the move from Conclusion 1 to Conclusion 2. That is, we might grant that Strevens is entirely correct about explanatory practice within physics but deny that his conclusions generalize to science more generally. This is compatible with a pluralistic or disunified account of scientific explanation, according to which what makes something a good explanation is different in physics than in biology.

7

While this might be the correct stance to take, I think an even stronger reply is possible. The diagnosis for why the move from Conclusion 1 to Conclusion 2 is faulty is that SM is not a probative test case for arbitrating between explanatory elitism and egalitarianism. In particular, it cannot distinguish between the following proposals for how SM improved upon the explanation provided by the molecular theory: (a) SM improved upon the explanation of e provided by the molecular theory by conferring a high probability on e. (b) SM improved upon the explanation of e provided by the molecular theory by conferring the correct probability on e. It is true that all of the actual observed outcomes that SM explained and which led to its acceptance were ones that were shown to be highly probable by SM. However, the objective chances of the observed outcomes falling under its purview are so high that the chance of the opposite outcome occurring is “so unlikely one would not expect it to occur even once in the lifetime of the universe” (2000, 372). Thus, historical practice does not afford us with any actual observed outcomes that were improbable by SM’s lights in order to distinguish between (a) and (b) as an explanation of the Parity Principle. This alternative explanation suggests another response on behalf of the egalitarian. They ought to deny Strevens’s premise (4), that if egalitarianism is correct, then an explanation showing that an event e might possibly occur is just as good as an explanation showing that e is highly probable. There is a difference between a theory’s entailing that an event is “merely possible” and entailing that the event will occur with some specific, perhaps low, probability. On all prominent egalitarian theories, an explanation would be lacking if it entailed merely that e could possibly happen while remaining silent about what the extent of this “possibility” is.

8

On this suggestion, the special achievement of probabilistic theories is in turning vague, mere possibility into precise predictions of probability. Hence, theories like population genetics and Mendelian genetics were explanatory improvements over their predecessors because they assigned precise probabilities to evolutionary outcomes that could then be verified by further observation. The plausibility of this as a historical account of the explanatory contribution and acceptance of Mendelism is illustrated by a series of passages from Bateson, one of its earliest and most ardent defenders. In his Materials for the Study of Variation, written in 1894 before the rediscovery of Mendel’s work, Bateson argues that without an account of the origin of variation, Darwin’s theory is at risk of becoming explanatorily inert: In these discussions we are continually stopped by such phrases as “if such and such a variation then took place and was favorable” or “we may easily suppose circumstances in which such and such a variation if it occurred might be beneficial,” and the like. The whole argument is based on such assumptions as these—assumptions which, were they found in the arguments of Paley or of Butler, we could not too scornfully ridicule… If we had before us the facts of Variation, there would be a body of evidence to which in these matters of doubt we could appeal. We should no longer say “if Variation take place in such a way,” or “if such a variation were possible;” we should on the contrary be able to say “since Variation does, or at least may take place in such a way,” “since such and such a Variation is possible,” and we should be expected to quote a case or cases of such occurrence as an observed fact (Bateson 1894, v-vi) Crucially, Bateson took this challenge for Darwin’s theory to be adequately resolved by Mendel’s theory. In his Mendel’s Principles of Heredity, written in 1909 after its rediscovery, he effuses that from the general law which Mendel’s “long lost papers provided we have reached a point from which classes of phenomena hitherto proverbial for their seeming irregularity can be recognized as parts of a consistent whole” (Bateson 1909, v). As a result, “a new world of intricate order previous undreamt of is

9

disclosed. We are thus endowed with an instrument of peculiar range and precision” (ibid., 17). Notably, in his thorough explication and defense of the theory, Bateson does not argue that Mendelism adds to the explanatory value of Darwinism by showing that the traits we see today were highly probable. Indeed, he often argues that it will be the rarer Mendelian events, such as the emergence of recessive phenotypes when heterozygotes in a large population meet, that are the most evolutionarily important. Mendelism did not make these “capricious” events more probable but rather explained their occurrence by subsuming disparate phenomena under a single probabilistic regularity which then entails the correct probability of the explananda under its purview. This is consistent with an egalitarian interpretation of the Parity Principle; the particular outcomes to which a theory confers the correct (and precise) probability are those that most favor the theory over alternatives. The plausibility of this interpretation depends on two claims. First, that one better explains O by conferring on O a low probability than by showing that it is merely possible, and second, that conferring the correct probability – not necessarily a high one – is the important explanatory advantage that leads to the acceptance of such theories. The elitist may respond that while probabilifying outcomes is indeed explanatorily important, it was only those outcomes to which the theories conferred high probabilities that were important for the favoring of those theories. In this case, Mendelism also showed that some outcomes had a low probability of occurring, but these were merely along for the ride, unimportant explanatorily and for the acceptance of the theory.

10

This is closer to the picture of probabilistic explanation provided by Strevens (2008). On his account, the optimal explanation will achieve the best balance of accuracy, cohesion, and generality. He stipulates that the accuracy of a probabilistic explanation for some event, e, is equal to the probability of e given the explanans (Strevens 2008, 365). Thus, if accuracy were the only explanatory virtue, this would entail a stringent elitism, on which higher probabilities always explain better10. However, an explanation also improves to the extent that it covers a more general class of causally similar (cohesive) events. Take Strevens’s example of a wheel of fortune with 90 red wedges and 10 black, and the event e of the wheel’s landing on red. We could give a deterministic explanation of e, citing the force of the wheel’s spin, its paint distribution, etc. which would entail with probability 1 that the wheel would land on one of its red wedges11. This explanation would include irrelevant details about this particular wheel and spin, whereas a probabilistic explanation – citing the causal properties of wheels of fortune and the fact that this wheel of fortune has 90% red wedges – would still be highly accurate but more general and cohesive. In part, this generality is purchased by subsuming events of type e (red outcomes) under a more general class of events (outcomes of a wheel of fortune), where this class includes low probability events (black outcomes). These latter events always act as a drag on the explanation’s overall accuracy, and as low probability events, they are not well-

10

It would also seem to recommend that deterministic explanations, when possible, are always better than probabilistic ones. Strevens wants to make room for the possibility that the best explanations, even for deterministic systems, are often probabilistic. 11 We could also give a deterministic explanation for why it landed on this red wedge in particular.

11

explained by the probabilistic regularity cited in the explanans12. According to Strevens, as the probability of e sinks (say, going from .9 red to .6 red), so does the quality of the probabilistic explanation, for “as the strike ratio (probability) for an outcome decreases, the region of initial conditions producing that outcome decreases, and the accuracy of the basing generalization (probabilistic regularity) decreases along with it. At some point the basing generalization’s accuracy sinks so low as to begin to become a net explanatory liability” (Strevens 2008, 429, parentheticals added)13. On this view, then, including the low probability events adds to the generality of the explanation, but only the high probability events add to its accuracy and are well explained by it. Hence, for the elitist, the Parity Principle gets the right result if it is the case that among the events a theory probabilifies, it is the high probability events that contributed the most to favoring the theory over its alternatives. I will argue that this picture is wrong, both historically and normatively. In biological practice, low probability events have sometimes been more important than high probability events in favoring theories, including Mendel’s, and indeed, they ought to have been. To illustrate how this is possible and to motivate the elitist’s argument to the contrary, it will be necessary to specify what it is, exactly, that probabilistic regularities explain.

12

This seems to have the odd consequence that adding a generality to the explanans that subsumes an alternative, non-e event, e* (that could have happened but did not) improves our explanation of e but at the same time, this regularity would not explain e*, or at least not explain it well, if it had occurred. 13 This suggests that the nature of the process being explained places an upper bound on the quality of probabilistic explanations. For wheels of fortune (or any chance process), the more that the probability distribution is evenly spread among various types of outcomes, the lower the accuracy and hence quality of probabilistic explanations of individual outcomes of that chance process. If this were the case, then perhaps certain kinds of biological phenomena are incapable, by their nature, of being well-explained probabilistically. However, this is at odds with biological practice and also seems to vitiate against some of Strevens’s own examples of successful probabilistic explanations in biology, such as drift in small populations (2008, 339-340).

12

3. What do probabilistic theories explain? In the previous section, I sketched a point on which the disagreement between the elitist and egalitarian turns; among the various outcomes that a theory probabilifies, are the high probability outcomes those that most favor the theory? Answering this question is somewhat tricky, as there is an ambiguity in the literature regarding the proper explananda of probabilistic explanations; do probabilistic generalizations explain probabilities of outcomes or outcomes themselves, and if the latter, which ones? Most accounts of explanation agree that the explanans contains a regularity or law along with a statement of initial conditions. What, then, can these explain? Consider several candidates for the explanandum, here illustrated by a Mendelian explanation of why Anna, the daughter of two brown-eyed parents, has blue eyes: 1. Probabilistic regularity and statement of initial conditions. Mendel’s Law of Segregation: each parent has two alleles for trait T and these alleles randomly segregate during meiosis (such that each gamete has a .5 probability of containing each of the alleles). Initial conditions: the parental genotypes are Bb x Bb. 2. Probabilities of outcomes covered by (1). Offspring of these parents will have genotypes with the following probabilities: Pr(BB) = .25 Pr(Bb) = .5 Pr(bb) = .25 3. Frequencies of outcomes that are assigned probabilities in (2). The expected frequency of genotypes in offspring of these parents is: Freq(BB) = ¼ Freq(Bb) = ½ Freq(bb) = ¼ 4. Single-case outcomes. An offspring of these parents, Anna, has genotype bb. Most accounts agree that the regularity in stage one explains the probabilities in stage two. On some accounts of explanation, this is where probabilistic explanation ends. For example, Papineau (1985) argues that a theory’s conferring a (high or accurate) probability on an explanandum outcome is neither necessary nor sufficient for explaining

13

that outcome, and thus, the probability itself that is the only thing that is guaranteed to be explained14. According to Woodward and Hitchcock’s (2003a,b) influential invariance account, probabilities of events might be the only explananda of probabilistic explanations, though the authors express ambivalence about that conclusion15. Strevens argues that such views make probabilistic theories explanatorily inert. SM was favored because it explained actual outcomes, not merely their probabilities. On Strevens’s proposal, probabilistic generalizations explain individual outcomes by way of explaining their probabilities; first, a generalization explains the probability of an outcome, and second, this probability explains the outcome itself16. Which outcomes, long-run frequencies or single-case outcomes, does Strevens have in mind? He argues that there is no real difference between these two types of explananda or what it takes to explain them well: The single red outcome can be treated as a kind of degenerate frequency, that is, as a frequency of 100% red outcomes from a ‘series’ of trials consisting of just one spin… There is an upper limit on the accuracy of such an explanatory model. Accuracy is, you will recall, proportional to the probability ascribed to the frequency. When the ‘frequency’ is the occurrence of a single outcome, accuracy is therefore proportional to the probability ascribed to the outcome itself (Strevens 2008, 425). Statistical mechanics explains the non-probabilistic fact that heat never flows from cold to hot in exactly the same way that it explains the particular event of, say, a hot iron cooling in cold water (Strevens 2000, 375).

14

For Papineau, a factor f explains an outcome e if and only if f is an actual cause of e. However, a probabilistic regularity and initial conditions cited in the explanans can both be true and raise the probability of an outcome without explaining it. In his example, even if it is true that Joe is a smoker, that smoking raises the probability of cancer, and that Joe develops lung cancer, smoking might not have been the cause of Joe’s lung cancer and would thus not explain that outcome. The increased probability is evidence of a real causal-explanatory relation but does not suffice for it. According to Papineau, the explanans only explains why Joe had a higher probability of getting cancer and not why he actually did. 15 “Discretion being the better part of valor, we will remain neutral on the issue of whether it is [the outcome] per se, or merely the probability of [the outcome] that is explained on any particular occasion” (fn 2). 16 For a similar view, see Emery (2015).

14

In the remaining sections, I will argue that several versions of the elitist Parity Principle face serious problems with respect to both frequencies and single-case outcomes.

4. How low probability frequencies may favor theories With respect to observed frequencies, the elitist Parity Principle predicts that, historically, we should see that frequencies to which a high probability was assigned were those that most contributed to the acceptance of scientific theories. Strevens himself (2008) acknowledges a problem with this view: [One] situation in which a microconstant model may explain a low-probability frequency is a case in which the frequency matches the strike ratio but the explanandum puts very narrow bounds on the fact about the frequency that is to be explained. Suppose, for example, that you want to explain why the frequency of red in 500 spins on the wheel of fortune was between 49% and 51%. (The probability of such an event is about .38)… I conclude, very provisionally, that low complex probabilities may occasionally explain (Strevens 2008, 415). In general, the observed frequency that is most probable and that would typically seem to best confirm a theory is the one that has a maximal likelihood, that is, the one in which the observed frequencies of outcomes match their probabilities as closely as possible17. The problem is that for any probabilistic hypothesis that predicts heterogeneous frequencies, as the sample size increases, the probability of the maximum likelihood frequency decreases18.

17

This will often fail to be the case if evidential favoring is measured by the Law of Likelihood. For example, if one is comparing the hypothesis (H1) that the probability of red is .2 against the hypothesis (H 2) that the probability of red is .9, then observing 15% red will have a lower likelihood than observing 20% red conditional on H1, but the latter will be stronger evidence favoring H1 over H2. This suggests an alternative way to argue against the elitist Parity Principle, but it is not one I can explore here. 18 Notice that in Strevens’s favored example of Statistical Mechanics, given the extremely high probability of each individual event it predicts, the conjunction of those individual events is still extremely probable.

15

While Strevens notes this possible exception to his elitist account, I think he far underestimates its importance. It is not a mere accident that sometimes low probability frequencies explain. In fact, given that the probability of the maximum likelihood frequency decreases with increasing sample sizes and that increasing sample sizes should constitute stronger evidence for or against a theory, the absolute probability of a frequency and its relevance for favoring or accepting a theory will often be inversely related. For example, suppose you are trying to test the hypothesis that the wheel’s probability of landing on red is .9. Observing 450 red out of 500 spins should provide you much greater reasons to favor or accept that theory than observing 9 red out of 10 spins, despite the fact that it is much less probable (.06 vs. .39). Here, an increase in confirmational power corresponds to a decrease in the absolute probability of the explanandum given the explanans19. As Strevens notes, one way to increase the probability is to make the explanandum less precise; for example, perhaps the fact to be explained is that the frequency fell between .85 and .95 red. By logically weakening the statement of the evidence, this throws out relevant information, violating the principle of total evidence. This move also negates the ability of the evidence to distinguish between different probabilistic hypotheses that both predict that the observed frequency will fall within that range (for example, if we wanted to use the observed frequency to evaluate whether the roulette wheel had a probability of .9 versus .92 of landing red). Each of these points is

19

Sober (2008) makes a similar argument against what he calls “Probabilistic Modus Tollens” and in favor of the Law of Likelihood.

16

illustrated by episodes in the history of Mendelism; I will consider the first here and the second in the next section. Mendel’s work was groundbreaking not only in his development of a novel probabilistic model of inheritance but in the extensive data he provided in its support. In his experiments with pea plants, Mendel recorded long-run trait frequencies extremely close to those that would be maximally likely given his laws of segregration. For example, for the F2 results of crosses pure lines for yellow (GG) and green (gg) seeds – which his model predicts will result in a .75/.25 probability of yellow to green seeds since yellow is dominant – Mendel reported an observed frequency of 6022:2001. This remarkably close fit for such a large sample size was highly influential in persuading scientists of the truth of his theory. However, such close fits throughout Mendel’s data also cast doubt on his methods, leading Fisher to suspect that the numbers had somehow been cooked. The resulting “Mendel-Fisher controversy” has persisted to the present day (for a summary, see Franklin 2008).The crux of the allegation is that such observed frequencies are highly improbable and thus cannot be explained by Mendel’s theory alone20. Of relevance here is a prominent defense of Mendel from Sturtevant, in his (1965) A History of Genetics, who argues that this elitist line of argumentation leads to absurdity: If I report that I tossed 1000 coins and got exactly 500 heads and 500 tails, a statistician will raise his eyebrows, though this is the most probable exactly specified result. If I report 480 heads and 520 tails, the statistician will say that this is about what one would expect—though this result is less probable than the 500:500 one. He will arrive at this by adding the probabilities for all results 20

It is possible to interpret the allegation as a likelihood argument. The probability of getting such a close match is much higher given that there was scientific impropriety than if the data were collected in a process with normal amounts of error and statistical variation, and therefore, the close fit is evidence for the former.

17

between 480:520 and 520:480, whereas for the exact agreement he will consider only the probability of 500:500 itself. If now I report that I tossed 1000 coins ten times, and got 500:500 every time, our statistician will surely conclude that I am lying, though this is the most probable result thus exactly specified. The argument comes perilously close to saying that no such experiment can be carried out, since every single exactly specified result has a vanishingly small probability of occurring (13). Conjoined with either of the first two versions of the Parity Principle, this has the consequence that “single exactly specified results” should not be important in favoring a theory. Worse, larger sample sizes and logically stronger statements of the explanandum, which are commonly taken to improve its confirmational value, reliably correspond to decreases in its absolute probability given the explanans. Thus, Strevens’s reserved endorsement that “low complex probabilities may occasionally explain” seems far too cautious (Strevens 2008, 415). While this point undermines the first two formulations of the Parity Principle, the elitist may retreat to the third formulation of the Parity Principle, according to which: 

If theory T1 confers a higher probability on O than does T2, then T1 explains O better than does T2, such that this provides pro tanto ground for favoring T1 to T2.

On this view, it is not the absolute value of the probability that matters. What matters for the acceptance of an explanation is whether it makes the outcome more probable than its competitors do. However, even this seemingly sensible position encounters difficulties when we turn from long-run frequencies as explananda to single-case outcomes. This is relevant for two reasons. First, it undermines Strevens’s view that the same explanatory principles hold for long-run frequencies as for single case outcomes. Second, it undermines those challenges to Darwinism that claim that it is explanatorily deficient in virtue of its

18

assigning low probabilities to particular single-case outcomes, such as the origin of intelligent life.

5. How low probability single-case outcomes may favor theories According to the elitist Parity Principle that survived the previous section, if a theory confers a higher probability on an outcome than an alternative, it explains it better in a way that provides a reason to favor that theory. For now, I will grant that this is true with respect to long-run frequencies. Indeed, this has strong Bayesian bona fides; if we pay attention to our total evidence and our set of evidence is sufficiently large and probative, then the theory that makes the evidence most probable is most strongly confirmed by it. What happens, though, if we restrict our focus to a limited subset of the total evidence, such as a single-case outcome that is a part of a larger set of frequency data? Suppose that a set of such outcomes, O1-On, comprise the frequency data, F, that are used to evaluate a theory, T. Is it the case that among those outcomes, those Oi that are shown to be highly probable are those that best favor T over alternatives? Relative to a background containing no observations, a high probability event may “contribute” more to indirectly favoring T by providing evidence that the frequency will be one that is assigned a high probability. However, this reasoning does not generalize, for obvious reasons. For example, suppose you want to test H1: the probability of red = .9 against H2: the probability of red = .95. You have already spun the wheel 9 times with it landing red every time. Relative to this background, a black outcome would

19

confirm the hypothesis that the probability of red is .9 moreso than another red would21. Likewise, if Anna’s parents had 20 offspring, the first 19 of which had brown eyes, Anna’s having blue eyes could carry significant weight in favoring the hypothesis that eye color resulted from Mendelian processes. A closer look at the experimental results that were taken to be the key outcomes which favored Mendel’s theory shows that they were very often ones that were improbable by its lights. In an influential paper, Bateson cites seven phenomena for which Mendelism provides “simple and convincing explanations of many facts hitherto paradoxical” (Bateson 1902, 7). Notably, none of these phenomena are ones that Mendel’s theory showed to be highly probable. One phenomenon that puzzled breeders was the reemergence of “rogue” traits among even the best strains. For example, after many generations of selection for the “beardless” trait in wheat, these strains would occasionally give rise to bearded offspring. Bateson argues that the hypothesis “that such a ‘rogue’ is a recessive form may give a complete explanation of this phenomenon in many cases”, for if beardlessness is dominant, “the chances are that it will always produce a certain proportion of bearded plants” (Bateson 1902, 10). Mendel’s theory does not state that reoccurrences of rogues are likely in any particular generation. However, because it predicts that they will occur with some determinant low probability22, then relative to a background of relative constancy, the observation of a rare rogue strain may constitute strong confirmation of the theory.

21 22

This is a version of Royall’s (1997) urn example. I will introduce a complication for this example in the next section.

20

This feature is particularly important when comparing two “close” probabilistic hypotheses. Since the probabilities and long-run frequencies they predict are very similar to one another, the confirmational battle will often be fought on the margins, as is illustrated by the historical battle between Mendel’s theory of inheritance and its closest competitor, Galton’s biometric theory.

6. Low probability single-case outcomes favored Mendel’s theory over Galton’s Galton (1897) stated his Law of Ancestral Heredity, which was to become the main alternative to Mendelism in the early twentieth century, as follows: The two parents contribute between them on the average one-half, or (0.5) of the total heritage of the offspring; the four grandparents, one-quarter, or (0.5)2; the eight great-grandparents, one-eighth, or (0.5)3, and so on. Thus, the sum of the ancestral contributions is expressed by the series {(0.5)+(0.5)2+(0.5)3, &c.}, which, being equal to 1, accounts for the whole heritage. This statistical law predicts the offspring phenotype23 as a function of the phenotype of its ancestors. The causal theory of inheritance underlying Galton’s law differs from Mendel’s in a few key ways. For Galton, hereditary material consists of particulate elements in the germ plasm, half of which are contributed by the father and half from the mother; however, there is an indefinite number of such elements which give rise to a particular trait instead of just two on the Mendelian picture (Bulmer 1988, Leuridan 2007). Some of these elements are expressed in the phenotype (patent elements) while others remain unexpressed (latent elements).

23

Galton believed that his law applied to both the total value of the character and its deviation from the mean population value. However, Provine (1971) points out that the regression formulae for absolute phenotype values and their deviations from the mean are inconsistent. This inconsistency is explored more clearly in Bulmer (1998).

21

There are two central differences between the two theories. With respect to each, both theories nevertheless predict extremely similar long-run trait frequencies. Despite the “numerous examples where the arithmetical results predictable by either system are nearly or quite the same” which raised serious obstacles for experimentally distinguishing between them, Bateson nevertheless argues that “further breeding would of course reveal that even in these cases the applicability of the Galtonian method was only superficial” (1909, 56). What, then, were the key predictive differences that were exploited by Mendelians? The key was in noticing that the two theories make different predictions regarding higher-order features of those frequencies. In each case, single-case outcomes were relevant to favoring of the theory, and in each, the theory that assigned the right probability, not the highest, was favored. First, though patency is similar to Mendelian dominance, the latter is fixed for each trait24, while Galton assumes that within a generation, each element “coding” for a trait has an equal chance, p, of being patent. Galton’s theory also assumes that the probability of patency, p, is the same for all traits, whereas on Mendel’s theory, the value of p will vary due to dominance. Bulmer (1999, 581) notes that Galton’s predicted ancestral correlation (.5)ip is equivalent to the Mendelian correlation if p is replaced by the narrow heritability, h2. For quantitative traits (which the phenotype character is a function of many independent alleles) or for population-level frequencies which average over many traits, h2 will average out to give an estimate roughly similar to Galton’s p.

24

Here, I am omitting cases of incomplete dominance, epistasis, and quantitative traits, all of which posed major challenges for testing Mendelism against the biometric theory. See Bateson (1902) for a treatment of such problem cases and an early attempt to accommodate them within a Mendelian framework.

22

Bateson and other early Mendelians argued that one of the key facts to be explained was the discontinuous and varied nature of inheritance. Any practical breeder knows that “while he can rapidly fix some characters, some never come true at all, and others will not come true with any certainty after long selection; the expectation after simple selection is, in fact, quite different for different characters” (1902, 38). For example, selection for heterozygotes never produces a pure strain, regardless of how many heterozygous ancestors a strain has; on Galton’s theory, the number of ancestors with a trait will always increase the probability that the offspring will have that trait. Similarly, rogue strains sometimes reemerge when selection on the dominant trait has failed to completely weed out the recessive character. Galton’s theory also predicts will result with some low probability. However, Galton’s theory predicts not only that this is possible, but that it possible for any strain that resulted from mixed ancestry; if there is a bearded ancestor anywhere down the line, there is a non-zero probability of a bearded offspring. In contrast, Mendel’s theory predicts that pure strains (in which rogues will not crop up) are possible; if the founding cross was between two homozygous dominants, recessive traits among ancestors will not raise the probability of recessives reemerging to above 025. The Mendelians’ claim is that their theory better explains the emergence of rogues when they occur despite the fact that Galton’s theory assigned a higher probability to that outcome. Simply put, Galton’s theory predicts too much homogeneity in the patterns of inheritance. On the other hand, Mendel’s predicts heterogeneity, albeit of a regular sort.

25

I am ignoring the possibility of mutation here. One complication is that the presence of ancestors with the recessive trait may raise the probability that the recessive trait will reemerge again through mutation, for perhaps they provide evidence that the recessive trait is easily attainable by mutation.

23

By assigning the same probability across all traits and lineages, Galton’s theory assigns both a too-high and too-low probability to certain outcomes. A second predictive difference arises from the fact that for Galton, the probability that a trait will be patent in an offspring given that it was patent in an ancestor i generations back26 is (.5)ip. Because every ancestor makes a diminishing but non-zero probabilistic contribution to the offspring phenotype, the traits of an individual and its ancestors are always probabilistically dependent, and there is no fact about the parents that screens off prior generations27 (Leuridan 2007). In contrast, Mendelian genotype evolution is a Markov process, in which the genotype of the parents screens off facts about the genotypes of past generations. Two problems arise for distinguishing between the two theories here. First, unlike genotype evolution, Mendelian phenotype evolution is not a Markov process (due to dominance, epistasis, etc.), and the phenotype is all that could be directly observed. Second, the probabilistic dependencies between generations that Galton’s theory predicts will be largely accurate when averaging over averaging over different possible ancestral histories, and we typically do not have information about the actual pattern of ancestry very far into the past. However, early Mendelians exploited the strong dependence of offspring on ancestral traits predicted by Galton’s theory to test it. On that theory, the probability that an individual has a trait is dependent on the number of ancestors with that trait; for example, the probability that Anna would have blue eyes always increases as the number

26

Where i=1 for a parent, i=2 for a grandparent, etc. This is only true for individuals with mixed ancestry. If a line is pure, such that no ancestor had an alternative trait, then the parental generation will screen off prior generations (trivially). 27

24

of her ancestors with blue eyes increases. Bateson demonstrates that this dependence does not always hold by drawing on data from horse breeders: For instance, in the breeding of thoroughbred race-horses the heredity of chestnut colour is that of an ordinary recessive, though the various colours, bay, brown, and chestnut have been indiscriminately united together in the breed. No difference is manifested between colour-inheritance of chestnuts which have had many chestnut ancestors in recent generations, and those that have no chestnut progenitor in the nearer degrees” (1909, 49). Once more, Bateson adverts to a higher-level feature of the statistical data – that the probability of offspring traits given parental traits in this case is independent of (or at least not as strongly dependent on) prior ancestral traits – which suggests that genotype inheritance is a Markov process as entailed by Mendel’s theory.

7. Some Elitist Responses and Conclusion In the last few sections, I have surveyed some of the observations that led to the acceptance of Mendel’s theory. These fell into three main classes: Mendel’s long-run frequency data that closely matched the probabilities predicted by the theory; regular heterogeneity in patterns of inheritance across traits; and patterns of probabilistic (in)dependence across generations indicative of Markovian genotype inheritance. In none of these cases was the absolute probability of any outcome used as an argument in favor of Mendel’s explanatory advantage. This accords with biological practice more generally, with its toleration and even embrace of low probability explanations. The elitist who wants to appeal to scientific practice to support his view must account for the fact that low probability explanations in biology have been regarded as scientifically adequate. Indeed, many currently accepted

25

probabilistic theories in biology have supplanted ones which made various outcomes more expected (for example, mixed and neutral theories versus naïve adaptationism). My analysis is that these theories really were explanatorily superior to their predecessors and competitors. Unlike Strevens’s favored case of statistical mechanics, biology is a messy and heterogeneous affair. The best explanations in that domain will not be ones that show that observed phenomena were highly probable but why they were predictably and projectably irregular. The explanatory advantage of probabilification here is to turn mere, unlawful “possibility” into regular, law-like probability. The elitist here has several available responses: first, that these theories did not actually have an explanatory advantage over their competitors; second, that the important outcomes in favoring these theories were actually ones that were shown to have a high probability; third, that the Parity Principle is false and the observations that a theory best explains are not necessarily those that led to its acceptance. This second strategy seems to be a particularly promising avenue for the elitist. In particular, he might argue that what contingent biological theories predict and explain is not single-case outcomes at all. For example, Mendelism does not explain why Anna has blue eyes; it explains why her trait stands in a 1:3 expected ratio with her siblings. This view is aptly developed by Helgeson (2013) who argues that “Darwin’s common ancestry hypothesis has nothing to say about where any particular organism should be found on the globe, or what morphological characteristics an organism should display” but nevertheless that a theory that does not predict or explain single-case outcomes may “stick its neck out when it comes to certain abstract, ‘high level’ features of the same set of observations” (Helgeson 2013; 59, 37).

26

This view may very well be correct, and it offers a way out for the elitist who fancies the Parity Principle. Nevertheless, this is a form of elitism that does not give ammunition to the critics of materialist Darwinism who claim that it assigns low probabilities to certain single-case outcomes of interest and therefore cannot adequate explain them and that this is grounds for rejecting or supplementing the theory. Either high probabilities do not explain better in a way that provides grounds for theory acceptance (the elitist Parity Principle is false) or Darwinism and other contingent biological theories are just not in the business of explaining single-case outcomes, and assigning low probabilities to them does not inhibit their explanatory advantage with respect to the higher-order patterns that are their proper explananda.

References Bateson, W. (1894). Materials for the Study of Variation. Macmillan, London. ---- (1902). The facts of heredity in light of Mendel’s discovery. Reports to the Evolution Committee of the Royal Society, I, 125-160. ---- (1909). Mendel’s Principles of Heredity. University Press, Cambridge. Beatty, J. (2006). Replaying life's tape. The Journal of Philosophy 7, 336-362. Bulmer, M. (1998). Galton's law of ancestral heredity. Heredity 81, 579–585. Cabrera, F. (2015). Can there be a Bayesian explanationism? On the prospects of a productive partnership. Synthese, 1-28. Clatterbuck, H. (2015). Drift beyond Wright–Fisher. Synthese, 192(11), 3487-3507. Currie, A., & Turner, D. (2016). Introduction: Scientific knowledge of the deep past. Studies in history and philosophy of science, 55, 43-46. Emery, N. (2015). Chance, possibility, and explanation. The British Journal for the Philosophy of Science, 66(1), 95-120. Franklin, A. (2008). The Mendel-Fisher controversy. In Franklin, A., et al. (eds.), Ending the Mendel-Fisher controversy, pp. 1-77. University of Pittsburgh Press, Pittsburgh. Galton, F. (1897). The average contribution of each several ancestor to the total heritage of the offspring. Proceedings of the Royal Society of London, 61, 401-413. Gould, S.J. (1989). Wonderful Life: the Burgess Shale and the Nature of History. W.W. Norton, New York

27

Hartmann, S., & Schupbach, J. (2010). Review of the book Depth: An Account of Scientific Explanation, M. Strevens, 2008. Notre Dame Philosophical Reviews, 6(38). Helgeson, C. 2013. “Diverse Evidence, Independent Evidence, and Darwin’s Arguments from Anatomy and Biogeography.” PhD diss., University of Wisconsin–Madison. Hempel, C.G. (1965). Aspects of Scientific Explanation. Free Press, New York. Hitchcock, C., & Woodward, J. (2003). Explanatory generalizations, part II: Plumbing explanatory depth. Noûs, 37(2), 181-199. Leuridan, B. (2007) Galton’s blinding glasses. Modern statistics hiding causal structure in early theories of inheritance. In Russo, F. and Williamson, J., editors, Causality and Probability in the Sciences, Texts in Philosophy series, pp. 243-262. College Publications, London. Mendel, G. (1866). Experiments in plant hybridization. Bateson, W. (trans.), Verhandlungen des naturforschenden Vereines in Brünn, Bd. IV für das Jahr 1865, Abhandlungen, 3–47. Nagel, T. (2012). Mind and Cosmos: Why the Materialist Neo-Darwinian Conception of Nature Is Almost Certainly False. Oxford University Press, Oxford. Provine, W.B. (1971). The Origins of Theoretical Population Genetics. University of Chicago Press, Chicago. Roche, W., & Sober, E. (2013). Explanatoriness is evidentially irrelevant, or inference to the best explanation meets Bayesian confirmation theory. Analysis, 73(4), 659668. Royall, R. (1997). Statistical evidence: a likelihood paradigm (Vol. 71). CRC press. Salmon, W. (1970). Statistical explanation. In R.G. Colodny (ed.), The Nature and Function of Scientific Theories, pp. 173-231. University of Pittsburgh Press, Pittsburgh. ---- (1984). Explanation and the Causal Structure of the World. Princeton University Press, Princeton, NJ. Sober, E. (1984). The nature of selection: evolutionary theory in philosophical focus. University of Chicago Press, Chicago. ---- (2008). Evidence and evolution: The logic behind the science. Cambridge University Press, Cambridge. Strevens, M. (2000). Do large probabilities explain better? Philosophy of Science, 67, 366-390. ---- (2008). Depth: An Account of Scientific Explanation. Harvard University Press, Cambridge, MA. Sturtevant, A.H. (1965). A History of Genetics. Harper & Row, New York. White, R. (2005). Explanation as a guide to induction. Philosopher's Imprint,5(2), 1-29. ---- (2007). Does origins of life research rest on a mistake?.” Noûs 41(3), 453-477. Woodward, J., & Hitchcock, C. (2003a). Explanatory generalizations, part I: A counterfactual account. Noûs, 37(1), 1-24. Wright, S. (1930). The genetical theory of natural selection a review. Journal of Heredity, 21(8), 349-356.

28

1 High Probabilities in Biology Do Not (As a Rule ...

I will argue that in many cases in the history of biological practice, the particular ... give a thorough history of the theory's acceptance; instead I will handpick a ...... The Mendel-Fisher controversy. In Franklin, A., et al. (eds.), Ending the Mendel-Fisher controversy, pp. 1-77. University of Pittsburgh Press,. Pittsburgh. Galton, F.

320KB Sizes 0 Downloads 134 Views

Recommend Documents

1 Forthcoming in Philosophical Studies Please do not ...
descriptions without first being committed to whether the distinction is to be found in the semantics .... I call the type of coreference displayed in (1-4) and (1-4)' 'de jure' coreference. .... long distance linguistic relation “linking” A and

Invaders do not require high resource levels to ... - Mason Heberling
Department of Biology, Syracuse University, 107 College Place, Syracuse, New York 13244, USA .... ditions, but invaders exhibit faster growth rates as a consequence of greater investment in photosynthetic capacity. METHODS. Study site. Plots were sel

please do not quote without permission 1 Authors - PhilPapers
Affiliations: Zoe Drayson, Department of Philosophy, University of Bristol, 9 Woodland Road, .... notebook) may sensibly be considered, given the right additional ...

In a replacement, as a general rule, a more soluble ...
provide new paleoenvironmental and petrologic data in the area. Secondary ... Cayugan rocks thicken eastward toward the center of this basin (Ailing and Briggs,. 1961 ..... celestite were derived from locally obtained supplies of strontium.

please do not quote without permission 1 Authors - PhilPapers
It doesn't seem obvious that new technology should automatically ... He points out that new brain technology only raises new ethical questions if it provides a.

Do not masturb
Metal gear solid 2 pccrack.Do notmasturb.Tiffany ftuesdaymorning.Fantasticfour. 2015 643. ... Office 2010 32 bit iso.1080p bluray dts-hd. ... Visualstudio pdf.

Small Changes in Thyroxine Dosage Do Not ... - Oxford Journals
May 2, 2006 - domain of the GHQ-28, which was significantly worse for the ..... Prince RL, Stewart G, Stuckey B, Will RK, Retallack RW, Price RI, Ward L.

Do Developers Introduce Bugs when they do not ...
we analyze—using social network analysis measures—the characteristics of the .... The list of comments associated to a bug report represents the discussion ...

Rule 1 simple
Download pdf rule 1 the simple strategy for. successful investing. ... 7 simple rules to stop ransomware lincolnshire chamber of commerce. Portuguese montante ...

Cheating complaints against builders not civil in nature High Court.pdf ...
... Downloaded on - 11/05/2018 12:45:52 ::: Page 2 of 2. Main menu. Displaying Cheating complaints against builders not civil in nature High Court.pdf. Page 1 ...

Percolation in high dimensions is not understood
aFaculty of Physics, Bielefeld University, Bielefeld D-33615, Germany .... Best-fit scaling exponents of the spanning cluster multiplicity with the lattice size L, ...

Do not pray for easy lives. Pray to be stronger men and women. Do not ...
Mar 8, 2015 - with praise to God. ... 2) 4:25 Faced with opposition, the disciples took comfort in God's ... 2009 BMG Songs, Inc. | Hillsong Music Publishing.

Percolation in high dimensions is not understood
Keywords: Spanning cluster multiplicity; Upper critical dimension. Researchers were interested previously in percolation theory above the upper critical.

Do not trash the incentive
Dec 1, 2011 - http://www.trevisoservizi.com/index.php?title=rassegna. ...... Std. Dev. Minimum. Maximum. Sorted waste ratio (%). Arpav. 63.10. 58.98. 15.40.

Beer Do Not Use.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Beer Do Not ...

Please do not quote
does not include commuter carriers and general aviation. ..... banks, computer and internet services, health professionals, insurance, labor, lawyers, oil and gas ...

We do not know exactly how many
Dec 25, 2012 - So one went to Central Asia, another went to the Middle East and North African region, another went to South America, another stayed right ...

©2012 Landes Bioscience. Do not distribute
Jul 12, 2012 - ising renewable feedstocks for the pro- duction of fuels .... is estimated that about half of all algae .... ular tools and resources exist that expedite.

Learner Variability Is the Rule, Not the Exception - Digital Promise
Positioning Systems. (LPS) initiative, we are building systems that help support teachers and edtech developers in addressing learner variability. We have.

Philosophical Foundations of Probabilities - CiteSeerX
ˆ wore fund—ment—llyD the ide— to —ssign equ—l pro˜—˜ilities to 4equ—lly ... used ˜y numerous s™ientists —nd —pplied st—tisti™i—nsF „he ide—s —re ...

Not in a Hurry.pdf
jeans show the name of the company that made them. This is a popular ... You see logos ... A good slogan is easy to remember. TL: # … ... Not in a Hurry.pdf.

Race as Biology Is Fiction, Racism as a Social Problem ...
Jan 16, 2005 - what is meant by racial groups and whether such groups ... 737) that makes possible the enormous ..... North Americans define as Black any-.