A brief reply to “A peculiar surge of incorrect conclusions about the prevalence of p-values just below .05” Joost de Winter & Dimitra Dodou We thank Daniel Lakens for his extensive and detailed blog post (Lakens, 2015). Although the blog post includes some highly useful and interesting modelling results, we wish to point out a few caveats. Lakens’ explorative p-curve fitting Lakens uses various model-fitting procedures to explain the p-values observed by De Winter and Dodou (2015). He shows that the longitudinal change of the p-value distribution could have been caused by a reduction of statistical power (from 55% in 1990 to 42% in 2013) plus an increase of publication bias (for p-values just above 0.05), combined with an increase of the prevalence of p-values in general (from 0.01% to 0.1%). Lakens further demonstrates (under a number of assumptions) that it is unlikely (albeit “not impossible”) that p-hacking alone could cause our observed p-value distribution. Lakens’ modelling results are not surprising. When adding many unknowns to a model, any curve can be fitted with high accuracy. Numerous other variables (e.g., changes in reporting styles such as numeric versus textual reporting of p-values, changes in data fabrication, changes in types of journals and statistical methods, and changes in hypothesis generating/testing/replication practices) could in principle also be added to an (overfitted) p-curve model, and thus be invoked to explain the observed trends. We believe that Lakens’ modelling results are interesting and a valuable complement to our paper. We fully support Lakens’ reminder that p-hacking alone probably cannot explain the observed trends, and that alternative explanations need to be considered. We particularly agree with Lakens’ emphasis on the possibility of growing publication bias over the years. In essence, publication bias is a questionable research practice, just like phacking, with the difference that the former involves selective reporting of entire studies whereas the latter involves selective reporting of statistical analyses. We find it strange that Lakens embraces the possibility of a growing publication bias while he simultaneously rejects the option of growing p-hacking. It seems plausible to us that publication bias and p-hacking are intercorrelated because they probably have common causes (e.g., a rise of a publish-or-perish culture). Straw mans: Misrepresenting the title of our work as well as the overall message Lakens systematically misquotes the title of our article by ignoring its second half “(but negative results are increasing rapidly too)”. This distorts the double message we intended to convey. The second half of the title was included exactly to point out that it is not all bad news, and that alternative explanations have to be considered. Note that Lakens engaged in the same straw man in a tweet posted on February 4, 2015, in which he took issue of half of the title of our article (see figure below). In his blog post, Lakens introduces yet another straw man by attacking the idea of “a surge of p-hacking”. In actuality, we nowhere claimed there is a surge of phacking, nor did we claim that the observed trends of p-values are solely or predominantly caused by p-hacking. Our article provides descriptive evidence and leaves room for various interpretations Our article is technically correct in the sense there has been a surge of p-values in the range 0.041–0.049 between 1990 and 2013 (with a factor 10.3), while p-values between 0.051 and 0.059 have increased rapidly too (with a factor 3.6). We tried to present a balanced and multifaceted reflection on the possible causes underlying these trends, by bringing forward both ‘negative factors’ (such as p-hacking, among others) and ‘positive factors’ (such as an increase in structured reporting, increase of statistical power, and changes in the number of [true] hypothesis being tested). We cite much of the relevant literature, including Lakens’ (in press) commentary on Masicampo and Lalande (2012). Recent concerns about p-hacking: are they an elusive concept that exists only in the mind of people? Lakens accurately points out that “In recent years researchers have become more aware of how flexibility during the data-analysis can increase false positive results” (Italics added). Lakens also compares the search for an increase of questionable research practices to “the search for the ether” (presumably in reference to the elusive and undetectable luminiferous aether and associated theories in physics developed in the late 19th century). We believe that Lakens all too easily downplays the possibility of a rise of p-hacking, as if the recent concerns of phacking are merely about peoples’ ‘awareness’ rather than about reality. Although we agree with Lakens that it is very difficult to get hard and replicable numbers on this topic, there is certainly some evidence to support that growing p-hacking is a concern. Leggett et al. (2013), for example, found that “During the calculation of exact p
values it was found that 36 of the 93 values that were reported as being exactly equal to .05 were, in fact, greater than .05 (all from JPSP). In other words, 38.7% of p values reported as being exactly .05 had been rounded down and discussed as being significant. …The proportion of misleading p values increased from 1965 to 2005. In 2005, 42% of probability values reported as exactly .05 were rounded down compared to 19% in 1965.” References De Winter, J. C. F., & Dodou, D. (2015). A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too). PeerJ, 3, e733. Lakens, D. (in press). What p-hacking really looks like: A comment on Masicampo & Lalande (2012). Quarterly Journal of Experimental Psychology. Lakens, D. (2015, February 17). A peculiar surge of incorrect conclusions about the prevalence of p-values just below .05 [blog post]. Retrieved from http://daniellakens.blogspot.nl/2015/02/a-peculiar-surge-ofincorrect.html Leggett, N. C., Thomas, N. A., Loetscher, T., & Nicholls, M. E. (2013). The life of p: “Just significant” results are on the rise. The Quarterly Journal of Experimental Psychology, 66, 2303–2309. Masicampo, E. J., & Lalande, D. R. (2012). A peculiar prevalence of p values just below .05. The Quarterly Journal of Experimental Psychology, 65, 2271–2279.