A little number and a big controversy: p-Values Andrew Q. Philips Texas A&M University Feb. 2, 2017 IPSA-USP Summer School 2017



Definitions



Advantages of p-values



Disadvantages



What else to use?



Conclusion

https://xkcd.com/882/

Given 20 independent tests, a 5% significance level, probability of false positive: 1

(1

.05)20 = 64%

ASA Feb. 2014 discussion forum (Wasserstein and Lazar 2016):

“why do so many colleges and grad schools teach p = 0.05?” “because that’s still what the scientific community and journal editors use”

“why do so many people still use p = 0.05?” “because that’s what they were taught in college or grad school”

Definition: “how frequently would I observe a result at least as extreme as the one obtained if Ho were true?” (Jackman) “strength of evidence against the null hypothesis” (Wagenmakers) Used to assess statistical significance of a finding Null-Hypothesis Significance Testing (NHST)

β t= S .E .

t ~ |1.96| -> p< 0.05 (twotailed)

0.025 + .025 = 0.05

If the null hypothesis is true, the probability distribution of p is uniform [0,1] If the alternative hypothesis is true, the distribution of p depends on sample size and the true value of the parameter of interest e.g. two-tailed test that 5 flips of a coin (T T T T T) is likely: 1 5 2 · ( ) = 0.0625 2

History Ronald Fisher (1920s)…though Pearson and Laplace discussed pvalues

Differs from Neyman-Pearson framework (power, Type I, Type II error) Unlike Fisher, NP approach involves explicitly specifying Ha

Advantages of p-value Only need to specify null hypothesis (i.e. proposed model used to summarize incompatibility with the data) H0 : β = 0

Smaller p-values correspond with greater incompatibility between the (null) model and the data. Evidence against the null hypothesis p-values can be looked up using relevant t/z statistics

Disadvantages p-values do not tell us whether the null hypothesis (or the alternative) is true p-values do not tell us the probability that random chance produced the data observed 0.05 threshold is not a dichotomous threshold between “true” effects and “false” effects. “p-hacking” leads to faulty scientific progress (large increases in Type I error)

“My p-value is 0.01…phew; there’s only a 1% chance that the results I’m seeing are not real”

We never know the odds that the effect existed in the first place….the “plausibility of the hypothesis”

“my p-value is 0.04…the alternative hypothesis is true and the null hypothesis is false”

We never know if the null hypothesis (of no effect) is true or false. The p-value simply provides the probability that the data are unlikely to have been generated if the null was true, given the data we’re seeing.

“When I include x1 in the model, its p-value is 0.05…but z’s is 0.06. Only x1 is affecting y”

p-value of 0.05 by convention is an arbitrary cut-off point. Z is simply less compatible with the data, given the null of no effect

p-hacking Evidence of Publication Bias in the PBC Literature 250

Goodhart's law: “when a measure becomes a target, it’s no longer a measure”

Frequency

200

150

100

50

0 0

.05

.1 p-value

Data from Philips (2016). 622 study-model obs.

.15

.2

Solutions? Alternatives to p-values?

Basic and Applied Social Psychology bans p-values

ASA statement on statistical significance and p-values

Pre-specification, clear methods, data access and transparency, robustness…

Substantive Significance p-values say nothing about the substantive effect As sample size increases, test power goes to 1 0.001* (0.0003) Which effect matters more? 4.28 (3.10) Confidence intervals, predicted/expected values, substantive quantities of interest probably better test the substantive results

Confidence Interval In repeated samples, we would expect the true value of the coefficient to lie within this interval “x”% of the time

Less sharp cutoff, more substantive feel

Bayesian: 95% posterior intervals

(Philips, Rutherford, and Whitten 2016)

Others Likelihood ratios How much more likely are the data generated from model M1 vs. model M2?

fully Bayesian Bootstrapping (~Bayesian with uninformative priors) Bayes factors Relative odds of the null hypothesis vs. the alternative

Bayes Factors Does a patient’s sleep improve before vs. after taking a drug? One-sample t-test t = 4.0621, df = 9, pvalue = 0.002833 (2sided) t

0.10 0.05 0.00

Density

0.15

0.20

Does A Drug Increase Sleep?

−4

−2

0

2

N = 10 Bandwidth = 0.7946

4

6

Increase in Patient's Sleep after Receiving Drug

0

1

2

3 Hours

4

Null of no effect of drug on sleep p-value stops here! Alternative of positive effect (must specify distribution)

BF: Is the data (relatively) more consistent with Ha than Ho?

vs. Null, mu = 0

Alt., r=0.707 0
100

10

1

1/10

Alt., r=0.707 !(0
Conclusions p-values are not going anywhere

Useful, but often misinterpreted

Use in conjunction with other approaches

Andrew Q. Philips Texas A&M University Feb. 2, 2017 IPSA ... - GitHub

Given 20 independent tests, a 5% significance level, probability ... e.g. two-tailed test that 5 flips of a coin (T T T T T) is likely: 2 · (. 1 ... N = 10 Bandwidth = 0.7946.

1MB Sizes 3 Downloads 127 Views

Recommend Documents

Texas A&M University The University of Texas at Austin
provides a more traditional course of study for those who wish to pursue a PhD in Philosophy. Faculty members regularly offer graduate seminars and di-.

Old Dominion University Lecture 2 - GitHub
Old Dominion University. Department of ... Our Hello World! [user@host ~]$ python .... maxnum = num print("The biggest number is: {}".format(maxnum)) ...

ImportantQuestionsSectionwise_D13-Feb-2017 (2).pdf
What must be the height of the antenna of FM radio station so that people in a circular region of 3140. km2. can enjoy the programme of an FM radio station ?

Andrew Bowie - sikkim university library
retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying .... why philosophical positions become generally accepted in a wider ...... the world in which the tree is where one leaves one's signature in a.

pdf-1332\texas-am-university-a-pictorial-history-1876-1996 ...
... economic, and business history. Page 3 of 8. pdf-1332\texas-am-university-a-pictorial-history-1876- ... -of-the-association-of-former-students-texas-am-un.pdf.

Grove - Q Touch Sensor v1.0.sch - GitHub
This is not changeable. SeeeclStuclio. TITLE: Grove — Q Touch Sensor v1.0. Design: Xianqnan.@u Check: Bruce. Date: 2214/4/22 11:24:07 UiSion: v1.0 Sheet: ...

university of stellenbosch - GitHub
the degrei. BACHELOR OF SCIENCE WITH HONOURS. (BScHons). (Computer Science) with all the rishts and privilepes pertaininp thereto was conferred on.

Feb 2017 Recycling (amend 2-1-17).pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Feb 2017 ...

University of Texas at Tyler.pdf
Sign in. Page. 1. /. 2. Loading… Page 1 of 2. Page 1 of 2. Page 2 of 2. Page 2 of 2. University of Texas at Tyler.pdf. University of Texas at Tyler.pdf. Open. Extract.

University of Texas at Brownsville.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. University of ...

Texas A&M University - Commerce.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Texas A&M ...

Markus F. Miller - Texas Tech University Departments
Education. 1974-1978 Joshua High School. Joshua, Texas. Graduated 7 of 132 ..... 171. Loneragan, Guy H., M.M. Brashears, D.J. Kunze, T. Platt, M.F. Miller, ...

Q&A live Feb 7 .pdf
7 days ago - We just posted a strategic partnership announcement as a. part of our roadmap couple of weeks ago. We wouldn't do that if. there was a plan to pivot away from it. Page 3 of 5. Q&A live Feb 7 .pdf. Q&A live Feb 7 .pdf. Open. Extract. Open

texas a&m university
arts majors. In actuality, medical/dental schools have no preference for what your major is as long as you do well and meet the basic entrance requirements. ... as an alternative career for these reasons: you generally do best at what you enjoy the m

Markus F. Miller - Texas Tech University Departments
Cattlemen's Beef Association, Texas Beef Council and Cargil, Inc. Award of ...... W.N. Tapp III, J.W.S. Yancey, J.K. Apple, M.J. Wolf, A.M. Laury, J.C. Brooks, M.M..

Feb 2017.pdf
How: Let your child list old family favor- ites, perhaps meatloaf or baked ziti. Then,. look for ways to make them healthier. (use lean ground beef or whole-wheat.

Oxford_Friends_of_the_Earth_leaflet_Updated Feb 2017.pdf ...
Oxford_Friends_of_the_Earth_leaflet_Updated Feb 2017.pdf. Oxford_Friends_of_the_Earth_leaflet_Updated Feb 2017.pdf. Open. Extract. Open with. Sign In.

2. 01/02/2017 Wednesday 07.30 AM to 09.15 AM Category Number ...
Feb 1, 2017 - Category Number:284/16. REFRIGERATION MECHANIC MEDICAL EDUCATION. Download Admission Tickets from 21/01/2017. WWW.

Data 8R Hypothesis Testing Summer 2017 1 Terminology 2 ... - GitHub
Jul 27, 2017 - simulated on a computer. ... From the histogram, it looks like the higher mean from gambling was not at all that unusual - it certainly could have.

HW 2. - GitHub
0. > JL. f t. HW 2."? - /*//. =:- 122. ^ 53.0. C.VK,. , r~/ = O*. ^. -._ I"T.

Chapter 2 - GitHub
Jan 30, 2018 - More intuitively, this notation means that the remainder (all the higher order terms) are about the size of the distance between ... We don't know µ, so we try to use the data (the Zi's) to estimate it. • I propose 3 ... Asymptotica