This document is the intellectual property of Dr Raywat Deonandan c. 2014 No part may be reproduced without his written permission (
[email protected])
2014-11-20
Hss 2381 – Stats, etc
Today’s slides…. • Were stolen from: – Nick Barrowman, PhD, Senior Statistician, Clinical Research Unit, CHEO Research Institute
Sample size determination
Outline • • • • • • •
Example: lowering blood pressure Null hypothesis Type-I and type-II error Sampling distribution of the mean Probability of type-I and type-II error Factors that affect sample size requirements Approximate sample size formulas – For a single-group study – For a two-group study • Sample size calculations in the literature
Example • Physicians design an intervention to reduce blood pressure (BP) in patients with high BP. • But does it work? Need a study. • How many participants are required? • Too few: – May not detect an effect even if there is one. • Too many: – May unnecessarily expose patients to risk.
The null hypothesis • For intervention studies, the null hypothesis is usually this: On average the intervention is associated with no reduction in blood pressure. . • The physicians who designed the intervention believe H0 is false. • The study is designed to put H0 to the test.
Possible scenarios
Based on the study findings we infer either …
that the intervention has no effect (accept H0)
or that the intervention has an effect (reject H0)
1
2014-11-20
Four possible scenarios
Four possible scenarios
In reality, either … the intervention has no effect (H0 is true)
Based on the study findings we infer either …
or
In reality, either …
the intervention has an effect (H0 is false)
that the intervention has no effect (accept H0)
the intervention has no effect (H0 is true)
Based on the study findings we infer either …
or that the intervention has an effect (reject H0)
Four possible scenarios
that the intervention has no effect (accept H0)
or
that the intervention has no effect (accept H0)
Based on the study findings or we infer either … that the
In reality, either … 0
that the intervention has no effect (accept H0)
Based on the study findings or we infer either … that the
Four possible scenarios
that the intervention has no effect (accept H0)
or that the intervention has an effect (reject H0)
intervention has or the an effect (H is false) 0
Incorrectly failing to reject the null
Correctly accept H0
Type-II error
Incorrectly rejecting the null
Type-I error
intervention has an effect (reject H0)
intervention has or the an effect (H is false) 0
Correctly accept H0
Type-I error
Correctly reject H0
The study
In reality, either …
Based on the study findings we infer either …
the intervention has no effect (H0 is true)
intervention has or the an effect (H is false)
Correctly reject H0
the intervention has no effect (H0 is true)
Correctly accept H0
Four possible scenarios
Correctly accept H0
intervention has an effect (reject H0)
0
that the intervention has an effect (reject H0)
In reality, either … the intervention has no effect (H0 is true)
intervention has or the an effect (H is false)
Correctly reject H0
• The population is considered to be all people who would be eligible for the intervention (might depend on age, other medical conditions, etc.). • Study participants are viewed as a sample from this population. • Suppose for each study participant we measure BP at baseline, and after 6 weeks of intervention. • Outcome is change in BP. • H0 is that mean change in BP is 0.
2
2014-11-20
Population vs. sample
Population distribution of change in BP
Random sample
Population
mean Calculation
Recall that variance is the square of the standard deviation, often written as σ2
± 1 standard deviation Population mean of the change in blood pressure
Inference
Sample mean of the change in blood pressure
Population distribution of change in BP
Sampling distribution of mean change in BP (N=1)
Sampling distribution of mean change in BP (N=2)
Sampling distribution of mean change in BP (N=5)
3
2014-11-20
Sampling distribution (N=10)
of mean change in BP Increasing sample size reduces the variability of the sample mean.
standard error
SD SE = N
standard deviation
Hypothesis test
Variance and sample size • As we’ve seen, increasing the sample size is akin to reducing the variance. • On the other hand if we use a more precise measurement method we may be able to reduce the variance directly and keep the sample size the same. • There may be a trade-off between the cost of more samples and the cost of a better measurement device.
Hypothesis test
Sampling distribution of the mean
Reject H0 if the observed mean is far in the tails of the null distribution. The figure below illustrates a one-tailed (one-sided) test.
Cut-off Observed mean
Rejection region H0 : The intervention (a new drug?) has no effect on the change in mean BP H1
: The intervention changes mean BP by at least 10 mmHG
Two-sided (two-tailed) test In practice, one-sided tests are rarely used. The figure below depicts a two-sided test. But for convenience, in the rest of the figures in this presentation, I’ll display one-sided tests.
Cut-off
Cut-off Observed mean
Left-hand side of rejection region
Type-I error If the null hypothesis is true, the rejection region of the test represents type-I error. The probability of type-I error is denoted α. It is represented below by the area of the red region.
Right-hand side of rejection region
4
2014-11-20
The alternative hypothesis
Trade-off between probability of type-I and type-II error
• The probability of type-II error is denoted β. • To quantify the probability of type II error, we need to consider the alternative hypothesis • There is a trade-off between type-I and type-II error.
The difference (“delta”) constitutes the change in mean BP
Probability of type-I error = 0.05
Trade-off between probability of type-I and type-II error
Trade-off between probability of type-I and type-II error
Probability of type-I error = 0.10
Probability of type-I error = 0.20
Power So the greater the cut-off value for type I error, the bigger the “rejection zone”, therefore the greater the probability of finding a significant effect.
• We usually fix the probability of type-I error (alpha) at 5% and then try to minimize the probability of type-II error (beta). • Define Power = 1 – beta • High power means it is more likely that our findings will be statistically significant. • The further apart the null and alternative hypothesis, curves are (i.e., the greater the difference, “delta”) the higher the power.
5
2014-11-20
The alternative hypothesis affects the probability of type-II error
The difference (“delta”) constitutes the change in mean BP
Probability of type-I error = 0.05
The alternative hypothesis affects the probability of type-II error
The difference (“delta”) constitutes the change in mean BP
Probability of type-I error = 0.05
The alternative hypothesis affects the probability of type-II error
The difference (“delta”) constitutes the change in mean BP
So, the bigger the difference between the mean BPs (i.e., the greater the effect to be measured), the more power there is, and the less type II error.
Probability of type-I error = 0.05
Maximizing power • Increasing sample size or reducing variance will increase power to detect an effect.
Why sample size affects power
The difference (“delta”) constitutes the change in mean BP
Probability of type-I error = 0.05
6
2014-11-20
Why sample size affects power
Why sample size affects power
Probability of type-I error = 0.05
Probability of type-I error = 0.05
Sample size doubled
Sample size quintupled
An approximate sample size formula for a single-group study • Suppose the variance in the change in BP, sigma2, is the same for the null and alternative hypotheses • Suppose alpha is fixed at 0.05 and we use twosided tests • Then we will have approximately 80% power to detect a mean change in BP, delta, if we enroll N participants, where
N = 8 sigma2 / delta2
Example • Suppose the standard deviation of the change in BP is anticipated to be 7 mm Hg (so the variance is 49). • Suppose we fix alpha at 0.05 and we’d like to have approximately 80% power to detect a mean change of 5 mm Hg. • Then we would need about 16 participants. N = 8 x 49 / 25 = 15.68
(approximately)
When there are two groups • So far, we’ve only considered a single group of study participants. • Usually we want to compare two groups: – control group receives “standard of care” or placebo – experimental group receives another intervention.
• Most randomized controlled trials are like this. • The focus is now on the difference between the means of the two groups. • For simplicity, assume the variance is the same in the two groups.
An approximate sample size formula for a two-group study • A similar approximate formula applies, again assuming alpha=0.05 and power=80%:
N
per group
= 16 sigma2 / delta2
(approximately)
• Careful! This is the required sample size per group. • Also, note that the constant is double what is was for the case of a single group. • So the total sample size is 4 times as large.
7
2014-11-20
Example
Summary
• Suppose we want to compare patients randomized to placebo with patients randomized to a new intervention. • Suppose the standard deviation is anticipated to again be 7 mm Hg. • Suppose we fix alpha at 0.05 and we’d like to have approximately 80% power to detect a change of 5 mm Hg. • Then we would need about 32 participants per group, for a total sample size of 64. N = 16 x 49 / 25 = 31.36
Required sample size … • • • •
increases with variance decreases with size of effect to detect decreases with probability of type-I error, alpha decreases with probability of type-II error, beta
Can’t have 0.36 persons, so always round up for sample size
Sample size determination has many other aspects • Different types of outcomes: dichotomous (e.g. mortality), time-to-event (e.g. survival time), etc. • Different designs: observational studies (e.g. case-control), surveys, prevalence studies • Practical considerations: e.g. costs, feasibility of recruitment
Next class (Wednesday) • We will discuss the final exam
8