hss2381 - sample size determination.pdf

Viewer
Transcript

This document is the intellectual property of Dr Raywat Deonandan c. 2014 No part may be reproduced without his written permission ([email protected])

2014-11-20

Hss 2381 – Stats, etc

Today’s slides…. • Were stolen from: – Nick Barrowman, PhD, Senior Statistician, Clinical Research Unit, CHEO Research Institute

Sample size determination

Outline • • • • • • •

Example: lowering blood pressure Null hypothesis Type-I and type-II error Sampling distribution of the mean Probability of type-I and type-II error Factors that affect sample size requirements Approximate sample size formulas – For a single-group study – For a two-group study • Sample size calculations in the literature

Example • Physicians design an intervention to reduce blood pressure (BP) in patients with high BP. • But does it work? Need a study. • How many participants are required? • Too few: – May not detect an effect even if there is one. • Too many: – May unnecessarily expose patients to risk.

The null hypothesis • For intervention studies, the null hypothesis is usually this: On average the intervention is associated with no reduction in blood pressure. . • The physicians who designed the intervention believe H0 is false. • The study is designed to put H0 to the test.

Possible scenarios

Based on the study findings we infer either …

that the intervention has no effect (accept H0)

or that the intervention has an effect (reject H0)

1

2014-11-20

Four possible scenarios

Four possible scenarios

In reality, either … the intervention has no effect (H0 is true)

Based on the study findings we infer either …

or

In reality, either …

the intervention has an effect (H0 is false)

that the intervention has no effect (accept H0)

the intervention has no effect (H0 is true)

Based on the study findings we infer either …

or that the intervention has an effect (reject H0)

Four possible scenarios

that the intervention has no effect (accept H0)

or

that the intervention has no effect (accept H0)

Based on the study findings or we infer either … that the

In reality, either … 0

that the intervention has no effect (accept H0)

Based on the study findings or we infer either … that the

Four possible scenarios

that the intervention has no effect (accept H0)

or that the intervention has an effect (reject H0)

intervention has or the an effect (H is false) 0

Incorrectly failing to reject the null

Correctly accept H0

Type-II error

Incorrectly rejecting the null

Type-I error

intervention has an effect (reject H0)

intervention has or the an effect (H is false) 0

Correctly accept H0

Type-I error

Correctly reject H0

The study

In reality, either …

Based on the study findings we infer either …

the intervention has no effect (H0 is true)

intervention has or the an effect (H is false)

Correctly reject H0

the intervention has no effect (H0 is true)

Correctly accept H0

Four possible scenarios

Correctly accept H0

intervention has an effect (reject H0)

0

that the intervention has an effect (reject H0)

In reality, either … the intervention has no effect (H0 is true)

intervention has or the an effect (H is false)

Correctly reject H0

• The population is considered to be all people who would be eligible for the intervention (might depend on age, other medical conditions, etc.). • Study participants are viewed as a sample from this population. • Suppose for each study participant we measure BP at baseline, and after 6 weeks of intervention. • Outcome is change in BP. • H0 is that mean change in BP is 0.

2

2014-11-20

Population vs. sample

Population distribution of change in BP

Random sample

Population

mean Calculation

Recall that variance is the square of the standard deviation, often written as σ2

± 1 standard deviation Population mean of the change in blood pressure

Inference

Sample mean of the change in blood pressure

Population distribution of change in BP

Sampling distribution of mean change in BP (N=1)

Sampling distribution of mean change in BP (N=2)

Sampling distribution of mean change in BP (N=5)

3

2014-11-20

Sampling distribution (N=10)

of mean change in BP Increasing sample size reduces the variability of the sample mean.

standard error

SD SE = N

standard deviation

Hypothesis test

Variance and sample size • As we’ve seen, increasing the sample size is akin to reducing the variance. • On the other hand if we use a more precise measurement method we may be able to reduce the variance directly and keep the sample size the same. • There may be a trade-off between the cost of more samples and the cost of a better measurement device.

Hypothesis test

Sampling distribution of the mean

Reject H0 if the observed mean is far in the tails of the null distribution. The figure below illustrates a one-tailed (one-sided) test.

Cut-off Observed mean

Rejection region H0 : The intervention (a new drug?) has no effect on the change in mean BP H1

: The intervention changes mean BP by at least 10 mmHG

Two-sided (two-tailed) test In practice, one-sided tests are rarely used. The figure below depicts a two-sided test. But for convenience, in the rest of the figures in this presentation, I’ll display one-sided tests.

Cut-off

Cut-off Observed mean

Left-hand side of rejection region

Type-I error If the null hypothesis is true, the rejection region of the test represents type-I error. The probability of type-I error is denoted α. It is represented below by the area of the red region.

Right-hand side of rejection region

4

2014-11-20

The alternative hypothesis

Trade-off between probability of type-I and type-II error

• The probability of type-II error is denoted β. • To quantify the probability of type II error, we need to consider the alternative hypothesis • There is a trade-off between type-I and type-II error.

The difference (“delta”) constitutes the change in mean BP

Probability of type-I error = 0.05

Trade-off between probability of type-I and type-II error

Trade-off between probability of type-I and type-II error

Probability of type-I error = 0.10

Probability of type-I error = 0.20

Power So the greater the cut-off value for type I error, the bigger the “rejection zone”, therefore the greater the probability of finding a significant effect.

• We usually fix the probability of type-I error (alpha) at 5% and then try to minimize the probability of type-II error (beta). • Define Power = 1 – beta • High power means it is more likely that our findings will be statistically significant. • The further apart the null and alternative hypothesis, curves are (i.e., the greater the difference, “delta”) the higher the power.

5

2014-11-20

The alternative hypothesis affects the probability of type-II error

The difference (“delta”) constitutes the change in mean BP

Probability of type-I error = 0.05

The alternative hypothesis affects the probability of type-II error

The difference (“delta”) constitutes the change in mean BP

Probability of type-I error = 0.05

The alternative hypothesis affects the probability of type-II error

The difference (“delta”) constitutes the change in mean BP

So, the bigger the difference between the mean BPs (i.e., the greater the effect to be measured), the more power there is, and the less type II error.

Probability of type-I error = 0.05

Maximizing power • Increasing sample size or reducing variance will increase power to detect an effect.

Why sample size affects power

The difference (“delta”) constitutes the change in mean BP

Probability of type-I error = 0.05

6

2014-11-20

Why sample size affects power

Why sample size affects power

Probability of type-I error = 0.05

Probability of type-I error = 0.05

Sample size doubled

Sample size quintupled

An approximate sample size formula for a single-group study • Suppose the variance in the change in BP, sigma2, is the same for the null and alternative hypotheses • Suppose alpha is fixed at 0.05 and we use twosided tests • Then we will have approximately 80% power to detect a mean change in BP, delta, if we enroll N participants, where

N = 8 sigma2 / delta2

Example • Suppose the standard deviation of the change in BP is anticipated to be 7 mm Hg (so the variance is 49). • Suppose we fix alpha at 0.05 and we’d like to have approximately 80% power to detect a mean change of 5 mm Hg. • Then we would need about 16 participants. N = 8 x 49 / 25 = 15.68

(approximately)

When there are two groups • So far, we’ve only considered a single group of study participants. • Usually we want to compare two groups: – control group receives “standard of care” or placebo – experimental group receives another intervention.

• Most randomized controlled trials are like this. • The focus is now on the difference between the means of the two groups. • For simplicity, assume the variance is the same in the two groups.

An approximate sample size formula for a two-group study • A similar approximate formula applies, again assuming alpha=0.05 and power=80%:

N

per group

= 16 sigma2 / delta2

(approximately)

• Careful! This is the required sample size per group. • Also, note that the constant is double what is was for the case of a single group. • So the total sample size is 4 times as large.

7

2014-11-20

Example

Summary

• Suppose we want to compare patients randomized to placebo with patients randomized to a new intervention. • Suppose the standard deviation is anticipated to again be 7 mm Hg. • Suppose we fix alpha at 0.05 and we’d like to have approximately 80% power to detect a change of 5 mm Hg. • Then we would need about 32 participants per group, for a total sample size of 64. N = 16 x 49 / 25 = 31.36

Required sample size … • • • •

increases with variance decreases with size of effect to detect decreases with probability of type-I error, alpha decreases with probability of type-II error, beta

Can’t have 0.36 persons, so always round up for sample size

Sample size determination has many other aspects • Different types of outcomes: dichotomous (e.g. mortality), time-to-event (e.g. survival time), etc. • Different designs: observational studies (e.g. case-control), surveys, prevalence studies • Practical considerations: e.g. costs, feasibility of recruitment

Next class (Wednesday) • We will discuss the final exam

8