STATISTICS AND RESEARCH DESIGN

Sample calculations for comparison of 2 means Nikolaos Pandis, Associate Editor of Statistics and Research Design Bern, Switzerland, and Corfu, Greece

A

common question in orthodontic research is “how many patients do I need for my study?” The next articles will introduce relevant concepts that will help readers to understand how to appropriately plan the size of a trial. The objective of a clinical trial is to provide reliable evidence regarding the effect or no effect of a treatment modality. A sufficient number of participants allows the researcher to detect a difference with reasonable precision (good power) if a difference exists, or allows one to be reasonably certain that no difference exists if the results show no difference. Small studies tend to be less convincing and inconclusive because they often have low power. Recruiting more patients than necessary is a waste of resources and even unethical, since more patients than necessary could be exposed to a potentially ineffective therapy. There is a close relationship between power and sample size; usually, as the sample size increases, study power is also expected to increase. Ideally, a balance between study power, a clinically important difference to be detected, trial feasibility, and credibility are required. What is study power? Power is the probability of observing a difference between treatment groups when a difference exists. A study designed to detect a clinically important difference with, let's say, a power of 80% assumes an 80% chance of observing a difference if there is a difference, and also assumes a 20% chance of missing the difference (false negative) when such a difference exists. Allowing a 20% (power 80%) or a 10% (power 90%) chance of a false negative (type II error or beta) is unavoidable, since a sample calculation with 100% power (type II error approaching zero) would require an infinite number of participants. Type I error, or a or alpha, refers to false-positive results and indicates that we are willing to accept a 5% (a 5 0.05) chance of observing a statistically significant difference when no such difference exists between the treatment groups. See Table I for descriptions and relationships of error types and power.

Am J Orthod Dentofacial Orthop 2012;141:519-21 0889-5406/$36.00 Copyright Ó 2012 by the American Association of Orthodontists. doi:10.1016/j.ajodo.2011.12.010

In this article, we will perform a sample calculation for a normally distributed quantitative outcome for a 2-arm trial with 1:1 allocation ratio (2-sided test). Sample calculations are based on assumptions, and we should aim to detect differences between treatment groups, if they exist, that have clinical importance rather than statistical significance. Before we proceed with the sample calculation, we need to define the following.  The research question.  The principal outcome measure of the trial.  m1, the anticipated mean response for the standard or control treatment.  m2, the anticipated mean response for the alternative treatment and hence the minimum clinically important difference (m2 – m1) between treatment arms that we would like to detect.  The standard deviation (for continuous outcomes only).  The degree of certainty with which we want to be able to detect the treatment difference (power) and the level of significance (type I error or a). We will use an example trial to illustrate the process. Pandis et al,1 in a study assessing treatment time to alignment and dental changes between selfligating and conventional appliances, found that the molar width difference at the end of the follow-up period was 2 mm (SD, 2 mm), a statistically significant finding (Table II). This study was not randomized, and the authors used different wires. Was the 2-mm difference in molar width genuine or was it observed because wires of different shapes were used for the treatment groups? We would like to confirm or refute those findings by adopting a randomized control trial design and using exactly the same wire shape and sequence for both treatment groups. As it was previously explained, to perform the sample calculation, we would need to decide what would be a clinically important difference that we want to detect. We can refer to the previous study and can assume that a molar width difference of 2 mm between the 2 appliances at a certain time after treatment initiation has clinical importance. Then we can design a randomized control trial with 90% power and a 5% level of significance, which 519

Statistics and research design

520

Table I. Types of errors in hypothesis testing at a 5% significance level and 80% power Result of significance test Not significant

Significant

In reality, no difference exists 1 – a (5 0.95 or 95%) Correct conclusion, accepting the null hypothesis (Ho) when the Ho is true a (5 0.05 or 5%) or type I error a 5 level of significance Incorrect conclusion, rejecting the Ho when the Ho is true

Table II. Intermolar width changes induced by align-

ment per bracket group adapted from Pandis et al

Dental cast measurement Initial intermolar width (mm) Final intermolar width (mm)

Conventional (n 5 27) Mean (SD) 44.2 (2.5) 44.6 (2.7)

1

Self-ligating (n 5 27) Mean (SD) 44.2 (2.6) 46.2 (1.7)

Table III. Values for different combinations of power

and level of significance, adapted from Pocock2 b 0.05 0.1 0.2 0.5 (95% power) (90% power) (80% power) (50% power) 13.0 10.5 7.85 3.84 a 0.05 0.01 17.8 14.9 11.7 6.63

will detect a 2-mm difference between the treatment groups if such a difference really exists. Therefore, m2 – m1 5 2 mm, power 5 90%, and a 5 0.05, and let us assume that the standard deviation (s) is 2 mm for both treatment arms by also referring to the cited study. We will use the following formula for 2 means from Pocock.2 n5f ða; bÞc

2s2 2 ðm1  m2Þ

where f(a, b) is a function of power and significance level, and Table III displays the appropriate substitution values. If we perform the appropriate substitutions in the 2  22 formula, we will get: n510:5 2 521. A total of 42 2 patients for both treatment arms are required to be able to detect a 2-mm difference in molar width between treatment groups with a power of 90% and a 5% level of significance. If we use a 5 0.01 and power 5 90%, then 2  22 n514:9  2 530, and a total of 60 patients are 2 needed for both treatment arms. It would be prudent

April 2012  Vol 141  Issue 4

In reality, a difference exists b or type II error (5 0.20 or 20%) b 5 1 – power Incorrect conclusion, rejecting the alternative hypothesis (Ha) when the Ha is true 1 – b (5 1 – 0.20 5 0.8 or 80%) 1 – b 5 power Correct conclusion, rejecting the Ho when the Ha is true

to add more participants to the calculated sample depending on the expected number of lost patients during the follow-up period. In the above calculations, we assumed that the observations are independent, the numbers of participants per trial arm are the same, and there are no losses to follow-up. By experimenting with the formula, we can see that the required sample size increases when the required difference of clinical importance is decreased, the power level is increased, the alpha level is decreased, or the standard deviation is increased, and vice versa. Therefore, sample size calculations can be manipulated by changing the assumptions; however, changes should be sensible and preferably in accordance with previous research or from a pilot study. Sample sizes are often calculated by using software or referring to tables.3 Power calculations should be considered at the design stage; they have limited or no value after the trial is conducted. After data analysis is complete, power is assessed by looking at the confidence intervals of the estimates. Narrow confidence intervals indicate high power and precision, and vice versa. Finally, a statement such as “the trial has 90% power” is ambiguous. A more appropriate way to comment on power with our example is as follows: “With 22 subjects per group, the trial has 80% power to detect a difference of 2 mm in molar width between conventional and self-ligating appliances at the 5% significance level.” The next article will present sample calculations for proportions. KEY POINTS

 Sample calculation should be based on clinically meaningful differences, consider previous knowledge, and balance statistical precision, trial feasibility, and credibility.  Power is considered at the design stage, and it has no value after the trial is conducted.

American Journal of Orthodontics and Dentofacial Orthopedics

Statistics and research design

REFERENCES 1. Pandis N, Polychronopoulou A, Eliades T. Self-ligating vs conventional brackets in the treatment of mandibular crowding: a prospective clinical trial of treatment duration and dental effects. Am J Orthod Dentofacial Orthop 2007;132:208-15.

521

2. Pocock SJ. Clinical trials: a practical approach. Chichester, United Kingdom: Wiley; 1983. p. 125-9. 3. Machin D, Cambell MJ, Tan SB, Tan SH. Sample size tables for clinical trials. 3rd ed. Oxford, United Kingdom: Wiley-Blackwell; 2009: p. 14-8.

American Journal of Orthodontics and Dentofacial Orthopedics

April 2012  Vol 141  Issue 4

Sample calculations for comparison of 2 means

power and sample size; usually, as the sample size in- ... error or beta) is unavoidable, since a sample calculation .... After data analysis is complete, power is.

85KB Sizes 0 Downloads 148 Views

Recommend Documents

Sample calculations for comparing proportions - American Journal of ...
of power and type I and type II errors and gave an example of the ... p2, the anticipated proportion on the alternative ... Two sources that could help us determine.

Small Sample Comparison of KSN-1005-derrived ... -
Capacitance. High Freq (kHz). High Freq (Vpp). Low Freq (kHz). Low Freq (Vpp). 0.107. 17.70. 55.00. 8.23. 66.00. 0.111. 16.50. 42.50. 8.22. 61.90. 0.111. 16.70.

Means for vaccinating
elastic member secured to said scarifying means, and ad hesive means on said member for .... References Cited in the file of this patent or the original patent.

Comparison-2.pdf
Five-year moratorium, after which USPS must consider alterna- tives to closings. ... to de- fray costs; waiver for physical hardship; OIG review of savings.

means for marketers
advocates is a drive for several additional rights. In the order of their serious challenge to ..... The marketing concept calls for a custom- er orientation backed by ...

Performance Comparison of Optimization Algorithms for Clustering ...
Performance Comparison of Optimization Algorithms for Clustering in Wireless Sensor Networks 2.pdf. Performance Comparison of Optimization Algorithms for ...Missing:

Comparison of Square Comparison of Square-Pixel and ... - IJRIT
Square pixels became the norm because there needed to be an industry standard to avoid compatibility issues over .... Euclidean Spaces'. Information and ...

Timestamp Sample Question 1 Sample Question 2 Check sessions ...
Page 1. Timestamp. Sample Question. 1. Sample Question. 2. Check sessions you are interested in attending. Your name: Your email address:

comparison
I She's os tall as her brother. Is it as good as you expected? ...... 9 The ticket wasn't as expensive as I expected. .. .................... ............ . .. 10 This shirt'S not so ...

Novel Approach for Modification of K-Means ...
Algorithm” 3rd International Symposium on Knowledge Acquisition and Modeling, 2010. [9] Divakar Singh, Anju singh, “A New Framework for Texture based ...

GB515866 Method of and means for obtaining ... - Ether Sciences
battery or other equivalent source and the radiant energy from the body being examined. In pursuance of the foregoing and according to my present invention a.

Novel Approach for Modification of K-Means Algorithm ...
Clustering is an unsupervised learning technique. The main advantage of clustering analysis is a descriptive task that seeks to identify homogeneous groups of objects based on the values of their attributes. Clustering algorithms can be applied in ma

1499501194937-gain-email-merchandising-intelligence-means-for ...
... loading more pages. Retrying... 1499501194937-gain-email-merchandising-intelligence-m ... affiliate-marketers-electronic-mail-market-inside.pdf.

comparison
1 'My computer keeps crashing,' 'Get a ......... ' . ..... BORN: WHEN? WHERE? 27.7.84 Leeds. 31.3.84 Leeds. SALARY. £26,000 ...... 6 this job I bad I my last one.

Derm patient sample 2.pdf
Page 1 of 1. Page 1 of 1. Derm patient sample 2.pdf. Derm patient sample 2.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Derm patient sample 2.pdf. Page 1 of 1.

GB515866 Method of and means for obtaining ... - Ether Sciences
electrons of the atoms of the object to be photographed to produce a change in voltage drop across an electrical circuit to correspondingly influence said field.

Comparison of Proper of Time for Lent.pdf
Comparison of Proper of Time for Lent.pdf. Comparison of Proper of Time for Lent.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Comparison of ...