Alden Gross 16 Aug 2012 Which ICC should we use for ...

Viewer
Transcript

Alden Gross 16 Aug 2012 Which ICC should we use for our functional composites study? Analysis goals Here, I use data provided in Shrout and Fleiss (1979) on ratings for 6 targets from 4 judges. (1) I calculate ICCs using their formulae, along the way testing what icr11.ado does. (2) I then apply the equations to a study we conducted in which 10 expert clinicians were asked to rate cognitive, physical, and independent loadings (3 sets of ratings) for each of 25 IADL/ADL items. Summary of results ICC(2,k) to describe agreement of the mean judge rating for items. ICC(2,1) to describe agreement of judges for a particular individual item’s rating. Background We report we are using an ICC(1,1) to describe reliability of a single rater. I will show later that I think that icr11.ado is calculating an ICC(3,1). This is easy to do: the form of the equations is identical. ICC(1,1) describes reliability in a study when each of a bunch of items (or subjects) are rated by a unique set of judges (or raters). That is, not all judges necessarily rate each item. This is an aspect of study design, and the ICC(1,1) can be ruled out quickly if your study did not use this design. ICC(2,1) assumes a random set of judges from a population of judges have each rated all the items. For example, the same judges rate cognitive load for every item in a functional battery This is our situation in SAGES. ICC(3,1) is similar, but judges are fixed effects because we have sampled from all possible judges (example: all 50 states vote on a constitutional amendment; there are 50 states and we have no need to generalize to the 51st state. So the judge, states, are fixed effects). Depending on one’s explicit purpose, ICC(2,1) and ICC(3,1) can be calculated together: the former can be described more as a measure of agreement while the latter measures consistency across judges. ICC(3,1) is usually larger because it does not care about random judges having been selected from the population. In addition to cases 1, 2, 3, we can describe reliability of an individual rater, ICC( , 1) or of the mean rating of judges, ICC( , k). This is an interpretative issue, but I would think we want to describe the mean rating among a set of judges in SAGES, since we will use the composites and not individual judge ratings later on for Thurstone scaling. Thus, ICC(2,k). Shrout and Fleiss (1979) provide a dense description of the ICC. They note (pg 423-424), ”It is not likely that ICC(2,1) or ICC(3,1) will ever be erroneously used in a case 1 study, since the appropriate mean squares would not be available. The misuse of ICC(1,1) on data from Case 3 1

or Case 3 studies is more likely. A consequence of this mistake is the underestimation of the true correlation...”

2

Here are ICCs based on data provided in Table 2 of Shrout and Fleiss (1979) on ratings for 6 targets from 4 judges. The correct ICCs are provided in Table 4 of their paper. The calculations agree (we also validated the equations using data from Shrout’s chapter in Psychiatric Epidemiology). . . webuse judges (Ratings of targets by judges) . anova rating judge target

Source

. . . . .

Number of obs = 24 Root MSE = 1.00968 Partial SS df MS

R-squared = 0.9095 Adj R-squared = 0.8612 F Prob > F

Model

153.666667

8

19.2083333

18.84

0.0000

judge target

97.4583333 56.2083333

3 5

32.4861111 11.2416667

31.87 11.03

0.0000 0.0001

Residual

15.2916667

15

1.01944444

Total

168.958333

23

7.34601449

* note: JMS was tough. * From SF1979, WMS=ems + (jms-ems)/n. * Check out table1, middle column, do algebra local bms = (`e(ss_2)´ / `e(df_2)´)

. local ems = (`e(rss)´/`e(df_r)´) . local jms = (`e(ss_1)´/`e(df_1)´) . local wms = (`e(rss)´/`e(df_r)´) /// 6.26 in SF1979 > + (`e(ss_1)´/`e(df_1)´ /// > - `e(rss)´ / `e(df_r)´)/(`e(df_2)´+1) . . . * ICC(1,1). should be 0.17, per Table 4. . display "ICC(1,1): " _c ICC(1,1): . display (`bms´ - `wms´ ) /// > / (`bms´ + `e(df_1)´*(`wms´)) .16574177 . . * ICC(2,1). should be 0.29, per Table 4. . display "ICC(2,1): " _c ICC(2,1): . display (`bms´ - `ems´ ) /// > / (`bms´ /// > + `e(df_1)´ * `ems´ /// > + (`e(df_1)´+1)*(`jms´ - `ems´) / (`e(df_2)´+1) .28976378 . . * ICC(3,1). should be 0.71, per Table 4. . display "ICC(3,1): " _c ICC(3,1): . display (`bms´ - `ems´ ) /// > / ( `bms´ /// > + `e(df_1)´ * `ems´ ) .71484071 . . * ICC(1,k). should be 0.44, per Table 4. . display "ICC(1,k): " _c ICC(1,k): . display (`bms´ - `wms´ ) / `bms´ .44279713 . . * ICC(2,k). should be 0.62, per Table 4.

3

)

. display "ICC(2,k): " _c ICC(2,k): . display (`bms´ - `ems´ ) /// > / (`bms´ /// > + (`jms´ - `ems´)/(`e(df_2)´+1) ) .62005055 . . * ICC(3,k). should be 0.91, per Table 4. . display "ICC(3,k): " _c ICC(3,k): . display (`bms´ - `ems´ ) / (`bms´) .90931554 .

So, what is icr11.ado doing? It appears to be ICC(3,1). This is usually larger than ICC(1,1) and likely larger than but similar in magnitude to ICC(2,1) since ICC(2,1) has additional uncertainty of random raters. This is easy to do by mixing up JMS with EMS in ANOVA because the equations are otherwise the same. . version 10 . icr11 , rating(rating) rater(judge) case(target) anova (Using anova) Number of obs = 24 R-squared = Root MSE = 0 Adj R-squared =

ICR(1,1) =

Source

Partial SS

df

MS

Model

168.958333

23

7.34601449

target judge target*judge

56.2083333 97.4583333 15.2916667

5 3 15

11.2416667 32.4861111 1.01944444

Residual

0

0

Total

168.958333

23

7.34601449

0.715

The intraclass correlation for a single rater [ICR(1,1)] describes the reliability of a single randomly selected rater. The result can be interpreted as the percent of the variance of a single rater´s ratings that are attributable to systematic differences between cases.

4

F

1.0000 Prob > F

What do these ICCs look like for the SAGES functional composites study?

. use $derived/fxncomp-208-ratings.dta, clear . quietly Composite ICC(1,1): ICC(2,1): ICC(3,1): ICC(1,k): ICC(2,k): ICC(3,k):

foreach t in 1 2 3 { type 1 .61593358 .62128812 .7219388 .94130478 .94254623 .96291256

Composite ICC(1,1): ICC(2,1): ICC(3,1): ICC(1,k): ICC(2,k): ICC(3,k):

type 2 .66668327 .66868278 .71135592 .95238434 .95279134 .96100565

Composite ICC(1,1): ICC(2,1): ICC(3,1): ICC(1,k): ICC(2,k): ICC(3,k):

type 3 .68435392 .68594985 .72247879 .95591034 .95622109 .96300856

5

Bonus material: Cronbachs Alpha is mathematically equivalent to the ICC for the mean of multiple observations with fixed raters/items, ICC(3,k). . foreach type in 1 2 3 { 2. display "Composite type `type´" 3. preserve 4. keep if type==`type´ 5. drop stub u lab name 6. reshape wide nu, i(item) j(raterid) 7. alpha nu* 8. restore 9. } Composite type 1 (500 observations deleted) (note: j = 1 2 3 4 5 6 7 8 9 10) Data Number of obs. Number of variables j variable (10 values) xij variables:

long

->

wide

250 4 raterid

-> -> ->

25 12 (dropped)

nu

->

nu1 nu2 ... nu10

Test scale = mean(unstandardized items) Average interitem covariance: Number of items in the scale: Scale reliability coefficient: Composite type 2 (500 observations deleted) (note: j = 1 2 3 4 5 6 7 8 9 10) Data Number of obs. Number of variables j variable (10 values) xij variables:

.0679642 10 0.9629

long

->

wide

250 4 raterid

-> -> ->

25 12 (dropped)

nu

->

nu1 nu2 ... nu10

Test scale = mean(unstandardized items) Average interitem covariance: Number of items in the scale: Scale reliability coefficient: Composite type 3 (500 observations deleted) (note: j = 1 2 3 4 5 6 7 8 9 10) Data Number of obs. Number of variables j variable (10 values) xij variables:

.0748336 10 0.9610

long

->

wide

250 4 raterid

-> -> ->

25 12 (dropped)

nu

->

nu1 nu2 ... nu10

Test scale = mean(unstandardized items) Average interitem covariance: Number of items in the scale: Scale reliability coefficient:

.1143844 10 0.9628

6

There’s an official Stata ado, icc.ado, that calculates all the ICCs from Shrout and Fleiss and has some other options to confuse you further: . webuse judges (Ratings of targets by judges) . * ICC(1,1 and k ) . icc rating target Intraclass correlations One-way random-effects model Absolute agreement Random effects: target

Number of targets = Number of raters =

rating

ICC

Individual Average

.1657418 .4427971

F test that ICC=0.00: F(5.0, 18.0) = 1.79

6 4

[95% Conf. Interval] -.1329323 -.8844422

.7225601 .9124154

Prob > F = 0.165

Note: ICCs estimate correlations between individual measurements and between average measurements made on the same target. . * ICC(2,1 and k ) . icc rating target judge, absolute Intraclass correlations Two-way random-effects model Absolute agreement Random effects: target Random effects: judge

Number of targets = Number of raters =

rating

ICC

Individual Average

.2897638 .6200505

6 4

[95% Conf. Interval] .0187865 .0711368

.7610844 .927232

F test that ICC=0.00: F(5.0, 15.0) = 11.03 Prob > F = 0.000 Note: ICCs estimate correlations between individual measurements and between average measurements made on the same target. . * ICC(3,1 and k ) . icc rating target judge, consistency Intraclass correlations Two-way random-effects model Consistency of agreement Random effects: target Number of targets = Random effects: judge Number of raters = rating

ICC

Individual Average

.7148407 .9093155

F test that ICC=0.00: F(5.0, 15.0) = 11.03

6 4

[95% Conf. Interval] .3424648 .6756747

.9458583 .9858917

Prob > F = 0.000

7

Should We Use Linearized Models To Calculate Fiscal ...

AWP AUG 2012 QP.pdf

EC-Aug-2012.pdf

16. reserve for future use

Which iPhone 6 Should You Buy.pdf

Why You Should Not Use Arch - GitHub

Bethesda Softworks - Dishonored Aug 2012 - Mobile Marketing ...

ICC Uniforms Info 15-16.pdf

ICC Statement_PR.pdf

Monthly Labor Review, June 2012: Which industries ...

The identifylayeroption specifies which method to use when ... - GitHub

$pdf-1423\hoodoo-almanac-2012-for-the-use-of ...$

pdf-1423\hoodoo-almanac-2012-for-the-use-of ...

alden scholarship application.pdf

BLIS Article Obesity Surgery Aug 2012.pdf

Kerala University of Health Sciences BDS Aug 2012 General ...

docket report 2012-Aug-20.pdf

Krancer response to CAC Aug 2012 highlighted.pdf

Kuvempu University BA History Aug 2012 History of Modern Asia.pdf

Dr. M.G.R. Medical University M.Ch Neonatology Aug 2012 Clinical ...

Kerala University of Health Sciences BDS Aug 2012 Dental ...