0001 0002 0003 0004 0005 0006 0007 0008 0009 0010
Standard Errors and Confidence Intervals for Scalability Coefficients in Mokken Scale Analysis Using Marginal Models
0011 0012 0013 0014
Renske E. Kuijpers, L. Andries van der Ark, and Marcel A. Croon Tilburg University, Tilburg
0015
October 10, 2012
0016 0017 0018 0019 0020 0021 0022 0023 0024 0025 0026 0027 0028 0029 0030 0031 0032 0033 0034 0035
First author’s address:
0036 0037
Renske E. Kuijpers
0038 0039
Department of Methodology and Statistics, FSW
0040 0041 0042
Tilburg University
0043 0044
P.O. Box 90153
0045 0046
5000 LE Tilburg
0047 0048 0049
The Netherlands
0050 0051 0052
phone: +31 13 466 4030
0053 0054 0055
email:
[email protected] 1
0056
ABSTRACT
0057 0058 0059 0060
Mokken scale analysis is a popular method for scaling dichotomous and polytomous items.
0061 0062
Whether or not items form a scale is determined by three types of scalability coefficients: for
0063 0064
pairs of items, for items, and for the entire scale. It has become standard practice to interpret
0065 0066 0067
the sample values of these scalability coefficients using Mokken’s guidelines, which have been
0068 0069
available since the 1970s. For valid assessment of the scalability coefficients, the standard
0070 0071
errors of the scalability coefficients must be taken into account. So far, standard errors were
0072 0073 0074
not available for scales consisting of Likert items, the most popular item type in sociology, and
0075 0076
standard errors could only be computed for dichotomous items if the number of items was small.
0077 0078
This study solves these two problems. First, we derived standard errors for Mokken’s scalability
0079 0080 0081
coefficients using a marginal modeling framework. These standard errors can be computed for
0082 0083
all types of items used in Mokken scale analysis. Second, we proved that the method can be
0084 0085
applied to scales consisting of large numbers of items. Third, we applied Mokken scale analysis
0086 0087 0088
to a set of polytomous items measuring tolerance. The analysis showed that ignoring standard
0089 0090
errors of scalability coefficients might result in incorrect inferences.
0091 0092 0093
keywords: Mokken scale analysis, standard errors, scalability coefficients, marginal models.
0094 0095 0096 0097 0098 0099 0100 0101 0102 0103 0104 0105 0106 0107 0108 0109 0110
2
0111
1
INTRODUCTION
0112 0113 0114 0115
In the social sciences, researchers often use surveys or questionnaires for measuring the trait
0116 0117
or attitude of interest, such as religiosity, tolerance or social capital. Typically, respondents
0118 0119
react to a set of indicators of the trait. The indicators are generally referred to as items, and
0120 0121 0122
a set of items pertaining to the same trait is referred to as a scale. The respondents receive a
0123 0124
score on each item. A summary of a respondent’s item scores, most often the sum of the item
0125 0126
scores, produces an estimate of his or her trait level. The sums of the item scores can only
0127 0128 0129
be used meaningfully as estimates of the respondents’ trait levels if the scores on the items in
0130 0131
the scale are unidimensional and have discrimination power to distinguish trait levels. Mokken
0132 0133
scale analysis (Mokken 1971; Sijtsma and Molenaar 2002) is a popular method that can be
0134 0135 0136
used to partition a set of items into one or more unidimensional scales, possibly leaving some
0137 0138
items unscalable. Some recent sociological studies that used Mokken scale analysis to construct
0139 0140
scales investigated topics such as opinions on genetically modified foods (Loner 2008), religious
0141 0142 0143
and spiritual beliefs (Gow et al. 2011), political knowledge and media use (Hendriks Vettehen,
0144 0145
Hagemann, and Van Snippenburg 2004), social capital (Webber and Huxley 2007), and attitudes
0146 0147
toward illegal immigration (Ommundsen et al. 2002).
0148 0149 0150
In Mokken scale analysis, three types of scalability coefficients are used both as criteria for
0151 0152
the item partitioning and as diagnostics for the strength of the scales. The coefficients are Hij ,
0153 0154
a coefficient for the scalability of item pair (i, j); Hj , a coefficient for the scalability of item j;
0155 0156 0157
and H, a coefficient for the scalability of the entire scale. Details of the scalability coefficients
0158 0159
are discussed in Section 2. Mokken (1971:184) advocated that items form a scale if and only if,
0160 0161 0162
ρij > 0 (which is equivalent to Hij ≥ 0) for all i < j, and
(1)
Hj ≥ c for all j,
(2)
0163 0164 0165
3
0166
where ρ is the product-moment correlation and c some positive lower bound specified by the
0167 0168 0169
research. He proposed to choose the lower bound c to be at least equal to .3, in order to
0170 0171
keep nondiscriminating items and weakly discriminating items out of the scale (Sijtsma and
0172 0173
Molenaar 2002). He also advocated that H should be at least .3 and considered a scale to be
0174 0175 0176 0177 0178
weakly scalable if .3 ≤ H < .4, moderately scalable if .4 ≤ H < .5, and strongly scalable if H ≥ .5 (Mokken 1971:185), whereas H < .3 meant that the items are unscalable. For example,
0179 0180
for the 6-item scale Personal Skills (N = 279), Webber and Huxley (2007) found that all Hij s
0181 0182 0183
were positive, the values of Hj ranged between .32 and .45, and H = .37. They concluded
0184 0185
that Personal Skills had “sufficient scale H values to be useful”. We argue that researchers
0186 0187
should take into account the uncertainty of the estimated scalability coefficients when applying
0188 0189 0190
Mokken’s heuristic guidelines. The uncertainty is quantified by the standard errors of the
0191 0192
estimated values. If the standard error of H is small, then Webber and Huxley’s conclusion is
0193 0194
justified, but if the standard error is large (e.g., .08) then there is a reasonable chance that the
0195 0196 0197
population value of H is less than .3, and that the set of items that constitute Personal Skills
0198 0199
is in fact unscalable following Mokken’s guidelines. A similar line of reasoning applies when
0200 0201
Hij and Hj are evaluated.
0202 0203 0204
Although some studies derived standard errors for scalability coefficients, none yielded stan-
0205 0206
dard errors for all scalability coefficients that could also be applied to reasonable or large
0207 0208
numbers of items. Mokken (1971:164-169) derived asymptotic standard errors of H in case
0209 0210 0211
of dichotomous items. Van Onna (2004) used several computer-intensive methods to compute
0212 0213
confidence intervals for H both for dichotomous and polytomous items, and advocated using
0214 0215
the nonparametric bootstrap for computing a range-preserving confidence interval for H. Van
0216 0217 0218
der Ark, Croon and Sijtsma (2008) used marginal modelling as a framework for testing specific
0219 0220
4
0221
hypotheses about scalability coefficients Hij , Hj , and H. Within this framework they also
0222 0223 0224
derived standard errors for Hij , Hj , and H. However, their approach could only be applied to
0225 0226
small sets of dichotomous items. A practical problem is that none of the methods has been
0227 0228
implemented in software, which makes the methods unavailable for applied researchers. As a
0229 0230 0231
result, standard errors of scalability coefficients are never reported in applications of Mokken
0232 0233
scale analysis.
0234
In this paper, we solve all limitations mentioned. We generalize the marginal modelling
0235 0236 0237 0238
approach for computing standard errors of scalability coefficients to polytomous items and to
0239 0240
large numbers of items. Furthermore, the approach is made available in the software package
0241 0242
mokken (Van der Ark 2007). The remainder of this paper is organized as follows. First, we
0243 0244 0245
discuss Mokken scale analysis. Second, we discuss the general principle of obtaining standard
0246 0247
errors of sample statistics using the marginal modelling approach, we give detailed results for
0248 0249
the derivation of standard errors of scalability coefficients for polytomous items, and we discuss
0250 0251 0252
how the method can be applied to large numbers of items. Third, we estimate the scalability
0253 0254
coefficients and their standard errors for two real-data examples. The examples demonstrate
0255 0256
that ignoring the uncertainty of the estimated scalability coefficients may lead to incorrect
0257 0258 0259
inferences. Finally, the strengths and weaknesses of the approach are discussed.
0260 0261 0262
2
MOKKEN SCALE ANALYSIS
0263 0264
2.1
The Monotone Homogeneity Model
0265 0266 0267
Mokken scale analysis is based on the monotone homogeneity model (Mokken 1971, Ch. 4; Sijts-
0268 0269 0270
ma and Molenaar 2002:22-23), which is a nonparametric item response theory (IRT) model for
0271 0272
measuring respondents on an ordinal scale. We consider a set of J items numbered 1, 2, . . . , J,
0273 0274
each having z + 1 ordered answer categories x = 0, 1, . . . , z. Let Xj denote the score on item j
0275
5
0276
and let X+ =
P
j
Xj denote the sum of the J item scores. Let θ denote a possibly multidimen-
0277 0278 0279
sional latent variable (usually referred to as latent trait); often θ values are interpreted in terms
0280 0281
of the construct that the items measure in common. IRT models describe the relation between
0282 0283
latent trait θ and the probabilities of item scores x, P (Xj = x|θ). The monotone homogeneity
0284 0285 0286
model consists of three assumptions:
0287 0288
Unidimensionality : The latent variable θ is unidimensional;
0289 0290 0291
Local independence : The item scores are independent given θ; that is, P (X1 = x1 , X2 =
0292 0293 0294
x2 , . . . , XJ = xJ |θ) =
J Q
P (Xj = xj |θ).
j=1
0295 0296 0297
Monotonicity : The probability of having a score of at least x on item j, P (Xj ≥ x|θ), is a
0298 0299
nondecreasing function of θ.
0300 0301 0302
The monotone homogeneity model is a general model in the sense that all other popular uni-
0303 0304
dimensional IRT models are a special case of the monotone homogeneity model (Van der Ark
0305 0306
2001). For practical purposes, the model allows the stochastic ordering of θ by means of X+
0307 0308 0309
(for details, see Van der Ark and Bergsma 2010, and references therein). Hence, only if the
0310 0311
monotone homogeneity model fits the data well, the total scale score can be used meaningfully
0312 0313
to order respondents.
0314 0315 0316
Mokken scale analysis can be regarded as a set of methods to construct scales for which
0317 0318
the monotone homogeneity model and other nonparametric IRT models fit well. The general
0319 0320
idea is that one investigates observable properties implied by the model. For example, under
0321 0322 0323
the monotone homogeneity model all scalability coefficients Hij must be nonnegative. Hence,
0324 0325
if a researcher finds that for a particular scale the sample values of Hij are all nonnegative,
0326 0327
then this result supports the possibility that the monotone homogeneity model is true, whereas
0328 0329 0330
negative Hij values mean that the model must be rejected. 6
0331
2.2
Scalability Coefficients
0332 0333
2.2.1
Item Steps and Weighted Guttman Errors
0334 0335 0336
Scalability coefficients Hij , Hj , and H are based on item steps and Guttman errors (Molenaar
0337 0338
1991), which are best explained by means of an example. Table 1 (see Weijmar Schultz and
0339 0340 0341
Van der Wiel 1991) shows a cross-classification of the scores of N = 178 respondents on J = 2
0342 0343
items (Item a and Item b), each having z + 1 = 4 ordered answer categories. The frequencies
0344 0345 0346 0347 0348
+y x+ are denoted nxy ab x, y = 0, . . . , 3, and the marginal frequencies are denoted nab and nab , where
the “+” indicates the sum over all categories.
0349 0350 0351
Insert Table 1 about here
0352 0353 0354 0355 0356 0357
Item steps are boolean statements Xj ≥ x (j = 1, . . . , J; x = 0, . . . , z), indicating whether a respondent has passed the item step (Xj ≥ x) or not (Xj < x). The popularity of an item
0358 0359
step is determined by means of the proportion of respondents that has passed the item step,
0360 0361 0362
P (Xj ≥ x). It may be noted that P (Xj ≥ 0) = 1 by definition, and this probability thus is not
0363 0364
informative. The ordering of the 2z item steps in Table 1 by descending popularity equals
0365 0366 0367
Xa ≥ 1, Xa ≥ 2, Xb ≥ 1, Xb ≥ 2, Xa ≥ 3, Xb ≥ 3.
(3)
0368 0369 0370 0371 0372 0373
Respondents who did not pass any item step have item-score pattern (0, 0); respondents who have passed one item step, most likely have passed the most popular item step Xa ≥ 1,
0374 0375
producing item-score pattern (1, 0); respondents who have passed two item steps, most likely
0376 0377 0378
have passed Xa ≥ 1 and Xa ≥ 2, producing item-score pattern (2, 0), and so on. The admissable
0379 0380
item-score patterns are (0,0), (1,0), (2,0), (2,1), (2,2), (3,2), and (3,3) (frequencies printed in
0381 0382
bold face in Table 1) that are consistent with the order of the item steps. Each respondent that
0383 0384 0385
passes the h most popular item steps and does not take the remaining 2z − h less popular item 7
0386
steps has an item-score pattern that is in agreement with the Guttman (1950) model (Molenaar
0387 0388 0389 0390 0391
1991). Such admissable patterns are called conformal patterns. Respondents having item-score pattern (0,3) passed the least popular item step Xb ≥ 3 but did not pass the more popular item
0392 0393
steps Xa ≥ 1, Xa ≥ 2, and Xa ≥ 3. Patterns for which at least one less popular item step has
0394 0395 0396
been passed and one more popular has not been passed are called Guttman errors (Molenaar
0397 0398
1991). A set of items is perfectly scalable if there are no Guttman errors, and is less scalable
0399 0400
as the number of Guttman errors increases.
0401 0402 0403
Molenaar (1991) suggested weighting the frequencies of the Guttman errors depending on
0404 0405
the degree of deviation from item-score patterns yielding a perfect scale. The weight for the
0406 0407
frequency of a particular item-score pattern is computed as follows. We consider all pairs of
0408 0409 0410
item steps and we compute the weight equal to the number of pairs of item steps for which
0411 0412
the less popular item step was passed and the more popular step was failed. For example, for
0413 0414
02 item-score pattern (0,2) in Table 1, the Guttman weight equals wab = 4 because for four pairs
0415 0416 0417
of item steps (Xa ≥ 1, Xb ≥ 1), (Xa ≥ 1, Xb ≥ 2), (Xa ≥ 2, Xb ≥ 1), and (Xa ≥ 2, Xb ≥ 2)
0418 0419
the less popular item step was passed and the more popular step was failed (e.g., for pair
0420 0421
(Xa ≥ 1, Xb ≥ 1), the less popular item step Xb ≥ 1 was passed, but the more popular item
0422 0423 0424
step Xa ≥ 1 was failed). The weights are shown between parentheses in each cell of Table 1.
0425 0426
Note that the boldface conformal item-score patterns have a weight equal to zero.
0427 0428
For computational purposes, we give a formula for computing the weights (also see Ligtvoet
0429 0430 0431 0432 0433
et al. 2010). Let the 2z item steps be ordered by descending popularity (cf. Equation 3), and let xy xy xy qxy ij = (qij(1) , qij(2) , . . . , qij(2z) ) be a vector consisting of zeroes and ones indicating for item-score
0434 0435
pattern (Xi = x, Xj = y) whether an item step has been passed (1) or not (0). Then weight
0436 0437 0438 0439 0440
8
0441
xy wij equals
0442
xy wij =
0443 0444
0447 0448 0449
xy qij(u)
u−1 X
u=2
0445 0446
2z X
! xy |1 − qij(v) | .
(4)
v=1
Equation 4 counts how often a score 0 precedes a score 1 in vector qxy ij . It may be noted that for example for response pattern (0,2) in Table 1, the third and fourth item step in Equation 3
0450 0451
02 are passed, and so q02 ab = (0, 0, 1, 1, 0, 0). In qab , the score 0 precedes the score 1 four times, and
0452 0453 0454 0455 0456
02 so the weight wab equals 4. As a second example, consider the item-score pattern (2,1). Here,
the first, second, and third item steps are passed, and thus q21 ab = (1, 1, 1, 0, 0, 0). Here, there
0457 0458
21 are no occasions on which a score 0 precedes a score 1, and thus the weight wab is equal to 0.
0459 0460 0461
2.2.2
Item Pair Scalability Coefficients
0462 0463 0464
Item pair scalability coefficient Hij compares the sum of weighted observed frequencies of
0465 0466
Guttman errors to the sum of weighted frequencies of Guttman errors that is expected under
0467 0468
marginal independence of the item scores. Let
0469 0470 0471
exy ij =
0472
+y nx+ ij × nij N
(5)
0473 0474 0475
be the expected bivariate frequency under marginal independence; let Fij and Eij be the sum
0476 0477
of weighted observed and expected frequencies of Guttman errors, respectively, for item pair
0478 0479 0480 0481 0482 0483
(i, j). Then P P xy xy Fij x y wij nij Hij = 1 − = 1 − P P xy xy . Eij x y wij eij
(6)
0484 0485
If there are no Guttman errors, then Hij = 1; if there are as many Guttman errors as there
0486 0487
are under marginal independence, then Hij = 0. Under the monotone homogeneity model,
0488 0489 0490 0491 0492
Hij ≥ 0. Molenaar (1991) showed that Hij can be written as a normed covariance. Let σij be the covariance between item i and item j and let σijmax be the maximum covariance between
0493 0494
item i and item j, given the marginal distributions of both items. Given that the items both
0495
9
0496
have a positive variance, Hij = σij /σijmax . For a set of J items, let K = 12 J(J − 1) denote the
0497 0498 0499
number of item pairs; hence, we have K different coefficients Hij .
0500 0501
2.2.3
The Item Scalability Coefficient
0502 0503 0504
Item scalability coefficient Hj is a generalization of Hij ; it compares the sum of weighted
0505 0506
observed and weighted expected frequencies of Guttman errors for an individual item:
0507 0508
P P P P xy xy i6=j Fij i6=j x y wij nij = 1 − P P P xy xy . Hj = 1 − P i6=j Eij i6=j x y wij eij
0509 0510 0511
(7)
0512 0513
Under the monotone homogeneity model, 0 ≤ Hj ≤ 1. Let R(j) = X+ −Xj denote the rest score.
0514 0515 0516 0517 0518
Sijtsma and Molenaar (2002:57) showed that Hj is equal to the normed covariance between Xj max and R(j) ; that is, Hj = σjR(j) /σjR . Hence, Hj expresses the strength of the association (j)
0519 0520
between item j and the other items in the scale, and can be viewed as the nonparametric
0521 0522
analogue of the discrimination parameter in parametric IRT (e.g., Van Abswoude, Van der
0523 0524 0525
Ark, and Sijtsma 2004). To keep nondiscriminating items and weakly discriminating items out
0526 0527
of the scale, Mokken (1971:184) proposed that all Hj s should be greater than some lower bound
0528 0529
c > 0. It may be noted that c > 0 is not an observable property of the monotone homogeneity
0530 0531 0532
model.
0533 0534
2.2.4
The Total-Scale Scalability Coefficient
0535 0536 0537
Coefficient H is a generalization of Hij and Hj ; it compares the sum of weighted observed and
0538 0539 0540
weighted expected frequencies of Guttman errors for all J items in the entire scale:
0541 0542 0543 0544
PP P P P P xy xy i6=j Fij i6=j x y wij nij H = 1 − PP = 1 − P P P P xy xy . i6=j Eij i6=j x y wij eij
(8)
0545 0546
H expresses the scalability of all items in the scale. Under the monotone homogeneity model,
0547 0548 0549
0 ≤ H ≤ 1. Moreover, Mokken (1971:148-153; also, see Sijtsma and Molenaar 2002, Theorem
0550
10
0551
4.2) showed that under the monotone homogeneity model, the scalability coefficients are related
0552 0553 0554
in such a way that
0555 0556
min(Hij ) ≤ min(Hj ) ≤ H ≤ max(Hj ) ≤ max(Hij ).
0557
i,j
j
j
i,j
0558 0559 0560
2.3
Methods in Mokken Scale Analysis
0561 0562 0563
Mokken scale analysis contains an automated item selection procedure that partitions the set
0564 0565
of items into one or more unidimensional scales. A scale is considered a Mokken scale if it
0566 0567 0568
satisfies the two criteria as stated in Equations 1 and 2. Moreover, Mokken scale analysis
0569 0570
provides several methods for the additional investigation of the assumptions of the monotone
0571 0572
homogeneity model and other nonparametric IRT models. A description of these methods is
0573 0574 0575
beyond the scope of this paper, and we refer the interested reader to, for example, Mokken
0576 0577
(1971) and Sijtsma and Molenaar (2002).
0578 0579 0580 0581 0582
3
STANDARD ERRORS OF SCALABILITY COEFFICIENTS
0583 0584 0585
In marginal modelling of categorical data (e.g., see Bergsma, Croon, and Hagenaars, 2009, and
0586 0587 0588
references therein), a two-step method is used to compute standard errors of sample statistics.
0589 0590
We describe this method for the scalability coefficients. The first step is to write the scalability
0591 0592
coefficients as a function of the frequencies of the observed item-score patterns in the data. A
0593 0594 0595
set of J items each with z + 1 ordered answer categories (0, 1, . . . , z) produces L = (z + 1)J
0596 0597
possible item-score patterns. Without loss of generality, we assume that item-score patterns
0598 0599
are in lexicographic order: going from 00 . . . 0 to zz . . . z with the last digit changing fastest,
0600 0601 0602
and the digit in the first column changing slowest. The observed frequencies of the L possible
0603 0604
item-score patterns can be collected in a vector n. For example, a set of J = 3 items (denoted
0605
11
0606
by a, b, and c) each with (z + 1) = 3 answer categories has L = 33 = 27 possible item-score
0607 0608 0609
patterns; hence vector n equals
0610 0611 0612 0613 0614 0615 0616 0617 0618 0619 0620 0621 0622 0623 0624
n000 abc n001 abc n002 abc n010 abc n011 abc .. .
n= 220 nabc 221 nabc n222 abc
.
(9)
Vector n in Equation 9 is used throughout to illustrate the approach. Let vector Hij = (H12 , H13 , . . . , HJ−1,J )T (the superscript T denotes the transpose) contain all K scalability
0625 0626
coefficients Hij , and let vector Hj = (H1 , H2 , . . . , HJ )T contain all J scalability coefficients Hj .
0627 0628 0629
Also, let g and g† be vector-valued functions, and let g ‡ be a scalar function. We show that
0630 0631
the scalability coefficients can be written as a function of n; that is
0632 0633 0634
Hij = g(n)
(10)
Hj = g† (n)
(11)
H = g ‡ (n)
(12)
0635 0636 0637 0638 0639 0640 0641 0642 0643
The second step is to use the delta method to obtain the asymptotic standard errors for the
0644 0645
scalability coefficients. Let Vn and Vg(n) be the asymptotic variance-covariance matrix of n
0646 0647
and g(n), respectively; let N be the total sample size; and let D(x) be a diagonal matrix with
0648 0649 0650
the elements of vector x on the diagonal.
0651 0652
If n is sampled from a multinomial distribution, then
0653 0654 0655
Vn = D(n) − nN −1 nT
0656 0657 0658 0659
(e.g., Agresti 2007:6). Now if, G = G(n) is the Jacobian, which is the matrix of first partial
0660
12
0661
derivatives of g(n) to n, then according to the delta method
0662 0663
Vg(n) = GVn GT
0664 0665 0666
= G D(n) − nN −1 nT GT
0667 0668
= GD(n)GT − GnN −1 nT GT .
0669 0670
(13)
0671 0672 0673
In most applications of marginal models, the functions g() are homogeneous of order 0: that
0674 0675
is, the value of g() does not change when the values of its arguments are all multiplied by the
0676 0677
same constant t:
0678 0679
g(tn) = g(n).
0680 0681 0682
For such functions it does not matter whether n represents the observed frequencies or the
0683 0684 0685
observed probabilities. Functions g(n) (Equation 10), g† (n) (Equation 11), and g ‡ (n) (Equa-
0686 0687
tion 12) are also homogeneous functions. Euler’s homogeneous function theorem (e.g., Weisstein
0688 0689
2011) now implies that Gn = 0. As a result, Equation 13 reduces to
0690 0691 0692
Vg(n) = GD(n)GT .
0693
(14)
0694 0695 0696
Taking the square root of the diagonal of Vg(n) produces the required standard errors.
0697 0698
We demonstrate how to obtain g(·) (Equation 10), g† (·) (Equation 11), and g ‡ (·) (Equa-
0699 0700
tion 12). The notation used in these derivations is called the generalized exp-log notation
0701 0702 0703
(Bergsma 1997; Kritzer 1977). Moreover, we also show how to obtain the matrix of first partial
0704 0705
derivatives for these functions.
0706 0707 0708 0709
3.1
Generalized Exp-Log Notations for the Three Scalability Coefficients
0710 0711 0712
Let A1 , A2 , A3 , A4 , and A5 , be design matrices to be explained below. Matrix A1 is explained
0713 0714 0715
in detail to give the reader more insight into the generalized exp-log notation. The construction 13
0716
of the other design matrices is relegated to Appendix A. The generalized exp-log notation for
0717 0718 0719
Hij (Equation 10) is
0720 0721 0722
Hij = g(n) = A5 exp(A4 log(A3 exp(A2 log(A1 n)))).
(15)
0723 0724 0725
The notation exp(X) and log(X) denote the exponential and logarithmic functions, evaluated
0726 0727 0728 0729 0730
element-wise to the elements of X. Let nij be the vector containing the (z + 1)2 bivariate frequencies of item pair (i, j). For
0731 0732
K item pairs, the total number of bivariate frequencies equals B = K(z + 1)2 . Let nj be the
0733 0734 0735
vector containing the (z + 1) univariate frequencies of item (j). For J items the total number
0736 0737 0738 0739 0740 0741 0742 0743 0744 0745 0746 0747 0748 0749
of univariate frequencies equals U = J(z + 1). For example, for Equation 9 00+ nabc n01+ abc n02+ abc 0++ n10+ nabc abc 11+ . n1++ nab = nabc and na = abc 12+ 2++ n nabc abc n20+ abc n21+ abc n22+ abc
0750 0751 0752 0753 0754 0755 0756 0757 0758 0759 0760
The (B + U + 1) × L design matrix A1 consists of three submatrices: B A1 = U . 1TL
(16)
The B × L submatrix B is necessary to obtain the B observed bivariate frequencies. The first (z + 1)2 rows correspond to the first item pair (item 1, item 2); the next (z + 1)2 rows
0761 0762
correspond to the second item pair (item 1, item 3), and so on; the L columns correspond to
0763 0764 0765
the L item-score patterns. Element (b, l) equals 1 if the l-th item-score pattern contributes to
0766 0767
the b-th bivariate frequency, and element (b, l) equals 0 otherwise. For example, for the vector
0768 0769
of observed frequencies in Equation 9, the first row of B, which pertains to observed bivariate
0770
14
0771
000 001 002 frequency n00+ abc = nabc + nabc + nabc , equals
0772 0773 0774 0775
(1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0).
0776 0777 0778
The U × L submatrix U is necessary to obtain the U observed univariate frequencies. The
0779 0780
first (z + 1) rows correspond to item 1; the next (z + 1) rows correspond to item 2 and so
0781 0782 0783
on. Element (u, l) equals 1 if the l-th item-score pattern contributes to the u-th observed
0784 0785
univariate frequency, and element (u, l) equals 0 otherwise. For example, for the vector of
0786 0787
observed frequencies in Equation 9, the first row of U, which pertains to observed univariate
0788 0789 0790
frequency n0++ abc , equals
0791 0792 0793
(1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0).
0794 0795 0796 0797 0798 0799 0800 0801 0802 0803 0804 0805 0806 0807 0808
Vector 1TL is the 1 × L unit vector. For the vector of observed frequencies in Equation 9 nab nac nbc B na A1 · n = U · n = (17) . T 1L nb nc N Design matrices A2 , A3 , A4 , and A5 are constructed in a similar way (see Appendix A).
0809 0810
The generalized exp-log notation for Hj (Equation 11) is
0811 0812 0813 0814
Hj = g† (n) = A†5 exp(A†4 log(A†3 exp(A2 log(A1 n)))).
(18)
0815 0816 0817 0818 0819
Note that A1 and A2 in Equation 18 are equal to those in Equation 15 and 18. Design matrices A†3 , A†4 , and A†5 are derived in Appendix B.
0820 0821
The generalized exp-log notation for H (Equation 12) is
0822 0823 0824 0825
H = g ‡ (n) = A‡5 exp(A‡4 log(A‡3 exp(A2 log(A1 n)))). 15
(19)
0826
Note that A1 and A2 in Equation 19 are equal to those in Equation 15. Design matrices A‡3 ,
0827 0828 0829
A‡4 , and A‡5 are derived in Appendix C. Once the design matrices have been constructed, the
0830 0831
matrix of partial derivatives G can be derived (Appendix D). Implementing G into Equation 14
0832 0833
produces the required standard errors.
0834 0835 0836
3.2
Standard Errors for Scales Consisting of Large Numbers of Items
0837 0838 0839
A practical problem is that the proposed method for deriving standard errors for scalability
0840 0841
coefficients cannot be applied to large numbers of items (cf. Van der Ark et al. 2008). Even
0842 0843 0844
for relatively small scales, L can be so large that vector n and the (B + U + 1) × L matrix A1
0845 0846
(Equation 16) cannot be stored in computer memory. For large numbers of items, B may also
0847 0848
be too large to store A2 and A3 . For example, for J = 10 Likert items with z + 1 = 5 ordered
0849 0850 0851
answer categories, L = 510 = 9, 765, 625 and B =
10 2
2 5 = 1125. Two modifications in the
0852 0853
generalized exp-log notation reduce the computational burden considerably, so that standard
0854 0855
errors of scalability coefficients can be computed for up to approximately 100 items and up to
0856 0857 0858
approximately 100,000 respondents. However, for larger data sets, computation may be slow.
0859 0860
The largest contribution to reducing the computational burden is using only the nonzero
0861 0862
frequencies in n, which pertain to item-score patterns that are observed in the data, and collect
0863 0864 0865
them in vector n∗ . So, all elements of n∗ are positive and the size of n∗ , denoted L∗ , cannot
0866 0867
exceed the sample size N . Let a matrix superscripted with an asterisk indicate a reduced matrix,
0868 0869
which means that the rows and/or columns pertaining to zero-frequencies have been deleted.
0870 0871 0872
Thus, when only the nonzero observed frequencies are used, expression A1 .n in Equations 15,
0873 0874
T
18, and 19 is replaced by A∗1 .n∗ , and expression GD(n)GT is replaced by G∗ D(n∗ )G∗ . Other
0875 0876
matrices used in this paper remain unchanged. Because typically L∗ is much smaller than L,
0877 0878 0879
the reduced vectors and matrices are small enough to be stored in computer memory. We show
0880
16
0881
that using reduced matrices does not affect the computation of the scalability coefficients and
0882 0883 0884
their standard errors.
0885
First, we show that A1 .n = A∗1 .n∗ , which means that Equations 15, 18, and 19 are unaffected
0886 0887 0888
by using reduced matrices.
0889 0890 0891
Proof. Let
PL
l=1
Ai,l nl be the i-th element in vector A1 .n. If nl = 0 then Ai,l nl has no
0892 0893
contribution to the i-th element in A1 .n, and the l-th column of A1 and the l-th element of n
0894 0895
can be removed without consequences. 2
0896
Second, we show that GD(n)GT = G∗ D(n∗ )G∗T , which means that the computation of
0897 0898 0899 0900
the standard errors in Equation 14 is unaffected by using reduced matrices.
0901 0902
Proof. Let Gl denote the l-th column of G. Hence, GD(n)GT =
PL
l=1
Gl GTl nl . If nl = 0 then
0903 0904 0905 0906 0907
Gl GTl nl = 0; and neither the l-th column of G nor the l-th element of n have any contribution to GD(n)GT and can be removed without consequences. 2
0908
In general, direct computation of the design matrices A∗1 , A2 , and A3 is unnecessary and
0909 0910 0911 0912
can be avoided, which is convenient when the number of observed bivariate frequencies B is
0913 0914
large. The procedure is described in Appendix D.
0915 0916 0917 0918 0919
4
MOKKEN SCALE ANALYSIS OF DATA MEASURING TOLERANCE
0920 0921 0922
The use of marginal modelling for the derivation of standard errors and the accompanying
0923 0924 0925
confidence intervals is illustrated by means of data from the 2008 European Values Study (EVS
0926 0927
2011). This large-scale cross-national survey provides insight into the basic values, preferences,
0928 0929
attitudes and opinions that people all over Europe have about for instance life, work, family,
0930 0931 0932
sexual behavior, gender roles, politics, religion, well-being, and tolerance. We analyze data
0933 0934
pertaining to the tolerance scale. The tolerance scale consists of 20 items, where one part
0935
17
0936
of the items measures tolerance with respect to material issues, and the other part measures
0937 0938 0939
tolerance with respect to interpersonal issues. Each item pertains to a particular controversial
0940 0941
behavior, and the respondents had to indicate the degree to which they consider the behavior
0942 0943
to be justified. Examples are ”Do you justify adultery?”, ”Do you justify euthanasia?”, and
0944 0945 0946
”Do you justify prostitution?”. In the original data set, the answer categories ranged from 1
0947 0948
(never) to 10 (always). The more extreme response categories were almost never chosen by
0949 0950
respondents, and so the corresponding cell frequencies were close to or equal to zero. For this
0951 0952 0953
article, the answer categories were recoded into three categories, with the scores 1 to 3 being
0954 0955
recoded into 1, the scores 4 to 7 into 2, and 8 to 10 into 3.
0956 0957
Mokken scale analyses were performed on the data obtained in The Netherlands (N =
0958 0959 0960
1, 554), presumably a rather liberal country with respect to tolerance, and the former Soviet
0961 0962
republic Georgia (N = 1, 500), presumably a rather conservative country (for the computer
0963 0964
syntax, see Appendix E). These two countries were chosen to show that in some cases standard
0965 0966 0967
errors do affect the conclusions, and in other cases they do not. Since no or almost no cases
0968 0969
were in the third category, for the Georgian sample, three items (i.e, items 3, 4, and 16) were
0970 0971
deleted from the tolerance scale. Note that for the analyses we used the same items for both
0972 0973 0974
samples. However, the scales discussed hereunder are not identical.
0975 0976
For the Dutch sample, the automated item selection procedure (see Section 2.3) produced
0977 0978
three scales, only the first scale will be considered here. The first scale consisted of 12 items,
0979 0980 0981
and measured tolerance with respect to interpersonal issues. The items included in the scale
0982 0983
were: ”Do you justify . . . taking soft drugs (item 4); adultery (item 6); homosexuality (item 8);
0984 0985
abortion (item 9); divorce (item 10); euthanasia (item 11); suicide (item 12); having casual sex
0986 0987 0988
(item 14); avoiding a fare on public transport (item 15); prostitution (item 16); experiments
0989 0990
18
0991
on human embryos (item 17); and invitro fertilization (item 19)”.
0992 0993 0994
Table 2 shows the sample values of Hij and Hj plus their asymptotic standard errors for
0995 0996
the first scale of the Dutch sample. To assess whether the item pair scalability coefficients were
0997 0998
b ij ±1, 96∗se(H b ij ). significantly greater than zero, 95% confidence intervals were obtained using H
0999 1000 1001
For none of the 66 sample Hij s the value zero was included in the confidence interval, so all
1002 1003
b ij s were significantly greater than zero. Similarly, 95% confidence intervals were created for H
1004 1005
b 15 = .303; s.e. = .024) the confidence interval included the criterion Hj . Only for item 15 (H
1006 1007 1008 1009 1010
value c = .3, so we do not have sufficient evidence that item 15 satisfies the second property of a Mokken scale (i.e., Hj ≥ c for all j) and thus it may be considered for removal from the
1011 1012
b = .479; scale. Following Mokken’s guidelines, the items form a scale of moderate strength (H
1013 1014 1015
s.e. = .012).
1016 1017 1018
Insert Table 2 about here
1019 1020 1021
For the Georgian sample, the automated item selection procedure produced three scales.
1022 1023 1024
Only the longest scale, which is the most similar to the Dutch scale, will be considered here.
1025 1026
The scale consisted of eight items, measuring tolerance with respect to interpersonal issues.
1027 1028
The items included in this scale were: ”Do you justify . . . adultery (item 6); divorce (item
1029 1030 1031
10); euthanasia (item 11); having casual sex (item 14); prostitution (item 16); experiments on
1032 1033
human embryos (item 17); manipulation of food (item 18); and invitro fertilization (item 19)”.
1034 1035 1036 1037 1038
b ij values. However, item 16 (prostitution) had an H b j value which All item pairs had positive H b 16 = .269; s.e. = .066) and was lower than the generally accepted lower bound value .3 (i.e., H
1039 1040
was thus removed from the scale. The fact that an item with an Hj value lower than lower
1041 1042
bound c was selected into the scale is an artefact of the method. However, at the moment that
1043 1044 1045
the item was selected into the scale, its Hj value with respect to the items already selected at 19
1046
that point was in excess of c. Once an item has been selected, it cannot be deselected anymore
1047 1048 1049
(Sijtsma and Molenaar 2002:79-80).
1050 1051
Insert Table 3 about here
1052 1053 1054
A second Mokken scale analysis was performed on the remaining seven items, and Table 3
1055 1056 1057 1058
shows the sample values of Hij and Hj , and their asymptotic standard errors. To assess
1059 1060
whether the item pair scalability coefficients were greater than zero, 95% confidence intervals
1061 1062 1063 1064 1065
b ij s zero was included were obtained in a similar way to the Dutch sample. For none of the 21 H b ij s were significantly greater than zero. Also, 95% confidence in the confidence interval, so all H
1066 1067
b 6 = .333; s.e. = .039) and 14 (H b 14 = .345; s.e. = intervals were created for Hj . For items 6 (H
1068 1069
.035) the confidence intervals included the criterion value c = .3. So we do not have sufficient
1070 1071 1072
evidence that both items satisfy the second property of a Mokken scale, and thus they may be
1073 1074
considered for removal from the scale. The sample value for coefficient H was equal to .402
1075 1076
with a standard error of .028. Although the sample value of H suggests that the items are
1077 1078 1079
moderately scalable according to Mokken’s guidelines, using the standard errors suggests that
1080 1081
we can only claim that the items are weakly scalable.
1082 1083 1084
5
DISCUSSION
1085 1086 1087 1088
For many sample statistics, for example, correlation coefficients, sample means, and regression
1089 1090
parameters, standard errors are vital for the interpretation of the size of the effect of the
1091 1092
estimated value. This is also true for scalability coefficients, but until recently their standard
1093 1094 1095
errors could not be computed. This paper showed how to derive these standard errors. Although
1096 1097
the derivation may be technically difficult, in practice the computation of the standard errors
1098 1099
is accomplished by means of the R package mokken (Van der Ark 2007), which is free of charge.
1100
20
1101
In general, it is well-known that standard errors decrease as the sample size N increases (e.g.,
1102 1103 1104
Tabachnick and Fidell 2007). However, the standard errors of the scalability coefficients are
1105 1106
not only functions of the sample size, but also of the skewness of the item-score distributions.
1107 1108
The more skewed the item-score distributions are, the larger the size of the standard errors
1109 1110 1111
(Agresti 2007:110); this is due to estimates of certain coefficients becoming less accurate as
1112 1113
the estimated item step proportions get closer to 0 or 1. So, even with a large sample size
1114 1115
standard errors can be large. This makes it even more important to consider standard errors
1116 1117 1118
when interpreting scalability coefficients.
1119 1120
In our data analysis, we argued that sample values of the scalability coefficients should be
1121 1122
significantly greater than the desired criterion, and we investigated each scalability coefficient
1123 1124 1125
separately without correction for multiple testing. These two decisions may be open for debate.
1126 1127
In statistical hypothesis testing, the null hypothesis usually states the opposite of what one
1128 1129
wants to prove (note that this is not the case in, e.g., model selection tests in structural
1130 1131 1132 1133 1134
equation modelling). We wish to test whether the item scalability coefficients are greater than .3, and so the null hypothesis is Hj ≤ .3. If the burden of proof is reversed, researchers may
1135 1136
be tempted to use very small samples (yielding very large confidence intervals) so that even for
1137 1138 1139
low values of Hj and H, the guidelines are met.
1140 1141
When the number of items is large, there will also be a large number of item pair and
1142 1143
item scalability coefficients. If for all these Hij s and Hj s confidence intervals are constructed
1144 1145 1146 1147 1148
simultaneously, the chance of incorrectly rejecting the true null hypothesis (i.e., Hij ≤ 0; and Hj ≤ c) is much larger. The probability of obtaining a Type I error will be much larger, than
1149 1150
when testing one hypothesis at the time. A correction for this multiple hypothesis testing
1151 1152 1153
might be used, for example, the Holm-Bonferroni correction (Holm 1979), which is suited for
1154 1155
21
1156
correlated tests. This results in larger confidence intervals (i.e., 99% or 99.9%), but it may be
1157 1158 1159
noted that larger confidence intervals result in a smaller power.
1160 1161
An issue that remains to be solved is that the order of the 2z item steps (Equation 3)
1162 1163
is obtained from the data. In most cases, it is assumed that the ordering of the item steps
1164 1165 1166
in the data is identical to the ordering of the item steps in the population. However, when
1167 1168
the popularity of two item steps are almost equal in the population, the ordering may be
1169 1170
reversed in the sample. This affects the Guttman weights in matrix A3 , because the number
1171 1172 1173
of Guttman errors for each item-score pattern depends on the ordering of the item steps. As
1174 1175
a result, the reversal may affect the estimates of the scalability coefficients and their standard
1176 1177
errors. Investigating the effect of differences in the ordering of item steps between sample and
1178 1179 1180
population on the estimates of the scalability coefficients and their standard errors is a topic
1181 1182
for future research.
1183 1184
Another topic for future research is to investigate how standard errors affect the automated
1185 1186 1187 1188 1189
item selection procedure in Mokken scale analysis. Now items are selected into a scale if all sample values of Hj ≥ c but as our example showed, this may may be too liberal as not all
1190 1191
sample values of Hj are significantly greater than c.
1192 1193 1194 1195 1196
APPENDIX A. Derivation of Design Matrices for Item Pair Scalability Coefficients
1197 1198 1199 1200
The 2B × (B + U + 1) design matrix A2 in Equation 15 is used for constructing the expected
1201 1202
bivariate frequencies (Equation 5). A2 consists of several submatrices:
1203 1204 1205 1206
A2 =
IB 0 0 0 P −1B
1207 1208 1209 1210
22
.
1211
Matrix IB is an identity matrix of order B; multiplying with IB leaves the observed bivariate fre-
1212 1213 1214 1215 1216
quencies unchanged. The B × U submatrix P is necessary to obtain the B products of observed univariate frequencies (numerator on the right-hand side of Equation 5). The first (z + 1)2 rows
1217 1218
correspond to the first item pair (item 1, item 2); the next (z+1)2 rows correspond to the second
1219 1220 1221
item pair (item 1, item 3), and so on; the U columns correspond to the U observed univariate
1222 1223
frequencies. Element (p, u) equals 1 if the u-th observed univariate frequency contributes to the
1224 1225
p-th product of observed univariate frequencies, and element (p, u) equals 0 otherwise. Vector
1226 1227 1228
−1B is used for dividing the product of observed univariate frequencies (obtained using matrix
1229 1230
P) by N ; this results in the expected bivariate frequencies under independence (Equation 5).
1231 1232
01 zz T 2 Let eij = (e00 ij , eij , . . . , eij ) contain the (z + 1) expected bivariate frequencies pertaining to
1233 1234 1235
item i and item j. Substituting A1 .n by the right-hand side of Equation 17, we find that for
1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265
the vector of observed frequencies in Equation 9 exp(A2 log(A1 n)) equals nab n ab nac nbc nac IB 0 0 na = nbc exp · log 0 P −1B nb eab eac nc ebc N
(20)
The (2K +1)×2B design matrix A3 is used to compute the weighted observed and expected frequencies; it has the following form:
cT1 0 A3 = W 0 . 0 W
(21)
00 01 zz T Let wij = (wij , wij , . . . , wij ) contain the (z + 1)2 Guttman weights (Equation 4) pertaining
to item-pair (i, j), then the K × B matrix W T w12 0 0 wT 13 0 0 W= .. .. . . 0 0
is a block-diagonal matrix: 0 ... 0 0 ... 0 T w14 ... 0 . .. .. . . T 0 . . . wJ−1,J 23
1266
Vector cT1 is a copy of the first row of W; duplicating this row is necessary for constructing the
1267 1268 1269
scalar 1 in Equation 6. Substituting exp(A2 log(A1 n)) by the right-hand side of Equation 20,
1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281
we find that for the vector of observed frequencies in Equation 9 A3 exp(A2 log(A1 n)) equals wab nab Fab nab T nac wab nab Fab wac nac Fac c1 0 W 0 nbc = wbc nbc = Fbc . (22) eab wab eab Eab 0 W eac wac eac Eac ebc wbc ebc Ebc Note that Fij and Eij were introduced in Equation 6.
1282 1283 1284 1285 1286 1287
The (K + 1) × (2K + 1) design matrix A4 is a concatenation of several submatrices, 1 −1 0TK−1 0TK . (23) A4 = 0K IK −IK
1288 1289
Substituting A3 exp(A2 log(A1 n)) by the right-hand side of Equation 22, we find that for
1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302
the vector of observed frequencies in Equation 9 exp(A4 log(A3 exp(A2 log(A1 n)))) equals Fab Fab 1 −1 0 0 0 0 0 1 Fac 0 1 0 0 −1 0 0 log Fbc = Fab /Eab . exp (24) Fac /Eac 0 0 1 0 0 −1 0 Eab 0 Fbc /Ebc 0 0 1 0 0 −1 Eac Ebc The K × (K + 1) design matrix A5 is a concatenation of a unit vector of length K, and the
1303 1304
negative of an identity matrix of order K, that is,
1305 1306
A5 =
1307
1K −IK
.
(25)
1308 1309 1310
Substituting exp(A4 log(A3 exp(A2 log(A1 n)))) by the right-hand side of Equation 24, we find
1311 1312
that for the vector of observed frequencies in Equation 9 A5 exp(A4 log(A3 exp(A2 log(A1 n))))
1313 1314 1315 1316 1317 1318 1319
equals 1 1 −1 0 0 1 − Fab /Eab Hab Fab /Eab 1 0 −1 0 1 − Fac /Eac = Hac . Fac /Eac = 1 0 0 −1 1 − Fbc /Ebc Hbc Fbc /Ebc
1320
24
1321 1322 1323
APPENDIX B. Derivation of Design Matrices for Item Scalability Coefficients
1324 1325 1326 1327 1328 1329 1330 1331 1332 1333
Matrix A†3 can be obtained by pre-multiplying matrix A3 (Equation 21) by a (2J +1)×(2K +1) matrix S† : For i = 1, 2, . . . J − 1, let Ji,J be the J × (J − i) matrix 0(i−1)×(J−i) , Ji,J = 1T1×(J−i) IJ−i
1334 1335 1336
and let J = (J1,J J2,J . . . JJ−1,J ); then
1337
0 cT1 0 S† = 0 J 0 . 0 0 J
1338 1339 1340 1341 1342 1343
Vector cT1 is a copy of the first row of J. Matrix S† is required in order to add up over
1344 1345
the appropriate coefficients Fij and Eij (Equation 7). Substituting A3 exp(A2 log(A1 n)) by
1346 1347
the right-hand side of Equation 22, we find that for the the vector of observed frequencies in
1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359
Equation 9 S† A3 exp(A2 log(A1 n)) equals
0 0 0 0 0 0 0
1 1 1 0 0 0 0
1 1 0 1 0 0 0
0 0 1 1 0 0 0
0 0 0 0 1 1 0
0 0 0 0 1 0 1
0 0 0 0 0 1 1
1360 1361 1362
Fab Fab Fac Fbc Eab Eac Ebc
P F Pi6=a ia F Pi6=a ia F Pi6=b ib = F P i6=c ic E Pi6=a ia E Pi6=b ib E i6=c ic
.
Design matrix A†4 and A†5 are very similar to A4 (Equation 23) and A5 (Equation 25), respec-
1363 1364
tively. The only difference is that the sizes of the submatrices are J rather than K.
1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375
25
1376 1377 1378
APPENDIX C. Derivation of Design Matrices for the Total-Scale Scalability Coefficient
1379 1380 1381 1382 1383 1384
Matrix A‡3 can be obtained by pre-multiplying matrix A3 (Equation 21) by a 3 × (2K + 1) matrix S‡
0 1TK 0TK S‡ = 0 1TK 0TK . 0 0TK 1TK
1385 1386 1387 1388 1389
Matrix S‡ is required in order to add up over the appropriate coefficients Fij and Eij (Equa-
1390 1391
tion 8). Substituting A3 exp(A2 log(A1 n)) by the right-hand side of Equation 22, we find that
1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403
for the vector of observed frequencies in Equation 9 S‡ A3 exp(A2 log(A1 n)) equals Fab Fab P P Fij 0 1 1 1 0 0 0 i6 = j Fac PP . 0 1 1 1 0 0 0 Fbc = i6=j Fij P P 0 0 0 0 1 1 1 i6=j Eij Eab Eac Ebc
1404 1405
Using design matrices
1406 1407 1408 1409
A‡4
=
1 −1 0 1 0 −1
and A‡5 =
1 −1
1410 1411 1412
in Equation 19 yields coefficient H.
1413 1414 1415 1416 1417
APPENDIX D. Deriving the Matrix of Partial Derivatives
1418 1419 1420
The Jacobian G is derived by means of a recursive procedure that requires the design matrices
1421 1422 1423
derived in Appendices A, B, and C. First, let φ(x) be a function that either indicates an
1424 1425
exponential (φ(x) = exp(x), φ0 (x) = exp(x)), a logarithm (φ(x) = log(x), φ0 (x) = 1/x),
1426 1427
or a translation (φ(x) = x + c, where c is some constant value, φ0 (x) = 1). Second, let
1428 1429 1430
26
1431
f0 (n), f1 (n), f2 (n), . . . , fq (n) be a series of q + 1 functions, in which
1432 1433
f0 (n) = n,
1434 1435 1436
fi (n) = φ[Ai fi−1 (n)]; for i = 1, . . . , q.
1437
(26)
1438 1439 1440
The last function in Equation 26 is
1441 1442
fq (n) = g(n)
1443 1444 1445
For example, for coefficient Hij in Equation 15, f0 (n) = n, f1 (n) = log(A1 f0 (n)) = log(A1 n) ,
1446 1447 1448
f2 (n) = exp(A2 f1 (n)) = exp(A2 log(A1 n)), and so forth until f5 (n) = A5 f4 (n) = g(n). Third,
1449 1450
the following recursive relationship can be derived for the partial derivatives of the functions
1451 1452
fi (n):
1453
∂f0 (n) = I, ∂nT
1454 1455 1456 1457
and
1458 1459 1460
∂fi (n) ∂fi−1 (n) 0 = D [φ (A f )] A ; for i = 1, . . . q. i i−1 i ∂nT ∂nT
1461 1462
Note, that if φ indicates an exponential, then Equation 27 equals
1463 1464 1465 1466
∂fi−1 (n) ∂fi (n) = D [exp(Ai fi−1 )] Ai ; T ∂n ∂nT
1467 1468 1469
if φ indicates a logarithm, then Equation 27 equals
1470 1471 1472 1473
∂fi (n) ∂fi−1 (n) = D−1 (Ai fi−1 )Ai ; T ∂n ∂nT
1474 1475
and if φ indicates a translation, then Equation 27 equals
1476 1477 1478 1479
∂fi (n) ∂fi−1 (n) = Ai . T ∂n ∂nT
1480 1481
Fourth, the Jacobian can be obtained as
1482 1483 1484 1485
G=
∂fq (n) . ∂nT 27
(27)
1486
For example, to obtain the Jacobian of Hij in Equation 15, Equation 27 is applied recursively
1487 1488 1489
for i = 1, 2, 3, 4, 5.
1490 1491 1492 1493
The recursive procedure in Equation 27 for i = 1, 2, 3 can be avoided by computing f3 and ∂f3 (n∗ ) ∂n∗T
directly from the data. This has the advantage that the first three design matrices need
1494 1495 1496 1497 1498
not be computed separately. In the recursive procedure described above, for i = 3 and for the reduced vector n∗ , Equation 27 equals
1499 1500 1501 1502 1503 1504
∂f3 (n∗ ) ∂f2 (n∗ ) 0 = D [φ (A f )] A 3 2 3 ∂n∗T ∂n∗T = A3 D [exp(A2 log(A∗1 n∗ ))] A2 D−1 [A∗1 n∗ ] A∗1 .
(28)
1505 1506 1507
Let M∗ be a B × L∗ matrix relating the B bivariate frequencies to the L∗ observed response
1508 1509 1510 1511 1512
patterns. Suppose that the b-th row of M∗ pertains to bivariate frequency nxy ij , then element y xy x (b, l) equals exy ij /ni + eij /nj − 1 if in the l-th response pattern the score on item i equals x and
1513 1514
x the score on item j equals y; element (b, l) equals exy ij /ni − 1 if in the l-th response pattern
1515 1516
the score on item i equals x and the score on item j does not equal y; element (b, l) equals
1517 1518 1519
y exy ij /nj − 1 if in the l-th response pattern the score on item i does not equal x and the score on
1520 1521
item j equals y; and element (b, l) equals −1 if in the l-th response pattern the score on item
1522 1523
i does not equal x and the score on item j does not equal y. Let B∗ (of order B × L∗ ) be the
1524 1525 1526
reduced version of matrix B introduced in Equation 16, and let W be the K × B matrix of
1527 1528
Guttman weights (see Appendix A, Equation 21). Tedious yet straightforward algebra shows
1529 1530
that Equation 28 is equal to
1531 1532 1533 1534 1535
cT1 ∂f3 (n ) WB∗ , = ∗T ∂n WM∗ ∗
1536 1537
where cT1 is a copy of the first row of WB∗ . The proof can be obtained from the first au-
1538 1539 1540
thor. For the subsequent steps in the recursive procedure described in this Appendix, f3 = 28
1541
A3 exp(A2 log(A∗1 n)) equals (F12 , F12 , F13 , . . . , FJ−1,J , E12 , E13 , . . . , EJ−1,J ) (cf. Equation 22 in
1542 1543 1544
Appendix A) which can be computed directly from the data. Note that cT1 yields a duplication
1545 1546
of F12 ).
1547 1548 1549
APPENDIX E. Data and R Code of Examples
1550 1551 1552 1553
The data used in the real-data example were collected in the 2008 wave of the European
1554 1555
Values Study (EVS 2011). From these data, two data sets have been made available: Data set
1556 1557
EVS2008.NL contains the scores on the 12 tolerance items pertaining to the largest Mokken scale
1558 1559 1560
for the Dutch sample, and EVS2008.GE contains the scores on the 7 tolerance items pertaining
1561 1562
to the largest Mokken scale for the Georgian sample. In both data sets the items have been
1563 1564
recoded from ten into three categories, and cases with missings have been deleted. The R code
1565 1566 1567
installs the R package mokken, reads the data, and computes the scalability coefficients and
1568 1569
their standard errors. Following R conventions, R> indicates the R prompt.
1570 1571 1572 1573
R> # Install mokken package if necessary. R> if(is.na(packageDescription("mokken")[[1]])) install.packages("mokken") R> library(mokken)
1574 1575 1576 1577 1578
R> # Read data R> EVS2008.NL <- read.table(file="http://spitswww.uvt.nl/~s544594/Data/EVS08NL.txt") R> EVS2008.GE <- read.table(file="http://spitswww.uvt.nl/~s544594/Data/EVS08GE.txt")
1579 1580 1581 1582
R> # Compute scalability coefficients and standard errors. R> coefH(EVS2008.NL) R> coefH(EVS2008.GE)
1583 1584 1585 1586
REFERENCES
1587 1588 1589
Agresti, Alan. 2007. An Introduction to Categorical Data Analysis (2nd ed.). Hoboken, NJ:
1590 1591
John Wiley & Sons.
1592 1593 1594 1595
Bergsma, Wicher P. 1997. Marginal Models for Categorical Data. Tilburg, The Netherlands: 29
1596
Tilburg University Press. Retrieved September 1, 2012, from:
1597 1598 1599
http://stats.lse.ac.uk/bergsma/pdf/bergsma_phdthesis.pdf.
1600 1601 1602
Bergsma, Wicher P., Marcel A. Croon and Jacques A. Hagenaars. 2009. Marginal Models for
1603 1604
Dependent, Clustered, and Longitudinal Categorical Data. New York, Springer.
1605 1606 1607
EVS 2011: European Values Study 2008, 4th wave, Integrated Dataset (EVS 2008). GESIS
1608 1609 1610
Data Archive, Cologne, Germany, ZA4800 Data File Version 3.0.0 (2012-04-10),
1611 1612
doi:10.4232/1.11004.
1613 1614 1615
Gow, Alan J., Roger Watson, Martha Whiteman, and Ian J. Deary. 2011. “A Stairway
1616 1617
to Heaven? Structure of the Religious Involvement Inventory and Spiritual Well-Being
1618 1619 1620
Scale.” Journal of Religion and Health 50:5-19.
1621 1622 1623
Guttman, Louis. 1950. “The Basis for Scalogram Analysis.” Pp. 60-90 in Measurement and
1624 1625
Prediction, Studies in Social Psychology in World War II, Vol. 4, edited by Samuel A.
1626 1627 1628
Stouffer et al. Princeton, NJ: Princeton University Press.
1629 1630 1631
Hendriks Vettehen, Paul G. J., C. P. M. Hagemann, and L. B. Van Snippenburg. 2004.
1632 1633
“Political Knowledge and Media Use in the Netherlands.” European Sociological Review
1634 1635
20:415-424.
1636 1637 1638 1639
Holm, Sture. 1979. “A Simple Sequentially Rejective Multiple Test Procedure.” Scandinavian
1640 1641
Journal of Statistics 6:65-70.
1642 1643 1644
Kritzer, Herbert M. 1977. “Analyzing Measures of Association Derived from Contingency
1645 1646
Tables.” Sociological Methods and Research 5:35-50.
1647 1648 1649 1650
30
1651
Ligtvoet, Rudy, L. Andries van der Ark, Janneke M. te Marvelde, and Klaas Sijtsma. 2010.
1652 1653 1654
“Investigating Invariant Item Ordering for Polytomously Scored Items.” Educational and
1655 1656
Psychological Measurement 70:578-595.
1657 1658 1659
Loner, Enzo. 2008. “The Importance of Having a Different Opinion. Europeans and GM
1660 1661
Foods.” European Journal of Sociology 49:31-63.
1662 1663 1664 1665
Mokken, Robert J. 1971. A Theory and Procedure of Scale Analysis. The Hague/Berlin:
1666 1667
Mouton/De Gruyter.
1668 1669 1670
Molenaar, Ivo W. 1991. “A Weighted Loevinger H-Coefficient Extending Mokken Scaling to
1671 1672
Multicategory Items.” Kwantitatieve Methoden 37:97-117.
1673 1674 1675 1676
Ommundsen, Reidar, Sven M¨orch, Tony Hak, Knud S. Larsen, and Kees Van der Veer. 2002.
1677 1678
“Attitudes Toward Illegal Immigration: A Cross-National Methodological Comparison.”
1679 1680
The Journal of Psychology 136:103-110.
1681 1682 1683
Sijtsma, Klaas and Ivo W. Molenaar. 2002. Introduction to Nonparametric Item Response
1684 1685 1686
Theory. Thousand Oaks, CA: Sage.
1687 1688 1689
Tabachnick, Barbara G. and Linda S. Fidell. 2007. Using Multivariate Statistics (5th ed.).
1690 1691
Boston, MA: Pearson Education.
1692 1693 1694
Van Abswoude, Alexandra A. H., L. Andries van der Ark and Klaas Sijtsma. 2004. “A Com-
1695 1696 1697
parative Study of Test Data Dimensionality Assessment Procedures Under Nonparametric
1698 1699
IRT Models.” Applied Psychological Measurement 28:3-24.
1700 1701 1702
Van der Ark, L. Andries. 2001. “Relationships and Properties of Polytomous Item Response
1703 1704 1705
Theory Models.” Applied Psychological Measurement 25:273-282. 31
1706
——. 2007. “Mokken Scale Analysis in R.” Journal of Statistical Software 20:1-19.
1707 1708 1709
Van der Ark, L. Andries and Wicher P. Bergsma. 2010. “A Note on Stochastic Ordering of
1710 1711 1712
the Latent Trait Using the Sum of Polytomous Item Scores.” Psychometrika 75:272-279.
1713 1714 1715
Van der Ark, L. Andries, Marcel A. Croon, and Klaas Sijtsma. 2008. “Mokken Scale Analysis
1716 1717
for Dichotomous Items Using Marginal Models.” Psychometrika 73:183-208.
1718 1719 1720
Van Onna, Marieke J. H. 2004. “Estimates of the Sampling Distribution of Scalability Coef-
1721 1722 1723
ficient H.” Applied Psychological Measurement 28:427-449.
1724 1725 1726
Webber, Martin P. and Peter J. Huxley. 2007. “Measuring Access to Social Capital: The
1727 1728
Validity and Reliability of the Resource Generator-UK and its Association with Common
1729 1730 1731
Mental Disorder.” Social Science and Medicine 65:481-492.
1732 1733 1734
Weijmar Schultz, Willibrord C. M. and Harry B. M. van der Wiel. 1991. Sexual Functioning
1735 1736
after Gynaecological Cancer Treatment. Groningen, The Netherlands: Dijkhuizen Van
1737 1738
Zanten B. V.
1739 1740 1741 1742
Weisstein, Eric W. 2011. “Euler’s Homogeneous Function Theorem.” From: MathWorld –A
1743 1744
Wolfram Web Resource
1745 1746
http://mathworld.wolfram.com/EulersHomogeneousFunctionTheorem.html.
1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760
32
1761 1762 1763
Table 1: Cross-Tabulation of Scores on Item a and Item b for N=178 Respondents; Guttman Weights Are Between Parentheses.
1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774
Xb Xa 0 1 2 3 nx+ P (Xa ≥ x) ab 0 3 (0) 0 (2) 0 (4) 0 (7) 3 1.000 1 4 (0) 7 (1) 3 (2) 0 (4) 14 0.983 2 10 (0) 22 (0) 34 (0) 3 (1) 69 0.904 3 9 (2) 17 (1) 40 (0) 26 (0) 92 0.517 n+y 26 46 77 29 178 ab P (Xb ≥ y) 1.000 0.854 0.596 0.163
1775 1776
Note: Frequencies of response patterns that are not Guttman errors are printed boldface.
1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815
33
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
34
19: IVF
17: Human Embryos
16: Prostitution
15: Fare Public Transp.
14: Casual Sex
12: Suicide
11: Euthanasia
10: Divorce
9: Abortion
8: Homosexuality
6: Adultery
4: Soft Drugs .325 (.036) .652 (.044) .432 (.033) .518 (.037) .502 (.040) .437 (.034) .585 (.029) .282 (.036) .458 (.032) .297 (.037) .363 (.050)
4
.539 (.060) .460 (.035) .475 (.040) .490 (.047) .426 (.037) .526 (.035) .234 (.036) .428 (.036) .338 (.039) .340 (.055)
6
.662 (.024) .733 (.023) .588 (.027) .596 (.032) .643 (.037) .457 (.062) .522 (.029) .414 (.036) .539 (.028)
8
.750 (.021) .715 (.022) .533 (.028) .557 (.029) .313 (.036) .520 (.027) .514 (.029) .465 (.031)
9
.635 (.024) .524 (.029) .544 (.027) .345 (.040) .639 (.026) .429 (.032) .489 (.030)
10
.531 (.025) .530 (.031) .367 (.046) .587 (.027) .436 (.029) .430 (.029)
Hij 11 14
15
16
17
Hj .436 (.019) .412 (.022) .584 (.018) .554 (.015) .573 (.015) .544 (.016) .448 (.016) .471 .510 (.029) (.016) .273 .398 .303 (.036) (.035) (.024) .455 .609 .336 .492 (.028) (.027) (.036) (.016) .313 .333 .134 .356 .370 (.029) (.032) (.039) (.029) (.019) .338 .453 .264 .439 .408 .430 (.032) (.038) (.056) (.030) (.029) (.020)
12
Table 2: Scalability Coefficients Hij and Hj and their Standard Errors (between Brackets) for 12 Items Measuring Tolerance with Respect to Interpersonal Issues for the Dutch Sample.
1816
1857 1858 1859
Table 3: Scalability Coefficients Hij and Hj and their Standard Errors (between Brackets) for 7 Items Measuring Tolerance with Respect to Interpersonal Issues for the Georgian Sample.
1860
Hij 11
1861 1862 1863 1864
6
10
Hj 14
6: Adultery
1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879
10: Divorce
.399 (.058) 11: Euthanasia .324 (.054) 14: Casual Sex .531 (.055) 17: Human Embryos .253 (.053) 18: Manip. Food .278 (.059) 19: IVF .254 (.057)
.595 (.040) .451 (.058) .419 (.058) .484 (.060) .452 (.039)
.362 (.049) .418 .230 (.052) (.048) .410 .318 (.064) (.061) .364 .275 (.038) (.048)
1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911
35
17
18
.333 (.039) .476 (.031) .416 (.031) .345 (.035) .394 (.038) .556 .436 (.057) (.042) .462 .514 .392 (.044) (.056) (.028)
1912
ACKNOWLEDGEMENTS
1913 1914 1915 1916
We thank Klaas Sijtsma and two anonymous reviewers for the helpful comments on earlier
1917 1918
versions of this manuscript.
1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966
36