Standard Errors and Confidence Intervals for Scalability ...

Viewer
Transcript

0001 0002 0003 0004 0005 0006 0007 0008 0009 0010

Standard Errors and Confidence Intervals for Scalability Coefficients in Mokken Scale Analysis Using Marginal Models

0011 0012 0013 0014

Renske E. Kuijpers, L. Andries van der Ark, and Marcel A. Croon Tilburg University, Tilburg

0015

October 10, 2012

0016 0017 0018 0019 0020 0021 0022 0023 0024 0025 0026 0027 0028 0029 0030 0031 0032 0033 0034 0035

First author’s address:

0036 0037

Renske E. Kuijpers

0038 0039

Department of Methodology and Statistics, FSW

0040 0041 0042

Tilburg University

0043 0044

P.O. Box 90153

0045 0046

5000 LE Tilburg

0047 0048 0049

The Netherlands

0050 0051 0052

phone: +31 13 466 4030

0053 0054 0055

email: [email protected] 1

0056

ABSTRACT

0057 0058 0059 0060

Mokken scale analysis is a popular method for scaling dichotomous and polytomous items.

0061 0062

Whether or not items form a scale is determined by three types of scalability coefficients: for

0063 0064

pairs of items, for items, and for the entire scale. It has become standard practice to interpret

0065 0066 0067

the sample values of these scalability coefficients using Mokken’s guidelines, which have been

0068 0069

available since the 1970s. For valid assessment of the scalability coefficients, the standard

0070 0071

errors of the scalability coefficients must be taken into account. So far, standard errors were

0072 0073 0074

not available for scales consisting of Likert items, the most popular item type in sociology, and

0075 0076

standard errors could only be computed for dichotomous items if the number of items was small.

0077 0078

This study solves these two problems. First, we derived standard errors for Mokken’s scalability

0079 0080 0081

coefficients using a marginal modeling framework. These standard errors can be computed for

0082 0083

all types of items used in Mokken scale analysis. Second, we proved that the method can be

0084 0085

applied to scales consisting of large numbers of items. Third, we applied Mokken scale analysis

0086 0087 0088

to a set of polytomous items measuring tolerance. The analysis showed that ignoring standard

0089 0090

errors of scalability coefficients might result in incorrect inferences.

0091 0092 0093

keywords: Mokken scale analysis, standard errors, scalability coefficients, marginal models.

0094 0095 0096 0097 0098 0099 0100 0101 0102 0103 0104 0105 0106 0107 0108 0109 0110

2

0111

1

INTRODUCTION

0112 0113 0114 0115

In the social sciences, researchers often use surveys or questionnaires for measuring the trait

0116 0117

or attitude of interest, such as religiosity, tolerance or social capital. Typically, respondents

0118 0119

react to a set of indicators of the trait. The indicators are generally referred to as items, and

0120 0121 0122

a set of items pertaining to the same trait is referred to as a scale. The respondents receive a

0123 0124

score on each item. A summary of a respondent’s item scores, most often the sum of the item

0125 0126

scores, produces an estimate of his or her trait level. The sums of the item scores can only

0127 0128 0129

be used meaningfully as estimates of the respondents’ trait levels if the scores on the items in

0130 0131

the scale are unidimensional and have discrimination power to distinguish trait levels. Mokken

0132 0133

scale analysis (Mokken 1971; Sijtsma and Molenaar 2002) is a popular method that can be

0134 0135 0136

used to partition a set of items into one or more unidimensional scales, possibly leaving some

0137 0138

items unscalable. Some recent sociological studies that used Mokken scale analysis to construct

0139 0140

scales investigated topics such as opinions on genetically modified foods (Loner 2008), religious

0141 0142 0143

and spiritual beliefs (Gow et al. 2011), political knowledge and media use (Hendriks Vettehen,

0144 0145

Hagemann, and Van Snippenburg 2004), social capital (Webber and Huxley 2007), and attitudes

0146 0147

toward illegal immigration (Ommundsen et al. 2002).

0148 0149 0150

In Mokken scale analysis, three types of scalability coefficients are used both as criteria for

0151 0152

the item partitioning and as diagnostics for the strength of the scales. The coefficients are Hij ,

0153 0154

a coefficient for the scalability of item pair (i, j); Hj , a coefficient for the scalability of item j;

0155 0156 0157

and H, a coefficient for the scalability of the entire scale. Details of the scalability coefficients

0158 0159

are discussed in Section 2. Mokken (1971:184) advocated that items form a scale if and only if,

0160 0161 0162

ρij > 0 (which is equivalent to Hij ≥ 0) for all i < j, and

(1)

Hj ≥ c for all j,

(2)

0163 0164 0165

3

0166

where ρ is the product-moment correlation and c some positive lower bound specified by the

0167 0168 0169

research. He proposed to choose the lower bound c to be at least equal to .3, in order to

0170 0171

keep nondiscriminating items and weakly discriminating items out of the scale (Sijtsma and

0172 0173

Molenaar 2002). He also advocated that H should be at least .3 and considered a scale to be

0174 0175 0176 0177 0178

weakly scalable if .3 ≤ H < .4, moderately scalable if .4 ≤ H < .5, and strongly scalable if H ≥ .5 (Mokken 1971:185), whereas H < .3 meant that the items are unscalable. For example,

0179 0180

for the 6-item scale Personal Skills (N = 279), Webber and Huxley (2007) found that all Hij s

0181 0182 0183

were positive, the values of Hj ranged between .32 and .45, and H = .37. They concluded

0184 0185

that Personal Skills had “sufficient scale H values to be useful”. We argue that researchers

0186 0187

should take into account the uncertainty of the estimated scalability coefficients when applying

0188 0189 0190

Mokken’s heuristic guidelines. The uncertainty is quantified by the standard errors of the

0191 0192

estimated values. If the standard error of H is small, then Webber and Huxley’s conclusion is

0193 0194

justified, but if the standard error is large (e.g., .08) then there is a reasonable chance that the

0195 0196 0197

population value of H is less than .3, and that the set of items that constitute Personal Skills

0198 0199

is in fact unscalable following Mokken’s guidelines. A similar line of reasoning applies when

0200 0201

Hij and Hj are evaluated.

0202 0203 0204

Although some studies derived standard errors for scalability coefficients, none yielded stan-

0205 0206

dard errors for all scalability coefficients that could also be applied to reasonable or large

0207 0208

numbers of items. Mokken (1971:164-169) derived asymptotic standard errors of H in case

0209 0210 0211

of dichotomous items. Van Onna (2004) used several computer-intensive methods to compute

0212 0213

confidence intervals for H both for dichotomous and polytomous items, and advocated using

0214 0215

the nonparametric bootstrap for computing a range-preserving confidence interval for H. Van

0216 0217 0218

der Ark, Croon and Sijtsma (2008) used marginal modelling as a framework for testing specific

0219 0220

4

0221

hypotheses about scalability coefficients Hij , Hj , and H. Within this framework they also

0222 0223 0224

derived standard errors for Hij , Hj , and H. However, their approach could only be applied to

0225 0226

small sets of dichotomous items. A practical problem is that none of the methods has been

0227 0228

implemented in software, which makes the methods unavailable for applied researchers. As a

0229 0230 0231

result, standard errors of scalability coefficients are never reported in applications of Mokken

0232 0233

scale analysis.

0234

In this paper, we solve all limitations mentioned. We generalize the marginal modelling

0235 0236 0237 0238

approach for computing standard errors of scalability coefficients to polytomous items and to

0239 0240

large numbers of items. Furthermore, the approach is made available in the software package

0241 0242

mokken (Van der Ark 2007). The remainder of this paper is organized as follows. First, we

0243 0244 0245

discuss Mokken scale analysis. Second, we discuss the general principle of obtaining standard

0246 0247

errors of sample statistics using the marginal modelling approach, we give detailed results for

0248 0249

the derivation of standard errors of scalability coefficients for polytomous items, and we discuss

0250 0251 0252

how the method can be applied to large numbers of items. Third, we estimate the scalability

0253 0254

coefficients and their standard errors for two real-data examples. The examples demonstrate

0255 0256

that ignoring the uncertainty of the estimated scalability coefficients may lead to incorrect

0257 0258 0259

inferences. Finally, the strengths and weaknesses of the approach are discussed.

0260 0261 0262

2

MOKKEN SCALE ANALYSIS

0263 0264

2.1

The Monotone Homogeneity Model

0265 0266 0267

Mokken scale analysis is based on the monotone homogeneity model (Mokken 1971, Ch. 4; Sijts-

0268 0269 0270

ma and Molenaar 2002:22-23), which is a nonparametric item response theory (IRT) model for

0271 0272

measuring respondents on an ordinal scale. We consider a set of J items numbered 1, 2, . . . , J,

0273 0274

each having z + 1 ordered answer categories x = 0, 1, . . . , z. Let Xj denote the score on item j

0275

5

0276

and let X+ =

P

j

Xj denote the sum of the J item scores. Let θ denote a possibly multidimen-

0277 0278 0279

sional latent variable (usually referred to as latent trait); often θ values are interpreted in terms

0280 0281

of the construct that the items measure in common. IRT models describe the relation between

0282 0283

latent trait θ and the probabilities of item scores x, P (Xj = x|θ). The monotone homogeneity

0284 0285 0286

model consists of three assumptions:

0287 0288

Unidimensionality : The latent variable θ is unidimensional;

0289 0290 0291

Local independence : The item scores are independent given θ; that is, P (X1 = x1 , X2 =

0292 0293 0294

x2 , . . . , XJ = xJ |θ) =

J Q

P (Xj = xj |θ).

j=1

0295 0296 0297

Monotonicity : The probability of having a score of at least x on item j, P (Xj ≥ x|θ), is a

0298 0299

nondecreasing function of θ.

0300 0301 0302

The monotone homogeneity model is a general model in the sense that all other popular uni-

0303 0304

dimensional IRT models are a special case of the monotone homogeneity model (Van der Ark

0305 0306

2001). For practical purposes, the model allows the stochastic ordering of θ by means of X+

0307 0308 0309

(for details, see Van der Ark and Bergsma 2010, and references therein). Hence, only if the

0310 0311

monotone homogeneity model fits the data well, the total scale score can be used meaningfully

0312 0313

to order respondents.

0314 0315 0316

Mokken scale analysis can be regarded as a set of methods to construct scales for which

0317 0318

the monotone homogeneity model and other nonparametric IRT models fit well. The general

0319 0320

idea is that one investigates observable properties implied by the model. For example, under

0321 0322 0323

the monotone homogeneity model all scalability coefficients Hij must be nonnegative. Hence,

0324 0325

if a researcher finds that for a particular scale the sample values of Hij are all nonnegative,

0326 0327

then this result supports the possibility that the monotone homogeneity model is true, whereas

0328 0329 0330

negative Hij values mean that the model must be rejected. 6

0331

2.2

Scalability Coefficients

0332 0333

2.2.1

Item Steps and Weighted Guttman Errors

0334 0335 0336

Scalability coefficients Hij , Hj , and H are based on item steps and Guttman errors (Molenaar

0337 0338

1991), which are best explained by means of an example. Table 1 (see Weijmar Schultz and

0339 0340 0341

Van der Wiel 1991) shows a cross-classification of the scores of N = 178 respondents on J = 2

0342 0343

items (Item a and Item b), each having z + 1 = 4 ordered answer categories. The frequencies

0344 0345 0346 0347 0348

+y x+ are denoted nxy ab x, y = 0, . . . , 3, and the marginal frequencies are denoted nab and nab , where

the “+” indicates the sum over all categories.

0349 0350 0351

Insert Table 1 about here

0352 0353 0354 0355 0356 0357

Item steps are boolean statements Xj ≥ x (j = 1, . . . , J; x = 0, . . . , z), indicating whether a respondent has passed the item step (Xj ≥ x) or not (Xj < x). The popularity of an item

0358 0359

step is determined by means of the proportion of respondents that has passed the item step,

0360 0361 0362

P (Xj ≥ x). It may be noted that P (Xj ≥ 0) = 1 by definition, and this probability thus is not

0363 0364

informative. The ordering of the 2z item steps in Table 1 by descending popularity equals

0365 0366 0367

Xa ≥ 1, Xa ≥ 2, Xb ≥ 1, Xb ≥ 2, Xa ≥ 3, Xb ≥ 3.

(3)

0368 0369 0370 0371 0372 0373

Respondents who did not pass any item step have item-score pattern (0, 0); respondents who have passed one item step, most likely have passed the most popular item step Xa ≥ 1,

0374 0375

producing item-score pattern (1, 0); respondents who have passed two item steps, most likely

0376 0377 0378

have passed Xa ≥ 1 and Xa ≥ 2, producing item-score pattern (2, 0), and so on. The admissable

0379 0380

item-score patterns are (0,0), (1,0), (2,0), (2,1), (2,2), (3,2), and (3,3) (frequencies printed in

0381 0382

bold face in Table 1) that are consistent with the order of the item steps. Each respondent that

0383 0384 0385

passes the h most popular item steps and does not take the remaining 2z − h less popular item 7

0386

steps has an item-score pattern that is in agreement with the Guttman (1950) model (Molenaar

0387 0388 0389 0390 0391

1991). Such admissable patterns are called conformal patterns. Respondents having item-score pattern (0,3) passed the least popular item step Xb ≥ 3 but did not pass the more popular item

0392 0393

steps Xa ≥ 1, Xa ≥ 2, and Xa ≥ 3. Patterns for which at least one less popular item step has

0394 0395 0396

been passed and one more popular has not been passed are called Guttman errors (Molenaar

0397 0398

1991). A set of items is perfectly scalable if there are no Guttman errors, and is less scalable

0399 0400

as the number of Guttman errors increases.

0401 0402 0403

Molenaar (1991) suggested weighting the frequencies of the Guttman errors depending on

0404 0405

the degree of deviation from item-score patterns yielding a perfect scale. The weight for the

0406 0407

frequency of a particular item-score pattern is computed as follows. We consider all pairs of

0408 0409 0410

item steps and we compute the weight equal to the number of pairs of item steps for which

0411 0412

the less popular item step was passed and the more popular step was failed. For example, for

0413 0414

02 item-score pattern (0,2) in Table 1, the Guttman weight equals wab = 4 because for four pairs

0415 0416 0417

of item steps (Xa ≥ 1, Xb ≥ 1), (Xa ≥ 1, Xb ≥ 2), (Xa ≥ 2, Xb ≥ 1), and (Xa ≥ 2, Xb ≥ 2)

0418 0419

the less popular item step was passed and the more popular step was failed (e.g., for pair

0420 0421

(Xa ≥ 1, Xb ≥ 1), the less popular item step Xb ≥ 1 was passed, but the more popular item

0422 0423 0424

step Xa ≥ 1 was failed). The weights are shown between parentheses in each cell of Table 1.

0425 0426

Note that the boldface conformal item-score patterns have a weight equal to zero.

0427 0428

For computational purposes, we give a formula for computing the weights (also see Ligtvoet

0429 0430 0431 0432 0433

et al. 2010). Let the 2z item steps be ordered by descending popularity (cf. Equation 3), and let xy xy xy qxy ij = (qij(1) , qij(2) , . . . , qij(2z) ) be a vector consisting of zeroes and ones indicating for item-score

0434 0435

pattern (Xi = x, Xj = y) whether an item step has been passed (1) or not (0). Then weight

0436 0437 0438 0439 0440

8

0441

xy wij equals

0442

xy wij =

0443 0444

0447 0448 0449

xy qij(u)

u−1 X

u=2

0445 0446

2z X

! xy |1 − qij(v) | .

(4)

v=1

Equation 4 counts how often a score 0 precedes a score 1 in vector qxy ij . It may be noted that for example for response pattern (0,2) in Table 1, the third and fourth item step in Equation 3

0450 0451

02 are passed, and so q02 ab = (0, 0, 1, 1, 0, 0). In qab , the score 0 precedes the score 1 four times, and

0452 0453 0454 0455 0456

02 so the weight wab equals 4. As a second example, consider the item-score pattern (2,1). Here,

the first, second, and third item steps are passed, and thus q21 ab = (1, 1, 1, 0, 0, 0). Here, there

0457 0458

21 are no occasions on which a score 0 precedes a score 1, and thus the weight wab is equal to 0.

0459 0460 0461

2.2.2

Item Pair Scalability Coefficients

0462 0463 0464

Item pair scalability coefficient Hij compares the sum of weighted observed frequencies of

0465 0466

Guttman errors to the sum of weighted frequencies of Guttman errors that is expected under

0467 0468

marginal independence of the item scores. Let

0469 0470 0471

exy ij =

0472

+y nx+ ij × nij N

(5)

0473 0474 0475

be the expected bivariate frequency under marginal independence; let Fij and Eij be the sum

0476 0477

of weighted observed and expected frequencies of Guttman errors, respectively, for item pair

0478 0479 0480 0481 0482 0483

(i, j). Then P P xy xy Fij x y wij nij Hij = 1 − = 1 − P P xy xy . Eij x y wij eij

(6)

0484 0485

If there are no Guttman errors, then Hij = 1; if there are as many Guttman errors as there

0486 0487

are under marginal independence, then Hij = 0. Under the monotone homogeneity model,

0488 0489 0490 0491 0492

Hij ≥ 0. Molenaar (1991) showed that Hij can be written as a normed covariance. Let σij be the covariance between item i and item j and let σijmax be the maximum covariance between

0493 0494

item i and item j, given the marginal distributions of both items. Given that the items both

0495

9

0496

have a positive variance, Hij = σij /σijmax . For a set of J items, let K = 12 J(J − 1) denote the

0497 0498 0499

number of item pairs; hence, we have K different coefficients Hij .

0500 0501

2.2.3

The Item Scalability Coefficient

0502 0503 0504

Item scalability coefficient Hj is a generalization of Hij ; it compares the sum of weighted

0505 0506

observed and weighted expected frequencies of Guttman errors for an individual item:

0507 0508

P P P P xy xy i6=j Fij i6=j x y wij nij = 1 − P P P xy xy . Hj = 1 − P i6=j Eij i6=j x y wij eij

0509 0510 0511

(7)

0512 0513

Under the monotone homogeneity model, 0 ≤ Hj ≤ 1. Let R(j) = X+ −Xj denote the rest score.

0514 0515 0516 0517 0518

Sijtsma and Molenaar (2002:57) showed that Hj is equal to the normed covariance between Xj max and R(j) ; that is, Hj = σjR(j) /σjR . Hence, Hj expresses the strength of the association (j)

0519 0520

between item j and the other items in the scale, and can be viewed as the nonparametric

0521 0522

analogue of the discrimination parameter in parametric IRT (e.g., Van Abswoude, Van der

0523 0524 0525

Ark, and Sijtsma 2004). To keep nondiscriminating items and weakly discriminating items out

0526 0527

of the scale, Mokken (1971:184) proposed that all Hj s should be greater than some lower bound

0528 0529

c > 0. It may be noted that c > 0 is not an observable property of the monotone homogeneity

0530 0531 0532

model.

0533 0534

2.2.4

The Total-Scale Scalability Coefficient

0535 0536 0537

Coefficient H is a generalization of Hij and Hj ; it compares the sum of weighted observed and

0538 0539 0540

weighted expected frequencies of Guttman errors for all J items in the entire scale:

0541 0542 0543 0544

PP P P P P xy xy i6=j Fij i6=j x y wij nij H = 1 − PP = 1 − P P P P xy xy . i6=j Eij i6=j x y wij eij

(8)

0545 0546

H expresses the scalability of all items in the scale. Under the monotone homogeneity model,

0547 0548 0549

0 ≤ H ≤ 1. Moreover, Mokken (1971:148-153; also, see Sijtsma and Molenaar 2002, Theorem

0550

10

0551

4.2) showed that under the monotone homogeneity model, the scalability coefficients are related

0552 0553 0554

in such a way that

0555 0556

min(Hij ) ≤ min(Hj ) ≤ H ≤ max(Hj ) ≤ max(Hij ).

0557

i,j

j

j

i,j

0558 0559 0560

2.3

Methods in Mokken Scale Analysis

0561 0562 0563

Mokken scale analysis contains an automated item selection procedure that partitions the set

0564 0565

of items into one or more unidimensional scales. A scale is considered a Mokken scale if it

0566 0567 0568

satisfies the two criteria as stated in Equations 1 and 2. Moreover, Mokken scale analysis

0569 0570

provides several methods for the additional investigation of the assumptions of the monotone

0571 0572

homogeneity model and other nonparametric IRT models. A description of these methods is

0573 0574 0575

beyond the scope of this paper, and we refer the interested reader to, for example, Mokken

0576 0577

(1971) and Sijtsma and Molenaar (2002).

0578 0579 0580 0581 0582

3

STANDARD ERRORS OF SCALABILITY COEFFICIENTS

0583 0584 0585

In marginal modelling of categorical data (e.g., see Bergsma, Croon, and Hagenaars, 2009, and

0586 0587 0588

references therein), a two-step method is used to compute standard errors of sample statistics.

0589 0590

We describe this method for the scalability coefficients. The first step is to write the scalability

0591 0592

coefficients as a function of the frequencies of the observed item-score patterns in the data. A

0593 0594 0595

set of J items each with z + 1 ordered answer categories (0, 1, . . . , z) produces L = (z + 1)J

0596 0597

possible item-score patterns. Without loss of generality, we assume that item-score patterns

0598 0599

are in lexicographic order: going from 00 . . . 0 to zz . . . z with the last digit changing fastest,

0600 0601 0602

and the digit in the first column changing slowest. The observed frequencies of the L possible

0603 0604

item-score patterns can be collected in a vector n. For example, a set of J = 3 items (denoted

0605

11

0606

by a, b, and c) each with (z + 1) = 3 answer categories has L = 33 = 27 possible item-score

0607 0608 0609

patterns; hence vector n equals 

0610 0611 0612 0613 0614 0615 0616 0617 0618 0619 0620 0621 0622 0623 0624

n000 abc n001 abc n002 abc n010 abc n011 abc .. .

       n=    220  nabc  221  nabc n222 abc

        .      

(9)

Vector n in Equation 9 is used throughout to illustrate the approach. Let vector Hij = (H12 , H13 , . . . , HJ−1,J )T (the superscript T denotes the transpose) contain all K scalability

0625 0626

coefficients Hij , and let vector Hj = (H1 , H2 , . . . , HJ )T contain all J scalability coefficients Hj .

0627 0628 0629

Also, let g and g† be vector-valued functions, and let g ‡ be a scalar function. We show that

0630 0631

the scalability coefficients can be written as a function of n; that is

0632 0633 0634

Hij = g(n)

(10)

Hj = g† (n)

(11)

H = g ‡ (n)

(12)

0635 0636 0637 0638 0639 0640 0641 0642 0643

The second step is to use the delta method to obtain the asymptotic standard errors for the

0644 0645

scalability coefficients. Let Vn and Vg(n) be the asymptotic variance-covariance matrix of n

0646 0647

and g(n), respectively; let N be the total sample size; and let D(x) be a diagonal matrix with

0648 0649 0650

the elements of vector x on the diagonal.

0651 0652

If n is sampled from a multinomial distribution, then

0653 0654 0655

Vn = D(n) − nN −1 nT

0656 0657 0658 0659

(e.g., Agresti 2007:6). Now if, G = G(n) is the Jacobian, which is the matrix of first partial

0660

12

0661

derivatives of g(n) to n, then according to the delta method

0662 0663

Vg(n) = GVn GT

0664 0665 0666

= G D(n) − nN −1 nT GT

0667 0668

= GD(n)GT − GnN −1 nT GT .

0669 0670

(13)

0671 0672 0673

In most applications of marginal models, the functions g() are homogeneous of order 0: that

0674 0675

is, the value of g() does not change when the values of its arguments are all multiplied by the

0676 0677

same constant t:

0678 0679

g(tn) = g(n).

0680 0681 0682

For such functions it does not matter whether n represents the observed frequencies or the

0683 0684 0685

observed probabilities. Functions g(n) (Equation 10), g† (n) (Equation 11), and g ‡ (n) (Equa-

0686 0687

tion 12) are also homogeneous functions. Euler’s homogeneous function theorem (e.g., Weisstein

0688 0689

2011) now implies that Gn = 0. As a result, Equation 13 reduces to

0690 0691 0692

Vg(n) = GD(n)GT .

0693

(14)

0694 0695 0696

Taking the square root of the diagonal of Vg(n) produces the required standard errors.

0697 0698

We demonstrate how to obtain g(·) (Equation 10), g† (·) (Equation 11), and g ‡ (·) (Equa-

0699 0700

tion 12). The notation used in these derivations is called the generalized exp-log notation

0701 0702 0703

(Bergsma 1997; Kritzer 1977). Moreover, we also show how to obtain the matrix of first partial

0704 0705

derivatives for these functions.

0706 0707 0708 0709

3.1

Generalized Exp-Log Notations for the Three Scalability Coefficients

0710 0711 0712

Let A1 , A2 , A3 , A4 , and A5 , be design matrices to be explained below. Matrix A1 is explained

0713 0714 0715

in detail to give the reader more insight into the generalized exp-log notation. The construction 13

0716

of the other design matrices is relegated to Appendix A. The generalized exp-log notation for

0717 0718 0719

Hij (Equation 10) is

0720 0721 0722

Hij = g(n) = A5 exp(A4 log(A3 exp(A2 log(A1 n)))).

(15)

0723 0724 0725

The notation exp(X) and log(X) denote the exponential and logarithmic functions, evaluated

0726 0727 0728 0729 0730

element-wise to the elements of X. Let nij be the vector containing the (z + 1)2 bivariate frequencies of item pair (i, j). For

0731 0732

K item pairs, the total number of bivariate frequencies equals B = K(z + 1)2 . Let nj be the

0733 0734 0735

vector containing the (z + 1) univariate frequencies of item (j). For J items the total number

0736 0737 0738 0739 0740 0741 0742 0743 0744 0745 0746 0747 0748 0749

of univariate frequencies equals U = J(z + 1). For example, for Equation 9  00+  nabc  n01+   abc   n02+   abc   0++   n10+  nabc  abc  11+    . n1++ nab =  nabc  and na = abc 12+ 2++  n  nabc  abc   n20+   abc   n21+  abc n22+ abc

0750 0751 0752 0753 0754 0755 0756 0757 0758 0759 0760

The (B + U + 1) × L design matrix A1 consists of three submatrices:   B A1 =  U  . 1TL

(16)

The B × L submatrix B is necessary to obtain the B observed bivariate frequencies. The first (z + 1)2 rows correspond to the first item pair (item 1, item 2); the next (z + 1)2 rows

0761 0762

correspond to the second item pair (item 1, item 3), and so on; the L columns correspond to

0763 0764 0765

the L item-score patterns. Element (b, l) equals 1 if the l-th item-score pattern contributes to

0766 0767

the b-th bivariate frequency, and element (b, l) equals 0 otherwise. For example, for the vector

0768 0769

of observed frequencies in Equation 9, the first row of B, which pertains to observed bivariate

0770

14

0771

000 001 002 frequency n00+ abc = nabc + nabc + nabc , equals

0772 0773 0774 0775

(1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0).

0776 0777 0778

The U × L submatrix U is necessary to obtain the U observed univariate frequencies. The

0779 0780

first (z + 1) rows correspond to item 1; the next (z + 1) rows correspond to item 2 and so

0781 0782 0783

on. Element (u, l) equals 1 if the l-th item-score pattern contributes to the u-th observed

0784 0785

univariate frequency, and element (u, l) equals 0 otherwise. For example, for the vector of

0786 0787

observed frequencies in Equation 9, the first row of U, which pertains to observed univariate

0788 0789 0790

frequency n0++ abc , equals

0791 0792 0793

(1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0).

0794 0795 0796 0797 0798 0799 0800 0801 0802 0803 0804 0805 0806 0807 0808

Vector 1TL is the 1 × L unit vector. For the vector of observed frequencies in Equation 9   nab  nac       nbc  B   na  A1 · n =  U  · n =  (17)  . T   1L  nb   nc  N Design matrices A2 , A3 , A4 , and A5 are constructed in a similar way (see Appendix A).

0809 0810

The generalized exp-log notation for Hj (Equation 11) is

0811 0812 0813 0814

Hj = g† (n) = A†5 exp(A†4 log(A†3 exp(A2 log(A1 n)))).

(18)

0815 0816 0817 0818 0819

Note that A1 and A2 in Equation 18 are equal to those in Equation 15 and 18. Design matrices A†3 , A†4 , and A†5 are derived in Appendix B.

0820 0821

The generalized exp-log notation for H (Equation 12) is

0822 0823 0824 0825

H = g ‡ (n) = A‡5 exp(A‡4 log(A‡3 exp(A2 log(A1 n)))). 15

(19)

0826

Note that A1 and A2 in Equation 19 are equal to those in Equation 15. Design matrices A‡3 ,

0827 0828 0829

A‡4 , and A‡5 are derived in Appendix C. Once the design matrices have been constructed, the

0830 0831

matrix of partial derivatives G can be derived (Appendix D). Implementing G into Equation 14

0832 0833

produces the required standard errors.

0834 0835 0836

3.2

Standard Errors for Scales Consisting of Large Numbers of Items

0837 0838 0839

A practical problem is that the proposed method for deriving standard errors for scalability

0840 0841

coefficients cannot be applied to large numbers of items (cf. Van der Ark et al. 2008). Even

0842 0843 0844

for relatively small scales, L can be so large that vector n and the (B + U + 1) × L matrix A1

0845 0846

(Equation 16) cannot be stored in computer memory. For large numbers of items, B may also

0847 0848

be too large to store A2 and A3 . For example, for J = 10 Likert items with z + 1 = 5 ordered

0849 0850 0851

answer categories, L = 510 = 9, 765, 625 and B =

10 2

2 5 = 1125. Two modifications in the

0852 0853

generalized exp-log notation reduce the computational burden considerably, so that standard

0854 0855

errors of scalability coefficients can be computed for up to approximately 100 items and up to

0856 0857 0858

approximately 100,000 respondents. However, for larger data sets, computation may be slow.

0859 0860

The largest contribution to reducing the computational burden is using only the nonzero

0861 0862

frequencies in n, which pertain to item-score patterns that are observed in the data, and collect

0863 0864 0865

them in vector n∗ . So, all elements of n∗ are positive and the size of n∗ , denoted L∗ , cannot

0866 0867

exceed the sample size N . Let a matrix superscripted with an asterisk indicate a reduced matrix,

0868 0869

which means that the rows and/or columns pertaining to zero-frequencies have been deleted.

0870 0871 0872

Thus, when only the nonzero observed frequencies are used, expression A1 .n in Equations 15,

0873 0874

T

18, and 19 is replaced by A∗1 .n∗ , and expression GD(n)GT is replaced by G∗ D(n∗ )G∗ . Other

0875 0876

matrices used in this paper remain unchanged. Because typically L∗ is much smaller than L,

0877 0878 0879

the reduced vectors and matrices are small enough to be stored in computer memory. We show

0880

16

0881

that using reduced matrices does not affect the computation of the scalability coefficients and

0882 0883 0884

their standard errors.

0885

First, we show that A1 .n = A∗1 .n∗ , which means that Equations 15, 18, and 19 are unaffected

0886 0887 0888

by using reduced matrices.

0889 0890 0891

Proof. Let

PL

l=1

Ai,l nl be the i-th element in vector A1 .n. If nl = 0 then Ai,l nl has no

0892 0893

contribution to the i-th element in A1 .n, and the l-th column of A1 and the l-th element of n

0894 0895

can be removed without consequences. 2

0896

Second, we show that GD(n)GT = G∗ D(n∗ )G∗T , which means that the computation of

0897 0898 0899 0900

the standard errors in Equation 14 is unaffected by using reduced matrices.

0901 0902

Proof. Let Gl denote the l-th column of G. Hence, GD(n)GT =

PL

l=1

Gl GTl nl . If nl = 0 then

0903 0904 0905 0906 0907

Gl GTl nl = 0; and neither the l-th column of G nor the l-th element of n have any contribution to GD(n)GT and can be removed without consequences. 2

0908

In general, direct computation of the design matrices A∗1 , A2 , and A3 is unnecessary and

0909 0910 0911 0912

can be avoided, which is convenient when the number of observed bivariate frequencies B is

0913 0914

large. The procedure is described in Appendix D.

0915 0916 0917 0918 0919

4

MOKKEN SCALE ANALYSIS OF DATA MEASURING TOLERANCE

0920 0921 0922

The use of marginal modelling for the derivation of standard errors and the accompanying

0923 0924 0925

confidence intervals is illustrated by means of data from the 2008 European Values Study (EVS

0926 0927

2011). This large-scale cross-national survey provides insight into the basic values, preferences,

0928 0929

attitudes and opinions that people all over Europe have about for instance life, work, family,

0930 0931 0932

sexual behavior, gender roles, politics, religion, well-being, and tolerance. We analyze data

0933 0934

pertaining to the tolerance scale. The tolerance scale consists of 20 items, where one part

0935

17

0936

of the items measures tolerance with respect to material issues, and the other part measures

0937 0938 0939

tolerance with respect to interpersonal issues. Each item pertains to a particular controversial

0940 0941

behavior, and the respondents had to indicate the degree to which they consider the behavior

0942 0943

to be justified. Examples are ”Do you justify adultery?”, ”Do you justify euthanasia?”, and

0944 0945 0946

”Do you justify prostitution?”. In the original data set, the answer categories ranged from 1

0947 0948

(never) to 10 (always). The more extreme response categories were almost never chosen by

0949 0950

respondents, and so the corresponding cell frequencies were close to or equal to zero. For this

0951 0952 0953

article, the answer categories were recoded into three categories, with the scores 1 to 3 being

0954 0955

recoded into 1, the scores 4 to 7 into 2, and 8 to 10 into 3.

0956 0957

Mokken scale analyses were performed on the data obtained in The Netherlands (N =

0958 0959 0960

1, 554), presumably a rather liberal country with respect to tolerance, and the former Soviet

0961 0962

republic Georgia (N = 1, 500), presumably a rather conservative country (for the computer

0963 0964

syntax, see Appendix E). These two countries were chosen to show that in some cases standard

0965 0966 0967

errors do affect the conclusions, and in other cases they do not. Since no or almost no cases

0968 0969

were in the third category, for the Georgian sample, three items (i.e, items 3, 4, and 16) were

0970 0971

deleted from the tolerance scale. Note that for the analyses we used the same items for both

0972 0973 0974

samples. However, the scales discussed hereunder are not identical.

0975 0976

For the Dutch sample, the automated item selection procedure (see Section 2.3) produced

0977 0978

three scales, only the first scale will be considered here. The first scale consisted of 12 items,

0979 0980 0981

and measured tolerance with respect to interpersonal issues. The items included in the scale

0982 0983

were: ”Do you justify . . . taking soft drugs (item 4); adultery (item 6); homosexuality (item 8);

0984 0985

abortion (item 9); divorce (item 10); euthanasia (item 11); suicide (item 12); having casual sex

0986 0987 0988

(item 14); avoiding a fare on public transport (item 15); prostitution (item 16); experiments

0989 0990

18

0991

on human embryos (item 17); and invitro fertilization (item 19)”.

0992 0993 0994

Table 2 shows the sample values of Hij and Hj plus their asymptotic standard errors for

0995 0996

the first scale of the Dutch sample. To assess whether the item pair scalability coefficients were

0997 0998

b ij ±1, 96∗se(H b ij ). significantly greater than zero, 95% confidence intervals were obtained using H

0999 1000 1001

For none of the 66 sample Hij s the value zero was included in the confidence interval, so all

1002 1003

b ij s were significantly greater than zero. Similarly, 95% confidence intervals were created for H

1004 1005

b 15 = .303; s.e. = .024) the confidence interval included the criterion Hj . Only for item 15 (H

1006 1007 1008 1009 1010

value c = .3, so we do not have sufficient evidence that item 15 satisfies the second property of a Mokken scale (i.e., Hj ≥ c for all j) and thus it may be considered for removal from the

1011 1012

b = .479; scale. Following Mokken’s guidelines, the items form a scale of moderate strength (H

1013 1014 1015

s.e. = .012).

1016 1017 1018

Insert Table 2 about here

1019 1020 1021

For the Georgian sample, the automated item selection procedure produced three scales.

1022 1023 1024

Only the longest scale, which is the most similar to the Dutch scale, will be considered here.

1025 1026

The scale consisted of eight items, measuring tolerance with respect to interpersonal issues.

1027 1028

The items included in this scale were: ”Do you justify . . . adultery (item 6); divorce (item

1029 1030 1031

10); euthanasia (item 11); having casual sex (item 14); prostitution (item 16); experiments on

1032 1033

human embryos (item 17); manipulation of food (item 18); and invitro fertilization (item 19)”.

1034 1035 1036 1037 1038

b ij values. However, item 16 (prostitution) had an H b j value which All item pairs had positive H b 16 = .269; s.e. = .066) and was lower than the generally accepted lower bound value .3 (i.e., H

1039 1040

was thus removed from the scale. The fact that an item with an Hj value lower than lower

1041 1042

bound c was selected into the scale is an artefact of the method. However, at the moment that

1043 1044 1045

the item was selected into the scale, its Hj value with respect to the items already selected at 19

1046

that point was in excess of c. Once an item has been selected, it cannot be deselected anymore

1047 1048 1049

(Sijtsma and Molenaar 2002:79-80).

1050 1051

Insert Table 3 about here

1052 1053 1054

A second Mokken scale analysis was performed on the remaining seven items, and Table 3

1055 1056 1057 1058

shows the sample values of Hij and Hj , and their asymptotic standard errors. To assess

1059 1060

whether the item pair scalability coefficients were greater than zero, 95% confidence intervals

1061 1062 1063 1064 1065

b ij s zero was included were obtained in a similar way to the Dutch sample. For none of the 21 H b ij s were significantly greater than zero. Also, 95% confidence in the confidence interval, so all H

1066 1067

b 6 = .333; s.e. = .039) and 14 (H b 14 = .345; s.e. = intervals were created for Hj . For items 6 (H

1068 1069

.035) the confidence intervals included the criterion value c = .3. So we do not have sufficient

1070 1071 1072

evidence that both items satisfy the second property of a Mokken scale, and thus they may be

1073 1074

considered for removal from the scale. The sample value for coefficient H was equal to .402

1075 1076

with a standard error of .028. Although the sample value of H suggests that the items are

1077 1078 1079

moderately scalable according to Mokken’s guidelines, using the standard errors suggests that

1080 1081

we can only claim that the items are weakly scalable.

1082 1083 1084

5

DISCUSSION

1085 1086 1087 1088

For many sample statistics, for example, correlation coefficients, sample means, and regression

1089 1090

parameters, standard errors are vital for the interpretation of the size of the effect of the

1091 1092

estimated value. This is also true for scalability coefficients, but until recently their standard

1093 1094 1095

errors could not be computed. This paper showed how to derive these standard errors. Although

1096 1097

the derivation may be technically difficult, in practice the computation of the standard errors

1098 1099

is accomplished by means of the R package mokken (Van der Ark 2007), which is free of charge.

1100

20

1101

In general, it is well-known that standard errors decrease as the sample size N increases (e.g.,

1102 1103 1104

Tabachnick and Fidell 2007). However, the standard errors of the scalability coefficients are

1105 1106

not only functions of the sample size, but also of the skewness of the item-score distributions.

1107 1108

The more skewed the item-score distributions are, the larger the size of the standard errors

1109 1110 1111

(Agresti 2007:110); this is due to estimates of certain coefficients becoming less accurate as

1112 1113

the estimated item step proportions get closer to 0 or 1. So, even with a large sample size

1114 1115

standard errors can be large. This makes it even more important to consider standard errors

1116 1117 1118

when interpreting scalability coefficients.

1119 1120

In our data analysis, we argued that sample values of the scalability coefficients should be

1121 1122

significantly greater than the desired criterion, and we investigated each scalability coefficient

1123 1124 1125

separately without correction for multiple testing. These two decisions may be open for debate.

1126 1127

In statistical hypothesis testing, the null hypothesis usually states the opposite of what one

1128 1129

wants to prove (note that this is not the case in, e.g., model selection tests in structural

1130 1131 1132 1133 1134

equation modelling). We wish to test whether the item scalability coefficients are greater than .3, and so the null hypothesis is Hj ≤ .3. If the burden of proof is reversed, researchers may

1135 1136

be tempted to use very small samples (yielding very large confidence intervals) so that even for

1137 1138 1139

low values of Hj and H, the guidelines are met.

1140 1141

When the number of items is large, there will also be a large number of item pair and

1142 1143

item scalability coefficients. If for all these Hij s and Hj s confidence intervals are constructed

1144 1145 1146 1147 1148

simultaneously, the chance of incorrectly rejecting the true null hypothesis (i.e., Hij ≤ 0; and Hj ≤ c) is much larger. The probability of obtaining a Type I error will be much larger, than

1149 1150

when testing one hypothesis at the time. A correction for this multiple hypothesis testing

1151 1152 1153

might be used, for example, the Holm-Bonferroni correction (Holm 1979), which is suited for

1154 1155

21

1156

correlated tests. This results in larger confidence intervals (i.e., 99% or 99.9%), but it may be

1157 1158 1159

noted that larger confidence intervals result in a smaller power.

1160 1161

An issue that remains to be solved is that the order of the 2z item steps (Equation 3)

1162 1163

is obtained from the data. In most cases, it is assumed that the ordering of the item steps

1164 1165 1166

in the data is identical to the ordering of the item steps in the population. However, when

1167 1168

the popularity of two item steps are almost equal in the population, the ordering may be

1169 1170

reversed in the sample. This affects the Guttman weights in matrix A3 , because the number

1171 1172 1173

of Guttman errors for each item-score pattern depends on the ordering of the item steps. As

1174 1175

a result, the reversal may affect the estimates of the scalability coefficients and their standard

1176 1177

errors. Investigating the effect of differences in the ordering of item steps between sample and

1178 1179 1180

population on the estimates of the scalability coefficients and their standard errors is a topic

1181 1182

for future research.

1183 1184

Another topic for future research is to investigate how standard errors affect the automated

1185 1186 1187 1188 1189

item selection procedure in Mokken scale analysis. Now items are selected into a scale if all sample values of Hj ≥ c but as our example showed, this may may be too liberal as not all

1190 1191

sample values of Hj are significantly greater than c.

1192 1193 1194 1195 1196

APPENDIX A. Derivation of Design Matrices for Item Pair Scalability Coefficients

1197 1198 1199 1200

The 2B × (B + U + 1) design matrix A2 in Equation 15 is used for constructing the expected

1201 1202

bivariate frequencies (Equation 5). A2 consists of several submatrices:

1203 1204 1205 1206

A2 =

IB 0 0 0 P −1B

1207 1208 1209 1210

22

.

1211

Matrix IB is an identity matrix of order B; multiplying with IB leaves the observed bivariate fre-

1212 1213 1214 1215 1216

quencies unchanged. The B × U submatrix P is necessary to obtain the B products of observed univariate frequencies (numerator on the right-hand side of Equation 5). The first (z + 1)2 rows

1217 1218

correspond to the first item pair (item 1, item 2); the next (z+1)2 rows correspond to the second

1219 1220 1221

item pair (item 1, item 3), and so on; the U columns correspond to the U observed univariate

1222 1223

frequencies. Element (p, u) equals 1 if the u-th observed univariate frequency contributes to the

1224 1225

p-th product of observed univariate frequencies, and element (p, u) equals 0 otherwise. Vector

1226 1227 1228

−1B is used for dividing the product of observed univariate frequencies (obtained using matrix

1229 1230

P) by N ; this results in the expected bivariate frequencies under independence (Equation 5).

1231 1232

01 zz T 2 Let eij = (e00 ij , eij , . . . , eij ) contain the (z + 1) expected bivariate frequencies pertaining to

1233 1234 1235

item i and item j. Substituting A1 .n by the right-hand side of Equation 17, we find that for

1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265

the vector of observed frequencies in Equation 9 exp(A2 log(A1 n)) equals      nab n ab   nac         nbc   nac     IB 0    0  na  =  nbc  exp  · log   0 P −1B      nb   eab      eac    nc  ebc N

(20)

The (2K +1)×2B design matrix A3 is used to compute the weighted observed and expected frequencies; it has the following form: 

 cT1 0 A3 =  W 0  . 0 W

(21)

00 01 zz T Let wij = (wij , wij , . . . , wij ) contain the (z + 1)2 Guttman weights (Equation 4) pertaining

to item-pair (i, j), then the K × B matrix W  T w12 0  0 wT 13   0 0 W=  .. ..  . . 0 0

is a block-diagonal matrix:  0 ... 0  0 ... 0  T  w14 ... 0 .  .. ..  . . T 0 . . . wJ−1,J 23

1266

Vector cT1 is a copy of the first row of W; duplicating this row is necessary for constructing the

1267 1268 1269

scalar 1 in Equation 6. Substituting exp(A2 log(A1 n)) by the right-hand side of Equation 20,

1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281

we find that for the vector of observed frequencies in Equation 9 A3 exp(A2 log(A1 n)) equals       wab nab Fab nab      T   nac   wab nab   Fab     wac nac   Fac  c1 0        W 0   nbc  =  wbc nbc  =  Fbc  . (22)  eab        wab eab   Eab  0 W      eac    wac eac   Eac  ebc wbc ebc Ebc Note that Fij and Eij were introduced in Equation 6.

1282 1283 1284 1285 1286 1287

The (K + 1) × (2K + 1) design matrix A4 is a concatenation of several submatrices, 1 −1 0TK−1 0TK . (23) A4 = 0K IK −IK

1288 1289

Substituting A3 exp(A2 log(A1 n)) by the right-hand side of Equation 22, we find that for

1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302

the vector of observed frequencies in Equation 9 exp(A4 log(A3 exp(A2 log(A1 n)))) equals    Fab   Fab       1 −1 0 0  0 0 0 1   Fac     0   1 0 0 −1 0 0    log  Fbc  =  Fab /Eab  . exp  (24)   Fac /Eac   0  0 1 0 0 −1 0    Eab   0   Fbc /Ebc 0 0 1 0 0 −1   Eac  Ebc The K × (K + 1) design matrix A5 is a concatenation of a unit vector of length K, and the

1303 1304

negative of an identity matrix of order K, that is,

1305 1306

A5 =

1307

1K −IK

.

(25)

1308 1309 1310

Substituting exp(A4 log(A3 exp(A2 log(A1 n)))) by the right-hand side of Equation 24, we find

1311 1312

that for the vector of observed frequencies in Equation 9 A5 exp(A4 log(A3 exp(A2 log(A1 n))))

1313 1314 1315 1316 1317 1318 1319

equals      1 1 −1 0 0  1 − Fab /Eab Hab  Fab /Eab    1 0 −1 0  1 − Fac /Eac  =  Hac  .  Fac /Eac  = 1 0 0 −1 1 − Fbc /Ebc Hbc Fbc /Ebc 





1320

24

1321 1322 1323

APPENDIX B. Derivation of Design Matrices for Item Scalability Coefficients

1324 1325 1326 1327 1328 1329 1330 1331 1332 1333

Matrix A†3 can be obtained by pre-multiplying matrix A3 (Equation 21) by a (2J +1)×(2K +1) matrix S† : For i = 1, 2, . . . J − 1, let Ji,J be the J × (J − i) matrix   0(i−1)×(J−i) , Ji,J =  1T1×(J−i) IJ−i

1334 1335 1336

and let J = (J1,J J2,J . . . JJ−1,J ); then

1337



 0 cT1 0 S† =  0 J 0  . 0 0 J

1338 1339 1340 1341 1342 1343

Vector cT1 is a copy of the first row of J. Matrix S† is required in order to add up over

1344 1345

the appropriate coefficients Fij and Eij (Equation 7). Substituting A3 exp(A2 log(A1 n)) by

1346 1347

the right-hand side of Equation 22, we find that for the the vector of observed frequencies in

1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359

Equation 9 S† A3 exp(A2 log(A1 n)) equals          

0 0 0 0 0 0 0

1 1 1 0 0 0 0

1 1 0 1 0 0 0

0 0 1 1 0 0 0

0 0 0 0 1 1 0

0 0 0 0 1 0 1

0 0 0 0 0 1 1

         

1360 1361 1362

Fab Fab Fac Fbc Eab Eac Ebc

 P F Pi6=a ia   F Pi6=a ia     F   Pi6=b ib = F   P i6=c ic   E   Pi6=a ia    E Pi6=b ib E i6=c ic 

      .    

Design matrix A†4 and A†5 are very similar to A4 (Equation 23) and A5 (Equation 25), respec-

1363 1364

tively. The only difference is that the sizes of the submatrices are J rather than K.

1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375

25

1376 1377 1378

APPENDIX C. Derivation of Design Matrices for the Total-Scale Scalability Coefficient

1379 1380 1381 1382 1383 1384

Matrix A‡3 can be obtained by pre-multiplying matrix A3 (Equation 21) by a 3 × (2K + 1) matrix S‡ 

 0 1TK 0TK S‡ =  0 1TK 0TK  . 0 0TK 1TK

1385 1386 1387 1388 1389

Matrix S‡ is required in order to add up over the appropriate coefficients Fij and Eij (Equa-

1390 1391

tion 8). Substituting A3 exp(A2 log(A1 n)) by the right-hand side of Equation 22, we find that

1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403

for the vector of observed frequencies in Equation 9 S‡ A3 exp(A2 log(A1 n)) equals   Fab      Fab   P P   Fij 0 1 1 1 0 0 0  i6 = j  Fac  PP .  0 1 1 1 0 0 0   Fbc  =  i6=j Fij   P P  0 0 0 0 1 1 1  i6=j Eij  Eab   Eac  Ebc

1404 1405

Using design matrices

1406 1407 1408 1409

A‡4

=

1 −1 0 1 0 −1

and A‡5 =

1 −1

1410 1411 1412

in Equation 19 yields coefficient H.

1413 1414 1415 1416 1417

APPENDIX D. Deriving the Matrix of Partial Derivatives

1418 1419 1420

The Jacobian G is derived by means of a recursive procedure that requires the design matrices

1421 1422 1423

derived in Appendices A, B, and C. First, let φ(x) be a function that either indicates an

1424 1425

exponential (φ(x) = exp(x), φ0 (x) = exp(x)), a logarithm (φ(x) = log(x), φ0 (x) = 1/x),

1426 1427

or a translation (φ(x) = x + c, where c is some constant value, φ0 (x) = 1). Second, let

1428 1429 1430

26

1431

f0 (n), f1 (n), f2 (n), . . . , fq (n) be a series of q + 1 functions, in which

1432 1433

f0 (n) = n,

1434 1435 1436

fi (n) = φ[Ai fi−1 (n)]; for i = 1, . . . , q.

1437

(26)

1438 1439 1440

The last function in Equation 26 is

1441 1442

fq (n) = g(n)

1443 1444 1445

For example, for coefficient Hij in Equation 15, f0 (n) = n, f1 (n) = log(A1 f0 (n)) = log(A1 n) ,

1446 1447 1448

f2 (n) = exp(A2 f1 (n)) = exp(A2 log(A1 n)), and so forth until f5 (n) = A5 f4 (n) = g(n). Third,

1449 1450

the following recursive relationship can be derived for the partial derivatives of the functions

1451 1452

fi (n):

1453

∂f0 (n) = I, ∂nT

1454 1455 1456 1457

and

1458 1459 1460

∂fi (n) ∂fi−1 (n) 0 = D [φ (A f )] A ; for i = 1, . . . q. i i−1 i ∂nT ∂nT

1461 1462

Note, that if φ indicates an exponential, then Equation 27 equals

1463 1464 1465 1466

∂fi−1 (n) ∂fi (n) = D [exp(Ai fi−1 )] Ai ; T ∂n ∂nT

1467 1468 1469

if φ indicates a logarithm, then Equation 27 equals

1470 1471 1472 1473

∂fi (n) ∂fi−1 (n) = D−1 (Ai fi−1 )Ai ; T ∂n ∂nT

1474 1475

and if φ indicates a translation, then Equation 27 equals

1476 1477 1478 1479

∂fi (n) ∂fi−1 (n) = Ai . T ∂n ∂nT

1480 1481

Fourth, the Jacobian can be obtained as

1482 1483 1484 1485

G=

∂fq (n) . ∂nT 27

(27)

1486

For example, to obtain the Jacobian of Hij in Equation 15, Equation 27 is applied recursively

1487 1488 1489

for i = 1, 2, 3, 4, 5.

1490 1491 1492 1493

The recursive procedure in Equation 27 for i = 1, 2, 3 can be avoided by computing f3 and ∂f3 (n∗ ) ∂n∗T

directly from the data. This has the advantage that the first three design matrices need

1494 1495 1496 1497 1498

not be computed separately. In the recursive procedure described above, for i = 3 and for the reduced vector n∗ , Equation 27 equals

1499 1500 1501 1502 1503 1504

∂f3 (n∗ ) ∂f2 (n∗ ) 0 = D [φ (A f )] A 3 2 3 ∂n∗T ∂n∗T = A3 D [exp(A2 log(A∗1 n∗ ))] A2 D−1 [A∗1 n∗ ] A∗1 .

(28)

1505 1506 1507

Let M∗ be a B × L∗ matrix relating the B bivariate frequencies to the L∗ observed response

1508 1509 1510 1511 1512

patterns. Suppose that the b-th row of M∗ pertains to bivariate frequency nxy ij , then element y xy x (b, l) equals exy ij /ni + eij /nj − 1 if in the l-th response pattern the score on item i equals x and

1513 1514

x the score on item j equals y; element (b, l) equals exy ij /ni − 1 if in the l-th response pattern

1515 1516

the score on item i equals x and the score on item j does not equal y; element (b, l) equals

1517 1518 1519

y exy ij /nj − 1 if in the l-th response pattern the score on item i does not equal x and the score on

1520 1521

item j equals y; and element (b, l) equals −1 if in the l-th response pattern the score on item

1522 1523

i does not equal x and the score on item j does not equal y. Let B∗ (of order B × L∗ ) be the

1524 1525 1526

reduced version of matrix B introduced in Equation 16, and let W be the K × B matrix of

1527 1528

Guttman weights (see Appendix A, Equation 21). Tedious yet straightforward algebra shows

1529 1530

that Equation 28 is equal to

1531 1532 1533 1534 1535

 cT1 ∂f3 (n )  WB∗  , = ∗T ∂n WM∗ ∗



1536 1537

where cT1 is a copy of the first row of WB∗ . The proof can be obtained from the first au-

1538 1539 1540

thor. For the subsequent steps in the recursive procedure described in this Appendix, f3 = 28

1541

A3 exp(A2 log(A∗1 n)) equals (F12 , F12 , F13 , . . . , FJ−1,J , E12 , E13 , . . . , EJ−1,J ) (cf. Equation 22 in

1542 1543 1544

Appendix A) which can be computed directly from the data. Note that cT1 yields a duplication

1545 1546

of F12 ).

1547 1548 1549

APPENDIX E. Data and R Code of Examples

1550 1551 1552 1553

The data used in the real-data example were collected in the 2008 wave of the European

1554 1555

Values Study (EVS 2011). From these data, two data sets have been made available: Data set

1556 1557

EVS2008.NL contains the scores on the 12 tolerance items pertaining to the largest Mokken scale

1558 1559 1560

for the Dutch sample, and EVS2008.GE contains the scores on the 7 tolerance items pertaining

1561 1562

to the largest Mokken scale for the Georgian sample. In both data sets the items have been

1563 1564

recoded from ten into three categories, and cases with missings have been deleted. The R code

1565 1566 1567

installs the R package mokken, reads the data, and computes the scalability coefficients and

1568 1569

their standard errors. Following R conventions, R> indicates the R prompt.

1570 1571 1572 1573

R> # Install mokken package if necessary. R> if(is.na(packageDescription("mokken")[[1]])) install.packages("mokken") R> library(mokken)

1574 1575 1576 1577 1578

R> # Read data R> EVS2008.NL <- read.table(file="http://spitswww.uvt.nl/~s544594/Data/EVS08NL.txt") R> EVS2008.GE <- read.table(file="http://spitswww.uvt.nl/~s544594/Data/EVS08GE.txt")

1579 1580 1581 1582

R> # Compute scalability coefficients and standard errors. R> coefH(EVS2008.NL) R> coefH(EVS2008.GE)

1583 1584 1585 1586

REFERENCES

1587 1588 1589

Agresti, Alan. 2007. An Introduction to Categorical Data Analysis (2nd ed.). Hoboken, NJ:

1590 1591

John Wiley & Sons.

1592 1593 1594 1595

Bergsma, Wicher P. 1997. Marginal Models for Categorical Data. Tilburg, The Netherlands: 29

1596

Tilburg University Press. Retrieved September 1, 2012, from:

1597 1598 1599

http://stats.lse.ac.uk/bergsma/pdf/bergsma_phdthesis.pdf.

1600 1601 1602

Bergsma, Wicher P., Marcel A. Croon and Jacques A. Hagenaars. 2009. Marginal Models for

1603 1604

Dependent, Clustered, and Longitudinal Categorical Data. New York, Springer.

1605 1606 1607

EVS 2011: European Values Study 2008, 4th wave, Integrated Dataset (EVS 2008). GESIS

1608 1609 1610

Data Archive, Cologne, Germany, ZA4800 Data File Version 3.0.0 (2012-04-10),

1611 1612

doi:10.4232/1.11004.

1613 1614 1615

Gow, Alan J., Roger Watson, Martha Whiteman, and Ian J. Deary. 2011. “A Stairway

1616 1617

to Heaven? Structure of the Religious Involvement Inventory and Spiritual Well-Being

1618 1619 1620

Scale.” Journal of Religion and Health 50:5-19.

1621 1622 1623

Guttman, Louis. 1950. “The Basis for Scalogram Analysis.” Pp. 60-90 in Measurement and

1624 1625

Prediction, Studies in Social Psychology in World War II, Vol. 4, edited by Samuel A.

1626 1627 1628

Stouffer et al. Princeton, NJ: Princeton University Press.

1629 1630 1631

Hendriks Vettehen, Paul G. J., C. P. M. Hagemann, and L. B. Van Snippenburg. 2004.

1632 1633

“Political Knowledge and Media Use in the Netherlands.” European Sociological Review

1634 1635

20:415-424.

1636 1637 1638 1639

Holm, Sture. 1979. “A Simple Sequentially Rejective Multiple Test Procedure.” Scandinavian

1640 1641

Journal of Statistics 6:65-70.

1642 1643 1644

Kritzer, Herbert M. 1977. “Analyzing Measures of Association Derived from Contingency

1645 1646

Tables.” Sociological Methods and Research 5:35-50.

1647 1648 1649 1650

30

1651

Ligtvoet, Rudy, L. Andries van der Ark, Janneke M. te Marvelde, and Klaas Sijtsma. 2010.

1652 1653 1654

“Investigating Invariant Item Ordering for Polytomously Scored Items.” Educational and

1655 1656

Psychological Measurement 70:578-595.

1657 1658 1659

Loner, Enzo. 2008. “The Importance of Having a Different Opinion. Europeans and GM

1660 1661

Foods.” European Journal of Sociology 49:31-63.

1662 1663 1664 1665

Mokken, Robert J. 1971. A Theory and Procedure of Scale Analysis. The Hague/Berlin:

1666 1667

Mouton/De Gruyter.

1668 1669 1670

Molenaar, Ivo W. 1991. “A Weighted Loevinger H-Coefficient Extending Mokken Scaling to

1671 1672

Multicategory Items.” Kwantitatieve Methoden 37:97-117.

1673 1674 1675 1676

Ommundsen, Reidar, Sven M¨orch, Tony Hak, Knud S. Larsen, and Kees Van der Veer. 2002.

1677 1678

“Attitudes Toward Illegal Immigration: A Cross-National Methodological Comparison.”

1679 1680

The Journal of Psychology 136:103-110.

1681 1682 1683

Sijtsma, Klaas and Ivo W. Molenaar. 2002. Introduction to Nonparametric Item Response

1684 1685 1686

Theory. Thousand Oaks, CA: Sage.

1687 1688 1689

Tabachnick, Barbara G. and Linda S. Fidell. 2007. Using Multivariate Statistics (5th ed.).

1690 1691

Boston, MA: Pearson Education.

1692 1693 1694

Van Abswoude, Alexandra A. H., L. Andries van der Ark and Klaas Sijtsma. 2004. “A Com-

1695 1696 1697

parative Study of Test Data Dimensionality Assessment Procedures Under Nonparametric

1698 1699

IRT Models.” Applied Psychological Measurement 28:3-24.

1700 1701 1702

Van der Ark, L. Andries. 2001. “Relationships and Properties of Polytomous Item Response

1703 1704 1705

Theory Models.” Applied Psychological Measurement 25:273-282. 31

1706

——. 2007. “Mokken Scale Analysis in R.” Journal of Statistical Software 20:1-19.

1707 1708 1709

Van der Ark, L. Andries and Wicher P. Bergsma. 2010. “A Note on Stochastic Ordering of

1710 1711 1712

the Latent Trait Using the Sum of Polytomous Item Scores.” Psychometrika 75:272-279.

1713 1714 1715

Van der Ark, L. Andries, Marcel A. Croon, and Klaas Sijtsma. 2008. “Mokken Scale Analysis

1716 1717

for Dichotomous Items Using Marginal Models.” Psychometrika 73:183-208.

1718 1719 1720

Van Onna, Marieke J. H. 2004. “Estimates of the Sampling Distribution of Scalability Coef-

1721 1722 1723

ficient H.” Applied Psychological Measurement 28:427-449.

1724 1725 1726

Webber, Martin P. and Peter J. Huxley. 2007. “Measuring Access to Social Capital: The

1727 1728

Validity and Reliability of the Resource Generator-UK and its Association with Common

1729 1730 1731

Mental Disorder.” Social Science and Medicine 65:481-492.

1732 1733 1734

Weijmar Schultz, Willibrord C. M. and Harry B. M. van der Wiel. 1991. Sexual Functioning

1735 1736

after Gynaecological Cancer Treatment. Groningen, The Netherlands: Dijkhuizen Van

1737 1738

Zanten B. V.

1739 1740 1741 1742

Weisstein, Eric W. 2011. “Euler’s Homogeneous Function Theorem.” From: MathWorld –A

1743 1744

Wolfram Web Resource

1745 1746

http://mathworld.wolfram.com/EulersHomogeneousFunctionTheorem.html.

1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760

32

1761 1762 1763

Table 1: Cross-Tabulation of Scores on Item a and Item b for N=178 Respondents; Guttman Weights Are Between Parentheses.

1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774

Xb Xa 0 1 2 3 nx+ P (Xa ≥ x) ab 0 3 (0) 0 (2) 0 (4) 0 (7) 3 1.000 1 4 (0) 7 (1) 3 (2) 0 (4) 14 0.983 2 10 (0) 22 (0) 34 (0) 3 (1) 69 0.904 3 9 (2) 17 (1) 40 (0) 26 (0) 92 0.517 n+y 26 46 77 29 178 ab P (Xb ≥ y) 1.000 0.854 0.596 0.163

1775 1776

Note: Frequencies of response patterns that are not Guttman errors are printed boldface.

1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815

33

1817

1818

1819

1820

1821

1822

1823

1824

1825

1826

1827

1828

1829

1830

1831

1832

1833

1834

1835

1836

1837

1838

1839

1840

1841

1842

1843

1844

1845

1846

1847

1848

1849

1850

1851

1852

1853

1854

1855

1856

34

19: IVF

17: Human Embryos

16: Prostitution

15: Fare Public Transp.

14: Casual Sex

12: Suicide

11: Euthanasia

10: Divorce

9: Abortion

8: Homosexuality

6: Adultery

4: Soft Drugs .325 (.036) .652 (.044) .432 (.033) .518 (.037) .502 (.040) .437 (.034) .585 (.029) .282 (.036) .458 (.032) .297 (.037) .363 (.050)

4

.539 (.060) .460 (.035) .475 (.040) .490 (.047) .426 (.037) .526 (.035) .234 (.036) .428 (.036) .338 (.039) .340 (.055)

6

.662 (.024) .733 (.023) .588 (.027) .596 (.032) .643 (.037) .457 (.062) .522 (.029) .414 (.036) .539 (.028)

8

.750 (.021) .715 (.022) .533 (.028) .557 (.029) .313 (.036) .520 (.027) .514 (.029) .465 (.031)

9

.635 (.024) .524 (.029) .544 (.027) .345 (.040) .639 (.026) .429 (.032) .489 (.030)

10

.531 (.025) .530 (.031) .367 (.046) .587 (.027) .436 (.029) .430 (.029)

Hij 11 14

15

16

17

Hj .436 (.019) .412 (.022) .584 (.018) .554 (.015) .573 (.015) .544 (.016) .448 (.016) .471 .510 (.029) (.016) .273 .398 .303 (.036) (.035) (.024) .455 .609 .336 .492 (.028) (.027) (.036) (.016) .313 .333 .134 .356 .370 (.029) (.032) (.039) (.029) (.019) .338 .453 .264 .439 .408 .430 (.032) (.038) (.056) (.030) (.029) (.020)

12

Table 2: Scalability Coefficients Hij and Hj and their Standard Errors (between Brackets) for 12 Items Measuring Tolerance with Respect to Interpersonal Issues for the Dutch Sample.

1816

1857 1858 1859

Table 3: Scalability Coefficients Hij and Hj and their Standard Errors (between Brackets) for 7 Items Measuring Tolerance with Respect to Interpersonal Issues for the Georgian Sample.

1860

Hij 11

1861 1862 1863 1864

6

10

Hj 14

6: Adultery

1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879

10: Divorce

.399 (.058) 11: Euthanasia .324 (.054) 14: Casual Sex .531 (.055) 17: Human Embryos .253 (.053) 18: Manip. Food .278 (.059) 19: IVF .254 (.057)

.595 (.040) .451 (.058) .419 (.058) .484 (.060) .452 (.039)

.362 (.049) .418 .230 (.052) (.048) .410 .318 (.064) (.061) .364 .275 (.038) (.048)

1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911

35

17

18

.333 (.039) .476 (.031) .416 (.031) .345 (.035) .394 (.038) .556 .436 (.057) (.042) .462 .514 .392 (.044) (.056) (.028)

1912

ACKNOWLEDGEMENTS

1913 1914 1915 1916

We thank Klaas Sijtsma and two anonymous reviewers for the helpful comments on earlier

1917 1918

versions of this manuscript.

1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966

36

Robust Nonparametric Confidence Intervals for ...

Incentives for Eliciting Confidence Intervals

Empirical calibration of confidence intervals - GitHub

Bootstrap confidence intervals in DirectLiNGAM

Two-Sided Confidence Intervals for the Single Proportion

Interpretation of Confidence Intervals B. Weaver

Standard operating procedure for rectifying errors in PDCO opinions ...

Exact confidence intervals for the Hurst parameter of a ... - Project Euclid

One-Cycle Correction of Timing Errors in Pipelines With Standard ...

The Taylor Rule and Forecast Intervals for Exchange ...

Semiparametric forecast intervals

Estimation of Prediction Intervals for the Model Outputs ...

Estimating sampling errors for major and trace ...

A Semantics for Degree Questions Based on Intervals

Standard operating procedure for requesting exceptions and ...

Fuzzy Intervals for Designing Structural Signature: An ... - Springer Link

Optimal inspection intervals for safety systems with partial ... - SSRN

scales intervals keys triads rhythm and meter pdf

Importance of Maintaining Continuous Errors and Omissions ...

scales intervals keys triads rhythm and meter pdf

server errors - Sascha Fahl