Teachers and Cheaters. Just an Anagram? Santiago Pereda-Fernández∗ Banca d’Italia January 14, 2017

Abstract In this paper I study the manipulation of test scores in the Italian education system. Using an experiment consisting in the random assignment of external monitors to classrooms, I apply a new methodology to study the nature and extent of manipulation of test scores at different levels in primary and secondary education, and I propose a correction method. The results show frequent manipulation, which is not associated with an increase in the correlation of the answers after I control for mean test scores. The manipulation is concentrated in the South and Islands region, and it tends to favor female and immigrant students. Finally, the negative correlation between the amount of manipulation and the number of missing answers in open ended questions relative to multiple choice questions suggests that teachers are more responsible for the manipulation than students.

Keywords: Cheating Correction, Copula, Discrimination, Gender, Nonlinear Panel Data, Test Scores Manipulation JEL classification: C23, C25, I21, I28, J24 ∗

Banca d’Italia, Via Nazionale 91, 00184 Roma, Italy. This paper was previously circulated with the name A New Method for the Correction of Test Scores Manipulation. I would like to thank Alessandro Belmonte, Stéphane Bonhomme, Nicola Curci, Domenico Depalo, Patrizia Falzetti, Raquel Fernández, Iván Fernández-Val, Guzmán González-Torres, Caroline Hoxby, Andrea Ichino, Claudio Michelacci, Marco Savegnago, Paolo Sestito, Martino Tasso, Jeffrey Wooldridge, Paolo Zacchia, Stefania Zotteri, and seminar participants at Banca d’Italia, EUI, IMT Lucca, Universidad de Alicante, Universidad de Cantabria and the 2nd IAAE for helpful comments and suggestions. All remaining errors are my own. The views presented in this paper do not necessarily reflect those of the Banca d’Italia. I can be reached via email at [email protected].

1

1

Introduction

A policy maker interested in evaluating the education system requires a comparable measure of academic achievement across students. Standardized tests permit the comparison of students’ knowledge, and are often used to evaluate teachers (Hanushek, 1971; Rockoff, 2004; Aaronson et al., 2007) and principals (Grissom et al., 2014), although the reliability of these estimates has been put into question (Rothstein, 2010, 2015; Chetty et al., 2014). A major threat to the comparability of these tests is the manipulation of the scores, which alters students’ recorded performance.1 There is ample evidence that these tests are susceptible of being manipulated, either by teachers grading unfairly (Jacob and Levitt, 2003; Dee et al., 2011; Angrist et al., 2014; Diamond and Persson, 2016), by students copying each other (Levitt and Lin, 2015), or even by principals who alter the pool of students who take the exam (Figlio, 2006; Cullen and Reback, 2006; Hussain, 2015) which, despite not being a manipulation of the individual test scores, affects its overall distribution. In this paper I study this phenomenon making the following contributions: first, I study the extent of test score manipulation taking advantage of a natural experiment in the Italian education system that randomly assigned external monitors to proctor some tests. On top of already known results, I find that the manipulation systematically favors female over male students, and immigrants over natives. Second, I propose a method to detect and correct manipulated test scores based on how likely the actual results are to happen at random. Methods that classify tests as either manipulated or fair, face two potential sources of misclassification: mistaking fair tests by manipulated (type I misclassification), and mistaking manipulated tests by fair (type II misclassification). Type I misclassification is particularly unfair, since it is usually the case that some statistics used to detect cheating are similar for high-achieving tests without manipulation, and manipulated tests. For example, they are both likely to have high class means, or correlated test scores, which could merely 1 Throughout this paper I refer to test score manipulation and cheating as any action taken by the students or the teachers that results in a variation of the test scores, usually an increase. This could take place before the test (alteration of the pool of students), during the test (students copying from one another, teachers turning a blind eye or telling the answers), or after the test (unfair grading, including leniency).

2

reflect effective teaching practices. Moreover, empirical studies on education often rely on raw test scores as a measure of students’ achievement, and are frequently standardized to have zero mean and unit standard deviation. However, this may not be the most appropriate approach to detect cheating: the answers to every single test item allow to consider a richer correlation structure of the results, which can be more informative to detect test scores manipulation. This correlation stems from factors that operate in a different manner and can be classified into three main categories: individual characteristics, which only affect a single student; class characteristics, which affect every student in the same classroom; and question characteristics, which affect every student, though only in each specific question. Hence, when a question is difficult, a small fraction of students is likely answer the question correctly, creating a high correlation in the answers for that particular question, both within and between classrooms. To overcome these challenges, the method I propose compares the likelihood of the results of two groups: one in which test scores are assumed to be fair (treatment group), and another one in which they might have been manipulated (control group), analogously to a comparison between blind graded and non-blind graded exams (e.g. Lavy (2008) or Hinnerich et al. (2011)). Hence, the results in the treatment group allow to estimate the probability of obtaining the observed test scores at random without manipulation. If the frequency of such results in the control group is larger than in the treatment group, then it indicates the existence of manipulated test scores, and the larger the difference, the more widespread manipulation. This likelihood function accounts for all the previously mentioned effects which create correlation patterns in students’ answers without manipulation.2 Under the assumption that the estimates of the group with an external monitor are not manipulated, differences between the two sets of estimates reflect the amount of manipulation for each demographic group and question.3 The estimates from the treatment group are subsequently used to calculate the 2

The setup is similar to those considered in Item Response Theory: they model the result to each question using an individual latent trait which is constant across questions, and questions are allowed to vary in difficulty. See Bacci et al. (2014) for an example applied to the INVALSI tests. 3 Note that this does not imply that the test scores of every student in the control group were manipulated,

3

probability of obtaining the observed results with manipulation. This constitutes the basis for the correction method, which applies a larger reduction of test scores the more unlikely the results and the higher the test scores are. The data I use stems from a set of recently introduced low stakes standardized tests in the Italian education system in primary, lower secondary, and upper secondary education. Students take these exams in their own schools, proctored by a teacher from their school who was not their teacher during the academic year. These teachers are also responsible for grading, transcribing the test scores, and sending them back to the National Institute for the Evaluation of the Education System (INVALSI). However, a set of randomly selected classrooms have an external monitor, who is responsible for the same tasks, but had no prior connection to the school. This constitutes a large scale natural experiment to study test score manipulation in the absence of an external monitor. Previous work used the results from preceding years of the primary and lower secondary tests.4 They found that having an internal monitor is associated with higher, more correlated test scores (Bertoni et al., 2013), which could be the result of students interactions (Lucifora and Tonello, 2015), or of teachers’ shirking at grading (Angrist et al., 2014). Moreover, the amount of manipulation is much larger in the South & Islands of Italy, which is greatly correlated with other measures of social capital (Paccagnella and Sestito, 2014). I find substantial test score manipulation, which is heterogeneous in various dimensions. Apart from the already known geographical patterns, I find that female and immigrant students benefit from this manipulation more than their male and native peers. In particular, females have more manipulated test scores than males. This result holds for all exams, and the manipulation is higher in mathematics exams, in which it can amount up to 2.1%, whereas in the Italian exams it is at most 1.2%. Regarding differences between different ethnic groups, I find that immigrant students in Italy tend to be favored relative to natives. This manipulation is larger in Italian exams, and it can be up to 2.7%. nor that the manipulation was of the same magnitude for students with the same characteristics. 4 In particular, Bertoni et al. (2013) focused on grades 2 and 5 for the 2010 tests, Angrist et al. (2014) and Battistin et al. (2014) on grades 2 and 5 for the 2010-12 tests, and Lucifora and Tonello (2015) on grade 6 for the 2010 tests.

4

If students were responsible for the manipulation, the correlation in their answers would increase. However, once I control for the mean scores, the correlation is not substantially different when the monitor is internal or external, but it is larger than the correlation found when students come from different classrooms. Hence, rather than manipulation, the correlation in students’ answers most likely reflects a combination of teacher quality, peer effects, and sorting of students. Also, the larger the amount of manipulation in open ended questions relative to multiple choice questions, the smaller the fraction of missing answers to open ended questions relative to multiple choice questions. These patterns are the opposite of what would arise if students copied each other during the exam. Even though these exams have no formal consequences on teachers (e.g. their wages are not linked to the results), they may have incentives to manipulate the results if they perceive that they are or could be evaluated in the future: if they were to be payed based on the performance of their students, or if principals used the results internally.5 Hence, manipulation could be a means to invalidate the comparability of the results to prevent their students test scores having consequences on them. The rest of the paper is organized as follows: the institutional details of the test and some descriptive statistics are presented in section 2. The empirical strategy and the correction methods are explained in section 3. Section 4 shows the results of the estimation, while section 5 shows the class-level correction in practice. Section 6 concludes.

2

Italian National Evaluation Test

INVALSI is the Italian institute responsible for the design and administration of annually standardized tests for Italian students. It was created in 1999, and in the academic year 2008/09 these tests acquired nationwide status. All students enrolled in certain grades are required to take two tests, one in mathematics and another one in Italian language. Even 5

These concerns, among others, have led to important boycotts of the 2014/15 and 2015/16 tests: in some of the exams, up to 10% of the students did not participate. See http://www. invalsi.it/invalsi/doc_evidenza/2015/Comunicato_stampa_Prove_INVALSI_2015_07_05.pdf http: //www.invalsi.it/invalsi/doc_evidenza/2016/Com_Stampa_INVALSI_II_SEC_SEC_GRADO.pdf.

5

though the Italian Ministry of Education stated the necessity of establishing a system of evaluation of teachers and schools based on students’ performance, the tests have been low stakes for all grades, with the exception of the 8th (III media), which corresponds to the end of the compulsory secondary education, in which the results of the test account for a sixth of their final marks. These exams are taken in classroom, and they are proctored by either an internal or an external monitor, who is also responsible for grading, transcribing the result of each student to a sheet and sending it to INVALSI.6 Internal monitors are teachers of a different class in the same school, while external monitors are teachers and principals who had not worked in the town of the school they were assigned for at least two years before the exam.7 External monitors are randomly assigned to classes with the same selection mechanism used by the IEA-TIMSS survey. In a first stage, a fixed number of schools from each region are selected at random. In a second stage, depending on the number of classrooms existing in the selected schools, one or two of them are selected at random by INVALSI.8 Students in these classes constitute the treatment group. Teachers, unlike external monitors, may have incentives to manipulate test scores: despite the low stakes nature of the exam, and although their salaries are not linked to the exam results, they may perceive that they are evaluated based on the results. INVALSI sends the results to principals, who can make them public to entice parents to enroll their children in their school. However, anecdotal evidence suggests that the results are discussed in front of all teachers, having internal consequences, such as the assignment of troublesome students. This, coupled with the possibility that in the future principals might be able to pay teachers based on their performance, may give incentives to teachers to manipulate their students’ scores.9 In this context, manipulation of the test scores could be used as a tool to invalidate 6

Only some of the questions are multiple choice, so this task cannot be automatically done by a machine. Some of the external monitors are retired teachers, while others are precari, i.e. teachers with no tenure position. They are paid between 100 and 200 EUR for the job, and can be asked in the subsequent years to monitor more exams, giving them incentives to grade fairly. 8 The 2013 tests were the first in which the assignment was done by INVALSI by public procedure. Previously, it was done by the selected schools. The recent changes in the assignment of external monitors have allowed to reduce the number of treated classrooms. 9 200 million euros have been assigned to principals to distribute among their teachers. The criteria to 7

6

the comparability of the results and consequently prevent linking students’ performance to teachers’ pay. Hence, teachers whose students perform worse would have more incentives to manipulate the test scores.

2.1

Data and Descriptive Statistics

As shown in table 1, over 2.3 million students were tested during the academic year 2012/13, of which over 143,000 were assigned an external monitor. It also shows the mean percentage of correct answers for students with either an internal or an external monitor in all ten exams, which was higher when the monitor was internal. The difference between the two groups varies across grades and is larger for the mathematics exam. Table 1: Size of the groups, academic year 2012/13 N C S % Correct (Math) % Correct (Ita)

2nd grade EX IN 25070 437479 1424 25346 737 6451 53.87 61.20 (20.68) (21.58) 59.90 64.76 (17.39) (17.84)

5th grade EX IN 24773 424046 1426 25559 736 6422 54.79 59.52 (18.87) (19.25) 74.36 76.82 (16.12) (15.52)

6th grade EX IN 27504 410332 1457 21756 732 5143 44.53 45.25 (16.80) (16.70) 64.25 64.40 (16.74) (16.87)

8th grade EX IN 28153 360528 1464 19041 1416 4537 50.83 52.48 (18.98) (19.02) 72.44 73.12 (14.96) (14.78)

10th grade EX IN 38273 270262 2203 15339 1094 3276 42.09 45.13 (17.72) (18.39) 64.20 65.92 (16.20) (17.00)

Notes: N, C and S respectively denote the number of students, classrooms and schools, and EX and IN respectively denote the groups with the external and the internal monitor. Classes with an internal monitor in schools that had at least one class with an external monitor are excluded. Standard deviations in parentheses.

Table 2 shows the mean and standard deviation of the covariates I use in this paper. Similarly to previous editions of the test, some of the variables were not perfectly balanced across the two groups. In particular, the mean class size is slightly larger in those classrooms proctored by an external monitor in more than a half of the exams, and they have a slightly higher presence of male and immigrant students in the upper secondary exams. Finally, the geographic stratification led to an over-representation of students coming from regions in which test scores were more manipulated in previous years (Bertoni et al., 2013). For expositional brevity, I focus my analysis on the 10th graders’ mathematics exam, since distribute this money includes teaching quality, which could be measured by the results of the INVALSI tests. See https://labuonascuola.gov.it/documenti/LA_BUONA_SCUOLA_SINTESI_SCHEDE.pdf?v=0b45ec8.

7

Table 2: Mean and standard deviation of covariates Class size Male Native North Center South & Isles

2nd grade EX IN 17.61* 17.26 (4.69) (5.22) 0.51 0.51 (0.50) (0.50) 0.95 0.95 (0.21) (0.21) 0.39* 0.46 (0.49) (0.50) 0.19* 0.18 (0.40) (0.39) 0.42* 0.36 (0.49) (0.48)

5th grade EX IN 17.37* 16.59 (4.75) (5.04) 0.50 0.50 (0.50) (0.50) 0.94 0.94 (0.24) (0.24) 0.38* 0.44 (0.49) (0.50) 0.19* 0.18 (0.39) (0.38) 0.43* 0.38 (0.50) (0.48)

6th grade EX IN 18.88 18.86 (4.20) (4.54) 0.51 0.51 (0.50) (0.50) 0.92 0.92 (0.26) (0.26) 0.43* 0.45 (0.49) (0.50) 0.19* 0.17 (0.39) (0.38) 0.38 0.39 (0.49) (0.49)

8th grade EX IN 19.23* 18.93 (4.49) (4.54) 0.51 0.50 (0.50) (0.50) 0.91* 0.92 (0.28) (0.27) 0.41* 0.43 (0.49) (0.50) 0.20* 0.18 (0.40) (0.38) 0.39 0.39 (0.49) (0.49)

10th EX 17.37 (5.49) 0.51* (0.50) 0.90* (0.29) 0.41* (0.49) 0.18* (0.39) 0.40* (0.49)

grade IN 17.62 (5.97) 0.49 (0.50) 0.91 (0.28) 0.45 (0.50) 0.16 (0.37) 0.38 (0.49)

Notes: EX and IN respectively denote the groups with the external and the internal monitor. Standard deviations in parentheses. The asterisk denotes that difference between the two groups is significantly different from zero at the 95% confidence level.

they constitute the largest treatment group, and the percentage of manipulation, measured as the difference in the percentage of correct answers between the two groups, is larger for the mathematics exam. Given that the amount of manipulation substantially varied by exam, pooling all the exams together to detect cheating patterns would be counter-productive, as it would cover several manipulation patterns. Regardless, the regressions for all exams are shown in appendix B, and those results that are different across exams are also reported in the paper. A total of 38,273 students in 2,203 classes were assigned an external monitor, whereas 270,262 students in 15,339 classrooms were assigned an internal monitor in schools without external monitors.10 The total number of questions in this exam was 50. The left panel in figure 1 shows the proportion of students who answered each question correctly. Even though there is a lot of variability across answers, students proctored by external monitors scored worse than the rest in all but three of the questions, suggesting that the scores were manipulated. Both difficult and easy questions can have large or small differences between the two groups, although there is a weak correlation between the difficulty of a question, measured as the proportion of correct answers for the treatment group, and the difference between the two 10

Since Bertoni et al. (2013) found that the manipulation was less severe in non-treated classrooms in treated schools, I exclude them from the main analysis. The manipulation patterns in these classrooms are similar to those from non-treated schools. These results are available upon request.

8

groups.11 Similarly, the right graph shows that there is a change in the distribution of the total number of correct answers, with both the mean, the median and the mode increasing when the examiner is internal. The majority of the change takes place around the center of the distribution, whereas the tails show a change much smaller in magnitude. Since this is a low stakes exam, there are no jumps at a cut-off grade and the change is quite smooth. Figure 1: 10th grade mathematics exam results 0.05

EX IN DIF

0.8

EX IN 0.04

0.6 0.03 0.4 0.02 0.2 0.01 0 0

10

20

30

40

0

50

0

10

20

r

30

40

50

The left graph depicts the proportion of correct answers by question (questions are sorted by how frequently they were correctly answered by students proctored by an external monitor); the right graph depicts the students’ distribution of test scores. EX, IN, DIF, and r respectively denote the groups with the external and the internal monitor, the difference between them, and the number of correct answers.

Apart from the mean test score, other measures used to identify cheating (Jacob and Levitt, 2003; Quintano et al., 2009) are based on the correlation of the answers. However, if the mean test scores of classrooms in the treatment group differs from those in the control group, then the correlation in the answers will be different in both groups, even if there is no manipulation.12 This presents a comparability problem which can be resolved by appropriately controlling for the mean test scores. To illustrate this point, consider the following alternative statistic to the within class 11

This result is, however, not consistent across exams, and for some of them there is no correlation at all. To see this, if every student in a class got the maximum grade, then the correlation in the answers would be one by construction. On the other hand, if all of them got one half of the answers right, the correlation could be equal to one, but also equal to zero. 12

9

correlation of test scores: the mean number of correct answers in common between two students, s, conditional on each of them having correctly answered r and r questions. This is estimated by Q X

PC PNc P

 1 ri = r, rj = r, rij = s s En (s|r, r) ≡ (1) PC PNc P c=1 i=1 j6=i 1 (ri = r, rj = r) s=0 where Nc is the number of students in classroom c, C is the number of classrooms, and Q is c=1

i=1

j6=i

the number of questions. This statistic mirrors the Oaxaca-Blinder decomposition: because both groups have different means, by controlling for them it is possible to see if the answers were more homogeneous for students proctored by an internal monitor in a comparable manner. If cheating increases answers’ homogeneity, then the manipulation would affect both the mean test scores and the conditional homogeneity.13 Figure 2 shows the values of this statistic for different values of (r, r), both for students in the same classroom in each of the two groups, and for students who are in different classrooms.14 As expected, this conditional mean is uniformly larger for students in the same classroom relative to students in different classrooms. However, the conditional mean number of correct answers in common is roughly the same when the monitors are either internal or external. Hence, it could be argued that the amount of homogeneity in students’ answers is the same in both groups once we control for their mean test scores, and it may reflect spillovers or the teacher effect.

3

Empirical Methodology

Questions in the INVALSI tests are graded on a right/wrong basis, so let yicq equal one if student i in classroom c correctly answered question q in the exam, and zero otherwise. This ∗ variable can be modeled with a latent variable, yicq , which is the sum of three effects: a

student-class effect, ηic , a question effect, ξq , and a specific student-class-question iid shock, εicq . The student-class effects measures the ability of a student, and more able students have a higher probability of answering correctly each question. On the other hand, the question 13

This would happen if students copied each other, or if teachers graded in a systematic way. Because of the number of possible combinations of r and r I show a representative selection of them. Full results are available upon request. 14

10

8

Figure 2: Mean number of correct answers in common, 10th grade mathematics exam r = 10 r = 15 10

6

8

4

6

2 10

15

15

r r = 20

20

4 10

25

15

20

r r = 35

20

25

15 10 EX IN IND

10 5 10

15

r

20

5 10

25

15

r

20

25

EX, IN and IND respectively denote the mean number of correct answers in common of two students with r and r correct answers (equation 1) when they are in the same class and the examiner is external, in the same class and the examiner is internal, or in different classes in either group.

effects measures the difficulty of each particular question, and the harder they are, the more negative the effect.15 Formally,  ∗ yicq = 1 yicq ≥0

(2)

∗ yicq = x0ic β + ηic + ξq + εicq

(3)

where, from an econometric perspective, the number of questions (Q) is fixed, the number of classrooms is large, and the number of students per classroom is small but not fixed. Because of the incidental parameter problem, it is impossible to obtain consistent estimates of the student-class effects, but it is possible to consistently estimate the question effects. The latter are parameters in the regression, while the former are treated as random effects. Denote by yc ≡ (y1c1 , ..., y1cQ , ..., yNc cQ ) the vector with the results of all students in classroom 15

The question effect may also capture the location of the question in the exam. However, several versions of the exam were provided, with the only difference among them being the ordering of the questions. Unfortunately, the version assigned to each student is not recorded in the dataset, and hence it is not possible to estimate if questions asked at the beginning of the exam are answered correctly more often.

11

c, and assume that the distribution of the unobservables is given by εicq ∼ Logistic (0, 1), 0  and ηic ∼ N 0, ση2 , and let θ ≡ ξ 0 , ση2 . For expositional clarity, let the student-class effects to be independent of each other. This model is a random effects logit with normally distributed random effects, whose likelihood  P is given by   Q   0 ˆ C N c β + η + ξ ) exp y (x XX ic q ic q=1 icq ηic  L (θ) = log  dΦ QQ 0 ση R q=1 (1 + exp (xic β + ηic + ξq )) c=1 i=1

(4)

Independence of the student-class effects is highly unrealistic, as sorting of students, peer effects, or sharing the same teacher can cause correlation in these effects. Despite that, equation 4 can be used to consistently estimate the vector of parameters θ, though not efficiently (Pereda-Fernández, 2016). A convenient way to model the correlation of the student-class effects is using a copula.16 Copulas are multivariate functions that capture the correlation structure of a vector of random variables. They depend on the ranks of   the individual effects, uic ≡ Φ ησicη , which are invariant to the parameters of the marginal distribution of ηic .17 Denote by ηc and uc the Nc -dimensional vectors of the individual effects and their ranks in class c, and let the copula be a Clayton, denoted by C (uc ; ρ), where ρ is the parameter that models the correlation intensity.18 The Copula-Based Random Effects (CBRE) estimator maximizes the following function:  P likelihood   P N Q c 0 ˆ C β + η + ξ ) exp y (x X ic q icq ic i=1 q=1 log  dC (uc ; ρ) L (θ) = QNc QQ 0 Nc (1 + exp (x β + η + ξ )) [0,1] ic q ic i=1 q=1 c=1

(5)

The covariates used in this paper have finite support. As shown in Chernozhukov et al. (2013), the distribution of the individual effects is not nonparametrically identified, and therefore neither is the copula (Pereda-Fernández, 2016). While this assumption seems strong, the conditional (fixed effects) logit estimates (Chamberlain, 1980), which do not impose any distributional assumption on the student-class effects, have a correlation with the random effects estimates almost equal to one.19 Also, I am ruling out class-question 16

As proved by Sklar (1959), any multivariate cdf can be written as a copula whose arguments are the marginal distributions, i.e. P (X1 ≤ x1 , ..., Xd ≤ xd ) = C (F1 (x1 ) , ..., FNc (xNc )). See Nelsen (2013) for an introduction to copulas. 17 Equation 4 implicitly assumes of the individual effects for students in the same classroom    thatthe copula  Q   ηNc c Nc η1c ηic is independent, i.e. C Φ ση , ..., Φ ση = j=1 Φ ση . 18 The Clayton copula is convenient from a computational point of view. 19 However, the conditional logit estimator cannot be applied to work with the cheating detection method

12

effects, which could matter if some teachers were more able to teach the material relevant for some of the questions. A way to avoid this issue would be to run the regression using a single student per classroom. This analysis is reported in appendix D, and the results indicate that there is no substantial bias.

3.1

Cheating Correction

As I argued in section 2, and also motivated by the findings in Angrist et al. (2014), teachers may play an important role in cheating, and therefore I consider the classroom as the unit of analysis. Using equation 5, if students i = 1, ..., Nc in classroom c got a total number of correct answers of rc ≡ (r1c , ..., rNc c ), it is possible to estimate the probability of each of them obtaining at least that many correct answers, R, denoted by P (R ≥ rc ). This probability presents a problem of comparability across classes, since class size is not constant. To overcome this problem, I use its geometric mean to estimate how likely those results are.20 This probability is computed for each classroom in the sample by plugging in the estimates from the treatment group:  X ˆ X ˆlc =  ... 

 N1   c 0 ˆ ˆ exp q=1 biq xic β + ηic + ξq   dC (uc ; ρˆ)  (6) QNc QQ  0 ˆ ˆ [0,1]Nc β + η + ξ 1 + exp x ic q ic bNc c ∈B rN b1 ∈B r1c q=1 i=1 c o n PQ where B ric ≡ biq : q=1 biq ≥ ric , i.e all the possible combinations of correct answers P

Nc i=1

PQ

that would yield a final test score of at least ric . If a demographic group, e.g. female students, scored higher relative to another group, male students, with an internal monitor relative to when the monitor was external, then equation 6 would yield a smaller likelihood for a classroom in which female students score better than males, than in a classroom in which the result was the opposite, even if the overall distribution of the test scores were the same in both classrooms. The same reasoning applies to classrooms in which students’ answers display a too high or too low degree of correlation relative to what happens in the treatment group. proposed in this paper. See appendix C for a more detailed discussion and the results. 20 See appendix A for the details on the computation of the sum of all possible permutations.

13

The correction for cheating proposed in this paper is split into two steps: the first one involves changing the estimated likelihood of each classroom in the control group such that the resulting distribution matches the distribution of the treatment group. The second step uses this likelihood and the observed test scores to calculate the correction. Denote by FL,j (l) the cdf of the likelihood for the treatment (j = 1) and control (j = 0) groups. The corrected    −1 likelihood is given by ˇlc ≡ FL,1 FL,0 ˆlc . In words, the cdf of the corrected likelihood of the classes in the control group equals the cdf of the treatment group by construction. Graphically, it involves a nonlinear horizontal shift of the cdf for the control group. The second step corrects the test scores based on the corrected likelihood, for which I use the following assumption: Assumption 1. Distribution of test score manipulation Let rc∗ denote the observed mean test score of classroom c with an internal monitor. This score is decomposed into the sum of the score without manipulation, rc , and the manipulation, αc . These two components are mutually independent and the distribution of the manipulation is given by an exponential(λ) distribution. This assumption allows me to estimate the expected fair test score, conditional on the   observed test score and the corrected likelihood, E r|r∗ , ˇl , which is the corrected test score.21 The idea is similar to Wei and Carroll (2009), whose estimator of quantile regression with measurement error is adapted to the current framework:  ´ r∗ ˇl|r λ exp (−λ (r∗ − r)) dF (r)  ∗  rf (7) E r|r , ˇl = ´0 r∗  f ˇl|r λ exp (−λ (r∗ − r)) dF (r) 0 where the equality follows by Bayes’ theorem and the independence between test scores and manipulation stated in assumption 1. Equation 7 suggests the following sample analogue to estimate the corrected test scores: PC0 1

   ˆ ˇl|rc λ ˆ exp −λ ˆ (r∗ − rc ) r f c c=1 1(rc ≤r∗ )   r˜ ≡ c=1 (8) P C0 ˆ ˇ  ˆ 1 ∗ ˆ PC0 c=1 f l|rc λ exp −λ (r − rc ) ∗ c=1 1(rc ≤r )    P τk+1 −τk K ˇ ˆ ˇ ˆ ˆ ˆ L (τ |r) is where f l|r = k=1 Qˆ (τ |r)−Qˆ (τ |r) 1 QL (τk |r) < l ≤ QL (τk+1 |r) , and Q PC0

L

k

L

k+1

21

Using a parametric distribution with positive support, such as the exponential distribution, ensures that the correction does not result in an increase of the test scores.

14

estimated by using linear quantile regression on a polynomial of r and applying Chernozhukov ˆ is estimated using the method of moments. et al. (2010) rearrangement, and λ Assumption 1 is not likely to hold in practice if manipulation of the test scores has a strategic component, and also the test scores from some classrooms in the control group may be free of manipulation. If this assumption was relaxed, one would still need to make an assumption on the distribution of the amount of manipulation conditional on the fair test score. Nevertheless, it allows me to explicitly derive a correction in closed form that has a desirable property: the smaller the likelihood of the results and the higher the mean test scores, the higher the correction.22 Therefore, this correction is higher when the mean classroom test scores are higher and their likelihood is smaller in the control group, relative to the treatment group. This approach explicitly acknowledges the existence of effective teachers and students in classrooms with internal monitors, it controls for the difficulty of each question of the test, it efficiently uses the information coming from the answers to every item, and it allows to identify which demographic groups benefit from the manipulation.

4

Results

Figure 3 shows the RE logit estimates (equation 4) of the question effects when no covariates are included. The results show that for 38 out of the 50 questions, the coefficient for the control group is significantly larger than for the treatment group, and for 6 out of the remaining 12, they are not significantly different. Ignoring unobserved heterogeneity results in significantly biased estimates, as shown in table 8 in appendix B. Table 3 shows the Average Partial Effects (APE) of the different covariates and the estimates of (ση , ρ) for several specifications.23 The first four columns are the standard 22

If manipulation was higher when test scores are low, then the correction should be higher for higher test scores. This would also punish harder fair test scores of high-performing students, which could be mistaken as manipulated. 23 Further specifications including a polynomial of class size, or an interaction between small class and regional dummies yield similar results and are omitted, but available upon request.

15

Figure 3: RE logit estimates, 10th grade mathematics exam 3

EX IN DIF

2 1 0 −1 −2 −3

0 5 10 15 20 25 30 35 40 45 50 This figure shows the RE logit estimates from equation 4. EX, IN, and DIF respectively denote the groups with the external and the internal monitor, and the difference between them. They are reported with the 95% confidence intervals, and are sorted by how frequently they were correctly answered by students proctored by an external monitor.

RE logit estimates (equation 4), whereas the latter two are the CBRE logit estimates (equation 5). The regressions include question effects, a female dummy interacted by each question, geographical dummies for the Center and South & Islands regions, and a dummy for classrooms whose size is smaller or equal to the median size. The coefficients for the group with an external monitor from column (6) show that students from the Center and South & Islands regions respectively scored on average 5% and 12% less correct answers than students from the North. Female students also scored lower than their male counterparts, scoring around 6% less, whereas native Italians outperformed immigrant students, with a difference of 5% correct answers. Students in small classrooms scored worse than those in large classrooms, although this coefficient does not have a causal interpretation. Finally, the estimate of the copula correlation coefficient indicates that there is substantial within-classroom correlation in the unobserved individual-class effect.24 The amount of manipulation and how much it benefited each demographic group can be 24

ρ is not interpreted as the linear correlation coefficient, and for the Clayton copula, the independence case happens when ρ = 1. Using the relation between the Clayton and Gaussian copulas with Kendall’s τ statistic, the linear correlation for this group is approximately 0.81.

16

Table 3: RE & CBRE logit estimates, 10th grade mathematics exam (1)

(2)

(3)

(4)

(5)

(6)

FE

-0.07*** (0.00) -

SI

-

-

IT

-

-

0.02*** (0.00) -0.04*** (0.00) -0.05*** (0.00) -0.12*** (0.00) -

SMALL

-

-

-

σ ˆη

0.95*** (0.01) -

0.96*** (0.01) -

0.90*** (0.01) -

0.01*** (0.00) -0.07*** (0.00) -0.05*** (0.00) -0.11*** (0.00) 0.06*** (0.00) -0.08*** (0.00) 0.87*** (0.01) -

-0.09*** (0.00) -

CE

-0.04*** (0.00) -0.07*** (0.00) -

0.04*** (0.00) -0.06*** (0.00) -0.05*** (0.00) -0.12*** (0.00) 0.05*** (0.00) -0.10*** (0.00) 0.76*** (0.00) 2.95*** (0.03)

FE

-0.04*** (0.00) -

SI

-

-

IT

-

-

0.02*** (0.00) -0.05*** (0.00) -0.04*** (0.00) -0.06*** (0.00) -

SMALL

-

-

-

σ ˆη

1.00*** (0.00) -

0.99*** (0.00) -

0.98*** (0.00) -

-0.01*** (0.00) -0.05*** (0.00) -0.03*** (0.00) -0.06*** (0.00) 0.07*** (0.00) -0.08*** (0.00) 0.95*** (0.00) -

-0.02*** (0.00) -

CE

-0.02*** (0.00) -0.04*** (0.00) -

FE

-0.03*** (0.00) -

SI

-

-

IT

-

-

0.01*** (0.00) 0.00** (0.00) -0.01*** (0.00) -0.06*** (0.00) -

SMALL

-

-

-

σ ˆη

-0.05*** (0.01) -

-0.03*** (0.01) -

-0.08*** (0.01) -

0.02*** (0.00) -0.01*** (0.00) -0.01*** (0.00) -0.05*** (0.00) -0.01*** (0.00) 0.01*** (0.00) -0.08*** (0.01) -

-0.06*** (0.00) -

CE

-0.02*** (0.00) -0.03*** (0.00) -

ˆξ

External

ρˆ ˆξ

Internal

ρˆ ˆξ

Difference

ρˆ

0.90*** (0.01) 4.11*** (0.09)

0.92*** (0.00) 1.81*** (0.00)

-0.02*** (0.01) 2.31*** (0.09)

0.02*** (0.00) -0.04*** (0.00) -0.03*** (0.00) -0.07*** (0.00) 0.04*** (0.00) -0.09*** (0.00) 0.91*** (0.00) 1.60*** (0.00) 0.01*** (0.00) -0.02*** (0.00) -0.02*** (0.00) -0.05*** (0.00) 0.01*** (0.00) -0.01*** (0.00) -0.15*** (0.00) 1.35*** (0.03)

Notes: FE is the mean of the interaction between a dummy for female students and the question dummies, CE, SI, IT, and SMALL are dummies for Center region, South & Islands region, natives, and small class. Columns 1-4 show the APE of the covariates and the estimates of ση with the RE Logit estimator (equation 4); columns 5-6 show the same estimates and those of ρ with the CBRE estimator (equation 5). *, **, and *** denote statistically significance at 17 the 90%, 95%, and 99% level, respectively, standard errors in parentheses.

measured by looking at the difference between the coefficients of the two groups. As shown in column (6), students from the Center and South & Islands respectively had an average of 2% and 5% extra correct answers than their northern counterparts. Similarly, female students had an extra 2% of correct answers due to manipulation, whereas for immigrant students it implied an extra 1%. Also, the manipulation was on average larger by 1% in small classrooms. Finally, the copula correlation coefficient was smaller in the control group, indicating that the correlation in the unobserved effects did not increase because of the manipulation. Most of the results apply to all exams in the sample. Table 4 summarizes the differences in performance between students with an internal and an external monitor for all exams, and it represents the percentage of manipulation, measured as the difference between the APE, in favor of each demographic group. The single most important variable is the dummy for the South & Islands region, which is significantly negative in all exams, indicating that it was in this region where the largest amount of manipulation took place, ranging between 1-12%. There is also more manipulation in the Center than in the North in eight of the exams, although it is of smaller magnitude than in the South & Islands. Test scores of female students were also more manipulated than their male counterparts in every exam, and this difference was on average 1%. The amount of manipulation varied for mathematics and Italian exams, being larger in the former, in which females scored worse than males when the monitor was external. However, females outperformed males in Italian exams with an external monitor, and yet the manipulation further increased this difference in performance. The existence of persistent differences in academic performance by gender is well documented (Machin and Pekkarinen, 2008; Lavy and Sand, 2015), and consistently with the results of this paper, Lavy (2008) finds that male students face discrimination with respect to females in every subject. In contrast, Diamond and Persson (2016) finds that teachers’ grading leniency was not different for male and female students. Similarly, test scores of immigrant students were more manipulated in most exams, with an average of 1.3% extra correct answers. Interestingly, this difference was larger in the

18

Italian exams, which could mean that teachers were trying to compensate for the handicap immigrants face by having to learn the local language. These results contrast with Diamond and Persson (2016), who finds no discrimination between natives and immigrants, Sprietsma (2013), who finds that German teachers discriminate against students with Turkish names, and Hanna and Linden (2012), who find that Indian teachers discriminate against lower caste students. Finally, the exams were more manipulated in classrooms of size smaller than the median, although this result is not homogeneous, and in the sixth grade Italian exam the manipulation was larger in large classrooms. This result largely coincides with those found in Angrist et al. (2014), but it is worth noticing that the grades of the exams considered in that study (2nd and 5th grades) displayed the first and third largest amount of manipulation in small classrooms of all exams. In contrast, this manipulation pattern is less marked in the 8th grade, and actually reversed in the 6th grade. Table 4: Summary CBRE logit estimates FE CE SI IT SMALL

2nd grade M I -0.9* -0.4* 0.8 -2.6** -4.5** -6.5** 0.3 1.9** -4.3** -1.0**

5th grade M I -1.9** -0.3** -5.8** -0.5** -12.0** -3.2** 1.0** 2.7** -7.0** -1.6*

6th grade M I -1.0** -1.2** -0.1 -0.3** -0.9** -0.8** 1.4** 1.4** 0.2 0.5**

8th grade M I -0.8** -0.3** -2.0** 0.0 -3.8** -0.8** 0.3 1.2** -0.1 -1.1**

10th M -2.1** -1.7** -5.1** 0.9** -1.4**

grade I -1.0** -2.0** -6.5** 1.7** -4.4**

Notes: FE, CE, SI, IT, and SMALL are difference between externally and internally monitored students of the APE for females, Center region, South & Islands region, natives, and small class, as reported in column 6 from tables 3 and 9 to 17, expressed in %. *, and ** respectively denote statistical significance at the 95, and 99% confidence level.

INVALSI tests are comprised of two types of questions: multiple choice and open ended. Multiple choice questions require minimal effort to grade and transcript, and at the same time students would find it easier to copy from one another. Open ended questions may involve an elaborate answer which takes more time to grade and students may find it harder to copy, and even though INVALSI provides a correction grid, teachers can have more room for interpretation in judging whether the answer to an open ended question is right or wrong. Table 5 shows that, although both types of questions suffer from manipulation, open

19

ended questions were more manipulated.25 However, the pattern for missing answers is the opposite, as shown in figure 4: for the control group, the proportion of missing answers decreases more for the open ended questions than for the multiple choice questions. If students had copied each other during the exam, it would have led to the opposite result. Moreover, another fact that makes student less likely to be responsible for the manipulation is the fact that there were several versions of each exam with the same questions but in a different order. Hence, this evidence supports the hypothesis that teachers were more responsible than students for the manipulation of the test scores. Table 5: Multiple choice vs open ended questions ∆AP E,M C ∆AP E,OE DID ∆M,M C ∆M,OE DID

2nd grade M I -5.29 -6.35 -6.37 -10.36 1.08 4.02 1.19 1.07 2.21 3.34 -1.02 -2.27

5th grade M I -1.70 -3.01 -5.62 -4.46 3.92 1.44 0.22 0.52 0.84 1.00 -0.63 -0.48

6th grade M I 1.76 -1.57 -0.22 -3.74 1.99 2.17 -0.11 -0.08 -0.37 -0.58 0.26 0.50

8th grade M I -1.81 -0.79 -2.02 -0.54 0.21 -0.25 0.24 -0.04 0.28 -0.12 -0.05 0.08

10th M -1.92 -5.03 3.11 -0.11 2.50 -2.61

grade I -3.05 -4.98 1.93 -0.29 0.29 -0.58

Notes: ∆AP E,M C and ∆AP E,OE respectively denote the mean difference between the treatment and control groups of the mean question APE of the CBRE logit estimates (first row of table 3) for open ended and multiple choice questions; DIDAP E denotes the difference between these two; ∆M,M C and ∆M,OE respectively denote the mean difference between the treatment and control groups percentage of missing answers for open ended and multiple choice questions; DIDM denotes the difference between these two. All numbers are reported as %.

5

Cheating Correction

The distribution of the estimated likelihood from equation 6, based on the estimates of column (6) from table 3, are shown in figure 5. Unsurprisingly, the two distributions do not coincide. The right tail, representing those classes with more likely results, is approximately the same for both distributions, but the left tail of the distribution of the group with internal monitors has more mass probability. This indicates that there is an excessive number of unlikely results relative to the amount there would have been without manipulation.26 25

The proportion of open ended questions ranged between 21% and 50%, depending on the exam. Consistently with the estimation results, the difference between the two distributions is largely explained by the difference in the South & Islands. See appendix B. 26

20

Figure 4: Open ended vs multiple choice questions 1 0.5

DID % Missing

0 −0.5 −1 −1.5 −2 −2.5 −3 −1

0

1

2

3

4

5

DID APE The horizontal axis represents the difference between open ended and multiple choice questions of the difference between the two groups of the mean APE, whereas the vertical axis represents the mean difference between the two types of questions of the difference between the two groups of the proportion of missing questions.

Figure 5: Distribution of the likelihood, 10th grade mathematics exam CDF

PDF

1

2.5

0.8

2

0.6

1.5

0.4

1

0.2

0.5

EX IN 0

0

0.2

0.4

0.6

0.8

EX IN 0

1

Likelihood

0

0.2

0.4

0.6

0.8

1

Likelihood

Distribution of the estimated likelihood of the class scores (equation 6). EX and IN respectively denote the groups with the external and the internal monitor.

21

Given the large regional differences in test scores manipulation, the correction method proposed in section 3.1 is applied to each class using only data from their region. Each dot in figure 6 represents the correction applied to a single class and relates it to their actual mean test score and estimated likelihood, showing that a higher correction is applied to higher, less likely test scores. Since the majority of the test scores with unlikely results are located in the South & Islands region, the correction is higher there. Consequently, the class and regional rankings are changed once the correction is applied.27 This is seen in the maps in figure 7, which show the mean test scores in every Italian province, both before and after the correction is applied. Figure 6: Correction for cheating, test scores, and likelihood, 10th grade mathematics exam

The upper and lower figures respectively show the scatter plot of the mean correction to the classes with an internal monitor, with the estimated likelihood of the test scores of each class (equation 6) and with the class mean test scores.

27

Specifically, the Kendall’s τ rank correlation equals 0.88 out of a maximum of 1.

22

Figure 7: Correction for cheating, provincial variation, 10th grade mathematics exam

23

5.1

Comparison with Quintano (2009) Correction

The method currently used by INVALSI to correct for cheating is based on the approach proposed by Quintano et al. (2009). It is based on a fuzzy clustering approach that depends on four statistics: within class mean test scores, within class standard deviation of test scores, within class average percentage of missing answers, and within class index of answer homogeneity. Thus, if the mean test scores of a classroom are high relative to those in the treatment group, which are assumed to be free of manipulation, they are more likely to be classified as manipulated. And similarly if their standard deviation, the average percentage of missing answers, or the index of answer homogeneity are low.28 Related to this strategy is the work of Battistin et al. (2014), who propose a way to bound the amount of manipulation in the population using a binary indicator that classifies a classrooms’ test scores as manipulated based on these four statistics. While manipulation can be reflected in those four statistics, this method suffers from two problems: comparability and the existence of confounders. To see the first problem, notice that the distributions of these statistics, and even their support, depend on the number of questions and students in the class. Therefore, the same within class standard deviation conveys different information if the classrooms are of different size, or if the tests have a different number of questions.29 Moreover, as I argued in section 2.1, the distribution of these statistics depend on each other in a nontrivial way. Hence, a low within class variance of test scores is more informative when the mean test score is close to a half of the maximum, than when it is close to the maximum, since by construction it is always low in the latter case. Regarding the second problem, these statistics may not reflect cheating accurately: a high mean could imply that students’ ability is high, and a high correlation could be caused by sorting or peer effects, and one should allow for these possibilities when comparing the two groups. Also, if these statistics are not sufficient, some information is missing and additional 28 The index of answer homogeneity takes value zero when every student’s answers coincide, and takes higher values, the more heterogeneous they are. 29 For example, if the number of questions equals 2, and the number of students equals 2, the variance of the test scores can take values {0, 1/4, 1}, but if the number of students equals 3, then it can take values {0, 2/9, 6/9, 8/9}.

24

statistics could improve the detection of cheating. In practice, both corrections are applied to different sets of classrooms and the amount of correction is different, although both of them are positively correlated (the linear correlation coefficient equals 0.52). Figure 8 shows the distribution of both corrections: the one proposed in this paper only leaves almost 20% of the test scores unchanged, and a correction of less than 3 points (out of a maximum of 50) is applied to nearly 90% of them. On average, the correction equals 1.4 points. In contrast, Quintano et al. (2009) correction does not correct about twice as many test scores, but the average correction for the remaining ones is much larger, with at least 10% of the test scores having a correction of at least 10 points, and an average correction of 4 points. Figure 8: Distribution of correction for cheating, 10th grade mathematics exam 1

0.8

0.6

0.4

0.2

α ˆ Quintano et al. 0

0

1

2

3

4

5

6

7

8

9

10

Correction α ˆ and Quintano et al. respectively denote the empirical cdf of the correction methods presented in this paper and the one proposed by Quintano et al. (2009).

Another way of comparing both corrections is by looking at the mean correction applied in each region and compare it to the actual changes in mean test scores between the two groups. Figure 9 shows that both corrections lead to a change in the regional rankings, and regions where the test scores were more manipulated are those in which the correction was the highest. However, they greatly differ in their fit: Quintano et al. (2009) consistently

25

overestimate the average correction, resulting in a larger reduction of the mean test scores for students with an internal monitor. The correction proposed in this paper matches the mean difference between the two groups by region better. Figure 9: Correction for cheating, regional variation, 10th grade mathematics exam 8

6

rI N − rEX α ˆ Quintano et al.

4

2

0

−2

VDA PIE LIG LOMTAAVENFVG ER TOSUMBMARLAZ ABRMOLCAMPUGBAS CAL SIC SAR For each region, rIN − rEX denotes the mean difference in test scores between students with an internal and an external monitor, α ˆ denotes the mean correction of the method presented in this paper, and Quintano et al. denotes the mean correction of the method proposed by Quintano et al. (2009).

6

Conclusion

In this paper I propose a novel approach to detect test scores manipulation and correct for it, based on the comparison of a group of test scores suspect of having been manipulated with a group of test scores that are assumed to be fair. Taking advantage of a natural experiment in the Italian education system, I apply nonlinear panel data regression methods to describe patterns in test scores manipulation, and based on these estimates, I calculate the corrected the test scores. The findings are consistent with the conjecture that teachers are responsible for the manipulation, showing a negative correlation in the difference between open ended and multiple choice questions between the amount of manipulation and the number of missing answers. The manipulation is limited in the North of Italy, frequent in the Center and 26

widespread in the South & Islands. Moreover, it tends to favor female students in every exam, and immigrant students in Italian exams. Unobserved heterogeneity accounts for an important share of the total variation, and it exhibits a substantial level of correlation within classrooms, reflecting a combination of teacher effects, sorting of students, and peer effects. The correction method I propose allows the results of a classroom to be good or highly correlated because factors unrelated to manipulation, such as effective teachers or able students. The correction then depends on how likely the observed result are to happen without manipulation, and the higher and more unlikely the results are, the higher the correction. For the majority of the classrooms the correction is quite modest or even zero, and it displays a large regional variation.

References Aaronson, D., L. Barrow, and W. Sander (2007). Teachers and student achievement in the chicago public high schools. Journal of Labor Economics 25 (1), 95–135. Angrist, J. D., E. Battistin, and D. Vuri (2014). In a small moment: Class size and moral hazard in the mezzogiorno. Technical report, National Bureau of Economic Research. Bacci, S., F. Bartolucci, and M. Gnaldi (2014). A class of multidimensional latent class irt models for ordinal polytomous item responses. Communications in Statistics-Theory and Methods 43 (4), 787–800. Battistin, E., M. De Nadai, and D. Vuri (2014). Counting rotten apples: Student achievement and score manipulation in italian elementary schools. Technical report, IZA Discussion Papers. Bertoni, M., G. Brunello, and L. Rocco (2013). When the cat is near, the mice won’t play: The effect of external examiners in italian schools. Journal of Public Economics 104, 65–77. Bonhomme, S. (2012). Functional differencing. Econometrica 80 (4), 1337–1385. Chamberlain, G. (1980). Analysis of covariance with qualitative data. The Review of Economic Studies 47 (1), 225–238. Chernozhukov, V., I. Fernández-Val, and A. Galichon (2010). Quantile and probability curves without crossing. Econometrica 78 (3), 1093–1125. Chernozhukov, V., I. Fernández-Val, J. Hahn, and W. Newey (2013). Average and quantile effects in nonseparable panel models. Econometrica 81 (2), 535–580. 27

Chetty, R., J. N. Friedman, and J. E. Rockoff (2014, September). Measuring the impacts of teachers i: Evaluating bias in teacher value-added estimates. American Economic Review 104 (9), 2633–79. Cullen, J. B. and R. Reback (2006). Tinkering toward accolades: School gaming under a performance accountability system, Volume 14. Emerald Group Publishing Limited. Dee, T. S., B. A. Jacob, J. McCrary, and J. Rockoff (2011). Rules and discretion in the evaluation of students and schools: The case of the new york regents examinations. Unpublished working paper. Diamond, R. and P. Persson (2016). The long-term consequences of teacher discretion in grading of high-stakes tests. Technical report, National Bureau of Economic Research. Figlio, D. N. (2006). Testing, crime and punishment. Journal of Public Economics 90 (4), 837–851. Grissom, J. A., D. Kalogrides, and S. Loeb (2014). Using student test scores to measure principal performance. Educational Evaluation and Policy Analysis XX (X), 1–26. Hanna, R. N. and L. L. Linden (2012). Discrimination in grading. American Economic Journal: Economic Policy 4 (4), 146–168. Hanushek, E. (1971). Teacher characteristics and gains in student achievement: Estimation using micro data. The American Economic Review 61 (2), 280–288. Hinnerich, B. T., E. Höglin, and M. Johannesson (2011). Are boys discriminated in swedish high schools? Economics of Education review 30 (4), 682–690. Hussain, I. (2015). Subjective performance evaluation in the public sector evidence from school inspections. Journal of Human Resources 50 (1), 189–221. Jacob, B. A. and S. D. Levitt (2003). Rotten apples: An investigation of the prevalence and predictors of teacher cheating. The Quarterly Journal of Economics 118 (3), 843–877. Lavy, V. (2008). Do gender stereotypes reduce girls’ or boys’ human capital outcomes? evidence from a natural experiment. Journal of public Economics 92 (10), 2083–2105. Lavy, V. and E. Sand (2015). On the origins of gender human capital gaps: Short and long term consequences of teachers’ stereotypical biases. Technical report, National Bureau of Economic Research. Levitt, S. D. and M.-J. Lin (2015). Catching cheating students. Technical report, National Bureau of Economic Research. Lucifora, C. and M. Tonello (2015). Cheating and social interactions: Evidence from a randomized experiment in a national evaluation program. Journal of Economic Behavior and Organization 115 (C), 45–66.

28

Machin, S. and T. Pekkarinen (2008). Science 322 (5906), 1331–1332.

Global sex differences in test score variability.

Nelsen, R. B. (2013). An introduction to copulas, Volume 139. Springer Science & Business Media. Paccagnella, M. and P. Sestito (2014). Economics 22 (4), 367–388.

School cheating and social capital.

Education

Pereda-Fernández, S. (2016). Copula-based random effects models for clustered data. Technical report, Bank of Italy Temi di Discussione (Working Paper) No 1092. Quintano, C., R. Castellano, and S. Longobardi (2009). A fuzzy clustering approach to improve the accuracy of italian student data: An experimental procedure to correct the impact of outliers on assessment test scores. Statistica Applicata 7 (2), 149–171. Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American Economic Review 94 (2), 247–252. Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. The Quarterly Journal of Economics 125 (1), 175–214. Rothstein, J. (2015). Revisiting the impacts of teachers. Unpublished working paper. Sklar, M. (1959). Fonctions de répartition à n dimensions et leurs marges. Publications de l’Istitut de Statistique de l’Universitè de Paris 8, 229–231. Sprietsma, M. (2013). Discrimination in grading: experimental evidence from primary school teachers. Empirical Economics 45 (1), 523–538. Wei, Y. and R. J. Carroll (2009). Quantile regression with measurement error. Journal of the American Statistical Association 104 (487), 1129–1143.

29

Appendix A

Some linear algebra results

Let z be a vector of dimension T , Z be the matrix whose main diagonal are the elements of vector z, and the off diagonal elements all equal zero, ιT a vector of ones of dimension T , and G be a T × T matrix whose (i, j) element equals 1 (i < j), i.e.

the elements

below the main diagonal equal one, and the remaining elements equal zero. Then, the sum of the permutations of r ≤ T distinct elements from z is given by 0 for r = 0, and Q PK−r+1 PT ... kr =kr−1 +1 rj=1 zkj = ι0T (ZG)r−1 ZιT for 1 ≤ r ≤ T . Now consider equation 5. k1 =1 If the distribution of εicq is logistic, then the probability of a particular result, (b1 , ..., bNc ), can be written as

ˆ P (b) =

 q=1 biq (ηic + ξq )

P

Nc i=1

[0,1]Nc

exp QN c QQ

dC (uc ; ρ) (1 + exp (ηic + ξq )) ≥ rNc ), i.e. the probability that each student in class i=1

To compute P (R1 ≥ r1 , ..., RNc

PQ

q=1

c gets at least at many correct answers as they actually got, the preceding trick can be combined with the numerical approximation of the integral with respect to the copula to obtain an estimate of the aforementioned probability, which would be exact if not for the integral. Formally, ˆ P (R1 ≥ r1 , ..., RNc ≥ rNc ) =

X b1 ∈B r1

ˆ

= ≈

X

...

bNc ∈B rN

[0,1]Nc

 P P Nc Q exp b (η + ξ ) iq ic q i=1 q=1 dC (uc ; ρ) QNc QQ q=1 (1 + exp (ηic + ξq )) i=1

c

PQ

s−1 0 Zic ιQ s=ric ιQ (Zic G) dC (uc ; ρ) QQ [0,1]Nc i=1 q=1 (1 + exp (ηic + ξq )) " # s−1 N1 Y N2 PQ Nc 0 Zicjh ιQ 1 X 1 X s=ric ιQ (Zicjh G) QQ N1 j=1 i=1 N2 h=2 q=1 (1 + exp (ηjh + ξq )) Nc Y

where Zic and Zicjh are the diagonal matrices whose (q, q) element equal exp (ηic + ξq ) and exp (ηjh + ξq ), respectively.30 The approximation in the last row uses the algorithm  Inclusion of covariates is straightforward and is achieved by letting zq = exp η + ξq + x01ic β + x02icq ζq ,  QQ and substituting the denominator by q=1 exp η + ξq + x01ic β + x02icq ζq . 30

30

presented in Pereda-Fernández (2016), which evaluates the integral at a set of points that depend on N1 and N2 .

B

Full Results Table 6: RE logit estimates ξˆEX > ξˆIN ξˆEX < ξˆIN ξˆEX = ξˆIN

2nd grade M I 0 0 32 39 0 0

5th grade M I 4 0 37 82 6 0

6th grade M I 29 5 10 54 9 12

8th grade M I 2 9 36 37 7 32

10th grade M I 3 0 40 88 7 0

Notes: EX and IN respectively denote the groups with the external and the internal monitor. A coefficient is considered as larger than the other if it is significantly larger at the 95% confidence level, and equal if none is statistically larger than the other.

Table 7: Correlation between RE logit estimates and conditional FE logit estimates EX IN ∆

2nd grade M I 1.00 0.87 0.99 1.00 0.98 0.07

5th grade M I 1.00 0.98 0.98 0.95 0.33 0.62

6th grade M I 1.00 0.97 1.00 1.00 0.94 0.38

8th grade M I 0.97 1.00 1.00 1.00 -0.91 -0.51

10th M 1.00 0.99 0.98

grade I 0.95 1.00 0.03

Notes: EX and IN respectively denote the groups with the external and the internal monitor.

Table 8: Comparison between RE logit estimates and logit estimates EX,6= EX,= IN,6= IN,=

2nd grade M I 30 35 2 4 32 39 0 0

5th grade M I 40 78 7 4 46 82 1 0

6th grade M I 33 70 15 1 45 71 3 0

8th grade M I 37 74 8 4 45 77 0 1

10th grade M I 43 83 7 5 47 88 3 0

Notes: EX and IN respectively denote the groups with the external and the internal monitor; = and 6= respectively denote that the coefficients are significantly equal or different at the 95% level of confidence. The quantities represent the number of questions that fit into each category for each exam.

31

Table 9: RE & CBRE logit estimates, 2nd grade mathematics exam (1)

(2)

(3)

(4)

(5)

(6)

FE

0.04*** (0.00) -

SI

-

-

IT

-

-

0.06*** (0.00) -0.01*** (0.00) 0.00 (0.00) -0.02*** (0.00) -

SMALL

-

-

-

σ ˆη

1.05*** (0.01) -

1.09*** (0.01) -

1.10*** (0.01) -

0.00 (0.01) -0.02*** (0.00) 0.00 (0.00) -0.04*** (0.00) 0.08*** (0.01) -0.01*** (0.00) 1.12*** (0.01) -

0.07*** (0.01) -

CE

0.05*** (0.00) -0.01*** (0.00) -

-0.02 (0.01) -0.01*** (0.00) 0.05*** (0.01) 0.04*** (0.01) 0.07*** (0.01) -0.04*** (0.01) 1.62*** (0.02) 0.39*** (0.02)

FE

0.10*** (0.00) -

SI

-

-

IT

-

-

0.10*** (0.00) -0.01*** (0.00) 0.02*** (0.00) 0.05*** (0.00) -

SMALL

-

-

-

σ ˆη

1.21*** (0.00) -

1.26*** (0.00) -

1.24*** (0.00) -

0.04*** (0.00) -0.01*** (0.00) 0.03*** (0.00) 0.06*** (0.00) 0.06*** (0.00) 0.00*** (0.00) 1.24*** (0.00) -

0.17*** (0.00) -

CE

0.12*** (0.00) 0.00*** (0.00) -

FE

-0.06*** (0.00) -

CE

-

-0.07*** (0.00) -0.01*** (0.00) -

SI

-

-

IT

-

-

-0.04*** (0.00) -0.01*** (0.00) -0.02*** (0.00) -0.07*** (0.00) -

SMALL

-

-

-

σ ˆη

-0.16*** (0.01) -

-0.16*** (0.01) -

-0.14*** (0.01) -

ˆξ

External

ρˆ ˆξ

Internal

ρˆ ˆξ

Difference

ρˆ

-0.04*** (0.01) -0.01*** (0.00) -0.03*** (0.00) -0.10*** (0.00) 0.02*** (0.01) -0.02*** (0.00) -0.12*** (0.01) -

1.04*** (0.02) 2.65*** (0.13)

1.02*** (0.00) 1.20*** (0.00) -0.10*** (0.01) 0.03 (0.02) 1.45*** (0.13)

0.03*** (0.00) 0.00*** (0.00) 0.04*** (0.00) 0.09*** (0.00) 0.07*** (0.00) 0.00*** (0.00) 1.12*** (0.00) 0.73*** (0.00) -0.05*** (0.01) -0.01** (0.00) 0.01 (0.01) -0.04*** (0.01) 0.00 (0.01) -0.04*** (0.01) 0.51*** (0.02) -0.35*** (0.02)

Notes: FE is the mean of the interaction between a dummy for female students and the question dummies, CE, SI, IT, and SMALL are dummies for Center region, South & Islands region, natives, and small class. Columns 1-4 show the APE of the covariates and the estimates of ση with the RE Logit estimator (equation 4); columns 5-6 show the same estimates and those of ρ with the CBRE estimator (equation 5). *, **, and *** denote statistically significance at 32 the 90%, 95%, and 99% level, respectively, standard errors in parentheses.

Table 10: RE & CBRE logit estimates, 5th grade mathematics exam (1)

(2)

(3)

(4)

(5)

(6)

FE

0.06*** (0.00) -

SI

-

-

IT

-

-

0.08*** (0.00) -0.03*** (0.00) -0.01*** (0.00) -0.04*** (0.00) -

SMALL

-

-

-

σ ˆη

0.87*** (0.01) -

0.96*** (0.01) -

0.95*** (0.01) -

0.01*** (0.01) -0.04*** (0.00) -0.01*** (0.00) -0.06*** (0.00) 0.08*** (0.00) -0.02*** (0.00) 0.93*** (0.01) -

0.06*** (0.01) -

CE

0.06*** (0.00) -0.03*** (0.00) -

0.10*** (0.01) -0.04*** (0.00) -0.02*** (0.00) -0.07*** (0.00) 0.08*** (0.00) -0.09*** (0.00) 0.92*** (0.01) 2.14*** (0.08)

FE

0.09*** (0.00) -

CE

-

0.10*** (0.00) -0.03*** (0.00) -

SI

-

-

IT

-

-

0.10*** (0.00) -0.03*** (0.00) 0.00*** (0.00) 0.00*** (0.00) -

SMALL

-

-

-

σ ˆη

1.03*** (0.00) -

1.12*** (0.00) -

1.10*** (0.00) -

FE

-0.03*** (0.00) -

CE

-

-0.04*** (0.00) -0.01** (0.00) -

SI

-

-

IT

-

-

-0.02*** (0.00) 0.00 (0.00) -0.01*** (0.00) -0.04*** (0.00) -

SMALL

-

-

-

σ ˆη

-0.15*** (0.01) -

-0.15*** (0.01) -

-0.15*** (0.01) -

ˆξ

External

ρˆ ˆξ

Internal

ρˆ ˆξ

Difference

ρˆ

0.03*** (0.00) -0.03*** (0.00) 0.00*** (0.00) -0.01*** (0.00) 0.07*** (0.00) 0.00*** (0.00) 1.10*** (0.00) -0.02*** (0.01) -0.01*** (0.00) -0.02*** (0.00) -0.05*** (0.00) 0.01 (0.01) -0.01*** (0.00) -0.16*** (0.01) -

0.84*** (0.01) 3.89*** (0.15) 0.11*** (0.00) 0.92*** (0.00) 1.96*** (0.01) -0.05*** (0.01) -0.09*** (0.01) 1.93*** (0.15)

0.03*** (0.00) -0.02*** (0.00) 0.04*** (0.00) 0.05*** (0.00) 0.07*** (0.00) -0.02*** (0.00) 1.06*** (0.00) 0.83*** (0.01) 0.07*** (0.01) -0.02*** (0.00) -0.06*** (0.00) -0.12*** (0.00) 0.01*** (0.00) -0.07*** (0.00) -0.13*** (0.01) 1.31*** (0.08)

Notes: FE is the mean of the interaction between a dummy for female students and the question dummies, CE, SI, IT, and SMALL are dummies for Center region, South & Islands region, natives, and small class. Columns 1-4 show the APE of the covariates and the estimates of ση with the RE Logit estimator (equation 4); columns 5-6 show the same estimates and those of ρ with the CBRE estimator (equation 5). *, **, and *** denote statistically significance at 33 the 90%, 95%, and 99% level, respectively, standard errors in parentheses.

Table 11: RE & CBRE logit estimates, 6th grade mathematics exam (1)

(2)

(3)

(4)

(5)

(6)

FE

-0.05*** (0.00) -

SI

-

-

IT

-

-

0.03*** (0.00) -0.02*** (0.00) -0.04*** (0.00) -0.13*** (0.00) -

SMALL

-

-

-

σ ˆη

0.82*** (0.01) -

0.81*** (0.01) -

0.56*** (0.00) -

-0.13*** (0.00) -0.04*** (0.00) -0.02*** (0.00) -0.05*** (0.00) 0.13*** (0.00) -0.03*** (0.00) 0.53*** (0.00) -

-0.06*** (0.00) -

CE

-0.03*** (0.00) -0.03*** (0.00) -

-0.12*** (0.00) -0.04*** (0.00) -0.02*** (0.00) -0.05*** (0.00) 0.13*** (0.00) -0.04*** (0.00) 0.51*** (0.00) 2.93*** (0.03)

FE

-0.06*** (0.00) -

SI

-

-

IT

-

-

0.03*** (0.00) -0.02*** (0.00) -0.04*** (0.00) -0.11*** (0.00) -

SMALL

-

-

-

σ ˆη

0.61*** (0.00) -

0.82*** (0.00) -

0.54*** (0.00) -

-0.12*** (0.00) -0.03*** (0.00) -0.01*** (0.00) -0.04*** (0.00) 0.12*** (0.00) -0.03*** (0.00) 0.52*** (0.00) -

-0.06*** (0.00) -

CE

-0.03*** (0.00) -0.02*** (0.00) -

FE

0.01*** (0.00) -

SI

-

-

IT

-

-

0.00*** (0.00) 0.00 (0.00) 0.00* (0.00) -0.02*** (0.00) -

SMALL

-

-

-

σ ˆη

0.21*** (0.01) -

-0.01 (0.01) -

0.01*** (0.00) -

0.00* (0.00) -0.01*** (0.00) 0.00 (0.00) -0.01*** (0.00) 0.01*** (0.00) 0.00*** (0.00) 0.00 (0.00) -

0.00 (0.00) -

CE

0.00 (0.00) -0.01*** (0.00) -

ˆξ

External

ρˆ ˆξ

Internal

ρˆ ˆξ

Difference

ρˆ

0.69*** (0.00) 3.08*** (0.04)

0.59*** (0.00) 2.78*** (0.01)

0.10*** (0.00) 0.30*** (0.05)

-0.12*** (0.00) -0.03*** (0.00) -0.02*** (0.00) -0.04*** (0.00) 0.12*** (0.00) -0.04*** (0.00) 0.51*** (0.00) 2.62*** (0.01) 0.00* (0.00) -0.01*** (0.00) 0.00 (0.00) -0.01*** (0.00) 0.01*** (0.00) 0.00 (0.00) 0.00 (0.00) 0.30*** (0.04)

Notes: FE is the mean of the interaction between a dummy for female students and the question dummies, CE, SI, IT, and SMALL are dummies for Center region, South & Islands region, natives, and small class. Columns 1-4 show the APE of the covariates and the estimates of ση with the RE Logit estimator (equation 4); columns 5-6 show the same estimates and those of ρ with the CBRE estimator (equation 5). *, **, and *** denote statistically significance at 34 the 90%, 95%, and 99% level, respectively, standard errors in parentheses.

Table 12: RE & CBRE logit estimates, 8th grade mathematics exam (1)

(2)

(3)

(4)

(5)

(6)

FE

0.01*** (0.00) -

SI

-

-

IT

-

-

0.03*** (0.00) -0.04*** (0.00) 0.00 (0.00) -0.02*** (0.00) -

SMALL

-

-

-

σ ˆη

0.91*** (0.01) -

0.97*** (0.01) -

0.96*** (0.01) -

-0.05*** (0.00) -0.04*** (0.00) -0.01* (0.00) -0.03*** (0.00) 0.10*** (0.00) -0.02*** (0.00) 0.94*** (0.01) -

0.05*** (0.00) -

CE

0.03*** (0.00) -0.04*** (0.00) -

-0.01** (0.00) -0.04*** (0.00) 0.01 (0.01) 0.00 (0.00) 0.08*** (0.00) -0.07*** (0.00) 0.84*** (0.00) 0.99*** (0.02)

FE

0.03*** (0.00) -

SI

-

-

IT

-

-

0.03*** (0.00) -0.03*** (0.00) 0.00** (0.00) 0.01*** (0.00) -

SMALL

-

-

-

σ ˆη

0.83*** (0.00) -

0.93*** (0.00) -

0.93*** (0.00) -

-0.04*** (0.00) -0.03*** (0.00) 0.00 (0.00) 0.00 (0.00) 0.09*** (0.00) -0.02*** (0.00) 0.92*** (0.00) -

0.03*** (0.00) -

CE

0.03*** (0.00) -0.03*** (0.00) -

FE

-0.02*** (0.00) -

SI

-

-

IT

-

-

0.00 (0.00) -0.01*** (0.00) 0.00 (0.00) -0.02*** (0.00) -

SMALL

-

-

-

σ ˆη

0.07*** (0.01) -

0.04*** (0.01) -

0.03*** (0.01) -

-0.01 (0.01) -0.01*** (0.00) -0.01* (0.00) -0.03*** (0.00) 0.01** (0.00) 0.00 (0.00) 0.02* (0.01) -

0.02*** (0.00) -

CE

-0.01*** (0.00) -0.01*** (0.00) -

ˆξ

External

ρˆ ˆξ

Internal

ρˆ ˆξ

Difference

ρˆ

0.71*** (0.00) 1.97*** (0.01)

0.72*** (0.00) 2.19*** (0.01)

-0.01*** (0.00) -0.21*** (0.01)

-0.02*** (0.00) -0.03*** (0.00) 0.03*** (0.00) 0.04*** (0.00) 0.08*** (0.00) -0.07*** (0.00) 0.86*** (0.00) 1.31*** (0.01) 0.01** (0.00) -0.01*** (0.00) -0.02*** (0.01) -0.04*** (0.00) 0.00 (0.00) 0.00 (0.00) -0.01*** (0.00) -0.33*** (0.02)

Notes: FE is the mean of the interaction between a dummy for female students and the question dummies, CE, SI, IT, and SMALL are dummies for Center region, South & Islands region, natives, and small class. Columns 1-4 show the APE of the covariates and the estimates of ση with the RE Logit estimator (equation 4); columns 5-6 show the same estimates and those of ρ with the CBRE estimator (equation 5). *, **, and *** denote statistically significance at 35 the 90%, 95%, and 99% level, respectively, standard errors in parentheses.

Table 13: RE & CBRE logit estimates, 2nd grade Italian exam (1)

(2)

(3)

(4)

(5)

(6)

FE

0.10*** (0.00) -

SI

-

-

IT

-

-

0.12*** (0.00) 0.02*** (0.00) -0.01* (0.00) -0.02*** (0.00) -

SMALL

-

-

-

σ ˆη

0.78*** (0.01) -

0.85*** (0.01) -

0.83*** (0.01) -

0.05*** (0.01) 0.02*** (0.00) -0.01** (0.00) -0.03*** (0.00) 0.08*** (0.00) -0.01*** (0.00) 0.81*** (0.01) -

0.13*** (0.00) -

CE

0.11*** (0.00) 0.02*** (0.00) -

0.04*** (0.00) 0.01*** (0.00) -0.01* (0.00) -0.01*** (0.00) 0.08*** (0.00) -0.02*** (0.00) 0.77*** (0.00) 0.24*** (0.01)

FE

0.17*** (0.00) -

CE

-

0.16*** (0.00) 0.02*** (0.00) -

SI

-

-

IT

-

-

0.15*** (0.00) 0.02*** (0.00) 0.01*** (0.00) 0.04*** (0.00) -

SMALL

-

-

-

σ ˆη

0.95*** (0.00) -

0.93*** (0.00) -

0.91*** (0.00) -

FE

-0.07*** (0.00) -

CE

-

-0.06*** (0.00) -0.01*** (0.00) -

SI

-

-

IT

-

-

-0.02*** (0.00) 0.00 (0.00) -0.02*** (0.00) -0.06*** (0.00) -

SMALL

-

-

-

σ ˆη

-0.16*** (0.01) -

-0.08*** (0.01) -

-0.08*** (0.01) -

ˆξ

External

ρˆ ˆξ

Internal

ρˆ ˆξ

Difference

ρˆ

0.09*** (0.00) 0.02*** (0.00) 0.01*** (0.00) 0.04*** (0.00) 0.05*** (0.00) 0.00** (0.00) 0.90*** (0.00) -0.04*** (0.01) 0.00 (0.00) -0.02*** (0.00) -0.06*** (0.00) 0.03*** (0.00) -0.01*** (0.00) -0.09*** (0.01) -

0.66*** (0.00) 0.50*** (0.01) 0.23*** (0.00) 0.72*** (0.00) 0.94*** (0.00) -0.10*** (0.00) -0.06*** (0.00) -0.45*** (0.01)

0.09*** (0.00) 0.02*** (0.00) 0.02*** (0.00) 0.06*** (0.00) 0.06*** (0.00) -0.01*** (0.00) 0.81*** (0.00) 0.53*** (0.00) -0.05*** (0.00) 0.00** (0.00) -0.03*** (0.00) -0.06*** (0.00) 0.02*** (0.00) -0.01*** (0.00) -0.03*** (0.00) -0.29*** (0.01)

Notes: FE is the mean of the interaction between a dummy for female students and the question dummies, CE, SI, IT, and SMALL are dummies for Center region, South & Islands region, natives, and small class. Columns 1-4 show the APE of the covariates and the estimates of ση with the RE Logit estimator (equation 4); columns 5-6 show the same estimates and those of ρ with the CBRE estimator (equation 5). *, **, and *** denote statistically significance at 36 the 90%, 95%, and 99% level, respectively, standard errors in parentheses.

Table 14: RE & CBRE logit estimates, 5th grade Italian exam (1)

(2)

(3)

(4)

(5)

(6)

FE

0.22 (0.00) -

SI

-

-

IT

-

-

0.25*** (0.00) 0.02*** (0.00) 0.00 (0.00) -0.03*** (0.00) -

SMALL

-

-

-

σ ˆη

1.07*** (0.01) -

1.08*** (0.01) -

1.07*** (0.01) -

0.21*** (0.00) 0.03*** (0.00) 0.00* (0.00) -0.04*** (0.00) 0.04*** (0.00) -0.01*** (0.00) 1.05*** (0.01) -

0.23*** (0.00) -

CE

0.23*** (0.00) 0.02*** (0.00) -

0.20*** (0.00) 0.03*** (0.00) 0.00*** (0.00) -0.04*** (0.00) 0.05*** (0.00) -0.03*** (0.00) 1.01*** (0.01) 3.51*** (0.02)

FE

0.26*** (0.00) -

SI

-

-

IT

-

-

0.26*** (0.00) 0.03*** (0.00) 0.00*** (0.00) -0.01*** (0.00) -

SMALL

-

-

-

σ ˆη

1.09*** (0.00) -

1.09*** (0.00) -

1.08*** (0.00) -

0.24*** (0.00) 0.03*** (0.00) 0.00*** (0.00) -0.01*** (0.00) 0.02*** (0.00) 0.00*** (0.00) 1.03*** (0.00) -

0.26*** (0.00) -

CE

0.26*** (0.00) 0.02*** (0.00) -

FE

-0.03*** (0.00) -

CE

-

-0.03*** (0.00) 0.00*** (0.00) -

SI

-

-

IT

-

-

-0.02*** (0.00) -0.01*** (0.00) 0.00 (0.00) -0.02*** (0.00) -

SMALL

-

-

-

σ ˆη

-0.02** (0.01) -

-0.01 (0.01) -

-0.01 (0.01) -

ˆξ

External

ρˆ ˆξ

Internal

ρˆ ˆξ

Difference

ρˆ

-0.03*** (0.00) 0.00*** (0.00) -0.01*** (0.00) -0.03*** (0.00) 0.02*** (0.00) 0.00*** (0.00) 0.02*** (0.01) -

1.03*** (0.01) 3.80*** (0.02)

1.04*** (0.00) 2.84*** (0.01) -0.03*** (0.00) -0.01 (0.01) 0.96*** (0.02)

0.25*** (0.00) 0.03*** (0.00) 0.00*** (0.00) -0.01*** (0.00) 0.02*** (0.00) -0.01*** (0.00) 0.99*** (0.00) 2.77*** (0.00) -0.05*** (0.00) 0.00*** (0.00) -0.01*** (0.00) -0.03*** (0.00) 0.03*** (0.00) -0.02*** (0.00) 0.02*** (0.01) 0.74*** (0.02)

Notes: FE is the mean of the interaction between a dummy for female students and the question dummies, CE, SI, IT, and SMALL are dummies for Center region, South & Islands region, natives, and small class. Columns 1-4 show the APE of the covariates and the estimates of ση with the RE Logit estimator (equation 4); columns 5-6 show the same estimates and those of ρ with the CBRE estimator (equation 5). *, **, and *** denote statistically significance at 37 the 90%, 95%, and 99% level, respectively, standard errors in parentheses.

Table 15: RE & CBRE logit estimates, 6th grade Italian exam (1)

(2)

(3)

(4)

(5)

(6)

FE

0.14*** (0.00) -

SI

-

-

IT

-

-

0.17*** (0.00) 0.01*** (0.00) 0.00* (0.00) -0.03*** (0.00) -

SMALL

-

-

-

σ ˆη

1.06*** (0.01) -

1.05*** (0.01) -

0.98*** (0.01) -

0.10*** (0.00) 0.02*** (0.00) -0.01*** (0.00) -0.05*** (0.00) 0.07*** (0.00) -0.01*** (0.00) 0.84*** (0.01) -

0.21*** (0.00) -

CE

0.15*** (0.00) 0.01*** (0.00) -

0.11*** (0.00) 0.02*** (0.00) -0.01*** (0.00) -0.05*** (0.00) 0.08*** (0.00) -0.03*** (0.00) 0.79*** (0.00) 0.03*** (0.00)

FE

0.16*** (0.00) -

CE

-

0.15*** (0.00) 0.02*** (0.00) -

SI

-

-

IT

-

-

0.16*** (0.00) 0.03*** (0.00) -0.01*** (0.00) -0.03*** (0.00) -

SMALL

-

-

-

σ ˆη

0.96*** (0.00) -

0.91*** (0.00) -

0.95*** (0.00) -

FE

-0.02*** (0.00) -

CE

-

0.00** (0.00) -0.01*** (0.00) -

SI

-

-

IT

-

-

0.01*** (0.00) -0.01*** (0.00) 0.00 (0.00) 0.00 (0.00) -

SMALL

-

-

-

σ ˆη

0.10*** (0.01) -

0.14*** (0.01) -

0.02*** (0.01) -

ˆξ

External

ρˆ ˆξ

Internal

ρˆ ˆξ

Difference

ρˆ

0.10*** (0.00) 0.03*** (0.00) -0.01*** (0.00) -0.05*** (0.00) 0.07*** (0.00) -0.01*** (0.00) 0.89*** (0.00) 0.00 (0.00) -0.02*** (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) -0.05*** (0.01) -

1.89*** (0.01) 0.03*** (0.00) 0.16*** (0.00) 0.91*** (0.00) 2.90*** (0.01) 0.05*** (0.00) 0.99*** (0.01) -2.87*** (0.01)

0.12*** (0.00) 0.03*** (0.00) -0.01*** (0.00) -0.04*** (0.00) 0.07*** (0.00) -0.03*** (0.00) 0.84*** (0.00) 2.87*** (0.01) -0.01*** (0.00) -0.01*** (0.00) 0.00*** (0.00) -0.01*** (0.00) 0.01*** (0.00) 0.00*** (0.00) -0.05*** (0.00) -2.84*** (0.01)

Notes: FE is the mean of the interaction between a dummy for female students and the question dummies, CE, SI, IT, and SMALL are dummies for Center region, South & Islands region, natives, and small class. Columns 1-4 show the APE of the covariates and the estimates of ση with the RE Logit estimator (equation 4); columns 5-6 show the same estimates and those of ρ with the CBRE estimator (equation 5). *, **, and *** denote statistically significance at 38 the 90%, 95%, and 99% level, respectively, standard errors in parentheses.

Table 16: RE & CBRE logit estimates, 8th grade Italian exam (1)

(2)

(3)

(4)

(5)

(6)

FE

0.23*** (0.00) -

SI

-

-

IT

-

-

0.22*** (0.00) 0.04*** (0.00) 0.00** (0.00) -0.01*** (0.00) -

SMALL

-

-

-

σ ˆη

0.89*** (0.01) -

0.93*** (0.01) -

0.92*** (0.01) -

0.14*** (0.00) 0.04*** (0.00) 0.00 (0.00) -0.02*** (0.00) 0.08*** (0.00) -0.02*** (0.00) 0.88*** (0.01) -

0.23*** (0.00) -

CE

0.21*** (0.00) 0.03*** (0.00) -

0.15*** (0.00) 0.04*** (0.00) 0.01*** (0.00) -0.01*** (0.00) 0.08*** (0.00) -0.03*** (0.00) 0.82*** (0.00) 3.35*** (0.04)

FE

0.24*** (0.00) -

CE

-

0.22*** (0.00) 0.03*** (0.00) -

SI

-

-

IT

-

-

0.22*** (0.00) 0.04*** (0.00) 0.01*** (0.00) 0.00 (0.00) -

SMALL

-

-

-

σ ˆη

0.88*** (0.00) -

0.92*** (0.00) -

0.91*** (0.00) -

FE

-0.01*** (0.00) -

CE

-

-0.01*** (0.00) 0.00 (0.00) -

SI

-

-

IT

-

-

0.00 (0.00) 0.00** (0.00) 0.00 (0.00) -0.01*** (0.00) -

SMALL

-

-

-

σ ˆη

0.01 (0.01) -

0.01 (0.01) -

0.01 (0.01) -

ˆξ

External

ρˆ ˆξ

Internal

ρˆ ˆξ

Difference

ρˆ

0.15*** (0.00) 0.04*** (0.00) 0.00*** (0.00) -0.01*** (0.00) 0.07*** (0.00) -0.02*** (0.00) 0.88*** (0.00) -0.01*** (0.00) 0.00** (0.00) 0.00* (0.00) -0.01*** (0.00) 0.01*** (0.00) 0.00 (0.00) 0.00 (0.01) -

0.86*** (0.00) 3.86*** (0.04) 0.24*** (0.00) 0.87*** (0.00) 2.88*** (0.00) -0.01*** (0.00) 0.00 (0.00) 0.98*** (0.04)

0.16*** (0.00) 0.04*** (0.00) 0.00*** (0.00) 0.00*** (0.00) 0.06*** (0.00) -0.02*** (0.00) 0.85*** (0.00) 2.76*** (0.00) -0.01*** (0.00) 0.00*** (0.00) 0.00 (0.00) -0.01*** (0.00) 0.01*** (0.00) -0.01*** (0.00) -0.03*** (0.00) 0.59*** (0.04)

Notes: FE is the mean of the interaction between a dummy for female students and the question dummies, CE, SI, IT, and SMALL are dummies for Center region, South & Islands region, natives, and small class. Columns 1-4 show the APE of the covariates and the estimates of ση with the RE Logit estimator (equation 4); columns 5-6 show the same estimates and those of ρ with the CBRE estimator (equation 5). *, **, and *** denote statistically significance at 39 the 90%, 95%, and 99% level, respectively, standard errors in parentheses.

Table 17: RE & CBRE logit estimates, 10th grade Italian exam (1)

(2)

(3)

(4)

(5)

(6)

FE

0.08*** (0.00) -

SI

-

-

IT

-

-

0.83*** (0.00) 0.11*** (0.00) -0.11*** (0.01) -0.24*** (0.01) -

SMALL

-

-

-

σ ˆη

0.97*** (0.01) -

0.92*** (0.01) -

0.93*** (0.01) -

0.88*** (0.00) 0.08*** (0.00) -0.20*** (0.01) -0.45*** (0.01) 0.42*** (0.01) -0.07*** (0.00) 0.81*** (0.00) -

0.10*** (0.00) -

CE

0.60*** (0.00) 0.05*** (0.00) -

0.20*** (0.00) 0.01*** (0.00) -0.05*** (0.00) -0.10*** (0.00) 0.07*** (0.00) -0.10*** (0.00) 0.73*** (0.00) 3.14*** (0.01)

FE

0.67*** (0.00) -

CE

-

0.77*** (0.00) 0.04*** (0.00) -

SI

-

-

IT

-

-

0.90*** (0.00) 0.10*** (0.00) -0.07*** (0.00) -0.10*** (0.00) -

SMALL

-

-

-

σ ˆη

1.04*** (0.00) -

1.00*** (0.00) -

1.01*** (0.00) -

FE

-0.58*** (0.00) -

CE

-

-0.17*** (0.00) 0.00** (0.00) -

SI

-

-

IT

-

-

-0.07*** (0.00) 0.00*** (0.00) -0.04*** (0.01) -0.14*** (0.01) -

SMALL

-

-

-

σ ˆη

-0.07*** (0.01) -

-0.08*** (0.01) -

-0.08*** (0.01) -

ˆξ

External

ρˆ ˆξ

Internal

ρˆ ˆξ

Difference

ρˆ

0.92*** (0.00) 0.12*** (0.00) -0.13*** (0.00) -0.17*** (0.00) 0.20*** (0.00) -0.04*** (0.00) 0.94*** (0.00) -0.05*** (0.00) -0.04*** (0.00) -0.08*** (0.01) -0.28*** (0.01) 0.22*** (0.01) -0.03*** (0.00) -0.12*** (0.00) -

0.91*** (0.00) 3.72*** (0.02) 0.14*** (0.00) 1.03*** (0.00) 1.53*** (0.00) -0.04*** (0.00) -0.12*** (0.00) 2.19*** (0.02)

0.14*** (0.00) 0.02*** (0.00) -0.03*** (0.00) -0.04*** (0.00) 0.06*** (0.00) -0.05*** (0.00) 1.01*** (0.00) 0.88*** (0.00) 0.06*** (0.00) -0.01*** (0.00) -0.02*** (0.00) -0.06*** (0.00) 0.02*** (0.00) -0.04*** (0.00) -0.27*** (0.00) 2.25*** (0.01)

Notes: FE is the mean of the interaction between a dummy for female students and the question dummies, CE, SI, IT, and SMALL are dummies for Center region, South & Islands region, natives, and small class. Columns 1-4 show the APE of the covariates and the estimates of ση with the RE Logit estimator (equation 4); columns 5-6 show the same estimates and those of ρ with the CBRE estimator (equation 5). *, **, and *** denote statistically significance at 40 the 90%, 95%, and 99% level, respectively, standard errors in parentheses.

Table 18: Linear correlation equivalent of the copula estimates, all exams EX IN

2nd grade M I 0.25 0.17 0.41 0.32

5th grade M I 0.73 0.84 0.45 0.79

6th grade M I 0.80 0.02 0.78 0.80

8th grade M I 0.50 0.83 0.58 0.79

10th grade M I 0.81 0.82 0.64 0.46

Notes: EX and IN respectively denote the groups with the external and the internal monitor. The coefficients equal the linear correlation of a Gaussian copula that yields the same value of the Kendall’s τ statistic as the estimates of the Clayton copula parameter.

CDF

Figure 10: Distribution of the likelihood by regions, 10th grade mathematics exam North South & Isles Center 1

1

1

0.5

0.5

0.5

PDF

0

0

0.5

1

0

3

3

2

2

1

1

0

0.5

1

0

0

0.5

1

0.5

1

2 1

0

0

0.5

Likelihood

1

0

0

0.5

1

0

0

Likelihood

EX and IN respectively denote the groups with the external and the internal monitor.

41

Likelihood

Figure 11: Correction for cheating, provincial variation, 2nd grade mathematics exam

42

Figure 12: Correction for cheating, provincial variation, 5th grade mathematics exam

43

Figure 13: Correction for cheating, provincial variation, 6th grade mathematics exam

44

Figure 14: Correction for cheating, provincial variation, 8th grade mathematics exam

45

Figure 15: Correction for cheating, provincial variation, 2nd grade Italian exam

46

Figure 16: Correction for cheating, provincial variation, 5th grade Italian exam

47

Figure 17: Correction for cheating, provincial variation, 6th grade Italian exam

48

Figure 18: Correction for cheating, provincial variation, 8th grade Italian exam

49

Figure 19: Correction for cheating, provincial variation, 10th grade Italian exam

50

C

Conditional Fixed Effects Approach

Consider the model given by 3. If εicq is logistically distributed, one can follow Chamberlain (1980) to overcome the incidental parameter problem, obtaining estimates of the question fixed effects.31 . Notice however, that because of multicollinearity, it is necessary to exclude one of the question effects for each group. Then, the interpretation of the remaining Q − 1 question effects is the difficulty of question q relative to the excluded question. In other words, we normalize the excluded question, q˜, to have ξq˜ = 0. Let Br be defined as the set of permutations of y such that the total number of correct o n P Q answers is r, i.e. Br ≡ b : q=1 bq = r .32 Under the assumption of no cheating, once the student-class effects are accounted for, the answers of two students are independent. Hence, the log-likelihood function is given by   C N C N C N c c c XX X XX XX 0 log  yic ξ− exp (b0 ξ) log [P (yic |ric )] = L (ξ) = c=1 i=1

C.1

c=1 i=1

c=1 i=1

(9)

b∈Bric

Results

Figure 20 shows the estimates of ξ for the mathematics exam of 10th graders.33 Similarly to figure 1, there is a weak pattern, as more difficult questions tend to have slightly larger differences between the treatment and control groups estimates. Further, the estimates of ξq are significantly different for the treatment and the control groups for 34 out of 49 questions, of which 29 show that the coefficient for the treatment group is significantly smaller. Moreover, although the coefficients are not directly comparable to the estimates shown in figure 3, the relation between the two of them is almost linear, with a correlation coefficient of approximately one for this exam, suggesting that the parametric assumption 31

As usual in this kind of setups, the identification relies on a parametric assumption of an unobservable variable that is not verifiable. As recently showed by Bonhomme (2012), it is possible to estimate the question fixed effects even if the parametric distribution of εicq is not logistic. However, given the large size of the data set, both in terms of number of students and of number of questions in an exam, assuming a distribution other than the logistic is computationally impractical.  32 The total number of permutations equals Q . r 33 Since I had to exclude one of the questions to avoid multicollinearity, and in order to make them as interpretable as possible, I excluded the question that was more frequently correctly answered.

51

does not play a big role in determining the value of the coefficients. These results are robust to most exams, as shown in table 19. Figure 20: Conditional FE logit estimates, 10th grade mathematics exam 0

EX IN

−1 −2 −3 −4 −5 −6

0 5 10 15 20 25 30 35 40 45 50 EX and IN respectively denote the groups with the external and the internal monitor. They are reported with the 95% confidence intervals, and are sorted by the proportion of students who answered them correctly in the treatment group.

Table 19: Conditional FE logit estimates ξˆEX > ξˆIN ξˆEX < ξˆIN ξˆEX = ξˆIN

2nd grade M I 4 3 6 18 21 17

5th grade M I 1 2 32 18 13 61

6th grade M I 0 2 39 8 8 60

8th grade M I 2 1 6 37 36 39

10th grade M I 6 4 29 13 14 70

Notes: EX and IN respectively denote the groups with the external and the internal monitor. A coefficient is considered as larger than the other if it is significantly larger at the 95% confidence level, and equal if none is statistically larger than the other.

Another alternative is to consider the estimation of the same coefficients for different demographic groups, such as gender. The comparison between the treatment and control groups for each of the genders is very similar to that of the whole population. However, if we compare the male and female estimates for each groups (table 20), and they show that even in the absence of manipulation, there are remarkable gender differences in performance, with male students performing relatively better than females for 17 questions, and the other way around for 16 questions. For the control group these differences are increased (26 and 19, 52

respectively), which could reflect both the manipulation of the test scores and the increase in the precision of the estimates derived from the increased sample size. Table 20: Conditional FE logit estimates by gender ξˆEX,M A ξˆEX,M A ξˆEX,M A ξˆIN,M A ξˆIN,M A ξˆIN,M A

> ξˆEX,F E < ξˆEX,F E = ξˆEX,F E > ξˆIN,F E < ξˆIN,F E = ξˆIN,F E

2nd grade M I 5 25 12 1 14 12 6 38 16 0 9 0

5th grade M I 24 16 0 1 22 64 26 50 8 12 12 19

6th grade M I 14 12 17 5 16 53 19 32 25 20 3 18

8th grade M I 18 39 5 0 21 38 28 40 11 18 5 19

10th M 17 16 16 26 17 6

grade I 13 55 19 18 62 7

Notes: EX and IN respectively denote the groups with the external and the internal monitor, whereas MA and FE denote male and female students. A coefficient is considered as larger than the other if it is significantly larger at the 95% confidence level, and equal if none is statistically larger than the other.

This result is robust to all exams, but not to all possible categories, as shown in tables 21 and 22. In particular, splitting the sample by class size leads to almost no differences in the estimates in the treatment group, but significant differences in the control group for most exams,34 and if we consider the three macro regions of Italy, we observe large differences between the estimates for the control groups in the North and South & Islands regions.35 Table 21: Conditional FE logit estimates by class size ξˆEX,SM ξˆEX,SM ξˆEX,SM ξˆIN,SM ξˆIN,SM ξˆIN,SM

> ξˆEX,LA < ξˆEX,LA = ξˆEX,LA > ξˆIN,LA < ξˆIN,LA = ξˆIN,LA

2nd grade M I 0 3 5 0 26 35 14 26 4 2 13 10

5th grade M I 15 0 0 0 41 81 22 64 1 0 23 17

6th grade M I 2 0 0 0 45 70 20 5 9 22 18 43

8th grade M I 4 0 2 0 38 77 31 16 5 3 8 58

10th M 15 6 28 28 6 15

grade I 11 32 44 21 48 18

Notes: EX and IN respectively denote the groups with the external and the internal monitor, whereas SM and LA denote that the students were in classrooms of size smaller or equal to the median, and larger. A coefficient is considered as larger than the other if it is significantly larger at the 95% confidence level, and equal if none is statistically larger than the other.

34

I consider two groups: those with a class size equal to or larger than the median for each grade (LARGE), and those with a smaller one (SMALL). 35 I consider three macro regions: North (Emilia Romagna, Friuli-Venezia Giulia, Liguria, Lombardia, Piemonte, Trentino-Alto Adige, Valle d’Aosta, and Veneto), Center (Lazio, Marche, Toscana, and Umbria), and South and Islands (Abruzzo, Basilicata, Calabria, Campania, Molise, Puglia, Sardegna, and Sicilia).

53

Table 22: Conditional FE logit estimates by region ξˆEX,N O ξˆEX,N O ξˆEX,N O ξˆIN,N O ξˆIN,N O ξˆIN,N O

> ξˆEX,SI < ξˆEX,SI = ξˆEX,SI > ξˆIN,SI < ξˆIN,SI = ξˆIN,SI

2nd grade M I 1 1 1 3 29 34 7 8 16 12 8 18

5th grade M I 1 0 0 16 44 65 1 0 35 76 10 5

6th grade M I 1 0 6 2 40 68 1 5 38 20 8 45

8th grade M I 1 0 14 10 29 67 0 0 43 65 1 12

10th M 0 14 35 3 39 7

grade I 0 30 57 2 71 14

Notes: EX and IN respectively denote the groups with the external and the internal monitor, whereas NO and SI denote that the students were from the North and South & Islands regions. A coefficient is considered as larger than the other if it is significantly larger at the 95% confidence level, and equal if none is statistically larger than the other.

D

Heterogeneous Question Fixed Effects

Equation 4 is based on the assumption that once the combined student-class effects are controlled for, there is no correlation in students’ answers, i.e. the question effects are homogeneous across all students. This assumption could be violated if teachers are more skilled to teach some particular topics than others, which would create correlation in the question effects among students within classrooms, even without manipulation. A way to overcome this would be to use the observations of a randomly chosen student from each classroom. Since the correlation is caused by the teacher, then students from different classrooms would be affected by a set of independent effects. Moreover, I avoid making any distributional assumption of the individual effects, for which I use the conditional fixed effects logit estimator, whose likelihood function is given by   C C C X X X X  L (β) = log P yi(c)c |ri(c)c = yi(c)c ξ − log  exp (bξ) c=1

c=1

c=1

(10)

b∈Bri(c)c

where i (c) denotes a randomly chosen student from class c. This strategy, unlike the precedent, does not provide a unique estimator, since there are as many as permutations Q of students: C c=1 Nc . Given the large number of possible estimates, I randomly select one student from each classroom M = 1000 times and then report the median estimate across repetitions. Regarding the confidence intervals, I use the 2.5 and 97.5 percentiles. The results are shown in table 23. For the treatment group, they are roughly the same as the

54

ones obtained by using all the students in each classroom, and only for one of the questions in the 10th grade Italian exam the estimates are significantly different. For the control group this is not always the case, and in two of the exams (8th and 10th grade Italian exams) the coefficients for the majority of the questions were significantly different. This reflects the manipulation, as well as the larger sample size for the control group, which tightens the confidence intervals. However, since the correction is based on the estimates with an external monitor, the possibility of having heterogeneous question fixed effects would have a modest impact on its reliability. Table 23: Conditional FE logit estimates, one student per classroom EX, 6= EX, = IN, 6= IN, =

2nd grade M I 0 0 31 46 7 7 24 39

5th grade M I 0 0 47 44 0 6 47 38

6th grade M I 0 0 49 38 12 12 37 26

8th grade M I 0 0 81 70 2 37 79 33

10th M 0 77 1 76

grade I 1 86 70 17

Notes: EX and IN respectively denote the groups with the external and the internal monitor, = and 6= respectively denote the number of coefficients whose 95% confidence intervals overlapped or did not overlap.

55

Teachers and Cheaters. Just an Anagram?

14 Jan 2017 - of test scores at different levels in primary and secondary education, and I propose a correction method. .... INVALSI is the Italian institute responsible for the design and administration of annually standardized tests ... required to take two tests, one in mathematics and another one in Italian language. Even.

2MB Sizes 1 Downloads 96 Views

Recommend Documents

Anagram, Weather.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect ...

CybERATTACk CAPAbiLiTiES - Anagram Laboratories
May 29, 2009 - For more information on CSTB, see its website at http://www.cstb.org, write to CSTB .... these knotty issues the best intellectual thought and consideration. A historical ..... 2.3.10 A Rapidly Changing and Changeable Technology and ..

Anagram, Dance.pdf
Page 2 of 2. Anagram Solutions. Solution to: Words related to DANCE. 1. disco. 2. rumba. 3. waltz. 4. jive. 5. quickstep. 6. bossa nova. 7. ballet. 8. foxtrot. 9. tango.

CybERATTACk CAPAbiLiTiES - Anagram Laboratories
May 29, 2009 - term that refers to the penetration of adversary computers and networks to obtain ..... 2.3.10 A Rapidly Changing and Changeable Technology and ...... est (e.g., the vice president's laptop, the SCADA systems controlling the.

Watch The Cheaters Club (2017) Full Movie Online Free ...
Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Watch The Cheaters Club (2017) Full Movie Online Free .Mp4____________.pdf. Watch The Cheaters Club

Teachers as Online Language Learners An ...
An interesting destination in my host country (with some relevant vocabulary words). A review of a film or video clip in this language (see the Activities). An interview with a native speaker of this language. A brief biography of a speaker of this l

Professional Teachers Practices and Standards Commission MacKay ...
Jan 14, 2016 - MacKay Building, First Floor, Conference Room 1. 800 Governors Drive, Pierre, South ... Call to Order and Roll Call. 2. Adoption of Agenda. 3.

DON'T JUST TAKE A VACATION, PLAN AN ...
Non-Diver's rate from $ 50 + 8% tax, based on double occupancy. ○ For a detailed itinerary, click Here… *Accepted forms of payment vary by trip. Inquire within. Room type: Price: Deposit Required: Partner Hotel: ... We will try to take you to one

Professional Teachers Practices and Standards Commission MacKay ...
Jan 14, 2016 - PROPOSED AGENDA. 1. Call to Order and Roll Call. 2. Adoption of Agenda. 3. Approve Minutes – November 12, 2015. 4. New Business.