Gender-Related Differences in Scores and Use of Time

Academically Talented Students’ on Tests of Spatial Ability

Heinrich Institute

Stumpf

for the Academic Advancement of Youth The Johns

(IAAY)

Hopkins University

ABSTRACT Gender-related differences in test scores of spatial ability have been ascribed to a tendency of females to take more time in working on such tasks, which is believed to be to their disadvantage in time-limited tests. This hypothesis was examined in a population of academically talented students who took four subtests on the computer of the Spatial Test Battery of the Institute for the Academic Advancement of Youth. Males had higher scores on three of these tests, females on another test which was a measure of visual memory. Females tended to take more time to work on the tests, even when their scores were higher than those of males, but this difference was substantial only for two of the tests. The time taken to work on the items was positively correlated with the scores on two of the tests. These results indicate that the amount of time taken can neither explain genderrelated score differences on spatial tests in general is the habit to use more time necessarily detrimental to test performance. The habit to work nor

quickly or slowly on spatial tests appeared to be a fairly general characteristic. It seems to be different from speed of cognitive processing. When asked to give ratings on their performance on the tests, females tended to estimate their scores more modestly than males, although females, like males, tended to overestimate their performance on two of the measures.

Spatiat ahility is a domain of ccynitivc functioning in which gender-related differences (in fav<»- of males) the differences havc been found rather often, ;ilth<>u,4h varv depel1llin,g on the population tested and the subskill of the abi)ity examined or the types of tcst items used (e..n..).inn B[’etersen.t~S(.,Maier.tW4.D%). Numerous subtaetors of spatial ability haB’e been found in factor-analytic research, such as spatial perception,

spatial visualization, mental rotation, spatial relations, spatial orientation, and visual memory (e. g., Carroll, 1993; Lohman, 1979; Maier, 1994). These subfactors differ considerably in the amount of gender-related differences found on them, with some mental rotation tasks showing some of the largest differences in favor of males in the cognitive domain (e. g., Linn & Petersen) and some visual memory tasks showing advantages for females (Stumpf & Eliot, 1995; Stumpf & lltldiminn, 1997). Similar observations have heen made with academiIn &dquo;k

cally talented students (e.g., Stumpf, 1993, 1995). such populations, a strong general factor (termed

PUTTING THE RESEARCH TO USE The present research shows that gender-related differences in spatial ability in academically talented students cannot be unitormlv explained by the habit of females to take more time for working on spatial tests on average than males. On the contrary, the habit to work carefully on such tests can help to improve scores in some contexts. There appear to be gender-related differences in one’s confidence in being able to solve spatial tasks, hut this confidence is exaggerited in both males and females for some tests. There are obvious consequences of these findings for psychological and educational counseling. Advice given to students on how to take tests, such as in attributional retraining, should not focus exclusively on boosting confidence and working quickly. Rather, the ohjeetive should be to find thc optimal way of using one’s time budget based on the general working speed and degree of confidence of a given person. The allocation of time to working on the items of a spatial test should not follow a rigid schedule, but depend on the amount of time available, the person’s working habits, and the nature and difficulty of each task.

Downloaded from gcq.sagepub.com by guest on May 10, 2016 157

factor&dquo;) was found to underlie performance on a large variety of tasks. This factor explains a large share of the gender-related differences on spatial tasks, although there are variance components orthogonal to it that also account for differences in performance between females and males (Stumpf & Eliot, 1995). It should be noted, however, that the observation of a k factor is not inconsistent with the existence of the subfactors mentioned above. Taken together, these findings reflect the hierarchical organization of spatial ability, with k on the top and the subfactors at a lower level (also see

Cattell, 1987). The hypotheses suggested to explain gender differences in spatial ability range from environmentalist approaches, stressing the role of differential socialization experiences of males and females, to biological approaches highlighting genetic or hormonal factors or differential cortical lateralization

in

females and males

(see Halpern, 1992, and Maier, 1994, for overviews). Other approaches stress the differences in the roles played by females and males in human evolution. Gender-related differences in spatial ability are interesting not only in their own right, but also because spatial ability has been assumed to be important for educational and professional success in many natural mathematics. The many interdependencies between mathematical problem solving and spatial ability have recently been summarized by Maier (1994), while the study by Casey, Nuttal, and Pezaris (1997) has reemphasized the importance of spatial ability in explaining gender-related differences in mathematical reasoning. Among the environmentalist approaches to explain gender-related differences in scores on spatial tests is the view that females have less confidence in their ability to solve spatial tasks due to their socialization. Therefore, they tend to work more cautiously and more slowly on them and, thus, have a disadvantage on timelimited tests. In fact, some authors have constructed a causal chain from less confidence in females, to a slower pace of working, to lower scores on spatial tests. This disadvantage on spatial tests, though, is believed to be of little practical consequence in everyday life, because here the lower speed can be compensated more easily than in psychometric tests (e.g., Goldstein, Haldane, & Mitchell, 1990). Assessing the speed of working in practice, however, has been difficult with conventional paper-and-pencil tests. The approaches most often used are the administration of tests with generous time limits, or no time limits at all, and specific ways of scoring the tests, such sciences and

as

computing the

administration of

tests

with very generous time limits

representative for the way abilities are tested in general (although this way of testing abilities may, in is not

not be typical for the way abilities education, work, and everyday life).

turn,

are

used in

The present research is an attempt to (a) ascertain whether there are gender differences in confidence and working speed on spatial tests in a population of academically talented students taking spatial tests on the computer and (b) examine how the use of time in working on such tests is related to performance. The research was performed in two studies with the computerized Spatial Test Battery (STB) of the Institute for the Academic Advancement of Youth (IAAY), located at Johns Hopkins University in Baltimore. In the first study, the subjects took the STB under the experimental conditions of a test evaluation research; in the second study, the STB was used under the competitive conditions of eligibility testing.

Instrument The computerized STB has four subtests: Surface

Development, Block Rotation, Visual Memory, and Perspectives. All items included in the battery are newly constructed, although of the same type as tasks used previously (Eliot & Smith, 1983). The computerized STB was derived in several steps from a very large paper-pencil version, which had 14 subtests representing

most

of the 16 item types of Eliot’s classification of

figural spatial

tests (Eliot, 1980, 1983; Eliot, Stumpf, & Tissot, 1992). The three-dimensional item types of the manipulation division in Eliot’s classification tended to

have higher k-loadings and higher predictive validity with respect to academic success in IAAY mathematics and science courses than the two-dimensional items from the recognition division (Stumpf, 1994, 1995; Stumpf & Eliot, 1995). Therefore, mainly item types from the recognition division were included in the computerized STB as an instrument for the identifica-

Downloaded from gcq.sagepub.com by guest on May 10, 2016

158

ratio of items solved to items

attempted (Goldstein et al., 1990). The latter procedure, however, is subject to serious problems affecting ratio scores and has yielded very mixed results (Delgado & Prieto, 1996; Prinzel & Freeman, 1995; Stumpf, 1993). Less problematic techniques of correction for guessing have also provided little support for the case (Ben-Shakhar & Sinai, 1991). The experimental procedure of using the same test with different time allowances has also produced conflicting results (Goldstein et al.; Delgado & Prieto; Gallagher & Johnson, 1992) and is open to the criticism that the

tion of talent for IAAY science and mathematics courses. The Visual Memory test is used to represent the important category of visual memory in the recognition division and the subfactor mentioned above. As far as reliability, predictive validity, and genderrelated differences in the scores are concerned, the computerized STB was expected to yield results similar

the findings obtained with the respective subtests of the paper-pencil versions. Aside from the facts that the items were new and presented in a different mode, the main difference between the paper-pencil and the computerized versions are the tutorials for the subtests, which are much more detailed and include moving images in the computerized mode. In the Surface Development test of the computerized STB, a flat shape, similar to a piece of paper, is presented, with lines indicating its edges and creases, where it can be folded. Beside the flat shape, a threedimensional object is shown, which results when the flat shape is folded. The subject has to find out what lines in the flat shape correspond to what edges in the folded object. The test has a total of seven pairs of flat shapes and folded objects with five lines to be identified in each pair. The time allowance was 15 minutes in the first and 12 minutes in the second study. In the Block Rotation test, an irregularly shaped model block is shown together with five answer choices. One of the five choices is the same object as the model block, but rotated in space. The other choices are similar, though different, blocks. The subject has to identify the object that is the same as the model block among the answer choices. The number of items is 22, the time allowances were 16 minutes in Study 1 and 12 minutes in Study 2. The Visual Memory test has a learning and a reproduction phase. In the learning phase, 20 simple shapes are presented for eight minutes. A part of each shape is blackened. About half an hour later, in the reproduction phase, the subject has to identify the shapes presented earlier among other similar, though different, shapes. The time allowed for the reproduction phase is to

eight minutes. In the Perspectives test, two pictures of an object such as a cable are presented. The first picture shows the front view of the object. The subject has to find out from what perspective the object is seen in the second picture, from the right, the left, from behind, from above, or from below. The time allowed to complete the 22 items is 15 minutes.

In the STB, the tests are embedded in a software that ensures a completely automatic presenta-

context

battery. Each test is preceded by a tutorial describing the content of the items, specifying the tion of the

number of items in the test and the time allowed to solve the tasks. All tutorials, except the one for the Visual Memory test, include moving displays of sample items. The STB also contains four ratings. In these ratings, the student is asked to estimate the percentages of the tasks in the four subtests he or she believes to have solved correctly. These estimates are gathered after the subject has taken the four tests. For each item, the software registers the response of the subject given in multiple-choice form and the time, in seconds, the item is exposed on the screen (this is referred to as &dquo;Item Exposure Time&dquo; [IET] below). The subject is allowed to move forward and backward freely from item to item within a given test and to change answers given previously. For the Surface Development test, though, the times recorded are the durations spent on each of the seven objects (comprising five items each). Therefore, some of the IET measures described below were not available for this test.

Study

1

Subjects A total of 1,283 seventh- and eighth-graders (553 females and 730 males) took the computerized STB as part of a test evaluation study in the spring of 1996. The STB was administered at about 250 test centers throughout the United States. The students had taken part in an IAAY Talent Search and were offered to take the STB at no cost in a pilot testing. As a reward for their participation, the students received a detailed score report. The IAAY Talent Searches are open to students who have scored at or above the 97th percentile of a nationally normed ability or achievement test such as the California Achievement Test. The sample is, thus, well above the national average as far as scholastic ability and achievement are concerned. About 71% of the students were white; approximately 17% reported their ethnicity as Asian, with the remainder of the sample belonging to other ethnic groups. The mean age of the subjects was 13.23 years (SD 0.46). Participation in the study was voluntary. The students were allowed to skip whole subtests if they wished, but only very few of them did so. Therefore, not all of the students completed every subtest. Ten males and five females skipped Block Rotation, one male and four females skipped Visual Memory, and five males and three females skipped Perspectives. =

Downloaded from gcq.sagepub.com by guest on May 10, 2016

159

as

For 896 of these students, complete data on the STB well as scaled scores on the SAT I were available.

The means, standard deviations, and intercorrelations of these scores are summarized in Table 1. The mean SAT I scores show that, on average, the present sample was intellectually highly advanced for its mean age. The pattern of intercorrelations between the STB and the SAT I is familiar from a study on the STB and other tests of verbal and mathematical reasoning ability (Stumpf & Haldimann, 1997), with the scores on the STB subtests (except Visual Memory) and on the STB as a whole being more highly correlated with the mathematical section of the SAT I than with the verbal part. These observations are consistent with the research mentioned in the introduction.

Procedure In the testing, an item was considered to be attempted if the student had made it appear on the screen for at least one second. As expected, the distri-

butions of the IETs on most of the items across students were skewed positively. Therefore, as suggested in the literature (e.g., Czarnolewski, 1996; Ulrich & Miller, 1993, 1994), the natural logarithm of each IET was computed in addition to the raw time. The mean comparisons reported below were performed with both the raw and the logarithmic IETS. The conclusions reached were essentially the same for the raw and log times so, to conserve space, only log IETs are reported in this article. For every student and every test, three measures derived from the IETs were obtained: (1) the average time spent on the items (if at least two items had been attempted), (2) the average time spent on each item that was solved correctly (if at least two

items had been solved

correctly), and (3) the average time spent on the items not solved correctly (if at least two incorrect answers were given). These measures based

least

rather than one to total of three IET measures were available for each test except Surface Development (if the criteria specified above were met). For every test, the split-half reliability of the total time spent on the items was estimated. The log IETs for every odd- and even-numbered item (or object in the case of Surface Development) were summed separately. From the correlations of these sums, the reliability of the total log IET was estimated through the were

on

increase their

at

two items

reliability. Thus,

a

Spearman-Brown &dquo;prophecy formula.&dquo; The average log JETs for each subtest were correlated across subjects. An iterative principal axis factor analysis was performed based on these correlations, using squared multiple correlations as initial communality estimates. The first factor was extracted. Factor scores were computed using the regression method and

compared

across

gender.

The numbers of items attempted, the number-correct scores, the ratio scores (numbers of items solved divided by numbers of items attempted), and the students’ estimates of the percentages of items solved were recorded. Based on these measures, summary statistics for the whole samples were computed, as well as correlations among subsets of them. The means of these measures for males versus females were compared using the effect size estimate d (Cohen, 1988). For the classification of the mean differences between males and females in the present sample, a conservative interpretation of Cohen’s criteria for evaluating dcoefficients was adopted: ds with an absolute value

Table 1

Means, Standard Deviations, and Intercorrelations of the STB and SAT I-Scores (N 896; Study 1) =

Downloaded from gcq.sagepub.com by guest on May 10, 2016

160

Figure 1. Average Logarithmic IETs of the Block Rotation Test

below .20 were regarded to be negligible, ds between .20 and .49 (and between -.20 and -.49) were interpreted as small, though systematic, and ds with absolute values from .50 through .79 were classified as &dquo;medium.&dquo; Higher absolute-value d coefficients were regarded as large. For certain measures, intercorrelations were computed separately for males and females and compared using the q index (Cohen). The students’ estimates of the percentages of items they had solved correctly were analyzed both for the whole sample of Study 1 and for males and females separately. The means of these estimates as well as their correlations with the test scores were compared. In a the way analogous to the factor analyses of the lETs, estimates were factored, and the factor scores were compared across the gender groups.

Results

Properties of the Tests. The numbers of

cases

differ

because students were allowed to skip entire subtests. With about 71% and 76% of the items solved on average, the Surface Development and Block Rotation tests, respectively, turned out to be fairly easy for the present population. The respective percentage for Perspectives is 54, indicating that this test had a medium level of difficulty, while Visual Memory, with 25°6, proved to be very hard. Overall, the time allowances for the four tests turned across tests

be rather generous. In Surface Development, 20.4 percent of the males and 18.6 percent of the females used the whole time available and were timed out. For Block Rotation, Visual Memory, and

out to

only

Items

were 12.3 (male) and (female), 3.3 (male) and 7.1 (female), and 8.2 (male) and 7.6 (female), respectively. Properties and Distributions of the IETS. The total log times the subjects spent working on the four subtests had a high degree of instrumental reliability,

Perspectives, these percentages 17.4

except for the Surface Development test. SpearmanBrown-corrected reliability coefficients of .63, .84, .87, and .84 were obtained for the four tests. In the case of Surface Development, it should be taken into consideration that the estimate was based on only seven addends (corresponding to the objects in the test) compared with 20 or 22 addends for the other tests. Despite these observations, the times the students spent on the tasks are not evenly distributed across items. Using Block Rotation as an example, Figure 1 shows the average (across subjects) logarithmic IETs of the items in a test. The figure shows that relatively much time is spent on the first two items. Responses are given much faster for the remainder of the tasks, with a slight trend to shorter IETs the more items have been completed. Items 5 and 8, which are relatively difficult, however, deviate from this general pattern. The shapes of the IET functions for males and females are very similar, as highlighted by a correlation of .98 between the average log IETs for males and females across items. Overall, there is a high negative correlation between item difficulty (expressed in the p-index) and average log IET; this correlation is -.72 for males and -.67 for females. Table 2 summarizes various descriptive statistics of the IET measures described above, the average time

Downloaded from gcq.sagepub.com by guest on May 10, 2016

161

spent on all attempted items in each test, on the items that were solved correctly, and on the items for which the answer given was wrong. The distributions of the raw IETs of the individual items across subjects were often very skewed. For Block Rotation, Visual Memory, and Perspectives, however, most of the indices based on the logarithms of the IETs roughly approximate normal distributions, at least as far as the second and third moments are concerned.

Table 3

Intercorrelations and Factor Loadings of the Log IETs (N 1144; Study 1) =

Table 2

Means, Standard Deviations, Skewness and Kurtosis Indices of the Test Scores and IET Measures (Study 1)

Table 4

Correlations of Test Scores with IET Measures and Performance Estimates (Study 1)

Note. The numbers of

cases

for each variable

In the Block Rotation test, the

are

given in Table 7:

subjects took about half

average to work on one item. The average time spent on items solved is shorter than the time taken to work on items that could not be solved correctly. In the latter case, in particular, the standard deviation of the IETs is much larger than in the former. This, as a general pattern, can be observed for the Visual Memory and Perspectives tests as well, although, here, the differences are smaller. While the average time spent on a Perspectives item is still longer than 20 seconds, the subjects dealt with the Visual Memory tasks in a much faster way. a

minute

on

Downloaded from gcq.sagepub.com by guest on May 10, 2016

162

The intercorrelations of the log times spent on the four subtests, which are displayed in Table 3, are higher than the correlations among the scores on these tests (see Table 6). A principal axis factor analysis based on these correlations yielded a strong general factor, which accounted for 45.6% of the variance. Table 5

Means and Standard Deviations of Test Performance and Estimates of Test Performance (Percentages of Items Solved; Study 1)

Note. The numbers of

cases

for each variable

are

given

in Table 7.

Correlations Between IETs and PeJonnance. The correlations between the IET measures and performance on the four tests are presented in Table 4. For Block Rotation, the average IET is unrelated to the scores attained, whereas for Surface Development, Visual Memory, and Perspectives, there are positive, though not very high, correlations between the average IET and test performance. The scores on these tests tend to be higher the more time the students invest in working on the items. The IETs based on those items that were solved correctly were unrelated to test performance in Block Rotation, but showed positive correlations with the scores on Visual Memory and Perspectives. Interestingly, the highest correlations of IET measures with test scores were found for the time spent on items that the students did not solve correctly in the end. This relation is particularly strong for the Perspectives test. It appears to be in part an outgrowth of the degree of effort the subjects were willing to invest into working on the tests. All other things being equal, a high degree of effort is likely to result in more items solved and in more time and energy spent on those tasks for which the solution cannot be found.

Table 6

Intercorrelations of the Test Scores and Performance Estimates (N = 1204;

Study 1)

Downloaded from gcq.sagepub.com by guest on May 10, 2016

163

Properties of the

Estimates. Table 5 shows the and standard deviations of the scores on the four tests and of the percentages of items the students estimated to have solved on them. In this table, the test scores are expressed as percentages of the maximum number of correct scores that can be attained. means

As far as Surface Development is concerned, the students tended to underestimate their performance. For Block Rotation, test performance and estimates are similar, both with respect to the means and the standard deviations. On Visual Memory and Perspectives, however, the students considerably overestimated

Table 7

Gender-Related Differences in the Test Scores, IETS, and Performance Estimates

Downloaded from gcq.sagepub.com by guest on May 10, 2016

164

(Study 1)

their performance, by about half a standard deviation for the memory test and about two thirds of a standard deviation for the Perspectives test. For Visual Memory, the variation of the estimates is substantially larger than the variation of the test scores. Table 6 summarizes the correlations among the scores on the four tests and the estimates. Within the two sets of variables, correlations are higher among the estimates than among the test scores. A principal axis factor analysis performed in the same way as the factoring of the response times reported above revealed that a strong general factor underlies the estimates, accounting for 52.1% of the variance. There is a substantial gender-related mean difference on the scores on this factor, more than one third of a standard deviation in size. The correlation of scores between the first factor underlying the IETs and the first factor of the estimates is .11. The correlations between the estimates and the test scores show that the impressions the students had about their performance did not correspond particularly well with the test scores. Ranging from .27 to .41, these correlations are all positive, but rather moderate in size. Gender-Related Differences. Table 7 summarizes the findings on gender-related differences in the test scores, IET measures, and performance estimates. In this table (and the others), positive ds indicate that the mean for males is higher than that for females, negative ds indicate the opposite. The gender-related differences found on the test scores, IET measures, and estimates are small compared to the amounts of variance within the sexes. The highest d-value found (.49) falls just short of indicating a medium effect size. Still, quite a number of the differences cannot be considered negligible or trivial. As far as the test scores are concerned, males have an advantage (corresponding to a small effect size) on the Block Rotation and Perspectives tests and females score slightly higher than males on Visual Memory. The advantage of males on Surface Development is not large enough to be classified as systematic. Although many students did not use the full time allowed to work on the tests, most of them inspected every item. There is little difference between the sexes with respect to this

respective

variable. Table 7 also lists the average ratio scores by gender. Given the fact that males and females did not differ much with respect to the numbers of items attempted, it is no surprise that the effect size estimates for the ratio scores are very similar to the d coefficients for the

number-correct scores. On the various IET measures, there is a tendency for females to take more time for working on the items. All mean differences indicate longer average IETs for females, although only a part of the differences is large enough to be called systematic. As far as the Surface Development test is concerned, the average of the time taken to work on the items does not differ much based on gender. On the test showing the largest gender-related differences in the scores, Block Rotation, females take considerably more time to work on the tasks as a whole and on those items they solve correctly. These differences amount to about a third of a standard deviation. The difference on the solved items appears to be particularly noteworthy. The fact that there is essentially a zero correlation between this IET measure and test performance suggests that females use less effective strategies. Probably, due to their lower confidence in being able to solve these tasks (see below), they perform more checks on their answers, including checks that are unnecessary. On the Visual Memory test females take more time, although on these tasks their performance is somewhat better on the average than that of males. IETs are longer on the average for females, whether the items are solved or not. These differences amount to roughly a third of a standard deviation. In this case, however, the extra amount of time spent is likely to be well invested, because, on this test, there are positive correlations between the IETs and test performance (see Table 4). Thus, here, it appears to be the better strategy to take time and perform many checks. As far as the Perspectives test is concerned, differences in IETs are rather small and do not offer much evidence for explaining gender-related score variance. Table 7 also summarizes the students’ estimates of the percentages of items they answered correctly. The mean estimates show that, on all tests except Visual Memory, females rate their performance more modestly than males. On Surface Development, Perspectives, and especially Block Rotation, there are mean differences, implying that males are confident in retrospect with respect to their performance. There is practically no such difference on Visual Memory. As far as Surface Development is concerned, the average rating of males (69.61) roughly parallels the mean of their test performance (71.89), while the females tend to underestimate their scores on the average, rating their score at 61.94, while the actual mean is 69.74. For Block Rotation, the gender-related differ-

systematic more

Downloaded from gcq.sagepub.com by guest on May 10, 2016

165

in the estimates roughly parallels the difference in the scores. For the Perspectives test, the effect sizes for the estimates and scores are fairly similar, but in this case, both males and females overestimate their actual performance. The slight score advantage of females on Visual Memory is not reflected in a difference in the mean estimates. Both males and females overestimate their performance on this test, but males do so to a somewhat larger extent. The absence of a genderrelated mean difference in the estimates with respect to Visual Memory shows that the purported higher confidence of males with respect to solving spatial tasks is not a universal phenomenon. It differs in extent depending on the nature of the tasks and may sometimes be zero. Table 4 provides a comparison of the correlations of the test scores with the IET measures and estimates for males and females. There is only one systematic difference between the sexes in these correlations, if a qvalue of .10 is accepted as the minimal criterion for such a difference (cf. Cohen, 1988). The correlations between the estimates and the test scores, which are also reported in the table, do not show systematic differences between the sexes, except for the Surface Development test, where the correlation is higher for males than for females. Thus, overall, the accuracy with which males and females estimate their level of performance is roughly the same. ence

Study

2

The findings of Study 2 are presented here to provide an impression of how the results obtained in a test evaluation study generalize to the context in which the test takers compete for eligibility by taking the STB.

Instrument In Study 2, four different forms of the STB were used. The numbers of items in the four tests were the same as in Study 1, as were most of the items themselves. While 18 tasks in Block Rotation and Perspectives, six objects in Surface Development, and all 20 items in Visual Memory were the same across these forms, four Block Rotation and Perspectives items and one Surface Development object varied across forms. These items were inserted to collect calibration data on them. They were not included in the test scores that were used for decisions on eligibility, but they were counted in this study, because they showed good psychometric properties in general. The

performance

estimates

longer administered.

Subjects The

subject sample consisted of 1,175 participants males and 420 females) in the 1996/1997 Talent Search of IAAY. The criteria for qualifying for the Talent Search were the same as in Study 1. All the subjects were students, and 95.7% of them attended the seventh grade. The students completed the STB as a part of their testing for eligibility for IAAY summer programs. Participation was voluntary and the score on the STB could improve students’ chances of being eligible for summer programs. The STB was administered at about 250 test centers all over the United States from early October 1996 through mid-April 1997. Sixty-five percent of the students described themselves as White, 13% as Asians, with the remainder of the sample belonging to other ethnic groups. The mean age of the sample was 12.28 years (SD 0.55). There is no overlap between the subject samples of Studies 1 and 2. (755

=

Procedure The responses and IETs were recorded and analyzed in the previous study, separately for the four forms. Only a part of these analyses are reported here to avoid repetitions of findings that are similar to those of Study 1. Also, to conserve space, some of the analyses will be reported only for Form 1. as

Results Table 8 shows the numbers of students taking the four tests in the four forms. These numbers differ slightly across tests within forms because some of the test administrations had to be interrupted for technical reasons. The data of the students affected by such problems were not included in the present analysis. Table 8 also shows the means and standard deviations of the scores on the tests, separately for the sexes and test forms. As in Study 1, Surface Development and Block Rotation turned out to be rather easy. Visual Memory was easier than in the previous study, though not an easy test overall. The Perspectives test also proved to be easier in the new population of subjects than in Study 1. Split-half reliability coefficients of the log IETs are presented here only for Form 1. They are .61, .85, .94, and .90 for the four tests.

Downloaded from gcq.sagepub.com by guest on May 10, 2016

166

were no

The time allowances for Surface Development and Block Rotation were shortened because these two tests had turned out to be rather easy in Study 1.

Table 8

Downloaded from gcq.sagepub.com by guest on May 10, 2016

167

Table 9 shows the correlations of the IET measures with test performance for Form 1. For Surface Development, the correlation of the average IET on all items attempted and the test score is low. Unlike in the first study, however, the correlation is negative, though not very marked, for the Block Rotation test. For both Visual Memory and Perspectives, the correlations for the time spent on the solved and unsolved items and test performance are positive and substantial.

Table 9

Correlations of the Log IET Measures With Test Performance (Form 1, Study 2)

Perspectives test, again, shows systematic advantages for males, which are almost half a standard deviation in size for Forms 2 and 3. The average log IETs for all items attempted on the tests are all longer for females than for males, but the size of the differences varies considerably from one test to the other. Differences are largest for Block Rotation and Visual Memory and considerably smaller for the other two tests. The average log times spent on those items that were solved correctly are also all longer for females than for males. These differences are systematic in the case of Block Rotation and Visual Memory, but negligible for three of the forms of the Perspectives test. The differences between the sexes in the times that were spent on the items not solved correctly are rather small for Perspectives, but sizable for Visual Memory and Block Rotation. The correlations of the IET measures with test performance are summarized in Table 9. These correlations are fairly similar across the gender groups. The largest difference was found for the overall IET on the Block Rotation test. Here, females appear to be more penalized than males by their habit of taking more time to work on the items. Block Rotation, however, is the only test where such an effect may have played an important role, because for the other parts of the STB the respective correlations are positive.

Discussion The gender-related differences in the scores on the Surface Development, Block Rotation, and Visual Memory tests reported here confirm earlier findings with paper-and-pencil versions of the STB in similar Note. The q indices refer to the comparison of the correlations for males and the correlations for females.

Table 8 summarizes the means and standard deviafemales on the test scores and IET measures as well as the associated statistics for the differences between the sexes on these variables. As far as the scores are concerned, gender-related differences in Surface Development are rather small. Block Rotation shows differences in favor of males that, with the exception of Form 1, can be classified as systematic in the sense specified above. The female advantage in Visual Memory, on the average, has increased somewhat in comparison to the previous study. The tions of males and

populations (Stumpf, 1993, 1995; Stumpf & Eliot, 1995). As far as the Perspectives test is concerned, similar differences have been found in less select populations with other perspectives tasks (Stumpf & Fay, 1981, 1983, 1987; Stumpf & Klieme, 1989). The size of the differences, however, is rather small as compared to the previous findings. This can, in part, be explained by the fact that, in the present study, detailed tutorials with moving images preceded three of the tests. These tutorials are likely to provide a considerable amount of mental practice, which has been found in to reduce gender-related differences on

studies

spatial tasks (see, e.g., Brinkmann, 1966; Connor, Shackman, & Serbin, 1978; Goldstein & Chance, 1965; Lord, 1987; Vasta, Knott, & Gaze, 1996). The slight advantages for females on the Visual Memory test are also consistent

Downloaded from gcq.sagepub.com by guest on May 10, 2016

168

some

findings (Stumpf & Eliot; Stumpf & Haldimann, 1997). Overall, the pattern of differences found here is in keeping with earlier observations on spatial ability in

with previous

talented students. These observations construct has a hierarchical structure in such a population that allows for gender differences in general spatial ability in one direction and differences in more specific subskills in the opposite direc-

academically

suggest that the

(Stumpf & Eliot, 1995). The data obtained on the performance estimates suggest that there are gender-related differences in the confidence in one’s performance on some, though not all, spatial tasks. This confidence shows a similar pattern as the habit of giving answers in a short amount of time. There is a general factor underlying it, there is a gender-related difference in it, but this difference is not the same with respect to every spatial test. As far as the Visual Memory test is concerned, it is very small. Results obtained on this test show that differences in confidence, although related to differences in performance on some tests, cannot explain differences in tion

spatial ability

per se. The pattern of results obtained with the estimates, especially the over-confidence observed, might be viewed as an outgrowth of the high ability level and educational success of the present population, but it should be kept in mind that similar instances of overconfidence have been observed in other samples as well (e.g., Lundeberg, Fox, & Puncochar, 1994). The average time spent on working on spatial items appears to be a characteristic having a high degree of reliability in a psychometric sense. A general factor underlies this characteristic trait as it is manifested in different spatial tests, a general habit of giving answers quickly or slowly. Overall, females tend to spend more time on such tasks than do males. Despite the generality of the factor across the spatial tasks used here, females do not tend to use more time than males on all cognitive test items and across all types of tasks. This is documented by the advantage females tend to have in the domains of perceptual speed (e.g., Born, Bleichrodt, & van der Flier, 1987; Hedges & Novell, 1995) and cler-

speed (e.g., Feingold, 1988; Stanley, Benbow, Brody, Dauber, & Lupkowski, 1992; Stumpf & Jackson, 1994). Recent findings (Harnqvist, 1997) suggest that this advantage is a more general phenomenon than previously believed and may not be limited to these two types of tasks, although there is evidence suggesting that males, on average, show a faster rate of mental rotation than females (Kail, Carter, & Pellegrino, 1979). ical

Aside from personality characteristics and cognitive styles, gender differences in the habit of taking little or much time could be due to the perceived difficulty of the tasks. Females might, on average, view spatial tasks as more difficult than do males and, therefore, work more slowly on such items. Thus, the first part of the hypothetical causal chain (lower confidence-more time taken-lower scores) mentioned above might be operating. Clearly, however, more data are needed to examine this hypothesis, in particular estimates of task difficulty collected before or while the subjects are working on the items. For the time being, the empirical evidence provides little support for such a view. The correlations of the general factors underlying the performance estimates and IETs is low, and the Visual Memory test showed relatively large differences in the amount of time taken. Based on actual performance, differences between males and females in perceived difficulty should be smallest in this case. Given the findings on the lETs, it appears to be important to distinguish between speed in performing the cognitive processes required to solve a problem and the overall time taken to do so. Two persons perform-

ing the solution steps

at

the

same

pace may still differ

one of them repeats steps of processing more often than the other or performs more checks of the correctness of intermediate and final results. Thus, the findings presented here do not imply gender differences in cognitive speed in general. Possibly, the characteristic underlying the IETs is less related to cognitive speed than to test taking styles or cognitive styles, such as

in the time

they spend

on

the

task, because

reflection-impulsivity. Despite the importance of the general factor mentioned above, the mean differences in the time spent on spatial tasks between males and females are not the same across all tests. The examples of Surface Development and Perspectives show that they can be quite small in specific cases. The data on the Visual Memory and Perspectives tests indicate that the difference in IETs is not necessarily related to the difference in scores on the respective tests. In Visual Memory, the IET difference is relatively large, but females have an advantage in performance. On Perspectives, the difference is small, but females have a disadvantage in the scores.

It is obvious that the habit of

working

on

items is

detrimental

taking much time for to test performance

when there is not enough time to look at every item and is, thus, likely to explain gender-related differences in scores on spatial tests in a number of cases.

Downloaded from gcq.sagepub.com by guest on May 10, 2016

169

to the percentages of students who were timed out while working on the tests and the correlations of IETs with the test scores in the present research, however, this factor appears to have con-

According

tributed to gender-related score differences only on the Block Rotation test. Even when the time is sufficient for considering every item, though, the habit of taking much time might be detrimental to performance because persons doing so might feel time pressure in the final phases of test taking. This argument could contribute to explaining the negative correlations between the IETs and performance on the Block Rotation test in Study 2, especially the correlation observed for the group of females, but no indication of such an effect was found for the other tests. Interestingly, however, the habit to take much time is not necessarily detrimental to performance, as shown by the positive correlations of the IETs and the scores on Visual Memory and Perspectives. In these cases, the extra time spent appears to be well invested. The habit to take relatively much time cannot explain gender-related differences on tests of spatial ability in the present population in general. In fact, the positive correlations between the IETs and the scores on Visual Memory and Perspectives found under the competitive circumstances of Study 2 even suggest that, when the test is fairly difficult, but enough time is available to work on every item, males do not live up to their potential of performance because they deal with the tasks too quickly. Thus, in such cases, males rather than females might be put at a disadvantage by the way they use their time. If the habit of taking much time for working on items is different from low speed of performing the elementary steps of processing, then factors other than speed of processing probably explain the differences between the sexes in the working time taken. Potential factors that might be relevant for such an explanation are (a) that steps of processing are repeated more often by females than by males and (b) that more additional steps (such as checking the correctness of solution steps or the answer given as a whole) are performed more often. Performing steps of the solution repeatedly and checking the correctness of solution steps can be beneficial because it helps to avoid giving incorrect answers, but it can also introduce an element of redundancy into processing spatial tasks. The low or negative correlations of the average IETs and the scores on the Block Rotation tests suggest that the additional steps performed are not very helpful in obtaining a better

The critical element in this context does not be to work slowly or quickly in general, but to use one’s time budget effectively by allocating as much time to a task as is required. In conclusion, the results presented here for a population of academically talented students show that there are gender-related differences in confidence with respect to some, but not all, spatial tasks, and that females tend to take more time on average than males for working on a number of tasks. There is, however, no simple causal chain between these variables and performance. Taking much time is likely to be detrimental to performance when there is much time pressure, but the present data show that it can be beneficial when sufficient time is available. Thus, it cannot explain gender differences in spatial ability in general; it might even contribute to reducing such differences in some cases. The relations among these variables are rather complex. The effects of the amount of working time taken on performance depend on the nature and content of the tasks and the time available to solve them. test score.

seem to

References Ben-Shakhar, A. R., & Sinai, Y. (1991). Gender differences in multiple-choice tests: The role of differential guessing tendencies. Journal of Educational Measurement, 28, 23-35. Born, M. P., Bleichrodt, N., & van der Flier, H. (1987). Crosscultural comparison of sex related differences on intelligence tests. Journal of Cross-cultural Psychology, 18 , 283-314. Brinkmann, E. H. (1966). Programmed instruction as a technique for improving spatial visualization. Journal of Applied Psychology, 50, 179-184.

Casey, M. B., Nuttal, R., & Pezaris, E. (1997). Mediators of gender differences in mathematics college entrance test scores: A comparison of spatial skills with internal beliefs and anxieties. Developmental Psychology, 33, 669-680. Carroll, J. B. (1993). Human cognitive abilities. Cambridge, England: Cambridge University Press. Cattell, R. B. (1987). Intelligence: Its structure, growth, and action. Amsterdam, The Netherlands: North-Holland. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd. ed.). Hillsdale, NJ: Erlbaum. Connor, J. M., Shackman, M., & Serbin, L. (1978). Sex-related differences in response to practice on a visual-spatial test and generalization to a related test. Child Development, 49 , 24-29. Czarnolewski, M. Y. (1996, September). An empirical validation of the natural log transformation of reaction time. Paper presented at the 40th Annual Meeting of the Human Factors Ergonomics Society, Philadelphia, PA. Delgado, A. R., & Prieto, G. (1996). Sex differences in visuospatial ability: Do performance factors play such an important role? Memory & Cognition, 24, 504-510. Eliot, J. (1980). Classification of figural spatial tests. Perceptual and Motor Skills, 51, 847-851. Eliot, J. (1983). The classification of spatial tests. In J. Eliot & I. M.

Downloaded from gcq.sagepub.com by guest on May 10, 2016

170

(Eds.), An international directory of spatial tests (pp. 11-15). Windsor, England: NFER Nelson. Eliot, J., & Smith, I. M. (1983). An international directory of spatial tests. Windsor, England: NFER Nelson. Eliot, J., Stumpf, H., & Tissot, S. L. (1992). CTY Spatial Test Battery: Administrator’s handbook. Baltimore, MD: Center for Talented Youth, Johns Hopkins University. Feingold, A. (1988). Cognitive gender differences are disappearing. American Psychologist, 43 , 95-103. Gallagher, S. A., & Johnson, E. S. (1992). Effects of time limits on performance on mental rotations by gifted adolescents. Gifted Child Quarterly, 36, 19-22. Goldstein, A. G., & Chance, J. E. (1965). Effects of practice on sexrelated differences in performance on embedded figures. Psychonomic Science, 3, 361-362. Goldstein, D., Haldane, D., & Mitchell, C. (1990). Sex differences in visual-spatial ability: The role of performance factors. Memory & Cognition, 18, 546-550. Halpern, D. F. (1992). Sex differences in cognitive abilities (2nd ed.). Hillsdale, NJ: Erlbaum. Härnqvist, K. (1997). Gender and grade differences in latent ability variables. Scandinavian Journal of Psychology, 38, 55-62. Hedges, L. V., & Novell, A. (1995). Sex differences in mental test scores, variability, and numbers of high-scoring individuals. Smith

Science, 269, 41-45.

Kail, R., Carter, P., & Pellegrino, J. (1979). The locus of sex differences in spatial ability. Perception and Psychophysics, 26, 182-186.

Linn, M. C., & Petersen, A. C. (1986). Emergence and characterization of

sex

differences in

spatial ability:

A

meta-analysis. Child

Development, 56, 1479-1498. F. (1979). Spatial ability: A review and reanalysis of the correlational literature. (Technical Report No. 8). Stanford, CA: Stanford University, School of Education.

Lohman, D. Lord, T.

R.

science

(1987).

A look at

spatial abilities in undergraduate women

majors. Journal of Research in Science Teaching, 24,

757-767.

Lundeberg,

M.

A., Fox, P.,

&

Puncochar,

J.

(1994). Highly confident

but wrong: Gender differences and similarities in confidence judgments. Journal of Educational Psychology, 86, 114-121.

Maier, P. H. (1994). Räumliches Vorstellungsvermögen [Spatial ability]. Frankfurt a. M., Germany: Lang. Maier, P. H. (1996). Geschlechtsspezifische Differenzen im räumlichen Vorstellungsvermögen [Gender-related differences in spatial ability]. Psychologie in Erziehung und Unterricht, 43, 245-265. Prinzel, L. J., & Freeman, F. G. (1995). Sex differences in visuo-spatial ability: Task difficulty, speed-accuracy tradeoff, and other

performance factors. Canadian Journal of Experimental Psychology, 49, 530-539. Stanley, J. C., Benbow, C. P., Brody, L. E., Dauber, S., & Lupkowski, A. E. (1992). Gender differences on eighty-six nationally standardized aptitude and achievement tests. In N. Colangelo, S. G. Assouline, & D. L. Ambroson (Eds.), Talent development: Vol.1 (pp. 42-65). Unionville, NY: Trillium Press. Stumpf, H. (1993). Performance factors and gender-related differAnother assessment. Memory & ences in spatial ability:

Cognition, 21, 828-836. H. (1994). Subskills of spatial ability and their relationships

Stumpf,

to success in

accelerated mathematics

courses.

In K. A. Heller &

Hany (Eds.), Competence and responsibility: The third European conference of the European Council for High Ability (pp. 286-297). Seattle, WA: Hogrefe and Huber. Stumpf, H. (1995). Development of a talent search and related programs for scientific innovation among youth. (Technical Report No. 12). Baltimore, MD: Johns Hopkins University, Center for E. A.

Talented Youth.

Stumpf, H., & Eliot, J. (1995). Gender-related differences in spatial ability and the k factor of general spatial ability in a population of academically talented students. Personality and Individual , 33-45. Differences, 19 Stumpf, H., & Fay, E. (1981). Entwicklung und Erprobung eines neuartigen Aufgabentyps zur Erfassung des räumlichen Vorstellungsvermögens [Development and evaluation of a new type of tasks for assessing spatial ability]. Diagnostica, 27, 157-174.

Stumpf, H., & Fay, Beurteilung des Perspectives: A

Hogrefe. Stumpf, H., & Fay,

E.

(1983). Schlauchfiguren: Ein Test zur Vorstellungsvermögens [Cube of spatial ability]. Göttingen, Germany:

räumlichen

test

(1987). Neue Befunde zu Reliabilitat, Validitat "Schlauchfiguren" [New evidence on the reliability, validity, and standardization of the Cube Perspectives]. Diagnostica, 33, 156-163. Stumpf, H., & Haldimann, M. (1997). Spatial ability and academic success in sixth grade students at international schools. School Psychology International, 18, 245-259. Stumpf, H., & Jackson, D. N. (1994). Gender-related differences in cognitive abilities: Evidence from a medical school admissions , 335-344. program. Personality and Individual Differences, 17 Stumpf, H., & Klieme, E. (1989). Sex-related differences in spatial ability: More evidence for convergence. Perceptual and Motor Skills, 69 , 915-921. Ulrich, R., & Miller, J. (1993). Information processing models generating lognormally distributed reaction times. Journal of Mathematical Psychology, 37, 513-525. Ulrich, R., & Miller, J. (1994). Effects of truncation on reaction time analysis. Journal of Experimental Psychology: General, 123, E.

und Normierung der

34-80.

Vasta, R., Knott, J. A., & Gaze, C. E. (1996). Can spatial training erase

gender differences on the Quarterly, 20, 549-567.

water-level task?

Psychology of

Women

Author’s Note The author is indebted to Mark Y. Czarnolewski, Diane Halpern, Lesley Mackay, Carol Mills, Julian Stanley, and two anonymous reviewers for helpful on earlier drafts of this article. comments Correspondence should be sent to Heinrich Stumpf, Wichheimer Str. 253, 51067 K61n, Germany; e-mail:

[email protected].

Downloaded from gcq.sagepub.com by guest on May 10, 2016

171

Scores and Use of Time on Tests of Spatial Ability - CiteSeerX

May 10, 2016 - picture, from the right, the left, from behind, from above, or from below. The time allowed to complete the 22 items is 15 minutes. In the STB, the tests are embedded in a software context that ensures a completely automatic presenta- tion of the battery. Each test is preceded by a tutorial describing the content ...

1003KB Sizes 0 Downloads 112 Views

Recommend Documents

Correlation of Balance Tests Scores with Modified ...
Keywords: BBS, MDRT, BPOMA, Modified PPT, Balance, Physical Function. ... It allows for analysis of the patient .... The data analysis was done on SPSS 11.5.

the use and adaptation of feng shui in spatial ... - MOBILPASAR.COM
the harmful effects with the technology and knowledge we have nowadays, it could be why the reason Feng ... Figure 1:Feng Shui Analysis (Smith, 2006, p.10) ...

Use of the Hough Transformation to Detect Lines and ... - CiteSeerX
A recurring problem in computer picture processing is the detec- tion of straight ... can be solved to any des ired degree of accuracy by testing the lines formed by ...

Knowing better and losing even more: the use of ... - CiteSeerX
the use of knowledge in hazards management. Gilbert F. ... ses and reporting for the expanding number of scientists and policy .... Available information does not permit strongly based .... far of consensual definition and measurement. (For a.

M602 Focusing on Spatial Composition and Influence of Building ...
Open with. Sign In. Main menu. Displaying M602 Focusing on Spatial Composition and Influence of Building Envelope on Daylight Aspects in an Art Center.pdf.

the use and adaptation of feng shui in spatial layout of modern ...
the use and adaptation of feng shui in spatial layout of modern malaysia residential building. 2013. THUANG HUAH JIUNN. Page 6. In relation to definitions above, Feng Shui is a way of dynamic science that focused on the compatibility of both the buil

It's Testing Time! - CiteSeerX
Jun 18, 2001 - e-mail: [email protected] ... In other words: As automated test cases form an integral part of XP, using the methodology creates an.

It's Testing Time! - CiteSeerX
Jun 18, 2001 - Sure, testing does not guarantee defect free software. In addition, tests should never .... A database application is a typical example for such a system. .... the implementation will have a negative side effect on performance. 3.

The Time-Course of Numerical-Spatial Interactions
Behavioral, neuroimaging and patient data suggests fundamental connections between numbers and space in the intraparietal sulcus (for recent reviews see ...

On calibration of language recognition scores
a very direct relationship between error-rates and information. .... of the original hypotheses, then we call it a binary classi- ...... AIP Conference Pro- ceedings ...

Impacts of Duty-Cycle and Radio Irregularity on HCRL ... - CiteSeerX
... thinking: more awake nodes can help to increase the HCRL localization accuracy. .... Education, Culture, Sports, Science and Technology, Japan and partially ...

Substituent and solvent e˛ ects on photoexcited states of ... - CiteSeerX
which has a strong electron-donating group, shows a prominent solvent polarity e†ect on the ..... With an external heavy-atom e†ect in a mixed solvent of toluene ...

reproductive and developmental effects of atrazine on the ... - CiteSeerX
Jan 21, 2003 - lower than that considered safe for seawater chronic exposure (26 g/L). ...... Van den Brink PJ, van Donk E, Gylstra R, Crum SJH, Brock. TCM.

Modeling reaction-diffusion of molecules on surface and ... - CiteSeerX
MinE and comparing their simulated localization patterns to the observations in ..... Hutchison, “E-CELL: software environment for whole-cell simulation,” ... 87–127. [11] J. Elf and M. Ehrenberg, “Spontaneous separation of bi-stable biochem-

Impacts of Duty-Cycle and Radio Irregularity on HCRL ... - CiteSeerX
1School of Software, Dalian University of Technology, China. 2Department of Multimedia ... Epoch vs. Error as K changes from 1 to 7. The anchor node density.

Modeling reaction-diffusion of molecules on surface and ... - CiteSeerX
(IJCSIS) International Journal of Computer Science and Information Security,. Vol. 3, No. 1, 2009 ...... He received his B.S (1981) in Mathematics from Keio ...

Impacts of Duty-Cycle and Radio Irregularity on HCRL ... - CiteSeerX
Abstract—This paper focuses on studying the impacts of two important factors that have been once ignored in localization: 1) duty-cycle of sensor nodes and 2) DOI (degree of irregularity) in radio irregularity. It reveals the fact that a smaller DO

reproductive and developmental effects of atrazine on the ... - CiteSeerX
Jan 21, 2003 - uated in freshwater mesocosms dosed for six weeks at 5 to. 360 g/L [43]. Phytoplankton effects were seen at 182 g/. L and were probably linked ...

accurate real-time windowed time warping - CiteSeerX
used to link data, recognise patterns or find similarities. ... lip-reading [8], data-mining [5], medicine [15], analytical .... pitch classes in standard Western music.

Nonparametric Tests of the Markov Hypothesis in Continuous-Time ...
Dec 14, 2010 - Princeton University and NBER, Princeton University and University of .... Under time-homogeneity of the process X, the Markov hypothesis can ...

accurate real-time windowed time warping - CiteSeerX
lip-reading [8], data-mining [5], medicine [15], analytical chemistry [2], and genetics [6], as well as other areas. In. DTW, dynamic programming is used to find the ...

Investigation and Treatment of Missing Item Scores in Test and ...
May 1, 2010 - This article first discusses a statistical test for investigating whether or not the pattern of missing scores in a respondent-by-item data matrix is random. Since this is an asymptotic test, we investigate whether it is useful in small