Journal of Experimental Psychology: General 2011, Vol. 140, No. 4, 605– 621

© 2011 American Psychological Association 0096-3445/11/$12.00 DOI: 10.1037/a0024014

Recollection-Based Prospective Metamemory Judgments Are More Accurate Than Those Based on Confidence: Judgments of Remembering and Knowing (JORKs) David P. McCabe and Nicholas C. Soderstrom Colorado State University Five experiments were conducted to examine whether the nature of the information that is monitored during prospective metamemory judgments affected the relative accuracy of those judgments. We compared item-by-item judgments of learning (JOLs), which involved participants determining how confident they were that they would remember studied items, with judgments of remembering and knowing (JORKs), which involved participants determining whether studied items would later be accompanied by contextual details (i.e., remembering) or would not (i.e., knowing). JORKs were more accurate than JOLs when remember– know or confidence judgments were made at test and when cued recall was the outcome measure, but not for yes–no recognition. We conclude that the accuracy of metamemory judgments depends on the nature of the information monitored during study and test and that metamemory monitoring can be improved if participants are asked to base their judgments on contextual details rather than on confidence. These data support the contention that metamemory decisions can be based on qualitatively distinct cues, rather than an overall memory strength signal. Keywords: metacognition, metamemory, memory predictions, judgments of learning

metamemory) are less accurate because currently available information is not diagnostic of later memory performance (Castel et al., 2007; Koriat & Bjork, 2005, 2006; Rhodes & Castel, 2008). Experimental research aimed at understanding this aspect of metacognitive monitoring suggests that when participants are asked to assess their mastery of to-be-remembered information, they focus on characteristics of the encoding experience, such as how fluently or easily it is processed (e.g., Koriat, Bjork, Sheffer, & Bar, 2004; Rhodes & Castel, 2008), rather than focusing on the potential future retrieval environment. Thus, participants are likely to use cues that are most salient or most available during study, perhaps because these cues are typically associated with remembering information. However, the cues that are typically used as a basis for metamemory predictions are often not diagnostic with respect to later memory performance (Koriat, 1997), and it appears that participants have difficulty escaping the influence of currently available information on metamemory predictions (Castel et al., 2007; Koriat & Bjork, 2005). Although currently available but nondiagnostic information has a strong influence on metamemory predictions, this bias can be reduced by providing participants with information regarding how the cognitive system operates (Koriat & Bjork, 2006). In the current study, we introduced a novel procedure to determine whether judgments that individuals use to predict their own learning can be modified to encourage research participants to focus less on characteristics that are not diagnostic of later retrieval and more on the characteristics that are, thereby increasing the accuracy of metacognitive predictions. Thus, the primary thesis motivating the current research was that information that is diagnostic of later retrieval is available during study, and the accuracy of metamemory judgments is determined by the extent to which judgments are based on that information. The issue of improving

Judgments about the likelihood of future events are often inaccurate because they are biased by information that is currently available to consciousness (Camerer, Loewenstein, & Weber, 1989; Castel, McCabe, & Roediger, 2007; Gilbert, Pinel, Wilson, Blumberg, & Wheatley, 1998; Koriat & Bjork, 2005; Schkade & Kahneman, 1998). For example, Camerer, Loewenstein, and Weber (1989) reported that stock traders lost money when they had inside information about companies because they believed that others would make investment decisions as if they had the same information. Similarly, when asked to predict the likelihood that others would be able to solve anagrams, research participants judged solved anagrams as more being likely to be solved by others relative to new anagrams, despite no differences in the objective difficulty of the items (Kelley & Jacoby, 1996). Indeed, currently available information causes errors or reduces accuracy in many domains, including children’s reasoning (Birch & Bloom, 2004), memory for political forecasts (Blank, Fischer, & Erdfelder, 2003), affective forecasting (Gilbert et al., 1998), planning for the future (Buehler, Griffin, & Ross, 1994), and medical decision making (Anderson, Stumpf, & Schulkin, 2009), to name but a few examples. Particularly relevant to the current study are situations in which judgments about one’s own memory performance (i.e., This article was published Online First June 27, 2011. David P. McCabe and Nicholas C. Soderstrom, Department of Psychology, Colorado State University. David P. McCabe passed away unexpectedly on January 11, 2011. His death is a great loss to everyone who knew him; he will be missed dearly. Correspondence concerning this article should be addressed to Nicholas C. Soderstrom, Department of Psychology, Colorado State University, Fort Collins, CO 80523-1876. E-mail: [email protected] 605

606

MCCABE AND SODERSTROM

the accuracy of metacognitive predictions is not simply a theoretical exercise, but rather, procedures that could potentially remediate cognitive biases have implications for improving metacognition in educational settings (Thiede & Anderson, 2003), memory rehabilitation (Bewick, Raymond, Malia, & Bennett, 1995), medical diagnostics (Croskerry, 2003), and other domains.

Predicting Memory Performance: Judgments of Learning Judgments that have been used to investigate metamemory can be classified into two general types: prospective metamemory judgments in which research participants are asked to predict future memory performance, and retrospective metamemory judgments in which research participants are asked to determine the veracity of retrieved information (Nelson & Narens, 1990). With respect to predictions of memory performance, judgments of learning (JOLs) have received considerable attention in recent years. JOLs require participants to determine how confident they are that they will later remember information that is being studied. The focus of the current study was on the relative accuracy of these predictions, which we assessed by examining resolution. Resolution measures the extent to which participants can distinguish between items that will be remembered later, as compared with those that will not. A real-world analogue to resolution is the situation in which a student attempts to distinguish between information that will or will not be remembered for an upcoming test. If items that are given high-confidence judgments are later remembered and items given low-confidence judgments are not, resolution would be considered high, whereas if items that are given high-confidence judgments are not remembered later and items given low-confidence judgments are remembered, resolution would be considered low. Resolution is typically measured by gamma correlations, a nonparametric measure of association (cf., correlation coefficient), which is calculated for each participant and then averaged across participants (see Nelson, 1984). Although JOLs made immediately after participants have studied to-be-remembered items are typically positively correlated with memory performance, this correlation is typically far from perfect (Nelson & Dunlosky, 1992). Thus, any theoretical explanation of the mechanisms underlying JOLs must explain their weak-to-moderate accuracy under most circumstances. As a point of comparison for the aforementioned inaccuracy of immediate JOLs, average gamma correlations become considerably stronger, sometimes approaching 1.0, when JOLs are made after a delay (see Dunlosky & Nelson, 1992; Rhodes & Tauber, 2011).

Unidimensional and Multidimensional Theories of Prospective Metamemory Judgments Theoretical frameworks proposed to explain JOLs fall into two broad classes: unidimensional and multidimensional (Jang & Nelson, 2005; Koriat, 1997). Unidimensional theories of JOLs—such as trace-access theories (Hart, 1967; Jang & Nelson, 2005) or memory-strength theories (Busey, Tunnicliff, Loftus, & Loftus, 2000)—propose that JOLs and recall are based on the same information. Thus, during study, participants assess the memory strength of a given item and base their JOLs on the given level of memory strength. Memory strength also determines the probability

that the item will be recalled or recognized during a test. According to memory-strength theories, there are several possible reasons that JOLs are not perfectly predictive of later recall. For example, the fineness of the judgment scale may be limited (Jang & Nelson, 2005), such that participants cannot make their judgments with sufficient precision, or JOLs may be influenced by irrelevant information available in working memory that adds noise to measures of resolution (Dunlosky & Nelson, 1992; Sikström & Jönsson, 2005). Nonetheless, strength theories predict a positive relation between JOLs and memory performance, but this pattern is not always found (e.g., Benjamin, Bjork, & Schwartz, 1998; Koriat, 1997). Multidimensional frameworks of JOLs propose that multiple, distinct cues provide the basis for JOLs and determine their accuracy (Koriat, 1997; Metcalfe & Finn, 2008). For example, Koriat’s (1997) cue utilization framework suggests that the relative inaccuracy of immediate JOLs reflects a discrepancy between the cues used to inform JOLs and the cues that determine retrieval success. In particular, Koriat et al. (2004) suggested that “on-line [immediate] JOLs are based predominantly—perhaps exclusively— on the subjective experience associated with processing fluency” (p. 653). Thus, one of the primary reasons for the less-than-perfect accuracy of immediate JOLs appears to be an overreliance on the fluency with which items are processed, which is often not predictive of future memory performance (Benjamin et al., 1998; Castel et al., 2007; Hertzog, Dunlosky, Robinson, & Kidder, 2003; Koriat, 1997; Rhodes & Castel, 2008). Although encoding fluency appears to be a powerful cue influencing JOLs, this influence is not inevitable. For example, when immediate JOLs are made for the same items during repeated study–test cycles, resolution improves (Koriat & Bjork, 2006). Similarly, researchers can reduce participants’ reliance on fluency as a basis for JOLs by explaining how the retrieval environment will differ from the study environment (Koriat & Bjork, 2006). In the current study, we introduce a procedure intended to shift the basis of prospective metamemory judgments from characteristics that are unlikely to be strongly associated with retrieval (e.g., encoding fluency) to characteristics that should be more closely associated with later retrieval. Specifically, we explained the distinction between remembering (i.e., recollection of specific details) and knowing (i.e., memory devoid of contextual details; Gardiner, 1988; Tulving, 1985) to participants prior to their prospective metamemory judgments and asked them to base their judgments on whether they expected to remember or know the items in the future. The remember– know procedure heretofore has only been used to assess retrieval experiences. Thus, in the current study, we employed a procedure in which predictions of later memory performance are made based on the types of information that are associated with remember– know judgments. These prospective metamemory judgments will be referred to as JORKs (judgments of remembering and knowing), and their accuracy will be compared to prospective judgments based on confidence (i.e., JOLs).

Unidimensional and Multidimensional Explanations of Remember–Know Judgments As is the case with prospective metamemory judgments, there are both unidimensional and multidimensional theories of retrospective metamemory judgments (Dunn, 2008; Gardiner, 2001;

RECOLLECTION-BASED JUDGMENTS

McCabe & Geraci, 2009a; Parks & Yonelinas, 2007; Rajaram, 1998; Wixted & Mickes, 2010). Unidimensional theories assume that remember– know judgments measure overall memory strength (Donaldson, 1996; Dunn, 2008). Although some unidimensional models propose that a single process underlies memory performance (e.g., Donaldson, 1996; Dunn, 2008) and others propose that multiple processes underlie memory performance (e.g., Rotello, Macmillan, & Reeder, 2004; Wixted & Mickes, 2010), these memory-strength theories can all be considered single-state theories because all share the assumption that participants only monitor a single memory-strength signal. Indeed, regardless of whether memory-strength theories posit a single-memory process or two processes, they share the assumption that remember– know judgments and confidence judgments are based on the same underlying information and should show empirical patterns that are similar. In contrast to unidimensional theories of remember– know judgments, multidimensional theories of remember– know judgments—specifically, those referred to as dual-state explanations— treat these judgments as valid, though imperfect, metacognitive measures of two distinct states of conscious awareness (Gardiner, 2001; McCabe & Geraci, 2009a; McCabe, Roediger, McDaniel, & Balota, 2009; Rajaram, 1998). According to dual-state explanations, participants have extensive experience in distinguishing between remembering (i.e., contextual retrieval of details from their personal past) and knowing (retrieval devoid of these contextual details) in their daily lives and can effectively characterize experiences in an experiment as falling into either of these categories (see McCabe & Geraci, 2009b). These dual-state explanations assume that remember– know judgments are based on different information than are confidence judgments, and under some circumstances, this difference will become apparent. Remember– know judgments are also often aligned with dual-process explanations that treat these judgments as indices of recollection and familiarity (e.g., Parks & Yonelinas, 2007), although our dual-state explanation is agnostic with respect to the potential memory processes that provide the bases for these judgments. In order to explain how memory strength is used when making remember– know judgments, Wixted (2007) has used the analogy of a juror considering evidence of guilt, suggesting that a juror considers the overall strength of accumulated evidence in a given case, rather than making a decision based on a single piece of evidence. Using Wixted’s juror analogy, we contend that jurors likely weight evidence differentially, according to whether the evidence is more or less diagnostic of guilt. For example, if jurors were determining the guilt of a defendant, DNA evidence at a crime scene would be more diagnostic of guilt than evidence that a defendant was seen in the vicinity of the crime scene, and presumably their decision could be based exclusively on the DNA evidence. Indeed, medical diagnoses, political forecasting, and legal decisions are often more efficient when made on the basis of the most diagnostic information rather than on the most information (Gigerenzer & Goldstein, 1996).

Unidimensional and Multidimensional Theory Predictions The theoretical rationale for the current study was novel in that dual-state accounts of remember– know judgments were used to inform the predictions for the accuracy of prospective

607

metamemory judgments. In particular, on the basis of multidimensional theories of prospective metamemory judgments (e.g., Koriat, 1997), coupled with the assumptions of dual-state retrieval theories (McCabe & Geraci, 2009a), we expected JORKs to be more accurate in predicting memory performance than JOLs because JORKs encourage judgments to be made on the basis of cues that are more diagnostic of retrieval experience (i.e., cues associated with episodic retrieval). Thus, asking participants to base their predictions on the type of details that are associated with episodic retrieval should confer a benefit that predictions based on confidence do not. By contrast, according to unidimensional accounts of prospective metamemory judgments (e.g., Jang & Nelson, 2005), memory strength is monitored at study and test, and there is no mechanism to account for the use of qualitatively distinct cues. Accordingly, the accuracy of JORKs and JOLs should be similar. In the current study, we also investigated the type of retrospective metacognitive judgments or memory tests that were employed, as the accuracy of prospective metamemory judgments is necessarily dependent on the outcome measure with which those prospective metamemory judgments are correlated. Thus, across the various experiments we report, we used remember– know judgments, confidence judgments, yes–no recognition, or cued recall. According to the dual-state theory guiding our hypotheses, the accuracy of prospective metacognitive judgments would depend on whether the outcome measure allows participants to distinguish between recollective experiences or the lack thereof. We expected that prospective metamemory accuracy would be greatest when remember– know judgments were made at test because these judgments explicitly require participants to judge whether contextual details would be retrieved or not (i.e., remembering and knowing, respectively). Confidence judgments are similar to remember– know judgments in that high-confidence judgments are probably based to a great extent on recollective details, and low-confidence responses are based on a lack of recollective details (see Roediger, Rajaram, & Geraci, 2007). However, confidence judgments are a cruder measure of recollective experience because participants are not instructed regarding the types of information that are associated with episodic retrieval, but rather are free to determine the types of information that they believe are indicative of confidence. Indeed, previous research indicates that high-confidence judgments are considerably more liberal than remember responses, suggesting that high-confidence judgments conflate the experiences of remembering and knowing (Gardiner & Java, 1990; Geraci, McCabe, & Guillory, 2009; Rajaram, Hamilton, & Bolton, 2002). Note that because of the same rationale mentioned here, we expected that JORKs would not be more accurate than JOLs when a yes–no recognition test was used as the outcome measure because a binary yes–no response provides no basis on which to distinguish between recollection of details or the lack thereof. To summarize, the experiments that we will report are as follows: The predictive accuracy of JORKs and JOLs made at study were compared for recognition tests that included either remember– know judgments (Experiment 2), both remember– know and confidence judgments (Experiment 1), or yes–no judgments (Experiment 3). Two final experiments (Experiments 4a and 4b) were conducted to examine the relative accuracy of JORKs and JOLs with a paired-associate learning paradigm. Thus, the experiments examined the accuracy of JORKs and JOLs with

MCCABE AND SODERSTROM

608

respect to predicting both memory performance (i.e., pairedassociate recall), hits on a recognition test, and retrospective metamemory judgments (i.e., remember– know and confidence judgments). The primary question of interest was whether monitoring distinct details (i.e., remembering and knowing) at study would lead to more accurate metamemory predictions than monitoring confidence at study.

Experiment 1 In Experiment 1, we crossed two prospective metamemory judgments—JORKs and JOLs—with two retrospective metamemory judgments—remember– know and confidence. JORKs were made on a 3-point scale (“Will you recollect, know, or forget this item?”), as were JOLs (rated on the scale Will remember ⬍ 1—2—3 ⬎ Won’t remember). These study judgment conditions were orthogonally crossed with analogous test conditions, such that participants received either a remember– know recognition test with a 3-point rating scale (i.e., 1 ⫽ recollect, 2 ⫽ know, 3 ⫽ new) or a confidence recognition test (Studied ⬍ 1—2—3 ⬎ Not studied). We expected that JORKs would be more accurate than JOLs because the former involve participants’ basing their judgments on information that is more closely associated with episodic retrieval. We also expected prospective metamemory judgments to be more accurate when a remember– know test (as compared with a confidence test) was used as an outcome measure, because the former requires a judgment that is based on whether retrieval experiences will be accompanied by contextual details or a lack thereof, whereas confidence judgments are less precise in this respect.

Method Participants. One-hundred seventy-six undergraduates from Colorado State University were randomly assigned to one of the four conditions (44 in each of the four conditions). They received course credit for their participation. Materials. In each condition, a homogeneous set of 132 concrete nouns that were equated in terms of their frequency (Log Hyperspace Analogue to Language [HAL] frequency M ⫽ 8.42, SD ⫽ 0.87, range ⫽ 6.51–10.95; Balota et al., 2007), number of letters (M ⫽ 5.77, SD ⫽ 0.75, range ⫽ 5–7), and number of syllables (M ⫽ 1.73, SD ⫽ 0.60, range ⫽ 1–3) were used. Two different sets of 60 items (Set A and Set B)— equated for frequency, concreteness, length, and number of syllables—were generated for purposes of counterbalancing. Half of the participants studied Set A with Set B items serving as the distractors on the recognition test, whereas for the other half of the participants, this was reversed. The study list consisted of 72 items, which included the studied items from Set A or B, depending on the counterbalance, and six filler words as buffers at the beginning and end of the studied list (these filler words were the same for all conditions). Between the study and test phases, participants engaged in several paper-and-pencil-based distractor tasks for 20 min. These tasks included a brief demographic questionnaire (age, sex, and so on), a matrix reasoning task, and a word search puzzle. Procedure. Participants were tested in groups of up to three participants. Prospective (JOLs or JORKs) and retrospective (remember– know or confidence) metamemory judgments were

factorially crossed, resulting in the following experimental conditions: (a) JORKs with a remember– know test, (b) JORKs with a confidence test, (c) JOLs with a remember– know test, and (d) JOLs with a confidence test. After signing informed consent, participants were read instructions that described the study phase of the experiment. Depending on the condition, either JORK or JOL study instructions were read aloud by the experimenter (see Appendix for instructions, which were based on McCabe & Geraci, 2009a). Subsequent instructions for the distractor tasks and test phases were also read aloud; questions were allowed after both task and test phases. During the study phase, participants studied 72 words presented one at a time on the computer screen. Each word was presented for 2 s, during which participants were asked to think of whatever association came to mind. In both JORK conditions, JORKs were made immediately following the study of each word. A screen reading “1 ⫽ recollect, 2 ⫽ know, or 3 ⫽ forget” prompted them for their judgment and remained on the screen until a judgment was made. Here, participants were asked to determine whether they would recollect the word, know the word, or forget the word on a later recognition test. Responses were given by pressing the 1–3 keys on the right side of the keyboard, which were labeled R, K, or F. After each JORK was given, the next word-judgment pair was presented, and this sequence was repeated until all 72 words had been studied and given a judgment. The study phase for both the JOL conditions was similar to that for the JORK conditions, with the key difference being that instead of making JORKs after each word, participants made JOLs. A screen reading “Will remember ⬍ 1—2—3 ⬎ Won’t remember” prompted them for their judgment after they had studied each word and remained until a judgment was made. Here, participants were asked to think of how confident they were that the previously studied word would be remembered on a later memory test (see Appendix for instructions). Responses were given by pressing the 1–3 keys on the right side of the keyboard. Immediately following the study phase, distractor tasks were completed for all participants. Participants completed these tasks in the same order and were stopped after 20 min regardless of how much of the distractor activities had been completed. The next phase of the experiment was the recognition test, which consisted of 60 studied words and 60 new words, presented one at a time, randomly intermixed for each participant. Participants who would be completing the remember– know tests were given the remember– know instructions. Note that this was the first time that the participants in the JOL with remember– know test condition were exposed to the remember– know distinction, and the first time that the participants in the JORK with confidence condition were exposed to the confidence distinction. For each item on the test, participants were instructed to indicate whether they would recollect the word, know the word, or judge the word to be new to the experiment. Before the test phase for these conditions began, a paper flap attached to the top of each computer monitor reading “1 ⫽ recollect, 2 ⫽ know, 3 ⫽ new” was flipped down, allowing participants to refer to this scale when making their responses. Responses were given by pressing the 1–3 keys at the top of the keyboard. Once a response was given, the next word appeared. This sequence continued until all 120 words were given a response.

RECOLLECTION-BASED JUDGMENTS

For participants completing the confidence tests, conditions were similar to the previously described conditions, with the key difference being that instead of making recollect/know/new responses after each word, participants made responses on the basis of their confidence. Participants were given instructions regarding the use of the confidence scale (see Appendix) and asked to indicate how confident they were that the word had been presented earlier. Before the test phase began, a paper flap attached to the top of each computer monitor reading “Studied ⬍ 1—2—3 ⬎ Not studied” was flipped down, so participants could refer to this scale when making the responses on the 1–3 keys at the top of the keyboard. The test phase was completed once all 120 words had been given a response. SuperLab software (Cedrus Corp., San Pedro, CA) was used for the presentation of all stimuli and the recording of responses. After all the participants had completed the recognition test, participants were debriefed and dismissed from the experiment.

Results An alpha level of .05 was used for all statistical tests. F values, mean-square error (MSE), and effect sizes (partial eta squared, ␩2p) are reported for each analysis. Cohen’s d is reported for comparisons of JORKs and JOLs and fro comparisons of remember– know and confidence tests in Table 1 as well. Gamma correlations as a function of prospective and retrospective judgment type. The primary analyses of interest concerned the differences in relative accuracy of prospective metamemory judgments (i.e., JOLs or JORKs). Relative accuracy reveals the extent to which participants were able to use prospective metamemory judgments to distinguish between items that would and would not be remembered. Following the convention in the literature, we assessed relative accuracy using gamma correlations, which are nonparametric correlations appropriate for use with categorical data (Nelson, 1984). In the current study, this

609

involved correlating the JOLs (ranging from 1–3) or JORKs (ranging from 1–3, corresponding to R, K, or F judgments) with confidence ratings (ranging from 1–3) or remember/know/new judgments (ranging from 1–3, corresponding to R, K, or N judgments). Thus, positive gamma correlations were associated with greater accuracy of prospective judgment. A gamma correlation was calculated for each participant, and then the mean gamma correlations were compared across participants in each condition. In order to determine the relative accuracy of prospective metamemory judgments and to determine whether potential differences were associated with retrospective judgment type, we conducted a 2 (prospective judgment type: JOLs, JORKs) ⫻ 2 (test judgment type: confidence, remember– know) between-subjects analysis of variance (ANOVA). The results are shown in Figure 1. This analysis revealed a main effect of study judgment type, such that mean gammas for JOLs (␥ ⫽ .38) were lower than mean gammas for JORKs (␥ ⫽ .53), F(1, 172) ⫽ 11.84, MSE ⫽ 0.99, ␩2p ⫽ .06, and a main effect of test type, such that mean gamma correlations when confidence tests were used (␥ ⫽ .37) were lower than mean gamma correlations when remember– know tests were used (␥ ⫽ .54), F(1, 172) ⫽ 15.46, MSE ⫽ 1.29, ␩2p ⫽ .08. The interaction between Prospective Judgment Type ⫻ Test Type was not significant, F ⬍ 1. Because we were also interested in the accuracy of prospective metamemory judgments as a function of test judgment type separately, we analyzed gammas for each test type separately (see Table 1). This analysis revealed that JORKs were more accurate than JOLs for both the remember– know tests, F(1, 86) ⫽ 7.89, MSE ⫽ 0.59, ␩2p ⫽ .08, and confidence tests, F(1, 86) ⫽ 4.43, MSE ⫽ 0.41, ␩2p ⫽ .05. Recognition memory as a function of prospective and retrospective judgment type. We also examined whether the prospective judgment type or test type influenced overall recognition memory performance (d⬘). We calculated d⬘ by collapsing remember and know judgments for the remember– know tests and the two

Table 1 Gamma Correlations and Effect Sizes (Cohen’s d) for Prospective and Retrospective Metamemory Judgments and Cued Recall for Experiments 1– 4 Prospective judgment Judgments of remembering & knowing

Judgments of learning Experiment Experiment 1 Confidence judgments Remember-know judgments Effect size for test type Experiment 2 Remember–know judgments Experiment 3 Yes–no judgments Experiment 4a Cued recall Remember-know judgments Experiment 4b Cued recall Remember–know judgments ⴱ

p ⬍ .05.



SD

.30 .45

.26 .29

.41

Cohen’s d for type of prospective judgment



SD

.43 .62

.34 .26

.34

.76

.18

1.35ⴱ

.31

.42

.37

.43

.14

.40 .42

.36 .34

.60 .58

.25 .24

.67ⴱ .55ⴱ

.31 .30

.28 .38

.46 .48

.20 .18

.52ⴱ .62ⴱ

Cohen’s d

.56ⴱ

Cohen’s d

.63ⴱ

.43ⴱ .63ⴱ

MCCABE AND SODERSTROM

610

Gamma Correlaon

.70

JORK

.60

JOL

.50 .40 .30 .20 .10 .00

Remember/Know

Confidence Test Type

Figure 1. Mean gamma correlations for judgments of remembering and knowing (JORKs) and judgments of learning (JOLs) for the remember– know tests (left) and confidence tests (right) in Experiment 1. Error bars reflect standard errors of the mean.

highest confidence judgments for the confidence tests into “old” responses (hits for targets; false alarms for lures). Note that d⬘ could also be compared for the remember and highest confidence (a rating of 1) hit and false-alarm rates, but floor effects for false

alarms for these response categories make these comparison tenuous (Diana, Reder, Arndt, & Park, 2006; McCabe & Balota, 2007). We converted hit rates and false-alarm rates using the correction factor recommended by Snodgrass and Corwin (1988, p. 35) in order to account for hit rates of 1 and false-alarm rates of 0, which are undefined in d⬘ calculations. In order to statistically evaluate recognition memory, we conducted a 2 (prospective judgment type: JOLs, JORKs) ⫻ 2 (test type: confidence, remember– know) between-subjects ANOVA on d⬘ values. The results are shown in Table 2. This analysis revealed no main effect of prospective judgment type, F ⬍ 1, a main effect of test type, F(1, 172) ⫽ 11.27, MSE ⫽ 5.86, ␩2p ⫽ 0.06, but no interaction, F ⬍ 1. Participants were more accurate when they completed a remember– know test as compared with a confidence test. A closer analysis of the test responses in Table 2 reveals that the difference in recognition memory (d⬘) as a function of test type was driven by higher levels of false alarms for new items. To confirm this observation regarding the false-alarm rates, we performed a 2 (prospective judgment type: JOLs, JORKs) ⫻ 2 (test type: confidence, remember– know) between-subjects ANOVA of the hit rate, which revealed no main effect of prospective judgment type, F ⬍ 1; no main effect of test type, F(1, 172) ⫽ 1.33, MSE ⫽ 0.01,

Table 2 Recognition Memory Judgments and Recognition Accuracy (d⬘) For Experiments 1–3 Following Judgments of Remembering and Knowing (JORKs) or Judgments of Learning (JOLs) Prospective judgment Judgments of learning Experiment and test type Experiment 1 Remember–know test Studied remember Studied know Hits New remember New know False alarms Recognition accuracy Confidence test Studied/high confidence (1) Studied/medium confidence (2) Hits New/high confidence (1) New/medium confidence (2) False alarm Recognition accuracy Experiment 2 Remember–know test Studied remember Studied know Hits New remember New know False alarms Recognition accuracy Experiment 3 Old-new test Studied New Recognition accuracy

Proportion

SD

.62 .26 .87 .03 .11 .15

.19 .15 .12 .03 .13 .14

Overall d⬘

2.56 .81 .09 .90 .10 .17 .27

Proportion

SD

.46 .43 .89 .03 .10 .13

.21 .20 .07 .06 .09 .12

0.81 .84 .06 .90 .09 .16 .25

.40 .50 .90 .01 .09 .09

.89 .12 2.28

0.68

SD

2.54

0.67

2.23

0.67

2.91

0.74

2.65

0.68

.12 .13 .08 .01 .08 .08

0.70

.10 .08

Overall d⬘

.15 .06 .13 .07 .14 .16

0.73

.19 .21 .13 .04 .11 .13 2.02

.83 .13

SD

.14 .11 .08 .10 .16 .18 2.15

.39 .43 .82 .02 .18 .20

Judgments of remembering & knowing

.08 .09

RECOLLECTION-BASED JUDGMENTS

␩2p ⫽ .01; and no interaction, F ⬍ 1. An identical ANOVA of false-alarm rates revealed no main effect of prospective judgment type, F ⬍ 1, and no interaction, F ⬍ 1, but there was a main effect of test type, F(1, 172) ⫽ 28.29, MSE ⫽ 0.65, ␩2p ⫽ .14, indicating that false alarms were greater for the confidence tests than the remember– know tests.1 Despite the differences in memory performance for the different test types, the statistical significance of the previous analyses of gamma correlations was unchanged when d⬘ was added as a covariate to those analyses. Thus, the increased relative accuracy of JORKs as compared with JOLs was independent of memory accuracy. We also note that the gamma correlations reported are between studied items and targets on the recognition test and thus do not reflect false alarms. Therefore, differences in the falsealarm rate would not be expected to affect gamma accuracy. Distribution of study responses as a function of study condition. The distribution of response categories during study can have dramatic effects on resolution (i.e., gamma correlations; Koriat & Goldsmith, 1996; Weaver & Kelemen, 1997). In particular, when responses are polarized, such that only the ends of a scale are used, accuracy can be inflated relative to distributing responses across the entire response scale. Thus, if there were differences in the distribution of JORK or JOL responses at study, this could potentially affect the differences in the accuracy of gamma correlations that we have reported. In order to examine this issue, we conducted a 2 (prospective judgment type: JOLs, JORKs) ⫻ 3 (study response type: 1/remember, 2/know, 3/forget) between-subjects ANOVA. These data are presented in Table 3. Note that the responses were not labeled the same in the JOL and JORK conditions, but each response had three levels and, thus, were compared directly. Moreover, because the responses sum to 1 for the study response type, main effects could not be calculated for prospective judgment type (the between-subjects variable), and thus, the interest is in the main effect of study response type (i.e., the distribution of the responses over the three response categories) and the interaction between the variables. The ANOVA revealed a small but significant main effect of study response type, F(1, 172) ⫽ 3.09, MSE ⫽ 0.15, ␩2p ⫽ .02, indicating that the judgments were not equally distributed across the three response categories. More important, there was no interaction with prospective judgment type (F ⬍ 1), indicating that the distribution of responses was similar for JOLs and JORKs. Thus, the increased accuracy of JORKs as compared with JOLs was not caused by the differential distribution of those responses during study. Experiment 1 revealed several findings of interest. First, JORKs were more accurate than JOLs. Thus, having participants focus on details associated with studied items led to benefits in metacognitive accuracy relative to having them make judgments of confidence (i.e., JOLs). Second, this difference in accuracy occurred regardless of test type. The lack of an interaction between Prospective Judgment Type ⫻ Retrospective Judgment Type rules out the possibility that participants were simply matching their prospective and retrospective judgments (e.g., “If I predicted knowing at study, I will give a ‘know’ response at test”). Third, the greater accuracy of JORKs as compared with JOLs was not due to differences in the distributions of prospective metamemory judgments; for both JORKs and JOLs, all three response categories were used nearly equally. Fourth, prospective metamemory judgments were

611

more accurate when remember– know tests were used than when confidence tests were used. Finally, recognition memory (d⬘) was more accurate when remember– know tests were used as compared with when confidence tests were used.

Experiment 2 In the subsequent experiments, we focused on replicating, extending, and gaining a better understanding of the finding that JORKs were more accurate than JOLs. Experiment 2 was conducted with two methodological changes intended to confirm that the results of the first experiment was not an artifact of some methodological differences between prospective or retrospective procedures. The first change was to modify the scale used for JOLs. For JORKs in Experiment 1, the end of the scale denoting the lowest likelihood of remembering an item in the future was labeled Forget, whereas for JOLs, the corresponding end of the scale was labeled Will not remember. Because using the word forget to frame metamemory judgments has been associated with changes in responding (Finn, 2008, but see Rhodes & Castel, 2008), we decided to replace the anchor Will not remember on the JOL scale with Will forget. This allowed us to examine whether framing both JORKs and JOLs in terms of forgetting might eliminate the accuracy of JORKs as compared with JOLs. The second change we made for Experiment 2 was to modify the instructions given for the remember– know test in order to try to equate the use of remember responses for studied items following JORKs and JOLs. Recall that more remember responses had been given to targets in the JOL condition of Experiment 1 than in the JORK condition (see Table 2), indicating that prospective judgments were influencing retrospective judgments. The reason for this is unclear; regardless, we wanted to equate the use of remember responses for studied items between the JOL and JORK conditions in order to confirm that this difference was not affecting the difference in relative accuracy between the JORK and JOL conditions. Thus, in Experiment 2, test instructions highlighted the idea that remember and know judgments were not to be confused with confidence judgments and that participants can have highconfidence know responses without recollecting contextual details (cf., Rajaram, 1993). We also included instructions explaining that remember responses should only be given if participants could report to the experimenter the specific contextual details associated with that item (Yonelinas, 2001) in order to ensure participants only used remember responses when they could access contextual details associated with test items (see Parks & Yonelinas, 2007, for detailed discussion of this issue). We did not include a confidence test in Experiment 2 because it was not necessary in order to address these methodological issues. Thus, the primary question of interest concerned whether the same patterns would replicate with the scale anchor (i.e., Will forget) and test instructions altered. 1 A caveat regarding this finding should be noted. Specifically, for these analyses, we collapsed remember and know judgments and confidence ratings of 1 and 2 into hit responses. It is clear that know judgments constitute “old” responses; however, this is less clear regarding a confidence rating of 2. That is, we coded a confidence rating of 2 as a hit, but we acknowledge that one may question whether this rating should, in fact, be regarded as an old response. We thank a reviewer for pointing this out.

MCCABE AND SODERSTROM

612

Table 3 Distributions of Responses for Judgments of Remembering and Knowing and Judgments of Learning for Experiments 1– 4 Type of judgment Judgments of learning 1 (high confidence) 2 (medium confidence) 3 (low confidence) Judgments of remembering & knowing Remember Know Forget

Experiment 1

Experiment 2

Experiment 3

Experiment 4a

Experiment 4b

.31 .38 .31

.33 .38 .29

.27 .42 .31

.33 .35 .30

.32 .36 .32

.35 .34 .31

.36 .39 .25

.31 .37 .32

.33 .30 .38

.31 .32 .37

Note. For the judgments of learning, 1 ⫽ will remember on the scale in all experiments; 3 ⫽ will not remember on the scale in Experiment 1 and 3 ⫽ forget in Experiments 2, 3, and 4.

Method Participants. Forty-eight undergraduates from Colorado State University were randomly assigned to one of the two prospective metamemory judgment conditions (24 in each of the two conditions). They received course credit for their participation. Materials and procedure. Materials were the same as Experiment 1. The procedure was also identical to Experiment 1 with the following exceptions. First, the JOL scale was changed to Will remember ⬍ 1—2—3 ⬎ Will forget. Second, the remember– know instructions were modified to include the statements, “Only give a recollect response if you can explain the reason for it,” and “It is important that you realize the difference between a recollect and know response is NOT just how confident you are. You can be very confident for either a recollect or know response.” These changes were intended to make participants more conservative with respect to the use of the remember response during test. Finally, no confidence test condition was included. Thus, Experiment 2 only included two betweensubject groups: JOLs with a remember– know test and JORKs with a remember– know test.

Results Gamma correlations as a function of prospective judgment type. We examined the relative accuracy of prospective metamemory judgments by conducting a one-way ANOVA with prospective judgment type (JOLs, JORKs) as a between-subjects variable and mean gamma correlations as the dependent measure. The results are shown in Table 1. This analysis revealed a main effect of judgment type, such that mean gammas for JOLs (␥ ⫽ .42) were less accurate than mean gammas for JORKs (␥ ⫽ .76), F(1, 46) ⫽ 19.19, MSE ⫽ 1.41, ␩2p ⫽ .29. Thus, labeling lowconfidence responses for JOLs with the anchor Will forget and using more conservative remember– know instructions (cf. Yonelinas, 2001) did not reduce the greater accuracy of JORKs as compared with the accuracy of JOLs and actually led to a stronger effect than in Experiment 1 (see effect sizes in Table 1). Recognition memory as a function of prospective judgment type. In order to examine whether prospective judgments influenced recognition memory, we conducted a one-way ANOVA with prospective judgment type (JOLs, JORKs) as a betweensubjects variable and d⬘ (measured as in Experiment 1) as the dependent measure. As shown in Table 2, this analysis revealed

that recognition memory (d⬘) was greater following JORKs than following JOLs, F(1, 46) ⫽ 16.82, MSE ⫽ 8.09, ␩2p ⫽ .19. However, there was no difference in remember responses across the JORK and JOL conditions, F ⬍ 1 (see Table 2). Thus, the accuracy of JORKs as compared with JOLs replicated even when the use of remember responses for studied items were statistically equivalent across JORK and JOL conditions. As was the case with Experiment 1, gamma correlations for JORKs were still more accurate than JOLs when d⬘ was entered as a covariate in an ANOVA. Thus, the difference in memory accuracy of JORKs and JOLs did not account for the differences in metacognitive accuracy between groups. Distribution of study responses as a function of study condition. As with Experiment 1, we examined the distribution of prospective metamemory judgments to determine whether this could have affected prospective metamemory accuracy by conducting a 2 (prospective judgment type: JOLs, JORKs) ⫻ 3 (study response type: 1/remember, 2/know, 3/forget) between-subjects ANOVA. The ANOVA revealed a main effect of study response type, F(2, 92) ⫽ 5.02, MSE ⫽ 0.15, ␩2p ⫽ .10, but there was no interaction with prospective judgment type (Fs ⬍ 1). The lack of an interaction indicates that the distribution of metamemory judgments was similar for JOLs and JORKs, but the main effect of study response type indicates that responses were not equally distributed across the different response categories. Specifically, although there were no significant differences in the percentage of 1/remember or 2/know responses, F(1, 47) ⫽ 1.24, MSE ⫽ 0.08, ␩2p ⫽ .03, p ⬍ .10, there was a lower percentage of 3/forget responses than remember responses, F(1, 47) ⫽ 4.09, MSE ⫽ 0.13, ␩2p ⫽ .08, or know responses, F(1, 47) ⫽ 9.78, MSE ⫽ 0.30, ␩2p ⫽ .17 (see Table 3). Experiment 2 replicated the finding that JORKs were more accurate than JOLs in predicting remember– know judgments, in this case when both scales were anchored with the term Forget and when remember– know instructions encouraged more conservative use of remember responses. As was the case for Experiment 1, there were no differences in the distribution of prospective metamemory judgments for the JORK and JOL conditions, but making JORKs led to more accurate memory performance (d⬘). However, differences in memory performance did not influence differences in metacognitive accuracy between the JORK and JOL conditions.

RECOLLECTION-BASED JUDGMENTS

Experiment 3 We have suggested that JORKs should be more accurate than JOLs because JORKs involve discriminating between items that will later include recollection of contextual details (i.e., remembering) and items that will not (i.e., knowing or forgetting). A corollary of this explanation is that the greater accuracy of JORKs depends, to a great extent, on the outcome measure requiring retrieval of contextual details. That is, if making JORKs during study encourages participants to distinguish between items that will later include contextual details and those that will not, then accuracy (i.e., gamma correlations) would depend on having an outcome measure in which items that include contextual details are distinguished from those that do not. For example, if a participant in the JORK condition gives a remember response to Item A and gives a know response to Item B, the accuracy of these judgments will depend, to a great extent, on those items being judged on similar information on the outcome test. However, if the outcome measure does not allow this sort of judgment to be made, the accuracy of prospective metamemory judgments will be constrained considerably. In Experiment 3, we addressed this prediction directly by using a yes–no recognition test as the outcome measure, instead of remember– know or confidence tests. In both remember– know and confidence tests, participants are asked to judge targets on the test according to the type of information that they retrieved or their level of certainty. Previously, we suggested that confidence tests provided a cruder discrimination between items that included retrieval of contextual details and those that did not, as compared with the remember– know test, which was why prospective metamemory judgments were less accurate for confidence tests. On the basis of this logic, we reasoned that yes–no recognition tests, in which participants are asked simply to determine whether an item was studied or not, would provide little precision as an outcome measure, thus constraining the accuracy of metamemory judgments. Stated differently, yes–no recognition tests do not allow one to make judgments on the basis of the amount or type of information retrieved, which should reduce the likelihood that JORKs would be more accurate than JOLs.

Method Participants. Seventy-two undergraduates from Colorado State University were randomly assigned to one of the two prospective metamemory judgment conditions (36 in each of the two conditions). They received course credit for their participation. Materials and procedure. Materials and procedure for the study phase were identical to those of Experiment 2. However, in Experiment 3, we used a yes–no recognition test in which participants were asked simply to determine, for each test item, whether it had been studied or was new. Participants gave responses by pressing one of two sticker-marked keys on the keyboard corresponding to “yes” and “no” (i.e., one key was marked Y; the other was marked N).

Results Gamma correlations as a function of prospective judgment type. The primary analysis of interest in Experiment 3, as with previous experiments, concerned the relative accuracy of partici-

613

pants’ prospective metamemory judgments (i.e., JOLs or JORKs), in this case, when a yes–no recognition test was used. These data are presented in Table 1. A between-subjects ANOVA for prospective judgment type (JOLs, JORKs) revealed no main effect of prospective judgment type, with similar gammas for JOLs (␥ ⫽ .31) and JORKs (␥ ⫽ .37), F ⬍ 1. Thus, when test judgments did not allow discrimination between retrieval accompanied by contextual details or the lack thereof, making JORKs at study was not significantly more accurate than making JOLs at study. Recognition memory as a function of prospective judgment type. In order to examine the accuracy of recognition memory, we conducted a one-way ANOVA with prospective judgment type (JOLs, JORKs) as a between-subjects variable and d⬘ as the dependent measure. This analysis revealed a significant difference in d⬘, F(1, 70) ⫽ 5.39, MSE ⫽ 2.50, ␩2p ⫽ .07, such that memory accuracy was greater following JORKs than following JOLs (see Table 2). This difference in accuracy resulted from the hit rate being higher following JORKs as compared with the hit rate following JOLs, F(1, 70) ⫽ 7.05, MSE ⫽ 0.06, ␩2p ⫽ .09, but there were no differences in the false-alarm rates, F ⬍ 1. Note again, that the relative accuracy of JORKs and JOLs (analyzed in the previous section of the Results) was unchanged, in terms of the significance of statistical tests, when memory performance was included as a covariate. Distribution of study responses as a function of study condition. As with Experiments 1 and 2, we examined the distribution of responses during prospective memory judgments by conducting a 2 (prospective judgment type: JOLs, JORKs) ⫻ 3 (study response type: 1/remember, 2/know, 3/forget) betweensubjects ANOVA. These data are presented in Table 3. The ANOVA revealed a main effect of study response type, F(1, 70) ⫽ 7.54, MSE ⫽ 0.21, ␩2p ⫽ .10, but no interaction, F(1, 70) ⫽ 1.66, MSE ⫽ 0.05, ␩2p ⫽ .02. The lack of an interaction indicates that the distribution of responses was similar for JOLs and JORKs, but the main effect of study response type indicates that responses were not equally distributed across the different response categories. Instead, although there was no significant difference in the percentage of 1/remember or 3/forget responses, F ⬍ 1, there was a higher percentage of 2/know responses than 1/remember responses, F(1, 70) ⫽ 17.70, MSE ⫽ 0.38, ␩2p ⫽ .20, or 3/forget responses, F(1, 70) ⫽ 7.15, MSE ⫽ 0.23, ␩2p ⫽ .09. Experiment 3 demonstrated that JORKs will not always be more accurate than JOLs, and these results were consistent with the idea that making judgments about items at retrieval on the basis of amount or type of information that is available is important to the accuracy of metamemory judgments.

Experiments 4a and 4b One limitation to our using single-item recognition tests in the first two experiments was that JORKs and JOLs were not correlated with memory accuracy per se but rather were correlated with metamemory judgments. Proper assessment of memory accuracy in recognition requires that response bias (i.e., the distance between targets and distractor distributions) is accounted for, but it was not possible to correlate prospective metamemory judgments with memory accuracy in this fashion in the first three experiments because only studied items were given JOLs, whereas new items were not.

MCCABE AND SODERSTROM

In order to examine whether the greater accuracy of JORKs as compared to JOLs would generalize beyond recognition testing and whether this would generalize to memory performance proper rather than metamemory judgments, we conducted Experiments 4a and 4b to compare the accuracy of JORKs and JOLs using a paired-associate recall paradigm. Paired-associate recall involves having participants study word pairs (e.g., couch–pen, glass– street) and then attempt to retrieve the target word (e.g., pen) when given the cue word (e.g., couch). Because most research on JOLs has used paired-associate recall, this experiment also provides a bridge between the current study and the majority of studies reported in the JOL literature. In Experiment 4a, we used a procedure in which remember– know judgments were explained to participants prior to the cued-recall test, and these judgments were made concurrently with recall. Because we were concerned that making remember– know judgments during the cued-recall test might affect retrieval strategies, we conducted an additional experiment (4b) in which the cued-recall test was completed prior to the completion of the remember– know judgments, and participants provided remember– know judgments for items that were already retrieved.

Method: Experiments 4a and 4b Participants. In each of the experiments (4a and 4b), 72 undergraduates from Colorado State University were randomly assigned to either the JORK or JOL condition (36 in each of the two conditions). They received course credit for their participation. Materials and procedure. Studied items included 48 unrelated word pairs (from Castel et al., 2007). Four of the pairs—the first two and the last two—were fillers and were not tested. The procedure was similar to that of Experiment 2 with the exception that studied items included word pairs and the memory test was a cued recall test. Thus, both JOLs and JORKs were made with regard to participants’ expected memory for each presented pair (i.e., “Will you remember the target word later when given the cue word?”). We modified the instructions included in the Appendix by simply replacing the word “word” with “word pair.” Participants were told that each pair included a cue and a target and that they would be judging whether they would be able to retrieve the target, given the cue word, approximately 10 min after the study phase. Each pair was presented for 4 s, after which time JOLs or JORKs were made. Once all 48 pairs had been given a judgment, a brief demographic questionnaire was completed to serve as a distractor. In Experiment 4a, after completing the distractor task, participants were given remember– know instructions based on those used in Experiment 2, with minor changes included to customize the instructions for cued recall (i.e., the word “word” was replaced with “word pair”). Next to each cue word and blank space (for the potentially recalled target) were the letters R and K. Participants were asked to circle either R (for remember) or K (for know) for each target that was recalled. Two different paper-and-pencil test sheets were created with different random orders. Each test sheet included all of the cue words, half of which were presented on the left side of the sheet and half of which were presented on the right side. In Experiment 4b, rather than having participants complete remember– know judgments during recall, we provided the

remember– know instructions after recall was completed, and participants were asked to circle R or K for each target that had already been recalled.

Results Experiment 4a. Gamma correlations as a function of prospective judgment type. The primary analyses of interest in Experiment 4 concerned the relative accuracy of participants’ prospective metamemory judgments (i.e., JOLs or JORKs). Thus, we conducted a one-way ANOVA with prospective judgment type (JOLs, JORKs) as between-subjects variable and mean gamma correlations as the dependent measure. The results are shown in Figure 2 (the effect sizes are reported in Table 1). This analysis revealed a main effect of prospective judgment type, such that mean gammas for JORKs (␥ ⫽ .60) were greater than for JOLs (␥ ⫽ .40) when cued recall was the outcome measure (i.e., a binary coding of recalled or not recalled), F(1, 70) ⫽ 7.41, MSE ⫽ 0.71, ␩2p ⫽ .10. The same outcome replicated in predictions of remember– know judgments, such that JORK gammas (␥ ⫽ .58) were greater than JOLs gammas (␥ ⫽ .42), F(1, 70) ⫽ 5.59, MSE ⫽ 0.50, ␩2p ⫽ .08. Although the similar values for gamma correlations for remember– know and cued recall may seem inconsistent with our prior suggestion that less precision in the outcome measure should constrain the accuracy of JORKs, there are important differences between cued-recall and recognition tests that may explain this discrepancy. This issue will be discussed in more detail in the General Discussion. Cued-recall and remember– know judgments as a function of prospective judgment type. In order to examine whether completing JORKs or JOLs at study affected memory performance differentially, we conducted a one-way ANOVA with prospective judgment type (JOLs, JORKs) as a between-subjects variable and proportion of cued recall as the dependent measure. These data are presented in Table 4. This analysis revealed no difference in cued-recall performance for the JORK and JOL conditions, F(1,

.70

JORK

JOL

.60

Gamma Correlaon

614

.50 .40 .30 .20 .10 .00

RK During Recall

RK Aer Recall

Figure 2. Mean gamma correlations for judgments of remembering and knowing (JORKs) and judgments of learning (JOLs) predicting cued recall in Experiments 4a (left) and 4b (right). In Experiment 4a, the remember– know judgments were made for each item while participants were recalling them, and in Experiment 4b, the remember– know judgments were made after cued recall was completed. Error bars reflect standard errors of the mean.

RECOLLECTION-BASED JUDGMENTS

Table 4 Cued Recall and Remember–Know Responses for Experiments 4a and 4b

Experiment and response Experiment 4a (RK during cued recall) Overall recall Remember Know Experiment 4b (RK after cued recall) Overall recall Remember Know Note.

Judgments of RK

Judgments of learning

M

SD

M

SD

.24 .18 .06

.18 .17 .08

.29 .24 .06

.15 .14 .07

.30 .22 .08

.17 .13 .07

.30 .20 .10

.19 .13 .10

RK ⫽ remember– know responses.

70) ⫽ 2.15, MSE ⫽ 0.06, ␩2p ⫽ .03. Similarly, when remember– know judgments were considered as a measure of memory performance, there were no differences in remember responses following JORKs compared with JOLs, F(1, 70) ⫽ 1.49, MSE ⫽ 0.04, ␩2p ⫽ .02, and no differences in know responses following JORKs compared with JOLs either, F ⬍ 1. Thus, in contrast to Experiments 2 and 3, which showed superior recognition memory (d⬘) following JORKs as compared with JOLs, paired-associate recall was not superior following JORKs. If anything, cued recall was nominally lower following JORKs as compared with that following JOLs, although in the subsequent experiment (4b) this nominal difference was not apparent. Distribution of study responses as a function of study condition. There were no differences in the distribution of prospective metamemory judgments during study (see Table 3). We confirmed this by conducting a 2 (prospective judgment type: JOLs, JORKs) ⫻ 3 (study response type: 1/remember, 2/know, 3/forget) between-subjects ANOVA, which revealed no main effect of study response type, F ⬍ 1, and no interaction, F(1, 70) ⫽ 1.31, MSE ⫽ 0.05, ␩2p ⫽ .02. Experiment 4b. Gamma correlations as a function of prospective judgment type. For Experiment 4b, a one-way ANOVA with prospective judgment type (JOLs, JORKs) as a between-subjects variable revealed that mean gammas with cued recall as the outcome measure were greater for JORKs (␥ ⫽ .46) than for JOLs (␥ ⫽ .31), F(1, 70) ⫽ 4.42, MSE ⫽ 0.41, ␩2p ⫽ .06 (see Figure 2). The same outcome replicated when remember– know judgments were the outcome measure, with relative accuracy being higher for JORKs (␥ ⫽ .48) than for JOLs (␥ ⫽ .31), F(1, 70) ⫽ 6.12, MSE ⫽ 0.53, ␩2p ⫽ .08. Cued-recall and remember– know judgments as a function of prospective judgment type. Consistent with Experiment 4a, the recall data for Experiment 4b indicated that there were no differences between JORKs and JOLs in terms of cued recall, remember responses, or know responses, Fs ⬍ 1 (see Table 4). Indeed, unlike Experiment 4a, which showed a nominal (though not statistically significant) decrease in cued recall following JORKs as compared with JOLs, in Experiment 4b the overall level of cued recall was identical. Regardless, in both experiments, the greater accuracy of

615

JORKs as compared with JOLs was found, and the strength of the effect was similar (see Table 1 and Figure 2). Distribution of study responses as a function of study condition. There were no differences in the distribution of prospective metamemory judgments during study (see Table 3). We confirmed this by conducting a 2 (prospective judgment type: JOLs, JORKs) ⫻ 3 (study response type: 1/remember, 2/know, 3/forget) between-subjects ANOVA, which revealed no main effect of study response type, F ⬍ 1, and no interaction, F(1, 142) ⫽ 1.65, MSE ⫽ 0.05, ␩2p ⫽ .01.

Comparison of Experiments 4a and 4b Because Experiments 4a and 4b were identical except with respect to whether remember– know judgments were provided during cued recall (4a) or after recall (4b), we compared the experiments to determine if this difference affected gamma correlations or influenced cued-recall performance. A one-way ANOVA with experiment (Experiments 4a and 4b) as a betweensubjects variable revealed no effect of experiment on cued recall, F(1, 142) ⫽ 1.65, MSE ⫽ 0.05, ␩2p ⫽ .01, but gamma correlations were more accurate in Experiment 4a than in Experiment 4b in predictions of cued recall, F(1, 142) ⫽ 1.39, MSE ⫽ 0.03, ␩2p ⫽ .02, and remember– know judgments, F(1, 142) ⫽ 1.39, MSE ⫽ 0.03, ␩2p ⫽ .02. It is unclear why participants’ prospective judgments were better predictors of remember– know judgments given during the cued-recall task (Experiment 4a) than those given after the cued-recall task had been completed (Experiment 4b), but we speculate that retrieval strategies may have been affected by the procedural difference.

General Discussion The primary aim in the current study was to examine whether we could enhance the accuracy of prospective metamemory judgments by asking people to base these judgments on the likelihood that they would retrieve contextual details on a later test, as compared with making judgments based on confidence. The results from four experiments (Experiments 1, 2, 4a, and 4b) provided support for this hypothesis. JORKs were more accurate than JOLs when participants were predicting retrospective remember– know judgments, retrospective confidence judgments, and cued recall; when overall memory accuracy (d⬘) was equivalent following JORKs and JOLs (Experiments 1, 4a, and 4b); and when memory accuracy differed (Experiment 2). Thus, the greater metamemory accuracy for JORKs as compared with JOLs was robust, replicating across different instructions and materials, outcome measures, and types of retrospective judgments. Before discussing the theoretical implications of the results we have reported, we should discount explanations of these results based on demand characteristics or statistical artifacts. For example, JORKs were not more accurate than JOLs because participants were more likely to “match” judgments at test with those they gave at study in the JORK conditions (e.g., “If I responded K at study, I will respond K at test”). If this sort of demand characteristic were influencing accuracy, JORKs should not have been more accurate than JOLs when a confidence test was used in Experiment 1 or in predictions of cued recall, particularly in Experiment 4b, when remember– know judgments were provided after cued recall was

616

MCCABE AND SODERSTROM

complete. We also can rule out the idea that JORKs might have been more accurate than JOLs because these responses were distributed differently for the two types of judgments. In all of the experiments, response distributions were similar for JORKs and JOLs. Thus, the greater accuracy demonstrated for JORKs as compared with JOLs in the current study appears to reflect a genuine difference in the nature of these judgments.

Multidimensional Theories of Prospective and Retrospective Metamemory Judgments The data in the experiments reported here provide support for the idea that participants can base their judgments on different types of information during study, depending on the type of judgment that is required. Specifically, when participants monitored whether studied items would be accompanied by contextual details, metacognitive predictions were more accurate than when based on whatever information provided the basis for confidence judgments. Thus, JOLs, which are based on confidence, are less effective at providing participants with a basis for distinguishing between information that will and will not be remembered later. Indeed, implicit in the instructions to make confidence judgments is the assumption that the information used to make those judgments is continuously distributed rather than based on qualitatively distinct cues. By contrast, JORKs explicitly encouraged the use of distinct cues when making prospective metamemory judgments (cf. Koriat, 1997), and because these cues were more diagnostic with respect to later retrieval, these cues were useful in predicting later memory performance or distinguishing between items using metamemory judgments. Others have examined related issues or proposed theoretical ideas consistent with the dual-state account proposed here (Carroll, Mazzoni, Andrews, & Pocock, 1999; Daniels, Toth, & Hertzog, 2009; Hicks & Marsh, 2002; Metcalfe & Finn, 2008; Son & Metcalfe, 2005). For example, Daniels et al. (2009) recently reported a study in which they examined the relative accuracy of JOLs for recognition and cued-recall tests that included remember– know judgments. Results showed that remember judgments were associated with higher JOLs at study than know or new judgments in recognition. Daniels et al. suggested that this result may reflect the extent to which participants based their JOLs on distinctive characteristics of the study experience, which later supported the assignment of remember responses at test. Indeed, this interpretation is consistent with the dual-state account proposed here, and the idea that remember– know judgments distinguish between distinct and nondistinct retrieval experiences (McCabe & Balota, 2007; Rajaram, 1998). The distinction between processing fluency (akin to the subjective experience of knowing) and retrieval of contextual information (akin to the subjective experience of remembering) has been a mainstay of theoretical and empirical contributions in metamemory research (Koriat, 1993; Koriat & Levy-Sadot, 2001; Metcalfe & Finn, 2008; Reder & Ritter, 1992). For example, Metcalfe and colleagues (Metcalfe & Finn, 2008; Son & Metcalfe, 2005) have proposed an explanation of delayed JOLs that suggests that both familiarity and target retrievability (which is similar to recollection) influence delayed JOLs. As mentioned in the introduction, delayed JOLs are considerably more accurate than immediate JOLs (see Rhodes & Tauber, 2011, for a review), and one

explanation of this increased accuracy has been that participants covertly recall items after a delay in order to determine if they will recall them on a later test (Nelson & Dunlosky, 1991). Metcalfe and colleagues have shown that in addition to target retrievability, the fluency with which items are retrieved also influences delayed JOLs. From this perspective, JORKs can be considered judgments that focus participants’ attention on information that is more closely associated with target retrievability than are immediate JOLs. Thus, the JORK data presented here suggest that immediate judgments can be improved by having participants monitor information that is closely associated with target retrievability. Another area of research relevant to the current study involves judgments of source (JOSs), in which participants are asked to determine if they will be able to retrieve source-specifying information associated with studied items later, such as whether an object was seen or imagined (Carroll et al., 1999). Thus, as in studies of JORKs, participants in JOSs studies are asked to monitor the availability of contextual details associated with studied information. A critical difference between JORKs and JOSs is that in the case of JOSs, participants are determining the likelihood that a specific type of source-specifying information will be available, and this information is not necessarily relevant to whether the item itself would be retrieved. For example, later remembering that a picture of a lollipop was actually seen or whether the lollipop was imagined does not necessarily determine whether one remembers that a lollipop was included at study. Supporting this idea, Carroll et al. (1999) reported that gamma correlations for JOSs were not above chance, and accuracy of JOSs were significantly lower than accuracy of JOLs for the same items (see also Kelly, Carroll, & Mazzoni, 2002). It is possible that judgments at study that caused the participants to focus on the characteristics that are associated with source retrieval—such as the distinctiveness of the source characteristics—might improve the accuracy of these predictions. Thus, it may be that similar to the findings from the current study, information that could be effective for predicting source memory is available when participants are making JOSs, but it is not accessed because the judgments do not focus attention on that information. Although our study focused on the relative accuracy of metacognitive predictions such that we were interested in whether participants could distinguish between items that would or would not be remembered later, metacognitive accuracy can also be addressed by examining absolute accuracy, a term that refers to whether or not participants’ prospective metamemory judgments over- or underpredict memory performance. One problem with respect to evaluating absolute accuracy in the current study is that because we crossed judgments based on confidence and remembering and knowing at both study and test, there is ambiguity involved in interpreting across judgments that were labeled differently and were based on different information. However, generally speaking, both remember judgments and high-confidence judgments at test were underpredicted by both JORKs and JOLs for recognition, whereas for cued recall, there seemed to be slight overconfidence. However, these results appeared to be driven primarily by a type of anchoring for JORKs and JOLs in the current experiments (Connor, Dunlosky, & Hertzog, 1997; Scheck, Meeter, & Nelson, 2004). Anchoring refers to participants’ typically anchoring their judgments at the midpoint of a given scale (e.g., around 50% on a 0%–100% scale), and adjusting judgments

RECOLLECTION-BASED JUDGMENTS

from this anchor. For JORKs and JOLs in the current study, which involved a 3-point scale, participants’ judgments were typically equally distributed among the three response alternatives, such that each judgment was used about a third of the time (see Table 3). Interesting issues for future research include examining the relation between absolute and relative accuracy of JORKs, and determining whether repeated study–test trials lead to adjustments in the predictions of remember– know responses that cause them to become more accurate. Indeed, it has been shown that JOL accuracy improves over repeated study–test trials in large part because anchoring effects change with practice (e.g., Koriat & Bjork, 2006).

Unidimensional Theories of Prospective Metamemory Accuracy As outlined in the introduction, the dual-state account proposed here can be contrasted with unidimensional theories of prospective metamemory judgments, which suggest that participants only have access to and monitor a unidimensional memory-strength signal. It is not clear how unidimensional theories of prospective metamemory judgments, such as trace-access theory (Busey et al., 2000; Jang & Nelson, 2005), would account for the current results. Indeed, unidimensional theories suggest that participants base both prospective and retrospective metamemory judgments on the same underlying information, which is a summed memory-strength signal output from cognitive processing. As such, the most straightforward prediction from unidimensional theories is that the effectiveness of monitoring should not be affected by instructions to monitor different aspects of retrieval experience during study (i.e., JORKs). That is, unidimensional theories make no distinction between categorically distinct types of retrieval experiences or types of cues that can be used to draw inferences regarding future retrieval experiences and, thus, predict no differences in the accuracy of JORKs compared with JOLs. Some unidimensional theories have suggested that information available in the study environment, but not during later retrieval, adds noise to the predictive accuracy of prospective judgments (Nelson & Dunlosky, 1992; Sikström & Jönsson, 2005). On the basis of this reasoning, one might argue that JORKs are simply reducing the amount of noise that influences immediate judgments, thereby increasing their accuracy relative to JOLs. However, the idea that noise affects metacognitive judgments does not explain the basis for those judgments, whereas the dual-state explanation that we provide explains the empirical patterns found here for study and test judgments and is based on the same principles. It is unclear if auxiliary assumptions might provide a means by which unidimensional accounts could explain the data we report, but the findings from the current study certainly would not be predicted by unidimensional theories of prospective metamemory judgments.

The Role of Test Type in Prospective Metamemory Accuracy The finding that the relative accuracy of prospective metamemory judgments was influenced by the type of test used to measure episodic memory is also consistent with the dual-state account proposed here. In particular, taking a remember– know test was associated with greater prospective memory accuracy than taking a confidence test.

617

From the memory-strength perspective, this result is not only surprising but very difficult to explain. Indeed, many proponents of the memory-strength perspective have maintained that remember– know and confidence tests measure the same underlying continuum of memory strength (Dunn, 2004, 2008; Inoue & Bellezza, 1998). Of course, regardless of the process or processes involved in remember– know and confidence judgments, as determined by model fits (e.g., signal detection models), the two judgment types (confidence and remember– know) may still assess different aspects of subjective experience (see McCabe & Geraci, 2009b, for more detailed discussion of this issue). Our point here is simply that issues related to the process or processes influencing metamemory judgments, and the subjective experiences associated with memory retrieval, are distinct issues and should be treated as such. The data we report support the idea that remember– know, confidence, and yes–no recognition tests rely to a great extent on different types of cues, or at the very least, weight the retrieved information differently. Thus, instructing participants regarding the type of information that is indicative of accurate episodic memory retrieval using the remember– know procedure is useful with respect to having participants efficiently monitor episodic memory. Conversely, in the confidence condition, when participants were free to determine the types of information or cues associated with accurate episodic memory retrieval, they based their responses on cues that were less diagnostic of accurate episodic retrieval. For yes–no recognition, there was no decision to be made in terms of the type or amount of information that was retrieved in support of a positive recognition decision, which drastically constrained the accuracy of prospective metamemory judgments.

Limitations and Future Directions One unexpected but notable finding from the current study was that memory performance (d⬘) was better following JORKs than following JOLs in Experiments 1 and 2. The reason for this effect is difficult to determine, given the procedural differences across experiments (e.g., differences in test instructions, labeling of scales). Generally speaking, perhaps the most straightforward interpretation of this finding is that encouraging participants to monitor cues that were associated with accurate episodic retrieval in the JORK condition led to more effective encoding and subsequent discrimination at test compared with in the JOL condition. Although this finding did not replicate across all experiments or tests (e.g., cued recall), determining why this effect occurred could be useful, particularly with respect to potential educational applications. Specifically, if the type of information monitored could potentially improve learning, training students to focus on monitoring different types of information during study could lead to improvement of both their metamemory and memory performance. For present purposes though, it is still important to note that differences in memory performance could not explain why JORKs were more accurate than JOLs in the current study. It is interesting to note that there were no differences in cued recall following JORKs and JOLs in Experiments 4a and 4b. Thus, there was no evidence that the type of processing involved in associative recall was affected by the prospective judgments. One curious finding related to the cued-recall experiments we report was that predictive accuracy (i.e., gamma correlations) was roughly equivalent for cued-recall and remember– know judg-

MCCABE AND SODERSTROM

618

ments in both experiments. At first glance, this seems inconsistent with our contention that making judgments during retrieval on the basis of the amount or type of information that is retrieved is important for constraining prospective memory accuracy. That is, there was no benefit to predictive accuracy when JORKs and JOLs were correlated with remember, know, or not recalled as the outcome measure, as compared with recalled or not recalled as the outcome measure. A likely reason for this result has to do with the type of information that supports cued recall, as compared with recognition. Specifically, cued recall is determined by relational processing of the two items within a pair, whereas recognition is primarily based on item-specific information (Hunt & McDaniel, 1993). Moreover, recognition is considerably more dependent on familiarity than is cued recall (Yonelinas, 2002). Indeed, on recognition tests in the current study, an average of 46% of the items were given know responses, whereas for the cued-recall tests, know judgments only accounted for 26% of the responses. Thus, it seems likely that the type of information that is retrieved to support a remember– know judgment in recall and recognition is distinct, which may explain differences between the tasks.

Conclusion The current study demonstrated that prospective judgments that involve the monitoring of contextual details (i.e., JORKS) are more accurate than those based on confidence (i.e., JOLs). As stated previously, if immediate JOLs rely almost exclusively on encoding fluency (Koriat et al., 2004), then the current data indicate that some factor other than encoding fluency is used as the basis for JORKs. The dual-state account of these findings provided here combines a theory that previously had been used to explain retrospective metamemory judgments (e.g., McCabe & Geraci, 2009a), with those that had been proposed to explain prospective metamemory judgments (e.g., Koriat, 1997). We believe that asking people to monitor the types of information that will later be associated with learning provides an effective method of improving the accuracy of immediate metamemory monitoring, and this idea is worthy of further research.

References Anderson, B., Stumpf, P. G., & Schulkin, J. (2009). Medical error reporting, patient safety, and the physician. Journal of Patient Safety, 5, 176 –179. doi:10.1097/PTS.0b013e3181b320b0 Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., . . . Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445– 459. doi:10.3758/BF03193014 Benjamin, A. S., Bjork, R. A., & Schwartz, B. L. (1998). The mismeasure of memory: When retrieval fluency is misleading as a metamnemonic index. Journal of Experimental Psychology: General, 127, 55– 68. doi: 10.1037/0096-3445.127.1.55 Bewick, K. C., Raymond, M. J., Malia, K. B., & Bennett, T. L. (1995). Metacognition as the ultimate executive: Techniques and tasks to facilitate executive functions. Neurorehabilitation, 5, 367–375. doi:10.1016/ 1053-8135(95)00136-0 Birch, S. A., & Bloom, P. (2004). Understanding children’s and adult’s limitations in mental state reasoning. Trends in Cognitive Sciences, 8, 255–260. doi:10.1016/j.tics.2004.04.011 Blank, H., Fischer, V., & Erdfelder, E. (2003). Hindsight bias in political elections. Memory, 11, 491–504. doi:10.1080/09658210244000513 Buehler, R., Griffin, D., & Ross, M. (1994). Exploring the “planning

fallacy”: Why people underestimate their task completion times. Journal of Personality and Social Psychology, 67, 366 –381. doi:10.1037/00223514.67.3.366 Busey, T. A., Tunnicliff, J., Loftus, G. R., & Loftus, E. F. (2000). Accounts of the confidence-accuracy relation in recognition memory. Psychonomic Bulletin & Review, 7, 26 – 48. doi:10.3758/BF03210724 Camerer, C., Loewenstein, G., & Weber, M. (1989). The curse of knowledge in economic settings: An experimental analysis. Journal of Political Economy, 97, 1232–1254. doi:10.1086/261651 Carroll, M., Mazzoni, G., Andrews, S., & Pocock, P. (1999). Monitoring the future: Object and source memory for real and imagined events. Applied Cognitive Psychology, 13, 373–390. doi:10.1002/(SICI)10990720(199908)13:4⬍373::AID-ACP605⬎3.0.CO;2-F Castel, A. D., McCabe, D. P., & Roediger, H. L. (2007). Illusions of competence and overestimation of associative memory for identical items: Evidence from judgments of learning. Psychonomic Bulletin & Review, 14, 107–111. doi:10.3758/BF03194036 Connor, L. T., Dunlosky, J., & Hertzog, C. (1997). Age-related differences in absolute by not relative metamemory accuracy. Psychology and Aging, 12, 50 –71. doi:10.1037/0882-7974.12.1.50 Croskerry, P. (2003). The importance of cognitive errors in diagnosis and strategies to minimize them. Academic Medicine, 78, 775–780. doi: 10.1097/00001888-200308000-00003 Daniels, K. A., Toth, J. P., & Hertzog, C. (2009). Aging and recollection in the accuracy of judgments of learning. Psychology and Aging, 24, 494 –500. doi:10.1037/a0015269 Diana, R. A., Reder, L. M., Arndt, J., & Park, H. (2006). Models of recognition: A review of arguments in favor of a dual-process account. Psychonomic Bulletin & Review, 13, 1–21. doi:10.3758/BF03193807 Donaldson, W. (1996). The role of decision processes in remembering and knowing. Memory & Cognition, 24, 523–533. doi:10.3758/BF03200940 Dunlosky, J., & Nelson, T. O. (1992). The importance of the kind of cue for judgments of learning (JOL) and the delayed-JOL effect. Memory & Cognition, 20, 374 –380. doi:10.3758/BF03210921 Dunn, J. C. (2004). Remember– know: A matter of confidence. Psychological Review, 111, 524 –542. doi:10.1037/0033-295X.111.2.524 Dunn, J. C. (2008). The dimensionality of the remember– know task: A state–trace analysis. Psychological Review, 115, 426 – 446. doi:10.1037/ 0033-295X.115.2.426 Finn, B. (2008). Framing effects on metacognitive monitoring and control. Memory & Cognition, 36, 813– 821. doi:10.3758/MC.36.4.813 Gardiner, J. M. (1988). Functional aspects of recollective experience. Memory & Cognition, 16, 309 –313. doi:10.3758/BF03197041 Gardiner, J. M. (2001). Episodic memory and autonoetic consciousness: A first-person account. Philosophical Transactions of the Royal Society London, 356, 1351–1361. doi:10.1098/rstb.2001.0955 Gardiner, J. M., & Java, R. I. (1990). Recollective experience in word and nonword recognition. Memory & Cognition, 18, 23–30. doi:10.3758/ BF03202642 Geraci, L., McCabe, D. P., & Guillory, J. J. (2009). On interpreting the relationship between remember– know judgments and confidence: The role of instructions. Consciousness and Cognition, 18, 701–709. doi: 10.1016/j.concog.2009.04.010 Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103, 650 – 669. doi:10.1037/0033-295X.103.4.650 Gilbert, D. T., Pinel, E. C., Wilson, T. D., Blumberg, S. J., & Wheatley, T. P. (1998). Immune neglect: A source of durability bias in affective forecasting. Journal of Personality and Social Psychology, 75, 617– 638. doi:10.1037/0022-3514.75.3.617 Hart, J. T. (1967). Memory and the memory-monitoring process. Journal of Verbal Learning and Verbal Behavior, 6, 685– 691. doi:10.1016/ S0022-5371(67)80072-0 Hertzog, C., Dunlosky, J., Robinson, A. E., & Kidder, D. P. (2003).

RECOLLECTION-BASED JUDGMENTS Encoding fluency is a cue used for judgments about learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 22– 34. doi:10.1037/0278-7393.29.1.22 Hicks, J. L., & Marsh, R. L. (2002). On predicting the future states of awareness for recognition of unrecallable items. Memory & Cognition, 30, 60 – 66. doi:10.3758/BF03195265 Hunt, R. R., & McDaniel, M. A. (1993). The enigma of organization and distinctiveness. Journal of Memory and Language, 32, 421– 445. doi: 10.1006/jmla.1993.1023 Inoue, C., & Bellezza, F. S. (1998). The detection model of recognition using know and remember judgments. Memory & Cognition, 26, 299 – 308. doi:10.3758/BF03201141 Jang, Y., & Nelson, T. O. (2005). How many dimensions underlie judgments of learning and recall? Evidence from state–trace methodology. Journal of Experimental Psychology: General, 134, 308 –326. doi: 10.1037/0096-3445.134.3.308 Kelley, C. M., & Jacoby, L. L. (1996). Adult egocentrism: Subjective experience versus analytic bases for judgment. Journal of Memory and Language, 35, 157–175. doi:10.1006/jmla.1996.0009 Kelly, A., Carroll, M., & Mazzoni, G. (2002). Metamemory and reality monitoring. Applied Cognitive Psychology, 16, 407– 428. doi:10.1002/ acp.803 Koriat, A. (1993). How do we know that we know? The accessibility model of the feeling of knowing. Psychological Review, 100, 609 – 639. doi: 10.1037/0033-295X.100.4.609 Koriat, A. (1997). Monitoring one’s own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology: General, 126, 349 –370. doi:10.1037/00963445.126.4.349 Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring one’s knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 187–194. doi:10.1037/02787393.31.2.187 Koriat, A., & Bjork, R. A. (2006). Mending metacognitive illusions: A comparison of mnemonic-based and theory-based procedures. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 1133– 1145. doi:10.1037/0278-7393.32.5.1133 Koriat, A., Bjork, R. A., Sheffer, L., & Bar, S. K. (2004). Predicting one’s own forgetting: The role of experience-based and theory-based processes. Journal of Experimental Psychology: General, 133, 643– 656. doi:10.1037/0096-3445.133.4.643 Koriat, A., & Goldsmith, M. (1996). Monitoring and control processes in the strategic regulation of memory accuracy. Psychological Review, 103, 490 –517. doi:10.1037/0033-295X.103.3.490 Koriat, A., & Levy-Sadot, R. (2001). The combined contributions of the cue-familiarity and accessibility heuristics to feelings of knowing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 34 –53. doi:10.1037/0278-7393.27.1.34 McCabe, D. P., & Balota, D. A. (2007). Context effects on remembering and knowing: The expectancy heuristic. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 536 –549. doi:10.1037/ 0278-7393.33.3.536 McCabe, D. P., & Geraci, L. (2009a). The influence of instructions and terminology on the accuracy of remember– know judgments. Consciousness and Cognition, 18, 401– 413. doi:10.1016/j.concog.2009.02.010 McCabe, D. P., & Geraci, L. (2009b). The role of extra-list associations in false remembering: A source misattribution account. Memory & Cognition, 37, 130 –142. McCabe, D. P., Roediger, H. L., McDaniel, M. A., & Balota, D. A. (2009). Aging decreases veridical remembering but increases false remembering: Neuropsychological test correlates of remember/know judgments. Neuropsychologia, 47, 2164 –2173. doi:10.1016/j.neuropsychologia .2008.11.025 Metcalfe, J., & Finn, B. (2008). Familiarity and retrieval processes in

619

delayed judgments of learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 1084 –1097. doi:10.1037/ a0012580 Nelson, T. O. (1984). A comparison of current measures of the accuracy of feeling-of-knowing predictions. Psychological Bulletin, 95, 109 –133. doi:10.1037/0033-2909.95.1.109 Nelson, T. O., & Dunlosky, J. (1991). When people’s judgments of learning (JOLs) are extremely accurate at predicting subsequent recall: The “delayed-JOL effect. “Psychological Science, 2, 267–270. doi: 10.1111/j.1467-9280.1991.tb00147.x Nelson, T. O., & Dunlosky, J. (1992). How shall we explain the delayedjudgment-of-learning effect? Psychological Science, 3, 317–318. doi: 10.1111/j.1467-9280.1992.tb00681.x Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework and some new findings. In G. H. Bower (Ed.), The psychology of learning and motivation (pp. 125–173). New York, NY: Academic Press. Parks, C. M., & Yonelinas, A. P. (2007). Moving beyond pure signaldetection models: Comment on Wixted (2007). Psychological Bulletin, 114, 188 –202. Rajaram, S. (1993). Remembering and knowing: Two means of access to the personal past. Memory & Cognition, 21, 89 –102. doi:10.3758/ BF03211168 Rajaram, S. (1998). The effects of conceptual salience and perceptual distinctiveness on conscious recollection. Psychonomic Bulletin & Review, 5, 71–78. doi:10.3758/BF03209458 Rajaram, S., Hamilton, M., & Bolton, A. (2002). Distinguishing states of awareness from confidence during retrieval: Evidence from amnesia. Cognitive, Affective, & Behavioral Neuroscience, 2, 227–235. doi: 10.3758/CABN.2.3.227 Reder, L. M., & Ritter, F. (1992). What determines initial feeling of knowing? Familiarity with question terms, not with the answer. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 435– 451. doi:10.1037/0278-7393.18.3.435 Rhodes, M. G., & Castel, A. D. (2008). Memory predictions are influenced by perceptual information: Evidence for metacognitive illusions. Journal of Experimental Psychology: General, 137, 615– 625. doi:10.1037/ a0013684 Rhodes, M. G., & Tauber, S. K. (2011). The influence of delaying judgments of learning (JOLs) on metacognitive accuracy: A meta-analytic review. Psychological Bulletin, 137, 131–148. doi:10.1037/a0021705 Roediger, H. L., Rajaram, S., & Geraci, L. (2007). Accessing memories: Three forms of consciousness. In M. Moscovitch, P. Zelazo, & E. Thompson (Eds.), Cambridge Handbook of consciousness (pp. 251– 288). New York, NY: Cambridge University Press. Rotello, C. M., Macmillan, N. A., & Reeder, J. A. (2004). Sum-difference theory of remembering and knowing: A two-dimensional signal detection model. Psychological Review, 111, 588 – 616. doi:10.1037/0033295X.111.3.588 Scheck, P., Meeter, M., & Nelson, T. O. (2004). Anchoring effects in the absolute accuracy of immediate versus delayed judgments of learning. Journal of Memory and Language, 51, 71–79. doi:10.1016/ j.jml.2004.03.004 Schkade, D. A., & Kahneman, D. (1998). Does living in California make people happy? A focusing illusion in judgments of life satisfaction. Psychological Science, 9, 340 –346. doi:10.1111/1467-9280.00066 Sikström, S., & Jönsson, F. (2005). A model for stochastic drift in memory strength to account for judgments of learning. Psychological Review, 112, 932–950. doi:10.1037/0033-295X.112.4.932 Snodgrass, J. G., & Corwin, J. (1988). Pragmatics of measuring recognition memory: Applications to dementia and amnesia. Journal of Experimental Psychology: General, 117, 34 –50. doi:10.1037/0096-3445.117.1.34

620

MCCABE AND SODERSTROM

Son, L. K., & Metcalfe, J. (2005). Judgments of learning: Evidence for a two-stage model. Memory & Cognition, 33, 1116 –1129. doi:10.3758/ BF03193217 Thiede, K. W., & Anderson, M. C. M. (2003). Summarizing can improve metacomprehension accuracy. Contemporary Educational Psychology, 28, 129 –160. doi:10.1016/S0361-476X(02)00011-5 Tulving, E. (1985). Memory and consciousness. Canadian Psychology/ Psychologie canadienne, 26, 1–12. doi:10.1037/h0080017 Weaver, C. A., III, & Kelemen, W. L. (1997). Judgments of learning at delays: Shifts in response patterns or increased metamemory accuracy? Psychological Science, 8, 318 –321. doi:10.1111/j.1467-9280.1997 .tb00445.x

Wixted, J. T. (2007). Dual-process theory and signal-detection theory of recognition memory. Psychological Review, 114, 152–176. doi:10.1037/ 0033-295X.114.1.152 Wixted, J. T., & Mickes, L. (2010). A continuous dual-process model of remember/know judgments. Psychological Review, 117, 1025–1054. doi:10.1037/a0020874 Yonelinas, A. P. (2001). Consciousness, control, and confidence: The three Cs of recognition memory. Journal of Experimental Psychology: General, 130, 361–379. doi:10.1037/0096-3445.130.3.361 Yonelinas, A. P. (2002). The nature of recollection and familiarity: A review of 30 years of research. Journal of Memory and Language, 46, 441–517. doi:10.1006/jmla.2002.2864

Appendix Instructions for Experiment 1 Judgments of Remembering and Knowing (JORK) Study Instructions A little later in the experiment, you are going to study a list of words, and you will take a test for those words. When people remember things, they can experience them in different ways. You are going to distinguish between two different types of memory experiences on the test that you’ll be taking. These two types of memory are called recollection and knowing (that’s knowing with a K, as on the screen in front of you). I am going to explain the difference between these two types of memory in some detail now. Please listen carefully. Recollection is a type of memory that is accompanied by the ability to recall details associated with a past event. For example, if I asked you to remember breakfast this morning, you’d likely be able to recollect where you were, what you ate, whom you ate with, what you talked about, what you were thinking about, and other details. Another way to explain recollection is that it involves mentally traveling back to the moment that an event occurred. In this experiment, the events we’re talking about remembering are going to be words that you’ll study. When you recollect a word, you may be able to recall a specific thought that came to mind when you studied the word or a mental image that came to mind when you studied it. Or you may remember a personal association you made for a word or your emotional reaction to a word. The important point is that recollection in this experiment involves bringing to mind some details of what happened or what was experienced at the time a word was originally studied. Knowing is a type of memory where you recognize something as a memory, but you can’t remember any specific details about the experience. This is like when you see someone on campus, and you know you’ve met that person before, but you have no idea where and can’t remember anything else about him (or her). In this experiment, when you believe you studied a word but you cannot consciously recollect any specific details from when you studied

the word earlier, that’s the experience of knowing. In other words, when you know a word, you recognize it as having been studied, but you do not re-experience the exact details of what you were thinking or feeling when you studied it. If you look to the right side of the keyboard on the number pad, you will see keys marked R, K, and F with stickers. You are going to use these keys to respond during the study phase of the experiment. The R, K, and F keys stand for recollect, know, and forget. The way the study phase is going to work is as follows: you are going to see words presented on the computer screen, one at a time, for 2 s each. For each word you see, I want you to just think of whatever pops into your head related to that word. Spend the full 2 s that the word is on the screen thinking about whatever pops into your mind about the word. Immediately after you have studied each word, a screen (just like the screen in front of you right now) will come up with the words, “Recollect, know, or forget?” For each word, you will try to predict whether later on the test you take, you will be able to recollect the word, you will just know the word was studied earlier, or you will forget the word later. In other words, for each word, you’ll be predicting what your future memory for that word will be like. If you believe you’ll be able to recollect specific details from when you studied a word, like the specific thought that came to mind, your emotional reaction, a mental image, or some personal association you made for that word, you should press the R key to predict that you will recollect the word. If you do not think you’ll be able to recall these sorts of details, but you still believe you’ll be able to recognize the word as one you studied, you should press K to indicate that you’ll know the word later. If you think that you won’t be able to recognize the word as one you studied at all, press the F key to indicate you believe you’ll forget the word. As soon as you make your response, the next studied word will appear, you’ll study it for 2 s, and then decide recollect, know, or forget for that word too, and so on.

(Appendix follows)

RECOLLECTION-BASED JUDGMENTS

Judgments of Learning Study Instructions A little later in the experiment you are going to study a list of words, and you will take a test for those words. The way the study phase is going to work is as follows: You are going to see words presented on the computer screen, one at a time, for 2 s each. For each word you see, I want you to just think of whatever pops into your head related to that word. Spend the full 2 s that the word is on the screen thinking about whatever pops into you mind about the word. Immediately after you have studied each word, a screen (just like the screen in front of you) will come up with a 1–3 judgment scale. Once this screen is presented, you should think whether on a later test you’ll be able to remember the word. So, for each word, you’ll be predicting whether you’ll remember that word later. If you are completely sure that you’ll be able to remember the word later, you should press the 1 key. If you are completely sure that you will not remember the word, press the 3 key. You can press any number from 1–3 depending on how sure you are that you will remember that word later. As soon as you make your response, the next studied word will appear, you’ll study it for 2 s and then decide how sure you are that you’ll remember that word too, and so on.

Remember–Know Test Phase Instructions If you look at the top of the computer screen, you will see a scale from 1 to 3 [Note: The actual scale read, “1 ⫽ recollect, 2 ⫽ know, 3 ⫽ new”]. This is the scale that you will use during the memory test. On the memory test, you are going to see words presented on the computer screen one at a time. Some will be words that you studied; others will be words you didn’t study, called new words. If a word is new to the experiment, meaning you had not studied it, you should press the 3 key to indicate that the word is new. If

621

a word on the test was a word you studied, you should press either the 1 or the 2 key to indicate that you either recollect the word or know the word. Press the 1 key to indicate you recollect a word if you can mentally travel back to the moment you studied the word by recalling the specific thought that came to mind when you studied it, a mental image, an emotional reaction, or some personal association you made for the word. You may also recollect thinking about what prediction you should give the word after you studied it. If you can recall these types of details, press 1. If you recognize a word as one you studied, but you can’t recall any specific details, press the 2 key to indicate that you just know you studied it. If you believe a word is New, press 3 to indicate you did not study it. [Note: In the condition in which JOLs were made at study, the bases for recollect and know responses were explained in detail at test, as in the JORK Study Instructions.]

Confidence Test Phase Instructions If you look at the top of the computer screen, you will see a scale from 1 to 3. This is the scale that you will use during the memory test. On the memory test, you are going to see words presented on the computer screen one at a time. Some will be words that you studied; others will be words you didn’t study, called new words. You will be making judgments on a scale from 1 to 3 as to how sure you are that you studied the presented word. Pressing 1 indicates that you are completely sure that you studied that word earlier. Pressing 3 indicates that you are completely sure that you did not study that word earlier, meaning it is a new word. You can press any number from 1 to 3 depending on how sure you are that you studied that word. Received April 12, 2010 Revision received April 4, 2011 Accepted April 5, 2011 䡲

Online First Publication APA-published journal articles are now available Online First in the PsycARTICLES database. Electronic versions of journal articles will be accessible prior to the print publication, expediting access to the latest peer-reviewed research. All PsycARTICLES institutional customers, individual APA PsycNET威 database package subscribers, and individual journal subscribers may now search these records as an added benefit. Online First Publication (OFP) records can be released within as little as 30 days of acceptance and transfer into production, and are marked to indicate the posting status, allowing researchers to quickly and easily discover the latest literature. OFP articles will be the version of record; the articles have gone through the full production cycle except for assignment to an issue and pagination. After a journal issue’s print publication, OFP records will be replaced with the final published article to reflect the final status and bibliographic information.

Recollection-Based Prospective Metamemory ...

currently available information causes errors or reduces accuracy in many domains, including children's reasoning (Birch & Bloom,. 2004), memory for political ...

164KB Sizes 0 Downloads 139 Views

Recommend Documents

THE ETHIOPIAN PROSPECTIVE CASE
of administration within the country. The third reason .... changing toward a greater degree of intolerance and ethnic solidar- ity. ..... Education and health. 583.

Hello prospective sponsor! - GitHub
Logo in an email blast ... + API email ... or have any other questions, please reach out to us at [email protected]. Best,. Team #FlawlessHacks.

THE ETHIOPIAN PROSPECTIVE CASE
authors do not deny the obvious influence that any open Amhara-. Tigray competition may ..... posed of people who are engaged in business pursuits. We can also posit that ... wealth. First, because coffee is grown as a cash crop in the south, ...

Proposal for prospective partners -
Educational institutes or faculties. • Municipalities. Prospective partners should have experience with at least one of the following subjects: sustainable development, organic farming, community based local development, representation of minority

Prospective Randomized Comparison of Antiarrhythmic ...
rent was delivered to the tip electrode of the ablation catheter using either the EPT 1000 generator (EP Tech- ... energy delivery could not be achieved due to immediate impedance rise even at low power setting. Radiofrequency ..... patients (6%) dev

Prospective of Photon Propulsion - YK Bae Corporation
energy-efficient transportation structure based on the Beamed-Laser ... technological challenges posed by photon propulsion for interstellar flight. [1] ... 3. Submitted for Publication in the JBIS Proceeding of the 100 Year Starship Symposium, 2011.

a prospective MRI study - Springer Link
Dec 19, 2008 - Materials and methods Twenty-four chronic low back pain. (LBP) patients with M1 ... changes has been found in a follow-up study [4]. Patients.

HotCopper Member and Prospective Member Survey.pdf
Page 1 of 4. 14/11/2012 Página 1 de 9 Profesor: Luís Rodolfo Dávila Márquez CÓDIGO: 00076 UFPS. CURSO: CÁLCULO INTEGRAL. UNIDAD 2 A.

Transmission Dynamics and Prospective ...
the companies developing the rapid advanced diagnostic called “TIGER” used in the study. ... by rubbing the swab, moistened with viral transport medium. (VTM) ...

Appendicitis : Prospective Evaluation with High ...
visualize an abnormal appendix or appendicolith in the presence of periceca! ..... data on graded compression sonography. (10-. 15). In addition to its high .... map for the proper surgical or percutane- ous abscess drainage and substan- tially.

Information Sheet for Prospective Students.pdf
There will be no particular religious doctrine taught in our school, but wholesome attitudes toward each other will be taught. The children will be encouraged.

Appication - Information for Prospective Studentd of International ...
Applicants provide an opportunity for students with Thai nationalities who are ... Copy of TOEFL, IELTS, TOEIC or other equivalent English language ... Appication - Information for Prospective Studentd of International Program 2017.pdf.

Prospective of Photon Propulsion - YK Bae Corporation
varying terminologies for photon propulsion using direct momentum transfer of .... and when the rocket emit photons, it loses small amount of mass through the ...

Transmission Dynamics and Prospective ...
and 3Science Applications International Corporation, Carlsbad, California; 4Lovelace Respiratory Research Institute, Albuquerque, New Mexico. (See the ...

Novel Prospective Approach to Evaluate.pdf
Whoops! There was a problem loading more pages. Retrying... Novel Prospective Approach to Evaluate.pdf. Novel Prospective Approach to Evaluate.pdf. Open.

A Prospective Evaluation of Afghanistan's National ...
Apr 6, 2008 - The degraded state of Afghanistan's rural road network both reflects ..... (iii) Project management, monitoring and evaluation and analysis of ...

Prospective Genomic Characterization of the German ...
Jul 20, 2011 - Events timeline of German EHEC O104:H4 outbreak. Major events relating ... To visualize the common ancestor model in the phylogenetic tree ...

A Prospective Cohort Study of Adolescents' Memory ... - Swiss TPH
cent human body models from the phantom “virtual population,” an 11-y-old girl (Billie) and a 14-y-old boy (Louis) (Gosselin et al. 2014). For near-field sources, ...

A Prospective Study of Participation in Optional School ...
Education Using a Self-Determination Theory Framework ... programs should play a more central role in increasing the physical activity levels of young ...... 3) were small or moderate in size, such differences are meaningful because they are ...

Social relations and PTSD symptoms: A prospective ...
The study and data reported in this paper were part of a research project granted by the National Science Council of Taiwan (NSC-90-2625-Z-002-033). The authors would like to thank .... Third, it is hypothesized that the paths linking PTSD symptoms .

Glimmers of Justice Gorsuch's Prospective IP ... - Snell & Wilmer
Apr 12, 2017 - Hatch Chile Company (2016), Gorsuch wrote, “The Hatch Valley [in New Mexico] ... concurrence that Chevron deference “permit[s] executive ...