Knowledge Examination in Multi-Session Tasks Jingjing Liu
Nicholas J. Belkin
Southern Connecticut State University 501 Crescent Street, New Haven, CT 06515
Rutgers University 4 Huntington Street, New Brunswick, NJ 08901
[email protected]
[email protected]
ABSTRACT We report findings on the patterns of change in users’ self-rated topic knowledge in multi-session search tasks. Data came from a 3-session lab experiment with 24 participants, each working on one sub-task in a general task at one session, searching for information and writing reports on hybrid cars. The general task was either parallel or dependently structured. Examination was conducted for users’ knowledge of both the general tasks and the sub-tasks, both before and after the working sessions. Results demonstrated that users’ knowledge of general tasks and that of sub-tasks had different values and patterns of change. We also found that some attributes of users’ knowledge varied between task types. These findings further our understanding of users’ knowledge in information tasks and are helpful for information retrieval research and system design.
Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – relevance feedback, search process.
General Terms Measurement, Performance, Experimentation, Human Factors.
Keywords Pre-task knowledge, post-task knowledge, multi-session task, task structure, dependent task, parallel task
1. INTRODUCTION Information tasks involve users searching for information to solve the tasks with which they are in an Anomalous State of Knowledge (ASK) [1]. It is understandable that along with working on their tasks, users gain knowledge. Multi-session tasks are frequently seen in everyday life, and there have been studies examining the relationship between users’ knowledge and their search behaviors in multi-stage tasks (e.g., [6][7]), assuming that users’ knowledge increases in later stages. A number of different methods have been employed to assess users’ knowledge and/or to assign users into different groups with different levels of knowledge. For example, users’ knowledge levels could be differentiated according to which stages they were in taking a certain training program or course (e.g., [6][7]). Users’ knowledge levels have also been judged by the accuracy or correctness of their answers to a set of questions or problems (e.g., [3]). Another knowledge assessment method is to ask users to rate their familiarity with terms in a thesaurus in a specific domain (e.g., [2]). In addition, some studies have tried asking users to self-rate their knowledge levels based on Likert scales (e.g., [4]). Cole et al. [2] found that users’ selfrated knowledge levels highly correlates with their ratings of thesaurus terms and therefore self-rating is as reliable as thesaurus term rating. Despite the assumption that users gain knowledge through working with the information tasks, so that in later stages users
are more knowledgeable with their tasks, there has been no research, to our knowledge, that closely examines users’ knowledge and their knowledge change in information tasks, especially in multi-session tasks. Our study was therefore aimed at exploring the following issues in multi-session tasks: 1. How are users’ knowledge of the general tasks and that of the subtasks different from each other? 2. How does users’ knowledge change within and across sessions? 3. How do different tasks (or task conditions) affect users’ knowledge and knowledge change?
2. METHOD Data came from a 3-session lab experiment designed to examine users’ behavioral and performance changes while searching for information to accomplish a task. A total of 24 college students participated in the study, each coming 3 times in a 2-week period working on one assigned general task that had 3 subtasks. Each participant was asked to write a 3-section article on hybrid cars, each section being finished in one session based on one sub-task of his/her choice. Two tasks were designed with different task structures, similar to Toms [5], one being a parallel task (PT) and the other a dependent task (DT). Half of the participants worked on the PT, which asked them to prepare an article comparing the Honda Civic, Nissan Altima and Toyota Camry hybrid cars. The 3 sub-tasks were parallel to each other. The other half worked on the DT, which asked them to a) explore which car manufactures produce hybrid cars, b) identify 3 mid-priced hybrid cars, and c) compare the three cars’ pros and cons. The accomplishment of some sub-tasks was assumed to be dependent upon that of others. Participants self-determined the order of the 3 sub-tasks in either task. The two tasks were in the same subject domain and had the same general requirements; their only difference was in task structure, one being parallel and the other dependent. At the beginning of each session, participants were asked to rate, on a 7-point scale (1=not, 7=very), how much knowledge they had with the general task, and with the sub-task that they chose for that session. At the end of each session, they were again asked to rate how much knowledge they had at that point with the general task and with the sub-task they worked on in that session.
3. RESULTS 3.1 An overview of general task knowledge Examination of the distributions of the knowledge variables in our study, including knowledge of pre- and post-session general task, and that of pre- and post-session sub-task, found that they were not normal, so non-parametric tests were generally used in our analyses unless specified. Although knowledge variables were not normally distributed, it still helps intuitively to look at their means and standard deviations (SDs). Figure 1 shows users’ pre- and post-session general task topic knowledge in the 3 sessions when both tasks were considered together.
The within-session comparison of the general and the sub-task knowledge (Table 1) showed that when 3 sessions were combined, on average, users’ post-session general task knowledge was higher than the pre-session one. The same pattern was found in individual sessions 1 and 2. In session 3, the descriptive data had the same tendency, but the differences were not significant.
DT). Table 2 shows their within-session comparison in individual tasks. As can been seen, when all 3 sessions are considered together, users’ post-session general task knowledge was higher than their pre-session knowledge, and this was also found in both tasks. In individual sessions, for the DT condition, post-session task knowledge was always higher than the presession one; in the PT condition, however, only in session 2, post-session task knowledge was significantly higher than the pre-session one. In sessions 1 and 3, no significant differences were found between the pre- and the post-session knowledge, although descriptively, the latter was higher than the former.
Figure 1. Pre- and post-session general task topic knowledge Table 1. General task topic knowledge ratings in 3 sessions (Mean (SD))
Session 1 Session 2 Session 3 3 sessions together H(p) of 3-session comparison
Presession
Postsession
Z(p) of prevs. postcomparison
2.75 (1.51) 3.79 (1.53) 4.75 (1.80) 3.76 (1.80) 14.89 (.001)
4.25 (1.03) 4.96 (1.12) 5.42 (1.02) 4.88 (1.15) 13.90 (.001)
3.323 (.001) 3.685 (.000) 1.620 (.105) 5.045 (.000) /
Comparison of users’ knowledge among sessions (Table 1) found significant differences for the pre-session general task knowledge (Kruskal-Wallis H(2, N=72)=14.89, p<.005) and the post-session general task knowledge (H(2, N=72)=13.90, p<.005). Post-hoc analysis using Tukey test revealed that the differences in both types of knowledge were between sessions 1 and 3. In session 3, not only did users have significantly more pre-session knowledge than in session 1, but they also reached significantly higher levels of knowledge after the session than after session 1. In addition, as one would imagine, comparison between users’ post-session knowledge in session 3 and presession knowledge in session 1 received significant differences (H(2, N=24)=4.27, p=.000). These all demonstrated that users did learn in the process of completing their tasks. Another point to note is that users’ pre-session general task knowledge was a bit lower than the previous session’s postsession general task knowledge. This is reasonable considering that when coming back in a later session, participants may have forgotten some of what they had learned in the previous session.
3.2 General task knowledge in two tasks Figure 2 shows users’ pre- and post-session general task knowledge in the 3 sessions in the two task conditions (PT and
Figure 2. Pre- and post-session general task topic knowledge in three sessions in two tasks Table 2. Paired comparison between pre- vs. post-session general task topic knowledge in two tasks (Mean (SD)) Dependent task Parallel task
Session 1 Session 2 Session 3 All 3 sessions H(p) of 3session comparison
Presession
Postsession
2.33 (1.07) 3.58 (1.44) 4.08 (1.68) 3.33 (1.57) 7.964 (.019)
4.17 (0.39) 4.75 (0.62) 5.58 (0.90) 4.83 (0.88) 16.386 (.000)
Z(p) of pre- vs. postcompari son
2.969 (.003) 2.401 (.016) 2.388 (.017) 4.451 (.000) /
Presession
Postsession
3.17 (1.80) 4.00 (1.65) 5.42 (1.73) 4.19 (1.93) 8.152 (.017)
4.33 (1.44) 5.17 (1.47) 5.42 (1.17) 4.97 (1.40) 3.802 (.149)
Z(p) of pre- vs. postcompar ison
1.810 (.070) 2.889 (.004) .000 (1.00) 3.073 (.002) /
Table 2 also shows the between-session comparison of the preand the post-session general task knowledge in the two task conditions. As can be seen, in DT, users’ rating scores for both the pre-session general task knowledge (H(2, N=72)=7.97, p<.05) and the post-session general task knowledge (H(2, N=72)=16.39, p<.001) had differences among 3 sessions. Posthoc analysis using Tukey revealed that the difference for presession knowledge was between sessions 1 and 3, and the differences for post-session knowledge were between sessions 1 and 2, and between sessions 2 and 3. In PT, users’ rating scores for the pre-session general task knowledge also had differences among 3 sessions (H(2, N=72)=8.15, p<.05), specifically, between session 1 and 3 as revealed by Tukey test, however, the
post-session general task knowledge did not significantly increase along sessions (H(2, N=72)=3.80, p>.05). We did further analysis using General Linear Model (GLM) for the effects of task and session on pre- and post-session general task knowledge. Results (Table 3) show that in general, users’ knowledge increased across 3 sessions, and this happened with both the pre-task knowledge (F(2, 70)=9.601, p<.001) and the post-task knowledge (F(2, 70)=8.181, p=.001). As for the effect of task, results show that pre-session general task knowledge in the PT was greater than that in the DT (F(2, 70)=5.336, p<.05). However, post-session general task knowledge in the two tasks did not show statistical differences. This means that although users showed higher baseline knowledge in the PT than in the DT, after the sessions, they had equal levels of knowledge in the two task conditions. Table 3. GLM analysis of the effects of session and task on pre- and post-session general task knowledge Pre-task general task Post-task general task knowledge F(p) knowledge F(p) Session 9.601 (.000) 8.181 (.001) Task 5.336 (.024) 0.301 (.585) Session*task 0.505 (.606) 0.446 (.642)
3.3 An overview of sub-task knowledge
Table 4 also shows the comparison of the pre- and post-session sub-task knowledge among 3 sessions. Unlike the pre-session general task knowledge that increased along sessions, presession sub-task knowledge did not have differences among 3 sessions (H(2, N=72)=0.165, p>.05). This is reasonable considering that the sub-tasks were different among sessions, and that users had equal levels of baseline knowledge on the different sub-tasks. On the other hand, users’ post-session subtask knowledge did have differences (H(2, N=72)=8.18, p<.05). Post-hoc analysis using Tukey found that the differences were between sessions 1 and 3. This means that although for the different sub-tasks, users had equal levels of baseline knowledge, after working on them, they gained more knowledge in session 3 than in session 1. It could be possible that users learned more for the sub-task in the 3rd session given the experience in the previous sessions with the sub-tasks in the same general task.
3.4 Sub-task knowledge in two tasks Figure 4 shows the pattern of pre- and post-session sub-task knowledge change in the 3 sessions in the two task conditions. The comparison between pre- and post-session sub-task knowledge (Table 5) showed that in general, users’ post-session sub-task knowledge was higher than the pre-session one, and this applied to both tasks, in each of the 3 sessions, as well as when 3 sessions were considered together.
Figure 3 shows the tendency of the changes of pre- and postsession sub-task knowledge in the 3 sessions. Table 4 shows the results of paired comparison between pre- and post-session subtask topic knowledge. As can be seen, in general, users’ postsession knowledge was higher than their pre-session knowledge, indicating they did gain knowledge for the sub-task topics.
Figure 4. Pre- and post-session sub-task topic knowledge across sessions
Figure 3. Pre- and post-session sub-task topic knowledge across sessions Table 4. Paired comparison between pre- vs. post-session sub-task topic knowledge (Mean (SD)) Z(p) of prePre-session Post-session vs. postcomparison Session 1 2.75 (1.51) 4.29 (1.46) 3.00 (.003) Session 2 2.75 (1.70) 5.00 (0.93) 3.97 (.000) Session 3 3.00 (1.91) 5.38 (1.28) 4.04 (.000) All 3 sessions 2.83 (1.70) 4.89 (1.31) 6.31 (.000) H(p) of 3session 0.17 (0.921) 8.18 (0.017) / comparison
Table 5. Paired comparison between pre- vs. post-session sub-task topic knowledge in two tasks (Mean (SD)) Dependent task Parallel task
Session 1 Session 2 Session 3 All 3 sessions H(p) of 3session comparison
Presession
Postsession
2.50 (1.38) 2.50 (1.51) 2.75 (1.77) 2.58 (1.52) 0.11 (0.95)
4.33 (1.16) 4.67 (0.65) 5.25 (1.36) 4.75 (1.13) 5.03 (0.08)
Z(p) of pre- vs. postcompari son
2.325 (.020) 2.971 (.003) 2.947 (.003) 4.716 (.000) /
Presession
Postsession
3.00 (1.65) 3.00 (1.91) 3.25 (2.09) 3.08 (1.84) 0.06 (0.97)
4.25 (1.77) 5.33 (1.07) 5.50 (1.24) 5.03 (1.46) 3.95 (0.14)
Z(p) of pre- vs. postcompari son
2.024 (.043) 2.701 (.007) 2.821 (.005) 4.203 (.000) /
Table 5 also shows the results of knowledge comparison across 3 sessions, for the pre- and the post-session sub-task knowledge respectively, in two task conditions. No significant differences were found in either type of knowledge among the 3 sessions, in either type of task. This means that in individual tasks, either the pre- or the post-session sub-task knowledge in later sessions was not higher than that in the previous sessions, despite that when both tasks combined, post-session sub-task knowledge in session 3 was greater than that in session 1 (see Table 4). Table 6. GLM analysis of the effects of session and task on pre- and post-session sub-task knowledge Pre-session sub-task Post-session sub-task knowledge F(p) knowledge F(p) Session .166 (.847) 4.632 (.013) Task 1.496 (.226) .886 (.350) Session*task .000 (1.000) .540 (.585) GLM analysis for the effects of task and session on pre- and post-session sub-task knowledge (Table 6) showed that task was not a significant factor of either type of knowledge. As for task session’s effect, users’ pre-session sub-task knowledge did not have differences among sessions (F(2, 70)=0.166, p>.05), but their post-session sub-task knowledge had differences among sessions (F(2, 70)=4.632, p<.05). These were consistent with those displayed in Table 4.
4. DISCUSSION & CONCLUSIONS Our study yielded several interesting findings about users’ knowledge and knowledge change in multi-session tasks, which further our understanding of users’ knowledge in information tasks. These findings also have implications for information retrieval research and/or system design. First, in multi-session tasks, users’ knowledge of the general tasks and that of the sub-tasks in individual sessions are obviously different variables that measure different things. It is easy for one to understand that users’ knowledge of a general task in different sessions evaluates the same task with no or little variations, but that of the sub-tasks evaluates different foci with much greater variations. Our results showed that users’ presession knowledge of the general task increased along sessions, indicating that in the beginning of later sessions, they retained knowledge that was gained in previous sessions. However, users' pre-session knowledge of the sub-tasks did not vary in different sessions, demonstrating that they had the same levels of baseline knowledge on each sub-task. In multi-session tasks, when measuring user knowledge, care should be taken regarding which type(s) of knowledge would be appropriate to use. Second, in general, users’ knowledge of the tasks and that of the sub-tasks were found to increase in a session, but there were exceptions. It makes intuitive sense that users gained knowledge on a general task or a sub-task after searching for information and working on it. However, for the general task, in session 3, users did not show greater knowledge after the session than before, indicating that users may have reached a plateau in their knowledge of the general task after two sessions’ working on it. A closer look at different tasks found that this did not happen in the DT, but it did in the PT, in session 3, as well as session 1. In session 3, users even showed descriptively equal levels of knowledge before and after the session. One can see that users’ general knowledge in the PT was relatively high in session 3, with a mean rating score of 5.42 on a 7-point scale. Again, the possible explanation could be that users could have gained a
high enough level of knowledge on the PT after two sessions’ work, although not on the DT. With regard to the case in session 1, one can notice that users showed a pre-session knowledge level that was not very low, with a mean of 3.17 (in comparison, that in the DT was only 2.33). This could possibly lead to a nosignificance difference with their post-session knowledge level, which did not turn out to be very high (a mean score of 4.33). These findings seem to indicate that in general, users gained knowledge after a working session, but if they had relatively high levels of baseline knowledge, they may not have significantly higher levels of knowledge afterwards. This has implications on information retrieval task design – tasks in research studies should be designed to avoid the ceiling effect, especially when examining the effects of user knowledge. Following these findings, it would be interesting to explore in future studies the relationship between this no-change in selfassessed pre- and post-session knowledge levels and users’ interaction with systems, and see what systems could do to help users gain more knowledge, or better accomplish their tasks, even though they did not think they had gained knowledge, or when they already owned relatively high levels of knowledge. Third, our results demonstrate that task type had effects on some aspects of users’ knowledge, but not others. Users did not show significant differences with the sub-tasks in the two task conditions, however, they showed differences with the general tasks. Specifically, they had significantly higher pre-session general task knowledge on the PT than on the DT, but their postsession general task knowledge on the two task conditions did not have differences. This indicates that although before working on it, users may have felt having more knowledge in the PT, after working on it, they ended up with the same levels of knowledge in both tasks. Again, it would be interesting to further examine in future studies the users’ searching behaviors and interactions with the systems in both tasks, and explore how and why they ended up with equal levels of knowledge despite of the different task structure (being parallel or dependent).
5. REFERENCES [1] Belkin, N. J. (1980) Anomalous states of knowledge as a basis for information retrieval. Canadian Journal of Information Science, 5, 133-143.
[2] Cole, M. J., Zhang, X., Liu, J., Liu, C., Belkin, N. J., Bierig, R., and Gwizdka, J. (2010). Are self-assessments reliable indicators of topic knowledge? ASIS&T ‘10.
[3] Duggan, G. B., & Payne, S. J. (2008). Knowledge in the head and on the web: Using topic expertise to aid search. CHI '08, 39-48.
[4] Kelly, D. (2006). Measuring online information-seeking context, part 1. Background and method. J. Am. Soc. Inf. Sci. Tec. 57(13), 1729-1739.
[5] Toms, E., MacKenzie, T., Jordan, C., O’Brien, H., Freund, L., Toze, S., et al. (2007). How task affects information search. In Workshop Pre-proceedings in Initiative for the Evaluation of XML Retrieval (INEX), 337-341.
[6] Vakkari, P., & Hakala, N. (2000). Changes in relevance criteria and problem stages in task performance. Journal of Documentation, 56(5), 540-562.
[7] Wildemuth, B. (2004). The effects of domain knowledge on search tactic formulation. J. Am. Soc. Inf. Sci. Tec., 55(3), 246258.