Speak to Me: A wizard of Oz Study on a Language Spoken Interface Xiaojun Yuan College of Computing and Information University at Albany, SUNY Albany, NY +1 518 591 8746
[email protected]
Nicholas J. Belkin School of Communication, Information & Library Studies Rutgers University New Brunswick, NJ +1 732 932 7500
[email protected]
ABSTRACT
This paper reports on an experiment comparing the user behavior of a spoken language search interface which allows only spoken query and touch behavior, with that of a generic keyboard baseline interface. The experiment, with 48 subjects each searching on 12 different topics of three types, indicated that using the spoken interface resulted in significantly less interaction, in terms of number of iterations, number of viewed documents, and number of clicks than using the baseline system. Author Keywords
Design, Experimentation, Human Factors, Interactive Information Retrieval
Ning Sa College of Computing and Information University at Albany, SUNY Albany, NY
[email protected]
Du and Crestani [2] has investigated the effectiveness of spoken queries, as well as their length, but in simulated rather than real interaction. In this study, we were interested in how the mode of query input affects user information search behavior. CONDUCT OF EXPERIMENT Wizard of Oz
A speak-to-me Wizard of Oz experiment (c.f., [1]) was conducted. We implemented the experimental system with a human, sitting in a control room, listening to the subjects’ spoken input and observing their gestures, interpreting what is said and pointed to, and typing that interpretation into the interface that the subjects are interacting with.
ACM Classification Keywords
H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – search process.
SYSTEM DESIGN AND IMPLEMENTATION
INTRODUCTION
Both the Baseline and Experimental systems used the same underlying information retrieval and presentation system, constructed using the Apache Lucene information retrieval toolkit. 1 Functions provided by the Lucene core were used to build the index, analyze the queries, and perform the searches. Figure 1 shows the interface.
As the variety of users of information systems increases, problems of how to make access to the wide variety of users both easy and effective become ever more difficult. We propose to address the problem of encouraging effective interaction of the searcher with the information systems by moving from keyboard-based interaction to spoken language and gestural interaction of the searcher with the information system.
Subjects used a standard keyboard and mouse to conduct searches. The input mode was a search query box, with instructions to the searchers to provide a natural language statement of their information problems. The search results were presented as a list of retrieved documents, with title, and brief description. Clicking on a title will lead to the full text of that article, and relevant articles were saved to a window in the interface by clicking on a “save” button.
General Terms
Human Factors; Design; Measurement
The interface for the experimental system was identical to the baseline system, in that it had the same query box, presented the search results in the same way, and had a window for saved documents. The only difference was that there was no keyboard, and rather than a mouse, there was Copyright held by the authors HCIR '13, Vancouver, BC, Canada, Oct 3-4, 2013
1
http://lucene.apache.org/
a touch-screen monitor on which the subjects could point out full documents, or parts of documents they like, do not like, or have any opinions on. Such actions were used to modify their queries, using standard relevance feedback and query modification techniques.
Task Type 1 (Factual Task): Topic: Recently, a friend told you about a movie that he liked. It starred a French actress. The title was a woman's name. It was an award winning film that came out in 2001. This actress also played the lead role in a movie with Tom Hanks in 2006. Task: Find the name of the actress and the name of both movies. Save the document(s) where you have found the information required. Task Type 2 (Interpretive Task): Topic: A friend of yours insists that you must only buy and eat organic foods. She has been warning you about genetically modified foods and their harmful effects. You have also heard of people who only eat raw foods in their diet. You have decided you need to find some information on organic food, genetically modified food and raw food to be able to discuss this further. Task: Find the benefits and/or harmful effects of each type of food. Save the document(s) where you have found the information required. Table 1. Tasks and Topics
Figure 1: System Interface Screenshot
Task
No.
Topic
Factual Task
FT1
The largest cruise ship in 2006
Experimental Design
FT2
NASA imaging program
The experiment was carried out in a within-subject design, in which participants performed searches using each of the two systems, first one system, then the other. For each system, subjects first performed a search on a training topic, then searched on six different topics. These six topics belong to three task categories. The first test topic was of the same task type as the training topic, the second topic was of the other task type, and so on. The order of the task types and topics was rotated across participants and the experiment was replicated by exchanging the order of the two systems. This design led to 48 subjects.
FT3
Hypermarket in Spain
FT4
The name of an actress and a movie
IT1
Differences between Organic food, genetically modified food, and raw food
IT2
Eating disorders in high school
IT3
Charter school, public school, and private school
IT4
Growing fruits
ET1
Lead paint poison
ET2
Reasons vegans
ET3
Beer brewing at home
ET4
Feline intelligence, temperament, behaviors,
Tasks and Topics
Twelve different search tasks were given to the subjects to perform. The tasks are categorized into three types in terms of the analysis by Kim [3]; that is, factual task, interpretive task and exploratory task. According to Kim, factual tasks collect facts by asking and are close-ended. Interpretive tasks and exploratory tasks are open-ended and include evaluation, inference, and comparison, while the former is more focused and goal oriented than the latter. Below, we give an example topic for each task type.
Interpretive Task
Exploratory Task
for
vegetarians
and
Task Type 3 (Exploratory Task): Topic: You have friends who are vegetarians and some who are vegans. You want to have a better understanding why your friends have chosen to not eat meat and/or animal products.
Task: You want to understand why some people eat meat and some do not. You want to know if it is a cultural thing or something else. Save the document(s) where you have found the information required. Table 1 shows a brief summary of the tasks. RESULTS
Behavior Measures The measures of user behavior were the time taken to complete the task; and the characteristics of user interaction with the system and of user effort, which were measured in terms of the number of iterations in a search (e.g. number of queries per task), the total number of documents saved per task, the total number of documents viewed per task, and the total number of clicks per task, and mean query length. Table 2 displays the results. Significance tests were through t-test. Table 2. Behavior measures (* significant at <.05 level, ** significant at <.01 level) Variables
Systems
Behavior Measure
Baseline
Experimental
t, sig.
Time of task completion
322.96 (159.46)
359.32** (174.65)
t=2.61, df=569.31,
Number of iterations
6.66** (4.52)
5.05 (3.03)
t=-5.04, df=501.96, p<0.00
Number of final saved documents
3.16
2.87
(2.13)
(1.98)
Number of viewed documents
6.07**
4.89
(3.87)
Number of clicks
46.10**
p=0.009
(seconds)
t=-1.68, df=570.70, p= 0.093
Results show that participants found the baseline system (Mean=6.00, SD=1.11) was significantly easier to learn to use than the experimental system (Mean=5.43, SD=1.18), W=831.5, p= 0.015. There was no significant difference between the rest of the variables. We transcribed the participants’ comments during use of the systems from the Morae Video and audio to help us understand more about both their opinions and their behaviors. Participants felt that the baseline system was easier to learn to use because “I am a little more biased towards typing, I am practicing that for a longer period. Searching is a lot easier with the typing.” “We are used to this kind of search. It is almost as same as we used the web browse Google, Yahoo…” However, they also commented that “it is just so much fun to talk to the computer, have it do all the work” and “…But, eventually, that (touch) is the future.” It may take time for users to get used to this new method of interaction and started using it in their daily life. DISCUSSION AND CONCLUSIONS
There were significant differences in favor of the experimental system on interaction measures. The number of iterations, that is, queries, per search, as well as the number of viewed documents, and the number of clicks were significantly lower, and the mean query length was significantly higher. We plan to further explore the results of the impact of task types on interaction and other behavior measures. The results of the project will be the evaluation of a new method of interaction with information systems, as compared to a normal keyboard-based interaction, and, design criteria for a spoken language and gesture input interface to information retrieval systems. ACKNOWLEDGMENTS
Query Length
(29.47) 3.55 (2.37)
(3.02)
t = -4.08, df = 541.77, p<0.00
Our thanks to Institute of Library and Information Services (IMLS) grant #RE-04-10-0053-10.
37.15 (20.84)
t = -4.21, df = 516.57, p<0.00
REFERENCES 1. Akers, D. 2006. Wizard of Oz for participatory design: inventing a gestural interface for 3D selection of neural pathway estimates. In CHI '06 Extended Abstracts on Human Factors in Computing Systems (Montréal, Québec, Canada, April 22 - 27, 2006). CHI '06, ACM Press, New York, NY, 454-459.
3.78* (2.34)
t=2.18, df= 1831.55, p= 0.029
2. User Perception
In the post system questionnaire, we asked participants’ opinions on the systems. The user perception of the systems was measured in terms of ease of learning to use the system, ease of use of the system, understanding of the system, and usefulness of the system (all on 7-point scales, 1=low; 7=high).
Du, H. & Crestani, F. (2004a) “Retrieval Effectiveness of Written and Spoken Queries: An Experimental Evaluation.” International Conference on Flexible Query Answering Systems (FQAS) 2004: 376-389.
3. Kim, J.-Y. (2006). Task as a Predictable Indicator for Information Seeking Behavior on the Web. Unpublised Ph.D. dissertation, Rutgers University, New Brunswick.