Speak to Me: A wizard of Oz Study on a Language Spoken Interface Xiaojun Yuan College of Computing and Information University at Albany, SUNY Albany, NY +1 518 591 8746 [email protected]

Nicholas J. Belkin School of Communication, Information & Library Studies Rutgers University New Brunswick, NJ +1 732 932 7500 [email protected]

ABSTRACT

This paper reports on an experiment comparing the user behavior of a spoken language search interface which allows only spoken query and touch behavior, with that of a generic keyboard baseline interface. The experiment, with 48 subjects each searching on 12 different topics of three types, indicated that using the spoken interface resulted in significantly less interaction, in terms of number of iterations, number of viewed documents, and number of clicks than using the baseline system. Author Keywords

Design, Experimentation, Human Factors, Interactive Information Retrieval

Ning Sa College of Computing and Information University at Albany, SUNY Albany, NY [email protected]

Du and Crestani [2] has investigated the effectiveness of spoken queries, as well as their length, but in simulated rather than real interaction. In this study, we were interested in how the mode of query input affects user information search behavior. CONDUCT OF EXPERIMENT Wizard of Oz

A speak-to-me Wizard of Oz experiment (c.f., [1]) was conducted. We implemented the experimental system with a human, sitting in a control room, listening to the subjects’ spoken input and observing their gestures, interpreting what is said and pointed to, and typing that interpretation into the interface that the subjects are interacting with.

ACM Classification Keywords

H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – search process.

SYSTEM DESIGN AND IMPLEMENTATION

INTRODUCTION

Both the Baseline and Experimental systems used the same underlying information retrieval and presentation system, constructed using the Apache Lucene information retrieval toolkit. 1 Functions provided by the Lucene core were used to build the index, analyze the queries, and perform the searches. Figure 1 shows the interface.

As the variety of users of information systems increases, problems of how to make access to the wide variety of users both easy and effective become ever more difficult. We propose to address the problem of encouraging effective interaction of the searcher with the information systems by moving from keyboard-based interaction to spoken language and gestural interaction of the searcher with the information system.

Subjects used a standard keyboard and mouse to conduct searches. The input mode was a search query box, with instructions to the searchers to provide a natural language statement of their information problems. The search results were presented as a list of retrieved documents, with title, and brief description. Clicking on a title will lead to the full text of that article, and relevant articles were saved to a window in the interface by clicking on a “save” button.

General Terms

Human Factors; Design; Measurement

The interface for the experimental system was identical to the baseline system, in that it had the same query box, presented the search results in the same way, and had a window for saved documents. The only difference was that there was no keyboard, and rather than a mouse, there was Copyright held by the authors HCIR '13, Vancouver, BC, Canada, Oct 3-4, 2013

1

http://lucene.apache.org/

a touch-screen monitor on which the subjects could point out full documents, or parts of documents they like, do not like, or have any opinions on. Such actions were used to modify their queries, using standard relevance feedback and query modification techniques.

Task Type 1 (Factual Task): Topic: Recently, a friend told you about a movie that he liked. It starred a French actress. The title was a woman's name. It was an award winning film that came out in 2001. This actress also played the lead role in a movie with Tom Hanks in 2006. Task: Find the name of the actress and the name of both movies. Save the document(s) where you have found the information required. Task Type 2 (Interpretive Task): Topic: A friend of yours insists that you must only buy and eat organic foods. She has been warning you about genetically modified foods and their harmful effects. You have also heard of people who only eat raw foods in their diet. You have decided you need to find some information on organic food, genetically modified food and raw food to be able to discuss this further. Task: Find the benefits and/or harmful effects of each type of food. Save the document(s) where you have found the information required. Table 1. Tasks and Topics

Figure 1: System Interface Screenshot

Task

No.

Topic

Factual Task

FT1

The largest cruise ship in 2006

Experimental Design

FT2

NASA imaging program

The experiment was carried out in a within-subject design, in which participants performed searches using each of the two systems, first one system, then the other. For each system, subjects first performed a search on a training topic, then searched on six different topics. These six topics belong to three task categories. The first test topic was of the same task type as the training topic, the second topic was of the other task type, and so on. The order of the task types and topics was rotated across participants and the experiment was replicated by exchanging the order of the two systems. This design led to 48 subjects.

FT3

Hypermarket in Spain

FT4

The name of an actress and a movie

IT1

Differences between Organic food, genetically modified food, and raw food

IT2

Eating disorders in high school

IT3

Charter school, public school, and private school

IT4

Growing fruits

ET1

Lead paint poison

ET2

Reasons vegans

ET3

Beer brewing at home

ET4

Feline intelligence, temperament, behaviors,

Tasks and Topics

Twelve different search tasks were given to the subjects to perform. The tasks are categorized into three types in terms of the analysis by Kim [3]; that is, factual task, interpretive task and exploratory task. According to Kim, factual tasks collect facts by asking and are close-ended. Interpretive tasks and exploratory tasks are open-ended and include evaluation, inference, and comparison, while the former is more focused and goal oriented than the latter. Below, we give an example topic for each task type.

Interpretive Task

Exploratory Task

for

vegetarians

and

Task Type 3 (Exploratory Task): Topic: You have friends who are vegetarians and some who are vegans. You want to have a better understanding why your friends have chosen to not eat meat and/or animal products.

Task: You want to understand why some people eat meat and some do not. You want to know if it is a cultural thing or something else. Save the document(s) where you have found the information required. Table 1 shows a brief summary of the tasks. RESULTS

Behavior Measures The measures of user behavior were the time taken to complete the task; and the characteristics of user interaction with the system and of user effort, which were measured in terms of the number of iterations in a search (e.g. number of queries per task), the total number of documents saved per task, the total number of documents viewed per task, and the total number of clicks per task, and mean query length. Table 2 displays the results. Significance tests were through t-test. Table 2. Behavior measures (* significant at <.05 level, ** significant at <.01 level) Variables

Systems

Behavior Measure

Baseline

Experimental

t, sig.

Time of task completion

322.96 (159.46)

359.32** (174.65)

t=2.61, df=569.31,

Number of iterations

6.66** (4.52)

5.05 (3.03)

t=-5.04, df=501.96, p<0.00

Number of final saved documents

3.16

2.87

(2.13)

(1.98)

Number of viewed documents

6.07**

4.89

(3.87)

Number of clicks

46.10**

p=0.009

(seconds)

t=-1.68, df=570.70, p= 0.093

Results show that participants found the baseline system (Mean=6.00, SD=1.11) was significantly easier to learn to use than the experimental system (Mean=5.43, SD=1.18), W=831.5, p= 0.015. There was no significant difference between the rest of the variables. We transcribed the participants’ comments during use of the systems from the Morae Video and audio to help us understand more about both their opinions and their behaviors. Participants felt that the baseline system was easier to learn to use because “I am a little more biased towards typing, I am practicing that for a longer period. Searching is a lot easier with the typing.” “We are used to this kind of search. It is almost as same as we used the web browse Google, Yahoo…” However, they also commented that “it is just so much fun to talk to the computer, have it do all the work” and “…But, eventually, that (touch) is the future.” It may take time for users to get used to this new method of interaction and started using it in their daily life. DISCUSSION AND CONCLUSIONS

There were significant differences in favor of the experimental system on interaction measures. The number of iterations, that is, queries, per search, as well as the number of viewed documents, and the number of clicks were significantly lower, and the mean query length was significantly higher. We plan to further explore the results of the impact of task types on interaction and other behavior measures. The results of the project will be the evaluation of a new method of interaction with information systems, as compared to a normal keyboard-based interaction, and, design criteria for a spoken language and gesture input interface to information retrieval systems. ACKNOWLEDGMENTS

Query Length

(29.47) 3.55 (2.37)

(3.02)

t = -4.08, df = 541.77, p<0.00

Our thanks to Institute of Library and Information Services (IMLS) grant #RE-04-10-0053-10.

37.15 (20.84)

t = -4.21, df = 516.57, p<0.00

REFERENCES 1. Akers, D. 2006. Wizard of Oz for participatory design: inventing a gestural interface for 3D selection of neural pathway estimates. In CHI '06 Extended Abstracts on Human Factors in Computing Systems (Montréal, Québec, Canada, April 22 - 27, 2006). CHI '06, ACM Press, New York, NY, 454-459.

3.78* (2.34)

t=2.18, df= 1831.55, p= 0.029

2. User Perception

In the post system questionnaire, we asked participants’ opinions on the systems. The user perception of the systems was measured in terms of ease of learning to use the system, ease of use of the system, understanding of the system, and usefulness of the system (all on 7-point scales, 1=low; 7=high).

Du, H. & Crestani, F. (2004a) “Retrieval Effectiveness of Written and Spoken Queries: An Experimental Evaluation.” International Conference on Flexible Query Answering Systems (FQAS) 2004: 376-389.

3. Kim, J.-Y. (2006). Task as a Predictable Indicator for Information Seeking Behavior on the Web. Unpublised Ph.D. dissertation, Rutgers University, New Brunswick.

Proceedings Template - WORD

We propose to address the problem of encouraging ... Topic: A friend of yours insists that you must only buy and .... Information Seeking Behavior on the Web.

191KB Sizes 1 Downloads 213 Views

Recommend Documents

Proceedings Template - WORD
This paper presents a System for Early Analysis of SoCs (SEAS) .... converted to a SystemC program which has constructor calls for ... cores contain more critical connections, such as high-speed IOs, ... At this early stage, the typical way to.

Proceedings Template - WORD - PDFKUL.COM
multimedia authoring system dedicated to end-users aims at facilitating multimedia documents creation. ... LimSee3 [7] is a generic tool (or platform) for editing multimedia documents and as such it provides several .... produced with an XSLT transfo

Proceedings Template - WORD
Through the use of crowdsourcing services like. Amazon's Mechanical ...... improving data quality and data mining using multiple, noisy labelers. In KDD 2008.

Proceedings Template - WORD
software such as Adobe Flash Creative Suite 3, SwiSH, ... after a course, to create a fully synchronized multimedia ... of on-line viewable course presentations.

Proceedings Template - WORD
10, 11]. Dialogic instruction involves fewer teacher questions and ... achievment [1, 3, 10]. ..... system) 2.0: A Windows laptop computer system for the in-.

Proceedings Template - WORD
Universal Hash Function has over other classes of Hash function. ..... O PG. O nPG. O MG. M. +. +. +. = +. 4. CONCLUSIONS. As stated by the results in the ... 1023–1030,. [4] Mitchell, M. An Introduction to Genetic Algorithms. MIT. Press, 2005.

Proceedings Template - WORD
As any heuristic implicitly sequences the input when it reads data, the presentation captures ... Pushing this idea further, a heuristic h is a mapping from one.

Proceedings Template - WORD
Experimental results on the datasets of TREC web track, OSHUMED, and a commercial web search ..... TREC data, since OHSUMED is a text document collection without hyperlink. ..... Knowledge Discovery and Data Mining (KDD), ACM.

Proceedings Template - WORD
685 Education Sciences. Madison WI, 53706-1475 [email protected] ... student engagement [11] and improve student achievement [24]. However, the quality of implementation of dialogic ..... for Knowledge Analysis (WEKA) [9] an open source data min

Proceedings Template - WORD
presented an image of a historical document and are asked to transcribe selected fields thereof. FSI has over 100,000 volunteer annotators and a large associated infrastructure of personnel and hardware for managing the crowd sourcing. FSI annotators

Proceedings Template - WORD
has existed for over a century and is routinely used in business and academia .... Administration ..... specifics of the data sources are outline in Appendix A. This.

Proceedings Template - WORD
the technical system, the users, their tasks and organizational con- ..... HTML editor employee. HTML file. Figure 2: Simple example of the SeeMe notation. 352 ...

Proceedings Template - WORD
Dept. of Computer Science. University of Vermont. Burlington, VT 05405. 802-656-9116 [email protected]. Margaret J. Eppstein. Dept. of Computer Science. University of Vermont. Burlington, VT 05405. 802-656-1918. [email protected]. ABSTRACT. T

Proceedings Template - WORD
Mar 25, 2011 - RFID. 10 IDOC with cryptic names & XSDs with long names. CRM. 8. IDOC & XSDs with long ... partners to the Joint Automotive Industry standard. The correct .... Informationsintegration in Service-Architekturen. [16] Rahm, E.

Proceedings Template - WORD
Jun 18, 2012 - such as social networks, micro-blogs, protein-protein interactions, and the .... the level-synchronized BFS are explained in [2][3]. Algorithm I: ...

Proceedings Template - WORD
information beyond their own contacts such as business services. We propose tagging contacts and sharing the tags with one's social network as a solution to ...

Proceedings Template - WORD
accounting for the gap. There was no ... source computer vision software library, was used to isolate the red balloon from the ..... D'Mello, S. et al. 2016. Attending to Attention: Detecting and Combating Mind Wandering during Computerized.

Proceedings Template - WORD
fitness function based on the ReliefF data mining algorithm. Preliminary results from ... the approach to larger data sets and to lower heritabilities. Categories and ...

Proceedings Template - WORD
non-Linux user with Opera non-Linux user with FireFox. Linux user ... The click chain model is introduced by F. Guo et al.[15]. It differs from the original cascade ...

Proceedings Template - WORD
temporal resolution between satellite sensor data, the need to establish ... Algorithms, Design. Keywords ..... cyclone events to analyze and visualize. On the ...

Proceedings Template - WORD
Many software projects use dezvelopment support systems such as bug tracking ... hosting service such as sourceforge.net that can be used at no fee. In case of ...

Proceedings Template - WORD
access speed(for the time being), small screen, and personal holding. ... that implement the WAP specification, like mobile phones. It is simpler and more widely ...

Proceedings Template - WORD
effectiveness of the VSE compare to Google is evaluated. The VSE ... provider. Hence, the VSE is a visualized layer built on top of Google as a search interface with which the user interacts .... Lexical Operators to Improve Internet Searches.

Proceedings Template - WORD
shown that mathematical modeling and computer simulation techniques can be used to study .... intersection model. Using the Java Software Development Kit, a.