A User Study on Improving the Effectiveness of a Spoken Language Interface to Information Systems Xiaojun Yuan College of Computing and Information, University at Albany, State University of New York 135 Western Avenue, Albany, NY 12222 [email protected]

Nicholas Belkin School of Communication and Information, Rutgers University 4 Huntington Street, New Brunswick, NJ 08901 [email protected]

Chris Jordan College of Computing and Information, University at Albany, State University of New York 135 Western Avenue, Albany, NY 12222 [email protected]

Catherine Dumas College of Computing and Information, University at Albany, State University of New York 135 Western Avenue, Albany, NY 12222 [email protected]

ABSTRACT needed information in completing complex tasks. Such systems should assist users during their entire search process and reduce the degree of user perceived task complexity, by iteratively constructing a complex query or search strategy in each searching stage, and by progressively integrating the partial answers into a coherent single one at the later stages. In this information-overloading age, we urgently need a search engine that can “interpret a user’s question, extract facts from all the information on the web, and select an appropriate answer” [10]. Obviously, the current search engines can not take this responsibility. As Etzioni suggested, “The big search engines have taken tiny steps in the right direction” [10]. To achieve the above goals, we need to figure out appropriate and effective ways to elicit clear information needs from users, understand more about their questions, and allow them to choose right answers effectively.

Research has shown that users of digital libraries and other information systems typically carry out searches with very short queries, on the order of two words or so. This makes it very difficult for the systems to disambiguate their queries and identify potentially relevant documents, resulting in sub-optimal retrieval performance. We hypothesize that users will provide better and more useful descriptions of their information problems if they are able to speak to the system and easily indicate through speech and gesture those documents and aspects of documents which they find useful, and not useful. In this paper, we introduced a spoken interface, and described a planned wizard of oz study.

Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – relevance feedback, search process.

As digital information systems become increasingly ubiquitous and important in people’s lives, issues regarding ease of access to, and effective use of such systems becomes correspondingly more important. As the variety of users of these systems increases, problems of how to make access to the wide variety of users both easy and effective is becoming even more difficult. One of the major mismatches between how users address information systems, and what information systems need to operate effectively, is the tendency of users to initiate their information seeking episodes with very short queries, and the need of typical “best match” retrieval algorithms to have long queries to effectively match information needs to documents. The tendency of users of information systems to begin their searches with brief queries is likely attributed to two factors: the general inability of people to specify precisely which documents they require to resolve their information problems (cf. [1]); and, the difficulty people have in finding terms appropriate both for describing their information problems, and matching the terms that have been used to describe the documents in the database they are interacting with

General Terms Measurement, Performance, Experimentation, Human Factors.

Keywords User studies, searching, spoken interface, user performance

1. INTRODUCTION Nowadays, when dealing with simple and easy tasks, the existing search engines do a fine job. For example, “Where is the capital of China?”, users can simply go to a search engine site, and type in “capital China.” The answer can be found from the snippet of the top ranked search results. However, current search engines do not do a good job on more complicated situations because of the complexity of human information behavior and needs. Users of information systems typically carry out searches with very short queries, on the order of two words or so [11]. This makes it very difficult for the systems to disambiguate their queries and identify potentially relevant documents, and leads to sub-optimal retrieval performance. Instead of simply returning a ranked list of documents to respond to this simple query, a better search system or interface is needed to assist users in locating

To address these two problems, a variety of ways have been proposed and investigated. One approach has been to devise interface techniques that encourage searchers to input longer

1

queries [12]; another has been to automatically enhance the initial query without the searchers’ intervention, or through query expansion based on thesauri or similar tools [9]; a third is to offer searchers, based on their initial queries, terms that could be used to enhance their initial queries [4][15]. Although these approaches have shown to afford some benefit in retrieval effectiveness, none of them have involved searchers developing and understanding their information problems, finding better ways to express their information “needs”, nor have they succeeded in substantially improving either retrieval effectiveness or searcher satisfaction with the interaction [13].

As shown in Figure 1, the interface (implemented in a touchscreen monitor) has a query entry box, a list of saved documents and a ranked list of retrieval results. Query input is through voice. The system shows the query as it is interpreted in the query entry box. It allows the subjects to view the retrieved documents by pointing on their titles, or by saying which document they would like to view. Useful documents are saved directly with the subjects touching the “Save” button to the right side of each retrieved document, or, again, by saying which document(s) should be saved. This system, called “Speak to Me” uses the Lucene information retrieval toolkit for indexing, retrieving and ranking results. For the experiments that will be conducted to evaluate the usefulness of this speech/gesture interface, a second system, our Baseline system, uses the same interface and retrieval techniques, but requires keyboard entry of query, and mouse clicking interaction for viewing, saving and scrolling.

We propose to address the problem of encouraging effective interaction of the searcher with information systems by moving from keyboard-based interaction to spoken language and gestural interaction of the searcher with the information system. The origins of this approach are based on Taylor’s research regarding question negotiation between users and librarians in special libraries [21]. Also on the experience of elicitation of verbal descriptions of searchers’ information problems in studies of Anomalous State of Knowledge (ASK)-based information systems [1]. Subsequently, [19] showed that there was substantial direct commentary by both searcher and intermediary on results retrieved with respect to a query put to the system. The major arguments against taking the spoken language and gesture approach to query input and interaction have been that: there has not been strong evidence that such interaction will actually result in more effective results; it is unclear that searchers will willingly engage in such interaction; and, most importantly, that speech understanding technology is not robust enough to support such interaction. Our position is that: there is some evidence that the longer queries and more extensive response to search results that would be afforded by this mode of interaction does improve retrieval effectiveness (e.g. [3]; [13]); when encouraged to describe their information problems more fully, searchers will do so [5][13]; and, spoken language interaction with information systems appears to be either doable right now [23] or in the very near future, with commercially available speech understanding systems (e.g. Dragon). There is also evidence that speech recognition technology is already in place in a mobile environment [20]. For example, Google outlined developments in voice search, which allows users to search the Internet from a mobile phone by speaking their requests or queries to Google in Japanese, in addition to Chinese and English. Google is planning to add new languages next year.

Figure 1. Speak to Me interface

2.2 Experiment Design We will implement a Wizard of Oz experiment to investigate the usefulness of an information retrieval system which accepts spoken and gestural input, compared to one which accepts typed input. Our hypothesis is that the system which allows the user to speak about her/his information need, and to indicate through speech and pointing good and bad things about the system’s response, will collect more useful information about the person’s information needs than that which allows interaction only through typing and mouse movement. The purpose of getting more information is to provide the data that is required to build a useful language model (cf. [8]) of the user’s information need, and to make inferences about the user’s intent in the search. These will then be used to predict what documents are most likely to be useful to the searcher. There is ample evidence that query language models increase in effectiveness and accuracy with increased data (i.e. number of words), (see e.g. various papers in [8]). Also, allowing or encouraging natural query formulation has shown to result in linguistic expressions indicative of intention (e.g. [5] [13] [18]).

Crestani and Du [7] have shown that asking for expression of search need in verbal terms results in significantly longer queries than those expressed through a keyboard interface. Crestani has led a group which has considered spoken language queries and their effectiveness in a variety of contexts [7]. Some of their work has investigated the effectiveness of spoken queries, as well as their length, but in simulated rather than real interaction. In the following section, a spoken interface is described, and a user experiment to investigate the utility of such interaction is briefly illustrated.

2. METHODOLOGY 2.1 Systems

2

For each topic, subjects will be provided with a Topic Description. This description indicates the topic on which they will search for information, and will be presented together with a “pre-search” questionnaire, asking scalar questions to assess the participant’s familiarity with the topic and task, and predicted difficulty in task accomplishment. Subjects are to respond to these questions before searching commences. After completing each search, the subjects will be asked to complete a post-search questionnaire, concerning their experience with the system and their evaluation of their performance of the task. After the first search, subjects will have a tutorial then a training task for the second system, and follow the same procedure for the first topic on that system. They will then alternate searches in the two systems, until all topics in both systems are completed. Subjects will then participate in an exit interview, in which they will be asked to compare the two systems on several criteria, as well as to characterize their experiences with each system. This general experimental procedure has been used successfully in a long series of experiments in interactive information retrieval at Rutgers (e.g. [22]; [2]). TechSmith Morae software will be used to record all the interactions between the subjects and the system, and the subjects’ voice and facial expressions during the whole experiment. Each subject will be paid $50 after completing the experiment.

The experimental design is within-subjects. There will be two systems in the laboratory, one running the Baseline System and the other running the Speak to Me System. The Baseline System will be equipped with a keyboard and mouse, while the Speak to Me System will have no keyboard, and will have a touch-screen monitor. The monitor interfaces for both the speak to me and baseline system are identical. These systems are located side-byside, and the participants will alternate search tasks between the two systems. There will be two experimenters. One experimenter will interact with the participant and administer the experiment, while a second experimenter will play “The Wizard of Oz” and input the queries and query modification commands that are issued by the participant when using the Speak to Me System. The second experimenter will be situated in a control room, behind an observation mirror. The interaction between the subject and the system will be recorded by TechSmith Morae 2.11.

2.3 Tasks Different search tasks will be given to the subjects to perform. These tasks will be described using scenarios that attempt to involve the subjects in the context of the search, according to the concept of “work tasks” proposed by Borlund [6]. We will investigate the impact of tasks using two different task classification schemes, in two successive phases.

3. CONCLUSIONS AND FUTURE WORK

2.3.1 Task Phase 1: Factual, Interpretive and Exploratory

This is an on-going project. The results of the project will be the evaluation of a new method of interaction with digital libraries and other information systems, as compared to a normal keyboard-based interaction, and, design criteria for a spoken language and gesture input interface to information retrieval systems. Although this experiment will be carried out with desktop computers, it is clear that the mode of interaction that we are testing is especially applicable to search with mobile devices. We also hope that this kind of research can draw sufficient attention in the field of IR and HCI for the purpose of improving search system effectiveness and gaining satisfactory user experience of using such systems.

We aim to test the impact of different task types on user information behavior in a spoken language environment. This is done to give some degree of ecological validity to the experiment. The tasks are categorized into three types in terms of the analysis by Kim [14]; that is, factual task, interpretive task and exploratory task. According to Kim, factual tasks collect facts by asking and are close-ended. Interpretive tasks and exploratory tasks are openended and include evaluation, inference, and comparison; the former is more focused and goal oriented than the latter.

2.3.2 Task Phase 2: Faceted Classification Tasks

4. AKNOWLEDGEMENTS

In this phase, we aim to associate variable features of tasks with user information behavior. Tasks are categorized based on facets based on the faceted classification scheme designed by Li [16][17].

This research was sponsored by Institute of Museum and Library Services (IMLS) grant RE-04-10-0053-10.

5. REFERENCES [1] Belkin, N.J. (1980). Anomalous States of Knowledge as a basis for information retrieval. Canadian Journal of Information Science, 5, 133-143. [2] Belkin, Cool, C., Kelly, D., Lin, S.-J., Park, S.Y., PerezCarballo, J., Sikora, C. (2001) Iterative exploration, design and evaluation of support for query reformulation in interactive information retrieval. Information Processing and Management, v. 37, no. 3: 403-434 [3] Belkin, N.J. & Kwasnik, B.H. (1986) Using structural representations of anomalous states of knowledge for choosing document retrieval strategies. In: SIGIR '86: Proceedings of the 1986 ACM SIGIR International Conference on Research and Development in Information Retrieval, Pisa, Italy. Pisa, ACM: 11-22. [4] Belkin, N.J., Marchetti, P.G. & Cool, C. (1993). BRAQUE: Design of an interface to support user interaction in

In the field of information science research, how tasks have been identified and characterized varies differently. Li’s classification scheme [16] makes an attempt to identify and integrate the various aspects or facets of task in a single scheme framework. The highlight of this classification scheme is that the values of the different facets during the construction of work and search tasks are controllable, and we can associate dependent behavioral variables with a relatively small set of independent task variables. Li’s classification scheme has fifteen facets or sub-facets of work or search task. Here. work tasks can lead users to engage in information seeking behavior, and search tasks are defined as the specific information seeking activities themselves.

2.4 Procedure 1

http://www.techsmith.com/morae.asp

3

[15] Koenemann, J. & Belkin, N.J. (1996). A case for interaction: A study of interactive information retrieval behavior and effectiveness. In: Proceedings of the ACM SIG CHI Conference on Human Factors in Computing Systems. (SIGCHI '96), (pp. 205-212). New York, ACM. [16] Li, Y. (2008). Relationships among work tasks, search tasks, and interactive information searching behavior. Unpublished dissertation. Rutgers University. [17] Li, Y. (2009) Exploring the relationships between work task and search task in information search. Journal of the American Society for Information Science and Technology, 60, 275-291. [18] Murdock, V., Kelly, D., Croft, W.B., Belkin, N.J. & Yuan, X.-J. (2007). Identifying and improving retrieval for procedural questions. Information Processing and Management, v. 43, no. 1: 181-203. [19] Saracevic, T., Spink, A. & Wu, M-M. (1997). Users and intermediaries in information retrieval: What are they talking about? User modeling. In: Proceedings of the Sixth International Conference, (UM '97). (pp.43-57-4). New York: Springer. [20] Stone, B. (2009). Google Adds Live Updates to Results. New York Times, December 8 Issue. Retrieved from http://www.nytimes.com/2009/12/08/technology/companies/0 8google.html [21] Taylor, R.S. (1968). Question negotiation and information seeking in libraries. College and Research Libraries, 29, 178194. [22] Yuan, X.-J. (2007). Supporting Multiple InformationSeeking Strategies in a Single System Framework. Ph.D. Dissertation, Rutgers University, New Brunswick. [23] Zue, V. et al. (2000) JUPITER: A Telephone-Based Conversational Interface for Weather Information," IEEE Transactions on Speech and Audio Processing, Vol. 8, No. 1, January 2000.

information retrieval. Information Processing and Management, 29, 325-344. [5] Belkin, N.J., Cool, C., Kelly, D., Kim, G., Kim, J.-Y., Lee, H.-J., Muresan, G., Tang, M.-C., Yuan, X.-J. (2003). Query Length in Interactive Information Retrieval. In SIGIR ’03. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 205-212). New York: ACM. [6] Borlund, P. (2003). The IIR evaluation model: a framework for evaluation of interactive information retrieval systems. Information Research, 8, 3-8. [7] Crestani, F. & Du, H. (2006).Written versus spoken queries: A qualitative and quantitative comparative analysis. Journal of the American Society for Information Science and Technology, 57, 881-890. [8] Croft, W. B. and Lafferty, J. (2003). Language Modeling for Information Retrieval. Amsterdam: Kluwer Academic Publishers. [9] Efthimiadis, E. N. (1996). Query expansion. In Williams, Martha E., (Ed). Annual Review of Information Systems and Technology (ARIST), 31, (pp. 121-187). [10] Etzioni, O. (2011) Search needs a shake-up. Nature, 476 (4 August 2011): 25-26. [11] Jansen, B. J., Spink, A., and Saracevic, T. (2000). Real life, real users, and real needs: A study and analysis of user queries on the web. Information Processing and Management. 36(2), 207-227. [12] Karlgren, J. & Franzén, K. (1997). Verbosity and interface design. htttp://www.ling.su.se/staff/franzen/irinterface.html. [13] Kelly, D., Dollu, V. J., & Fu, X. (2005). The loquacious user: A document-independent source of terms for query expansion. In: Proceedings of the 28th Annual ACM International Conference on Research and Development in Information Retrieval (SIGIR '05), (pp. 457-464). Salvador, Brazil:ACM. [14] Kim, J.-Y. (2006). Task as a predictable indicator for information seeking behavior on the web. Unpublished Ph.D. dissertation, Rutgers University, New Brunswick.

4

A User Study on Improving the Effectiveness of a ...

[23] or in the very near future, with commercially available speech understanding ... Internet from a mobile phone by speaking their requests or queries to Google in ... Me System will have no keyboard, and will have a touch-screen monitor. .... http://www.nytimes.com/2009/12/08/technology/companies/0. 8google.html.

74KB Sizes 1 Downloads 159 Views

Recommend Documents

On the Effectiveness of Aluminium Foil Helmets: An Empirical Study ...
On the Effectiveness of Aluminium Foil Helmets: An Empirical Study.pdf. On the Effectiveness of Aluminium Foil Helmets: An Empirical Study.pdf. Open. Extract.

A Pilot Study to Evaluate the Effectiveness of a Peer Support Model ...
A Pilot Study to Evaluate the Effectiveness of a Peer Support Model - Final Report.pdf. A Pilot Study to Evaluate the Effectiveness of a Peer Support Model - Final ...

Improving the Effectiveness of Electronic Health Record-Based ...
process to create a greater impact on health care. quality. In accordance with the 2009 Health Information Tech- nology for Economic and Clinical Health Act ...

Improving the Effectiveness of Electronic Health Record-Based ...
Improving the Effectiveness of Electronic Health Record-Based Referral Processes.pdf. Improving the Effectiveness of Electronic Health Record-Based Referral ...

Improving the Effectiveness of Electronic Health Record-Based ...
Improving the Effectiveness of Electronic Health Record-Based Referral Processes.pdf. Improving the Effectiveness of Electronic Health Record-Based Referral ...

A Study of End-User Programming: Challenges and ...
often editable, but only by those familiar with conventional programming ..... the context of spreadsheet programming (a generalization of the grid-of-cells ...

Evaluation of the Effectiveness of a Silver-Impregnated Medical Cap ...
Page 3 of 7. Evaluation of the Effectiveness of a Silver-Impregnate ... eatment of Nipple Fissure of Breastfeeding Mothers.pdf. Evaluation of the Effectiveness of a ...

Evaluation of the Effectiveness of a Silver-Impregnated Medical Cap ...
Page 1 of 7. Clinical Research. Evaluation of the Effectiveness of a Silver-Impregnated. Medical Cap for Topical Treatment of Nipple. Fissure of Breastfeeding Mothers. Adriano Marrazzu,1 Maria Grazia Sanna,2 Francesco Dessole,1 Giampiero Capobianco,1

A Crawler-based Study of Spyware on the Web
domains we examined in May, but in only 1.6% of domains in October. ..... have substantially reduced how much spyware is available through their site. 3.8 What ...

A Qualitative Study on the Experiences of Mothers ...
Sapountzi-Krepia, BSc, MSc, PhD, RN, RHV, 3-5 Elaion Street,. Kifissia ... In Greece, thalassemia represents a major public ..... About Thalassemia Management.

A Study on the Generalization Capability of Acoustic ...
robust speech recognition, where the training and testing data fol- low different .... Illustration of the two-class classification problem in log likelihood domain: (a) ...

A Crawler-based Study of Spyware on the Web
file-sharing software; the Kazaa system [10] alone has been the source of hundreds of .... At least two commercial anti-spyware companies have implemented ...

Effectiveness of a Counseling Intervention after a ...
Mar 1, 2005 - Debriefing inter- ventions were also used in another two studies (14,15) .... diagram (Fig. 1) depicts the 4 phases of the trial ... they met criterion A DSM-IV (Phase 2) (n=348) .... meaningful way to women, offering information.

A preliminary study on the enhancement of the osteointegration of a ...
osteointegration of a novel synthetic hydroxyapatite scaffold in .... 1) were implanted to study osteointegration. The total .... bone toward the center of the drilled hole was evident at 1 week .... contact area between the implant and the new bone.

Load-Balancing for Improving User Responsiveness on ...
Play Store and Apple App Store. As a result, a ... our approach tries to minimize the cost of task migration .... is designed to avoid too frequent unnecessary load-.

The Instructional and Motivational Effectiveness of a ...
a needs analysis phase, design phase, development phase, implementation phase and ... present so-called online or web-based courses in cataloguing ... vide “an appropriate degree of intellectual chal- .... gramming and graphic design.

THE EFFECTIVENESS OF WEB SPEECH GRAPHICS ON ENGLISH ...
THE EFFECTIVENESS OF WEB SPEECH GRAPHICS ... LEARNING OF BUSINESS ENGLISH STUDENTS.pdf. THE EFFECTIVENESS OF WEB SPEECH ...