User Study Insights for Improved Meta-index Searching

Viewer
Transcript

User Study Insights for Improved Meta-index Searching Michael Huggett and Edie Rasmussen University of British Columbia 470 – 1961 East Mall Vancouver, Canada 1-604-822-2404

ABSTRACT

{m.huggett, edie.rasmussen}@ubc.ca 2. USER INTERACTION WITH INDEXES

We have created meta-indexes for collections of digital books that promote searching, navigation and browsing within a topic area. Because the meta-index is a new knowledge structure, we have performed preliminary user studies to collect information on users’ perception and use of meta-indexes. The study consisted of individually-assigned tasks using our Meta-dex User Interface (MUI), followed by a focus group interview and discussion. Users’ responses were positive and their suggestions led to specific improvements in how meta-indexes are constructed, and in the MUI, which will be tested in subsequent user studies.

Categories and Subject Descriptors H.3.7 [Information Systems]: Information Storage and Retrieval – Digital libraries

General Terms Algorithms, Design, Human Factors.

Keywords Digital libraries, digital books, digital collections, indexes.

1. INTRODUCTION Several large-scale studies on the use of digital books in the academic environment have found high levels of use, at every level [1, 2]. Given their ubiquity and heavy use, digital books offer the potential to change the way scholars interact with texts. The way in which readers reported interacting with digital books is interesting: users did not read entire books, or even chapters, but dipped in and out of them [2]. Henry and Smith [3] suggest that “[t]he challenge before scholars now is to make connections among and within huge sets of digitized data and to create new knowledge from them” (p. 108). As a conceptual knowledge structure manually extracted from a book’s text, the back-of-book index (BoBI) has exactly this potential: to make connections within the collection and through analysis of index entries, to create new knowledge. In the Indexer's Legacy Project [4, 5] we are examining the role of the back-of-book index in supporting exploration of collections of digital monographs. For this work, we created a test collection of seven corpora of at least 100 books, each in a specific domain (as defined by a subject area or theme in the Arts (Art History, Music), Humanities (Economics, Cooking), and Sciences (Geology, Anatomy, Darwin). We have created a new knowledge structure, the meta-index, an aggregation of the indexes of all books within a digital domain, providing summarization of and access to the domain’s structured knowledge. In our current work, we are examining how users perceive and interact with digital collections using the meta-index structure. Copyright held by the author(s) HCIR '13, Vancouver, BC, Canada, Oct 3-4, 2013

The back-of-book index is a ubiquitous and familiar knowledge structure, with appearances in some of the earliest printed books [6] but is relatively unstudied as a tool providing access to scholarly works. This may be in part because the back-of-book index represents closed-system indexing, in which the index is created for a single document, using a vocabulary, indexing style and format considered appropriate for that document [7]. There are many guides to indexing (for example, Mulvany’s Indexing Books [8]) and an ISO [9] and NISO [10] Standard, as well as numerous publisher’s guides to house style, but they generally provide general guidelines on the creation of index entries, with more formal constraints on the actual presentation of the index (indented vs. run-on style, for example). Criteria for a ‘good index’, such as those for the Wheatley medal awarded by The Society of Indexers [11], are also expressed in general terms, e.g. “The terms must be chosen consistently”; “The layout must be clear and help the user”. The lack of easily measurable quality criteria makes it difficult to evaluate back- of-book indexes, and by extension, automatically generated book indexes as well. Csomai and Mihalcea [12] describe the process by which they created a small (56 item) test-bed of indexes as a ‘gold standard’ for comparison of automatically generated indexes, using relatively simple metrics such as recall and precision. Since the function of the index is to provide access to the text, it is reasonable to evaluate an index on this functionality. However, only a few researchers have examined the characteristics and use of the back-of-book index as a navigational tool for printed books (see for example [13, 14, 15, 16, 17, 18]). Similarly, only a few studies have examined index use within a digital text; these studies generally take the form of evaluating the ease of look-up of a term or concept within a single digital text, for example by comparing the BoBI to a keyword search using a search engine (for example [19, 20, 21, 22, 23]. Interesting work by Chi et al. [20, 24, 25] explored the reconfiguration of a digital index around a user’s search term. Our work extends prior work in examining index structures and use at the collection level, with the metaindex as a tool for searching and navigation over a collection of books in a similar domain. The work reported here is exploratory: we have used a small-scale user study to solicit users’ perception of the meta-index and its relevance to their information search, and to explore the types of tasks which would benefit from a meta-search of a digital domain.

3. SYSTEM DESIGN The domain meta-indexes are compiled from digital books in the public domain. The requirements for the books were that they should be paginated, have a BoBI, be largely discursive (i.e. not rely overly on charts, diagrams, or equations), and have a legible text layer without too many OCR errors. It should be noted that native-digital books (i.e. those that begin as digital manuscripts) would be greatly preferred for their lack of OCR errors, but such books are rare in the public domain.

Figure 1: The Meta-Dex User Interface: Search View, Entry View, and Book View After investigating a number of potential sources, we chose to use the Internet Archive [26], as they provide a great number of public domain titles across a wide range of topic areas. We downloaded hundreds of books in PDF form, with the aim of accumulating over 100 books in each of a half-dozen topic areas. Some prospective domains had to be abandoned when it became clear that not enough books could be found in that area. When a domain was fully accumulated (at an end total of 102 books per domain), we extracted the plain text of each book into separate files for content (i.e. the chapters, without front or back matter) and index. Due to inconsistencies in formatting, the indexes were then cleaned by research assistants who used a set of custom-built software tools to find and correct transcription errors. The meta-indexes are built from these sets of cleaned indexes. When all book indexes in a domain were ready, they were each expanded such that each index sub-entry appeared on its own line, fully qualified all the way up to its main entry. This allows us to get a better sense of which terms are most important in an index, by repeating the main entry’s terms alongside the terms of each of its sub-entries: main entries with more sub-entries are assumed to be more important. The expanded indexes are then aggregated into a single alphabetically-sorted proto-meta-index, ensuring that the same entries in different books appear alongside each other. The proto-meta-index is then compressed back into standard tabbed index format, with unique main entries and indented subentries. The resulting meta-indexes are large, typically over 100,000 lines. The Meta-Dex User Interface (MUI) is a tool designed to navigate the domain meta-indexes. The MUI is provided through a dynamic interactive online Web application running on a cloud server at the University of British Columbia (http://meta-dex.noip.org). The site also provides a brief overview of the project, and links to several tables that provide bibliometric data on the topic domains and compare their language models. For comparison with the MUI, the site also provides a standard keyword-based search tool based on the Solr search engine [27]. The MUI displays three views (Figure 1). The Search View provides users with a search box with options for AND/OR boolean searches and term wild-carding. When a search is performed, all the main entries that match those terms are returned, ranked by first by the number of books in which they appear, and second by the number of page references. Clicking on any of the result entries opens the Entry View, which shows the entire entry. Popular entries can be

thousands of lines long. Clicking on a book ID (green numbers) or a page number (blue numbers) opens the Book View, which shows the book meta-data and the text of the selected page. Users can scroll to previous or next pages to get the page’s context.

4. METHODOLOGY The Meta-Dex User Interface (MUI) was tested in two sessions of six participants each. The participants were all graduate students in the area of library, archival, and information science. An initial written questionnaire captured their experience with searching for books in libraries, and with common online digital search tools. Following the questionnaire, participants were asked to perform a 20-minute interactive search session using the MUI. They were given the general scenario of working at a media research company on a series of client-generated tasks, rather than asked to use the MUI on topics of their own interest. This scenario was intended to justify the use of the specific topic domains that we have prepared. The users were asked to answer a series of questions in selected domains, for example (in Art History) “You are considering writing a paper on the Renaissance. How well is this topic covered in the collection? What sub-topics would you select?” Users were provided with the questions in writing on task worksheets, with space left for them to write their responses. They were allowed to work at their own pace, and were not expected to complete all questions. In this way, they were given the opportunity to become familiar with the interface while exploring a question in depth. After the individual MUI sessions, the participants were brought together in a 20-minute focus-group discussion. The discussion was guided by a researcher in an open-ended manner, for example “what did you like (or not like) about the interface?” Participants were then free to follow threads of discussion, and build upon each other’s comments. Data was gathered in various forms. In addition to the users’ written questionnaires, and the answers provided on their task worksheets, the users’ actions in the interface were recorded to time-stamped log files. From this we could see which interface affordances they had used, which they had avoided, and in what part of the search process they had moved quickly or become bogged down. During the focus-group sessions, researchers took notes, and an audio recording of the discussions was later transcribed.

6000+ lines

3000+ lines Figure 2: Pop-up to display the position of a sub-entry in a very large meta-index entry

5. RESULTS From the users’ written questionnaires and in discussion, we found that the users had extensive experience with digital books in online collections, such as the Hathi Trust, Internet Archive, Project Gutenberg, and Google Books. They commonly used tables of contents and BoBIs to find sections in book chapters that met their needs, frequently starting with the TOC and using the index as the last word in making decisions about a source. Their search habits also differ by topic area. For instance, in the Classics they would start with general terms and information sources first, then drill down to topic-specific vocabulary (such as found in an index). The log files that we gathered for their online sessions showed that users were expert in their use of keyword searches to find online information, as seen in their ease of query reformulation using terms returned from initial queries. Reaction to the meta-index as knowledge structure was positive. They particularly liked its breadth: all of the the domain’s terms are gathered and sorted in one place, versus the need to review sources one at a time. They liked the meta-index as a good source for choosing search terms within a topic area: it suggested what could be found even before any searches were made with respect to shared vocabulary, alternative terms, and the suggestion of available sub-topics. Users were encouraged by the idea that the source indexes making up the meta-index were created by individual human indexers expert in the topic area, and that the meta-index therefore represented a consensus of sorts regarding appropriate terms and phrases. They remarked that using the meta-index gave them the impression of strong relevance of results, and that meta-index search seemed like a good way to review a large number of books to find the best few to actually sign out of the library. Reaction to the Meta-dex User Interface (MUI) was mostly positive. Participants had many insightful comments to share, as they had a lot of experience with user interfaces and strong preferences for their design. They liked that the MUI was fast, useful, and had a good coverage of domains. They liked the ranking of entries in the Entry View that showed all of the most popular variations of the search terms within the domain. They liked that they could judge the popularity of an entry by both its document frequency as well as its total number of page references, which made it easier to find the best books quickly and to cross-reference. They also liked the ability to skip to previous and next pages in the Book View to read the context around a

referenced page. From a design perspective, they liked the use of colour to draw attention to important interface affordances, and to highlight search terms in the results. They remarked that the design was cleaner, and more usefully hierarchical, than Google Books, and appreciated the single unified interface that abstracted and grouped search artifacts, versus having to open multiple sources in separate browser tabs. The MUI attracted some negative comments as well. Most critical was the difficulty of working with very large entries: large entries sometimes had significant sub-entry indentation (up to 5 levels) which made it difficult to remember the trail of terms back up to the main entry. It was also often difficult to find their search terms within hundreds or thousands of lines of results, and they often scrolled past them despite their highlighting. Less critically, users remarked that although researchers were present to explain the MUI’s functionality, the addition of a legend would ease the adoption of the interface. Users were also distracted by occasional bad OCR artifacts in Book View, and (since the extracted text did not include images) the occasional missing figure and illustration. Compared to popular Web sites, users found the page style basic, unattractive, and “clunky”, although they conceded that functionality was probably more important at this stage of system development.

6. SYSTEM UPGRADES In response to user comments, we have implemented changes to the interface. To deal with the problem of very large entries, large entries are now ‘folded’ to show the search terms first (to be reexpanded with a click), and mousing over a sub-entry now lists its complete path up to the main entry (Figure 2). Since the Book View formerly appeared at the top of the page, which required a lot of scrolling, it now appears next to the sub-entry that is clicked. Although not a problem with the interface itself, users commented on the increased difficulty of finding items (primarily entities such as people and place names) in the meta-index where different sources had adopted different spellings. In response, we developed a synonyms mechanism that maps variations in a term or phrase to a single canonical representation. Implemented as a simple text configuration file, each line represents one canonical expression followed by all of its variants, each separated by a delimiter. When the meta-index is rebuilt from the source indexes, all references for the variants appear under the canonical main entry, and the variations appear as “see also…” entries that point to the canonical representation. A search using the terms of the variant returns the canonical entry, but also lists the variants in the search results.

7. CONCLUSIONS AND FUTURE WORK From their comments, it was clear that the participants would definitely use the MUI for their daily work if it were available to them. Despite its drawbacks, users said that once they were familiar with its function, the MUI was easy to use and produced useful results in an efficient way. We believe these good preliminary results show the meta-index as a useful new knowledge structure for searching and navigating digital collections. Future work includes running a formal user study with experts in a subset of the domains that we have collected. We intend to give them the opportunity to use the MUI for longer periods, and anticipate that their feedback will point to deeper (possibly domain-specific) issues in meta-index search.

We also intend to add personalization to the MUI. In particular, we hope to extend the synonyms mechanism to allow users to define their own matching terms on the fly, to better tailor the meta-index semantics to their needs. We would like to make the MUI experience more engaging. User comments in the current studies indicated that they missed the library shelf-browsing experience when working with digital books. In discussion, it was suggested that we display dynamically-sorted ‘shelves’ of book spine images as a potential search result. Although not a purely functional approach in the style of the current MUI, participants were enthusiastic about this potential mode of navigation.

8. ACKNOWLEDGMENTS Funding from the UBC Hampton Fund and the Social Science and Humanities Research Council is gratefully acknowledged. We also thank the Graduate Research Assistants who acquired and preprocessed books for the test collection.

9. REFERENCES [1] Gielen, N. (2010). Handheld E-Book Readers and Scholarship: Report and Reader Survey. ACLS Humanities E-Book White Paper No. 3. August 18, 2010. [2] JISC (2009). JISC National E-Book Observatory Project. Key Findings and Recommendations. Final Report, November 2009. http://observatory.jiscebooks.org/reports/jisc-national-ebooks-observatory-project-key-findings-andrecommendations/ [3] C. Henry and K. Smith (2010). Ghostlier demarcations: large-scale text digitization projects and their utility for contemporary humanities scholarship. In: The Idea of Order: Transforming Research Collections for 21st Century Scholarship (CLIR Publication No. 147) [4] M. Huggett and E. Rasmussen (2011). The Meta-Dex Suite: Generating and analyzing indexes and meta-indexes. Proceedings of the 34th Annual International ACM SIGIR Conference, July 24-28, 2011, Beijing, China. New York: ACM. Pp. 1285-1286. [5] M. Huggett and E. Rasmussen (2012). Dynamic online views of meta-indexes. Proceedings of the 12th ACM/IEEECS joint conference on Digital Libraries. New York: ACM. Pages 233-236. [6] H.H. Wellisch (1986). The oldest printed indexes. The Indexer 15(2): 73-82 [7] S. Klement (2002). Open-system versus closed-system indexing. The Indexer 23(1): 23-32. [8] N. Mulvany (2009). Indexing Books. 2nd ed. University of Chicago Press. [9] ISO 999:1996, Information and documentation—guidelines for the content, organization and presentation of indexes [10] J.D. Anderson (1997). NISO-TR02, Guidelines for indexes and related information retrieval devices. http://www.niso.org/publications/tr/tr02.pdf

[11] The Wheatley Medal: About the Wheatley Medal. http://www.indexers.org.uk/index.php?id=61 [12] A. Csomai and R. Mihalcea (). Creating a testbed for the evaluation of automatically generated back-of-the book indexes. Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing: 429-440. [13] V. Diodato (1994). User preferences for features in back of book indexes. Journal of the American Society for Information Science 45(7): 529-536. [14] M. Piotrowski (2010). Leveraging back-of-the-book indices to enable spatial browsing of a historical document collection. GIR’10. Proceedings of the 6th Workshop on Geographic Information Retrieval. Article No. 17. [15] C. Jörgensen, C. and E.D. Liddy (1996). Information access or information anxiety?—an exploratory evaluation of book index features. The Indexer 20, 64-68. [16] E.D. Liddy and C. Jörgensen (1993). Reality check! Book index characteristics that facilitate information access. Proceedings of the 25th Annual Meeting of the American Society of Indexers, 125-138 [17] S.C. Olason (2000). Let’s get usable! Usability studies for indexes. The Indexer 22(2): 91-95. [18] N. Wacholder et al. (2003). Experimental study of index terms and information access. ASIST 2003, pp. 184-192. [19] N. Abdullah and F. Gibb (2008). Using a task-based approach in evaluating the usability of BoBIs in an E-book environment. ECDL 2008, LNCS 4956, 246-257. [20] E.H. Chi et al. (2007). ScentIndex and ScentHighlights: productive reading techniques for conceptually reorganizing subject indexes and highlighting passes. Information Visualization 6(1): 32-47. [21] D.E. Egan et al. (1989). Formative design evaluation of Superbook. ACM Transactions on Information Systems 7(1), 30-57. [22] V. Liesaputra, I.H. Witten, and D. Bainbridge (2009). Searching in a book. ECDL 2009, LNCS 5714, 442-446. [23] M. Piotrowski (2010). Leveraging back-of-the-book indices to enable spatial browsing of a historical document collection. GIR’10. Proceedings of the 6th Workshop on Geographic Information Retrieval. Article No. 17. [24] E.H. Chi et al. (2004). eBooks with indexes that reorganize conceptually. ACM Special Interest Group on Computer Human Interaction (CHI 2004), April 24029, Vienna, Austria. Pp. 1223-1226. [25] E. H. Chi et al. (2006). ScentIndex: Conceptually reorganizing subject indexes for reading. VAST 2006, 159166. [26] The Internet Archive. http://www.archive.org. [27] Apache Solr. http://lucene.apache.org/solr/

User Study Insights for Improved Meta-index Searching

For this work, we created a test collection of seven corpora of at least 100 books, ... The domain meta-indexes are compiled from digital books in the public domain. .... were then free to follow threads of discussion, and build upon each other's ...

Download PDF

505KB Sizes 0 Downloads 199 Views

Report

User Study Insights for Improved Meta-index Searching

Recommend Documents