A Wearable Digital Library of Personal Conversations Wei-hao Lin

Alexander G. Hauptmann

Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-3890 USA +1 412 268 4757

Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213-3890 USA +1 412 268 1448

[email protected]

[email protected]

ABSTRACT

Personal Memory Collection

We have developed a wearable, personalized digital library system, which unobtrusively records the wearer’s part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person and hears the same voice, it can replay parts of the last conversation in compressed form. Results from a prototype system show the effectiveness of combining of face recognition and speaker identification for retrieving conversations.

The hardware is designed as a wearable device consisting of a miniature ‘spy’ camera, a cardioid lapel microphone and an omni-directional microphone, all attached to a laptop computer in a backpack. When the system is prompted to collect personal conversations, it attempts to detect the face of the person you are talking to in the video stream, and listen to the conversation from both the closetalking (wearer) audio track and the omni-directional (dialogue partner) audio track.

Categories and Subject Descriptors

The close-talking audio is transcribed by a speech recognition system to produce a rough, approximate transcript. The omni-directional audio stream is processed through a speaker identification module. An encoded representation of the face of your current dialog partner, the dialog partner speaker characteristics, and the raw audio of the current conversation is saved to a database.

H3.1 Content Analysis and Indexing, H3.7 Digital Libraries, I5.4 Pattern Recognition Applications General Terms:

Algorithms, Measurement, Human Factors Keywords

Personal Digital Library, conversation capture, speaker identification, face recognition, augmented human memory

The audio is further processed through audio analysis (silence removal, emphasis detection) and general speech recognition to efficiently replay only the person names and the major issues that were mentioned in that conversation.

Real-time information retrieval as a wearable personal memory aid

Our research aims to augment human memory through a personal digital library of experiences. The long-term vision is to allow people to capture and retrieve from a complete audio, video and textual record of their personal experiences and electronic communications. This assumes that within ten years technology will be in place for creating a continuously recorded, digital, high fidelity record of one’s whole life in video form [2]. Wearable, personal digital library systems units will record audio, video, GPS and electronic communications. This research fulfills the vision of Vannevar Bush’s personal Memex [1], capturing and remembering whatever is seen and heard, and quickly returning any item on request

Personal Memory Retrieval

In the retrieval (remembering) mode, the system searches for a face in the video stream and performs speaker identification on the omni-directional audio stream. When either a face is detected and/or a speaker is identified, the face and speaker characteristics will be matched to instances of faces and speaker characteristics stored in the personal memory library database. A linear interpolation, which has been shown effective among different classifier combining strategies [4] is used to combine the probabilities of two face recognition and one speaker identification modules. When a sufficiently high scoring match is found, the system will return a brief summary of the corresponding recorded conversation with the person. Figure 1 shows the process of personal memory retrieval.

In the following sections we describe a first implementation of a personal digital library system for remembering people and conversations. There are two modes in the system, collection the conversations for a personal digital library and retrieving summaries of conversations from the personal library.

Extracting and Retrieving Metadata from Conversations Face Detection and Recognition

Face detection and matching was used in the CMU NameIt system [5] using the ‘eigenface’ approach. Meanwhile there have been several commercial systems offering face detection and identification, such as Visionics [7]. In our implementation we are using both the Visionics FaceIt toolkit for face detection and matching as well as, the Schneiderman face detector [6] and ‘eigenfaces’ [5] for matching similar faces.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. JCDL’02, July 13-17, 2002, Portland, Oregon, USA. Copyright 2002 ACM 1-58113-513-0/02/0007…$5.00.

277

Speaker Identification

Visionics face recognition system found the correct conversation only at rank 4.5. Speaker identification proved to be slightly more reliable with an average rank of 3.09. A linear combination of the three types of evidence found the correct conversation at rank 2.59. Thus in a library of 22 personal conversations, after listening to 2 conversation summaries, you would likely find the correct one. Table 1 Average rank of the correctly remembered conversation summary. Retrieval Method Average Rank

Speaker identification is done through our own implementation of Gaussian Mixture Models [8]. The speaker identification system also uses the fundamental pitch frequency to eliminate false alarms. About four seconds of speech are required for reliable speaker identification under benign environmental conditions. Information Summarization

The audio stream to be summarized is selectively compacted based on power. Similar to the video skims [9] and audio summarization [3], only selected portions of the audio are played. Silences are used to define ‘cuts’, and low signal-to-noise ratio segments are eliminated. We have also implemented a TF.IDF weighting scheme to rank segments based on transcript words. So far the system tends to play back too much of the conversation, forcing the user to actively interrupt the summary playback especially when the conversation was not with the current dialog partner.

3.42

(Visionics Face Detection)

4.50

Speaker Identification

3.09

Combined Evidence

2.59

Discussion

The main focus of our system is the integration of multimodal human experience in a personal digital library. The novelty of the system is in using the face and the audio cues to help remember essential details about the previous meeting with the same person, and automatically creating a personal digital library of information associated with the face, the voice and the words. Eventually an intelligent assistant drawing from an annotated personal history could overcome age and other limits to mental capacity and help recall the details needed in a given situation.

Personal Memory … … … .., … …. … …. …… … … …. …. .

Face Detection/ Recognition

(Schneiderman + Normalized Eigenfaces)

Speaker ID Summary

ACKNOWLEDGMENTS

This work was supported in part by the National Science Foundation Information Technology Research under grant number IIS-0121641.

… Name … work … pleasure … meet

REFERENCES 1. Bush, V., As we may think, Atlantic Monthly, Vol.176, No. 1; pages 101-108, 1945 2. Gray, J., What next? A few remaining problems in Information Technology, ACM Federated Research Computer Conference, Atlanta, GA, May 1999.

Figure 1. Process for Personal Memory Retrieval

3. Arons, B.M., Interactively Skimming Recorded Speech, Ph.D. Dissertation, MIT, February 1994.

Experimental Data

We collected two conversations each with 22 people while wearing our prototype capture unit. Each conversation was analyzed for faces and speaker audio characteristics as described above. The first of each conversation served as the example inside personal digital library, while the second conversation was used as the query or retrieval prompt to ‘remember’ the first conversation. We used an average of about five seconds at the beginning of each conversation for creation of the library and as the retrieval query.

4. J. Kittler, M. Hatef, R. Duin , and J. Matas, On Combining Classifiers, IEEE Trans. PAMI, 20(3), 1998 5. Satoh,S., and Kanade,T. NAME-IT: Association of Face and Name in Video. IEEE CVPR97, Puerto Rico, 1997. 6. Schneiderman, H. and Kanade, T. Probabilistic Modeling of Local Appearance and Spatial Relationships of Object Recognition, IEEE CVPR, Santa Barbara, 1998. 7. Visionics FaceIt, Face www.visionics.com, 2001

Recognition

Software,

8. Schmidt, M., Golden, J., and Gish, H. “GMM sample statistic log-likelihoods for text-independent speaker recognition,” Eurospeech-9, Rhodes, Greece, September 1997, pp.855 858.

We used the average rank as the retrieval metric, i.e. on average, at what rank was the correct conversation found. Retrieval Results

The results of our experiment in Table 1 show that the Schneiderman/Eigenface detection/recognition method retrieved the correct conversation at an average rank of 3.42 of the 22 possible conversation candidates. The

9. Smith, M. and Kanade, T. Video skimming and characterization through the combination of image and language understanding techniques, IEEE CVPR97,(San Juan, Puerto Rico, 1997), 775 – 781.

278

A Wearable Digital Library of Personal Conversations

Language Technologies Institute. Carnegie Mellon ... Computer Science Department. Carnegie Mellon ... laptop computer in a backpack. When the system is.

168KB Sizes 5 Downloads 132 Views

Recommend Documents

Genomic library - Personal Website of Rahul Gladwin
A genomic library is a collection of genes or DNA ... CONSTRUCTING GENOMIC LIBRARIES ... man Genome Project: Technologies, People, and Informa-.

practice - ACM Digital Library
This article provides an overview of how XSS vulnerabilities arise and why it is so difficult to avoid them in real-world Web application software development.

Ingest - HathiTrust Digital Library
Nov 15, 2013 - The HathiTrust Research Center is seeking proposals for ... HathiTrust has prepared a FAQ to accompany the recent call for US federal gov-.

A guided tour of data-center networking - ACM Digital Library
Jun 2, 2012 - purpose-built custom system architec- tures. This is evident from the growth of Ethernet as a cluster interconnect on the Top500 list of most ...

On Effective Presentation of Graph Patterns: A ... - ACM Digital Library
Oct 30, 2008 - to mine frequent patterns over graph data, with the large spectrum covering many variants of the problem. However, the real bottleneck for ...

Ingest - HathiTrust Digital Library
Nov 15, 2013 - You can follow HathiTrust on Twitter or Facebook · Subscribe to email .... Most-accessed volumes. The psychology of selling and advertising, by.

The Chronicles of Narnia - ACM Digital Library
For almost 2 decades Rhythm and Hues Studios has been using its proprietary software pipeline to create photo real characters for films and commercials. However, the demands of "The Chronicles of. Narnia" forced a fundamental reevaluation of the stud

Personal Digital Certificates -
Jul 5, 2014 - Personal Digital Certificates .... personal digital certificate. ... If you click on Sign on the VT Digital Signature Test Page and nothing happens, ...

Introducing a digital library reading appliance into a ...
they would save the paper, either in topical files, or in their piles of papers from the reading group ..... allowing them add, delete, rearrange, and resize. For some group .... Implications for the Design of Office Information Systems. ACM TOIS 1,

Download PDF - HathiTrust Digital Library
Nov 14, 2014 - The California Digital Library (CDL) loaded 58,128 new or updated bibliographic ... All Deter- minations. Public ... Boston College. 53. 3,263.

Download PDF - HathiTrust Digital Library
Jul 22, 2013 - The two-year project will focus on enriching and augmenting ... tools for enriching and augmenting metadata for the HathiTrust corpus. ... The California Digital Library (CDL) team began to load all current ... Boston College. 0.

Development Updates - HathiTrust Digital Library
May 6, 2015 - HathiTrust Digital Library. June 24, 2015 .... See more on Eric's blog post http://blogs.nd.edu/emorgan/2015/05/htrc-workset- · browser/. Eleanor ...

Development Updates - HathiTrust Digital Library
Nov 11, 2015 - metadata from the HaithiTrust database of published works. Finding meaningful trends in a large corpus of big data. Looking Ahead for HTRC.

Download PDF - HathiTrust Digital Library
Jul 22, 2013 - details are available at http://www.hathitrust.org/htrc_uncamp2013. Program Steering ... HathiTrust discussed future deposits of Internet Ar-.

Download PDF - HathiTrust Digital Library
Apr 25, 2014 - ... (slides | webinar), THAT-. Camp, Gainesville, FL, April 24, ... All Deter- minations. Public. Domain .... March. Overall. Boston College. 0. 3,111.

Download PDF - HathiTrust Digital Library
Feb 23, 2015 - We are grateful for all of .... HathiTrust corresponded with Boston College, Northwestern University, Univer- .... University of Florida. 0. 9,866.

Kinetic tiles - ACM Digital Library
May 7, 2011 - We propose and demonstrate Kinetic Tiles, modular construction units for kinetic animations. Three different design methods are explored and evaluated for kinetic animation with the Kinetic Tiles using preset movements, design via anima

Download PDF - HathiTrust Digital Library
Mar 24, 2014 - California Digital Library (CDL) loaded 51,669 new or updated bibliographic re- .... rect suggestions for nearly all queries. ... Boston College.

Download PDF - HathiTrust Digital Library
Oct 10, 2014 - The HathiTrust Research Center released a Request for Proposals for Advanced. Collaborative .... Made further enhancements to the search index update and release process that will be ... Twitter or Facebook · Subscribe to ...

Download PDF - HathiTrust Digital Library
Dec 6, 2013 - by the Internet Archive (IA), and Boston College completed steps for HathiTrust to ... California Digital Library (CDL) loaded 143,552 new or updated ... Development staff tested all HathiTrust applications in the upgraded.

HathiTrust update - HathiTrust Digital Library
May 9, 2014 - ... the features the HTRC intends to make available across all ... ton College, Emory University, the University of California, and the University of.

Download PDF - HathiTrust Digital Library
Jun 3, 2014 - For now we ask all ... of Illinois and prepared to ingest materials from Boston College. HathiTrust also .... University of California. 20,514.