Automatically Finding and Recommending Resources ...

Viewer
Transcript

Automatically Finding and Recommending Resources to Support Knowledge Workers’ Activities Jianqiang Shen*, Werner Geyer**, Michael Muller**, Casey Dugan**, Beth Brownholtz**, David R Millen** *School of EECS, Oregon State University **IBM T.J. Watson Research 1148 Kelley Engineering Center One Rogers Street Corvallis, OR 97331, USA Cambridge, MA 02116, USA [email protected] {werner.geyer, michael_muller, cadugan, Tel: +1-541-737-1646 beth_brownholtz, david_r_millen}@us.ibm.com Tel: +1-617-693-4791 ABSTRACT

Knowledge workers perform many different activities daily. Each activity defines a distinct work context with different information needs. In this paper we leverage users’ activity representations, stored in an activity management system, to automatically recommend resources to support knowledge workers in their current activity. We developed a collaborative activity predictor to both predict the current work activity and measure a resource’s relevance to a specific activity. Relevant resources are then displayed in a contextual side bar on the desktop. We describe the system, our new activity-centric search algorithm, and experimental results based on the data from 50 real users. Author Keywords

Intelligent interface, activity-centric collaboration, task, search, recommendation, productivity, attention. ACM Classification Keywords

I.2.1 [Artificial Intelligence]: Applications and Expert Systems. - Office automation. H5.2 [Information interfaces and presentation]: User Interfaces. - Graphical user interfaces. INTRODUCTION

Knowledge workers frequently change tasks, either by choice or through interruptions [25]. With an increased number of tasks and task switches, it becomes more and more crucial to restore the context of their current task. Previous research on activity-centric collaboration (e.g. [1, 12, 18, 25, 31]) supports knowledge workers with context switching and resource rediscovery by organizing and inte-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IUI'08, January 13-16, 2008, Maspalomas, Gran Canaria, Spain. Copyright 2008 ACM 978-1-59593-987-6/ 08/ 0001…$5.00

grating resources, tools, and people around the computational concept of a work activity. Many of these approaches have in common that they provide some structure within which all resources of an activity may be collectively located and (re)discovered. One problem with this approach is that the user is burdened with manually managing the structure by adding new records (e.g. manually adding external resources or collaboratively developing resources inside the structure). As a consequence, much information that is part of the cognitive model or otherwise related to the activity, or required to complete the activity might not get captured or displayed in the formal representation. We call this problem the “Representation Gap” [31]. In order to satisfy their information needs beyond what is represented in an activity, knowledge workers typically conduct a search using standard web or desktop tools. However, this strategy requires the user to interrupt the work and actively seek out information. Today’s search technologies also do not take into account the user’s current activity. Users need to manually encode their information needs through appropriate search terms. A few systems ([5, 11]) have been proposed to automatically find information based on the current contents of specific desktop tools (e.g. chat, document editing etc.). However, these systems do not take into account the actual activity the user is working on. Other systems [8] identify the task the user is working on but do not recommend resources related to that task. In this paper, we present a new approach and system for automated semantic search. We call this approach activitycentric resource recommendation because our system leverages the knowledge of a user’s activities and their rich descriptions as intermediaries to make better recommendations about resources required to complete the activity. This approach is different from existing work in that sensing of what a user is doing on their desktop is not used directly in search queries. We use them to first find matching activities which are then utilized to find related resources.

We use an existing activity management system, Lotus Activities [24, 29] to train an activity predictor. Lotus Activities allows users to interactively create collections of shared resources called activities. Each activity and each resource may have a name, description, and tags. The use of tags in Lotus Activities is interesting because, compared to other tagging systems, tags are shared in Lotus Activities, i.e. each activity or resource has only a single set of tags that can be modified by all users who have access to the resource. Tags are socially filtered content descriptors. We were very interested in the predictive power of tags [21, 27, 30, 31, 35] compared to the other data fields. Our activity predictor and resource recommender prototype is integrated into a contextual user interface in a desktop side bar [31] for easy access, displaying predicted activities and search results. The remainder of the paper is structured as follows. The next section reviews related work. Then we provide a brief overview of the activity management system used for our research and we present the user interface of the recommender prototype integrated into the desktop side bar. Next, we describe the activity predictor, analyze the predictive power of different data fields, and present three algorithms for activity-centric resource recommendation. The last section presents experimental results regarding the activity prediction precision, and a comparison of the recommendation approaches. RELATED WORK

Many activity-centric systems have been proposed to help knowledge workers to manage their work. There also has been considerable research on helping users accomplish their tasks by implicitly recommending resources. Activity management systems help knowledge workers manage their daily work life. They are mainly based on two assumptions: (a) the desktop user switches between different activities, and (b) each activity is associated with a set of resources relevant to that activity. Typical activities are “Writing Recommendation Paper”, “Designing Malibu” or “Planning Trip to Boston”. Resources can be documents, photographs, podcasts, email messages, web pages, RSS feeds, social bookmarks, chats and so on. Activity management systems help with task switching and resource rediscovery by providing a context for organizing and accessing related resources. Some systems focus on email by tracking tasks hidden in the user’s inbox ([3, 9, 22]). Activity-centric email clients, for example, enable users to associate incoming email messages with existing tasks. Other systems try to support heterogeneous resources besides email messages ([18, 8]). These systems monitor the user’s activities and track resources used when carrying out the task; they automatically organize and update these resources to make them easily available to the user when the task is resumed. Finally,

there are activity management systems based on shared objects and collaboration around those objects ([34, 12, 29]). In contrast to the aforementioned single user systems, these systems provide integrated representations of diverse media that are assumed to be shared, such as email, IM and web pages. Most Activity management approaches (e.g., [1, 12, 18, 22, 29, 31]) have in common that they focus on organizing resources but they usually lack the ability to contextually assist the user. Our system enhances activity management approaches by proactively recommending resources related to the current activity. Our prototype is based on Lotus Activities [24, 29], a commercialized version of Activity Explorer [12]. We describe Lotus Activities in more detail in the next section. Implicit query approaches. Traditionally, finding information is done by explicitly invoking a search engine (e.g. Google, Yahoo etc.) and typing in a set of key words that describe what the user is looking for. In recent years, interest in implicit query generation has grown rapidly [6, 5, 11, 15, 16, 23]. Implicit query systems typically monitor what the desktop user is currently doing (e.g. documents that the user is working on) and automatically conduct searches for relevant information. They automatically construct queries based on the text of the document the user is manipulating, and then automatically query search engines and present the processed query results to the user. Some implicit query systems have focused on the email client [10, 14]. They analyze what a user is reading or composing, automatically identify important words to use in a query, and find related items in an index of personal information. Implicit query systems are typically based on object similarities. This kind of context-sensitive search saves valuable time because it allows users to maintain their task focus while taking advantage of peripherally-presented suggestions. However, the same information retrieval technique as in standard search engines is used; the search process itself is just automated. These approaches typically convert the query and candidate resources into some pattern vectors based on some keyword analysis. They then make recommendations according to the similarity between the pattern of the query and the pattern of the resource. One problem with this approach is that the similarity between the abstracted patterns (e.g. from application title bars etc.) does not necessarily reflect accurately their relevance to the user’s current activity. For example, assuming the user is working on the activity Malibu Patent Filing and editing the document My Cool Idea, it is unlikely that resources related to the Patent Filling Process will be highly recommended (although we actually need them), because the system has difficulty in finding connections between “patent” and “cool idea”. In our approach, we first infer the possible ongoing activities, and then compute a resource’s relevance to the pre-

G H B A

C

D

E

F

Figure 1. Activity web page with activity outline on left.

dicted activities weighted with the activity likelihood. The complex data objects of activities are being used as intermediaries in the resource search because they contain a richer source of information than just the window title or document content captured from a user’s desktop actions. RESOURCE RECOMMENDER PROTOTYPE

We provide an introduction to Lotus Activities, an existing activity management product. We then describe Malibu, a desktop side bar for Lotus Activities, and we describe how our recommender prototype is integrated into Malibu. The Activity Management System

Our prototype leverages IBM’s activity management system ([24, 29]) Lotus Activities, a product which emerged from a multi-year research effort on activity-centric computing ([12, 31]). Lotus Activities organizes and integrates resources, tools, and people around the computational concept of a work activity. Lotus Activities consists of a centralized, web-based service and defines many extensions for existing desktop applications. The design of Lotus Activities was driven by the goal of organizing work into shared or private activities. In this system an activity consists of a set of related, shared resources representing a task or project. The set of related resources is structured as a hierarchical thread called an “activity outline”, representing the content of the task at hand. The activity outline (see (A) in Figure 1) is similar to an activity thread [12], with more explicit structure than a working sphere [13], greater diver-

Figure 2. Malibu Side Bar.

sity of resources than a “thrask” [3], and more generalizability than the domain-specific structures in the ABC system [1]. Users add items to activities by posting either a response to an existing item or a new resource addition directly to the activity. Lotus Activities supports sharing of six types of resources: message, chat transcript, file, task, web link, and email. An activity and its contained resources can have simple metadata associated with them: name, description, tags, and optional due date (see (B) in Figure 1). Keyword tagging allows users to find related resources in a single activity or across activities. Each activity has a web page associated with it, so that users can see recent posted entries, navigate the activity outline, see all the entries in the activity organized by type, and see the activity history (C). The activity outline (A) is automatically structured based on the response hierarchy in an activity, but it can also be reorganized post-hoc if a user wants to create their own structure within the activity. Malibu Personal Productivity Assistant

To assist knowledge workers in integrating activity data with other data sources, we developed the experimental Malibu client. Malibu runs as a desktop side bar (“Malibu Board”) that either slides out when users hover with their mouse at the left or right side of the screen or remains sticky and always visible on the desktop [31]. The Malibu Board contains a series of configurable views, each one displaying items from a particular data source: My Activi-

ties (D) from Lotus Activities, Dogear Bookmarks (E) from a social bookmarking system [27], and My Feeds (F) from RSS and ATOM feeds (see Figure 2). Dynamic views are frequently updated. As such, Malibu provides peripheral access to multiple data sources while the user focuses on the main work on the desktop. With this design, Malibu becomes an always-on companion that can display information that is contextually relevant to the current desktop activity of the user, draw attention to new and important events, and provide quick access to data sources that need attention.

I K J

The “Navigator” (G) can be used to bring any Malibu item into focus, i.e. items can be selected as pivot objects on which Malibu performs a tag-based search (“Surf in Malibu”) [31]1. When a user pivots on an item, views are reconfigured to display contextual information related to this item, which becomes the focus of the board. The search can be refined by selecting individual tags or manually adding key words. The details box (H) shows information about the current focus item, including the social tags from that focus item (which were used to perform the tag-based search in all of the views), description and author information. Similar to a browser, the navigation buttons and the history drop-down menu can be used to restore the search results of previous pivot objects. The Recommender Prototype in Malibu

Malibu helps users to find resources relevant to their activities by aggregating information related to the user’s work from heterogeneous, discrete data sources. Users do not need to encode their search by specifying search terms because search per default is based on the tags of the activity that has been focused in the Navigator (see previous section). While this abbreviated search specification is an improvement, users still need to actively search for related content by pivoting on individual activities in the Navigator in Figure 2 (G). Tagging resources can help knowledge workers remember and organize information. However, tags are not always accurate because they might be added by a non-expert user on that topic. Some users may also intentionally add inappropriate tags just in order to increase the visibility of some resources [21]. Thus, the results from simple tag-based search can contain resources far from the user’s desire. Furthermore, ranking tag-based search results from different data sources is difficult (e.g., [30]). This paper goes beyond previously reported work [31] to make Malibu more intelligent by automatically identifying

1

Tags are used to describe not only bookmarks [27] but also activity records [12], and selected feeds. Note that Malibu leverages user-generated tags from the data sources it aggregates, i.e. the tags are created in those data sources.

Figure 3. Resource Recommender.

the context in which a user is currently working, and also to improve the relevance of search results by going beyond simple tag-matching. We developed a new plug-in for Malibu that implements the activity predictor described in this paper. The activity predictor leverages the user’s rich activity information to automatically identify the matching activities. We then use the predicted activities to recommend other resources across the diverse Malibu data sources. This contextual semantic search process is described in more detail in the next section of this paper. We added a desktop sensing infrastructure to Malibu that monitors applications in focus and publishes that knowledge (application name and title) to interested Malibu components. Our resource recommender plug-in captures this information and extracts keywords to use as input for our activity predictor. Keywords from multiple applications are stored in a queue of configurable size, and we use all unique keywords in the queue to make activity predictions. This process happens invisibly in the background with low computational overhead. The user simply runs the Malibu side bar with the resource recommender. The resource recommender view provides awareness of predicted activities and recommended resources. Figure 3 shows the resource recommender integrated as an “Activity Prediction” view into Malibu (I). At any time, the list at the top of the view shows the three most likely activi-

ties a user may currently be working on (output of the activity predictor). The list is automatically updated every 15 seconds. The recommended resources supported by our system are Lotus Activities, social bookmarks from Dogear [27], and RSS feeds. Figure 3 shows resources of each type in the “Recommended Resources” list (J). The resources are shown in ranked order according to their weighted relevance to all predicted activities (shown in brackets for each item). A user can also focus on an individual activity by selecting the activity and pressing the “Surf” button (K). The activity becomes the focus of the board (shown in the Navigator in Figure 3) and the list of recommended resources (J) is updated according to their relevance values to the selected activity, refining the activity predictions. ACTIVITY-CENTRIC RESOURCE RECOMMENDATION

In this section, we describe our recommendation algorithm in detail. Our approach is different from traditional implicit query methods. Existing implicit query systems (e.g., [10, 15]) make recommendations based on the similarities between the context (such as the title of the window in focus) and the candidate resources. They do not explicitly take into account the activity a user is working on. Our system leverages data from an activity management system to create implicit queries. Assuming that activity data exist, we leverage those data to make semantically meaningful resource recommendations. We recommend resources in a two-step approach: (1) Infer the current activity by sensing the context information (in our prototype, we extract keywords from window title bars); (2) Estimate the relevance of each resource to each activity. The final relevance estimation is weighted by the likelihood of each predicted activity. If the current activity is less likely to be activity y, then a resource’s relevance to y will contribute less in the final relevance of the resource. We use machine learning techniques to train a predictor to infer the current activity and a resource’s relevance.

based on {ℜ1 , ℜ2 ,..., ℜm } , which is used to infer the current activity and the resource’s relevance to each activity. In the Lotus Activities system, each shared item has a name, a description, and tags. We convert these text fragments into a “bag of words”. This means that the origin of the word will not influence the predictor. We then use a stop word list to eliminate words that are very common and we apply a simple rule-based algorithm to stem English words to their roots [33]. Based on these bags of words, we train a multinomial naive Bayes predictor [26]. The predictor learns a model of the joint probability, P(x,y), of the input x (i.e., a bag of words) and the activity y. It makes its predictions by applying Bayes’ rule to calculate P(y|x), i.e. the probability that observation x belongs to a specific activity y. We chose the naive Bayes predictor for two reasons: first, in most realworld tasks such as document classification, naive Bayes performs well compared to other state-of-the-art learning algorithms such as support vector machines (SVMs) [17]; second, to achieve the responsiveness of an interactive desktop application, we needed a method that is efficient in both training and inference. Naive Bayes is fast and allows for efficient incremental training. Our approach differs from other activity predictors such as TaskPredictor [36], in that we train the predictor with shared activity items created by a group of users as opposed to personal data only. Knowledge workers spend much time in collaborative activities. They are usually interacting with similar objects when executing the same activity. Adding more activity-related items to an activity will benefit not only the creator but also other users since everyone’s predictor has more information with which to make decisions [27, 35]. Our predictor is collaborative because more than one user contributes to the training data. Resource Recommendation

Collaborative Activity Prediction

The first step in our approach is to create accurate activity predictions. The Lotus Activities system used in our experiments allows users to organize their work by creating collections of shared resources called activities [24, 29]. Each activity contains shared items (documents or other objects) that are organized into recognizable structures. We assume each user is involved in a set of activities { y1 , y2 ,..., ym } , and each activity yi has a collection of shared items ℜi . Given the context (e.g., the title of the window in focus) and a resource candidate, standard implicit query systems compute the relevance directly, and information about the actual activity is ignored. In our approach, we compute the relevance conditioned on {ℜ1 , ℜ2 ,..., ℜm } , and thus activity information is exploited to improve recommendation performance. We accomplish our goal by training a probabilistic classifier

The second step in our approach is to recommend resources related to the current activity of a user. This section describes three approaches for recommending resources. The first one is based on the similarity between the context and resources and does not directly consider the current activity of a user. The second one first predicts the activity and then ranks the resources based on their relevance to that activity. The third one weights the relevance with the likelihood of the activity. We found the last algorithm gives significantly better results, as shown in the results section. LSI-Based Recommendation

A simple recommendation method is to analyze the textual content of objects using Latent Semantic Indexing (LSI) [7, 9]. Unlike traditional bag-of-words similarity metrics based on how many words two documents have in common, LSI projects documents onto a latent semantic subspace and computes similarity as the distance between two documents

in the semantic subspace. LSI exploits the semantic relationship between words and documents by applying the singular value decomposition to reduce the dimension of the word-document space. LSI attempts to solve the synonymy and polysemy problems which other information retrieval models found hard to overcome. Many researchers such as Dredze et al. [9] showed that the LSI approach yields better results than a word-based model. We use LSI to recommend resources as follows. First we build an LSI index containing all shared items seen so far in Lotus Activities. We convert the context c (the title of the window in focus) into a semantic vector c~ =< e1 ,.., ek > using the LSI index. For each resource ri, we also convert it into a vector ~ri =< f i1 ,.., f ik > using the LSI index. We then rank resource candidates according to their cosine similarities with c. The cosine similarity between c and ri is computed as Sim (c, ri ) =

∑e j

∑e j

2 j

⋅

j

f ij

∑

(2) j

f ij

2

Though LSI can effectively exploit the semantic similarity by recovering from synonymy and polysemy, it ignores activity labels of shared items: it makes no difference between items from one activity and items from other activities. We can generate more activity-oriented recommendations if we utilize the activity information of training items. Prediction-Based Recommendation

We compute the similarity between the context (keywords from window titles) and a resource candidate by estimating their relationship to the user’s activities. A straightforward approach is to compute the resource’s likelihood of belonging to the predicted activity. We use the collaborative predictor to predict the current activity y based on the context. We then estimate the resource’s relevance to y. Let resource ri’s relevance to y be Fy (ri ) . Given the resource ri, we estimate P( y | ri ) , the posterior probability that ri is in activity y. We let Fy (ri ) = P( y | ri ) and rank resources according to the relevance values. The prediction-based method can exploit the activity information of items and generate more suitable recommendations as shown later. Moreover, the prediction-based method uses much less CPU time than the LSI-based method. The LSI-based method spends a lot of time in finding the LSI index. For example, on a Pentium4 PC, 1 GB memory, a training set containing 754 training items and 2170 words needs more than 6 hours to search the LSI index. The naive Bayes predictor takes about 5 seconds in training, much faster than the LSI method. One problem with this method though is that recommendations rely on the prediction of the current activity. Thus if

our prediction of the current activity is wrong, our recommendations can be quite inappropriate. Expected Relevance Recommendation

In order to overcome the weakness of the straightforward activity-centric resource recommendation, we adjusted our approach to integrate the uncertainty of the current activity prediction into a resource’s relevance. In other words, if the current activity is more likely to be y, then resources highly relevant to y should be more interesting. For each candidate resource ri and each activity yj, we estimate ri’s relevance to yj. As we mentioned before, we let the relevance Fy (ri ) = P( y j | ri ) . Instead of only considerj

ing the relevance to one specific activity, we take into account the relevance values to all activities. If the current activity is less likely to be yj, ri’s relevance to yj will have smaller weight. We weight the relevance to activity yj with the likelihood that the current activity is yj. Given the context c, we compute the weighted relevance as F (c, ri ) = ∑ j P( y j | c) ⋅ Fy j (ri ) = ∑ j P( y j | c) ⋅ P( y j | ri ) (3)

We call this the expected relevance. Given the set of activities Y = { y1 , y2 ,..., ym } , the expected relevance has the following nice characteristics: first, if we are not confident in the prediction of the current activity (which means that ∀y j ∈ Y , P ( y j | c ) is close to 1/m), the resources most relevant to each activity will have high relevance values. Thus, whatever the current activity is, the resources highly relevant to it would be ranked high. Second, if we are confident about the current activity (which means that ∃y j ∈ Y , such that P( y j | c) is close to 1), the resources more relevant to it will gain higher relevance. Finally, even if we are confident about the current activity, the resources most relevant to other activities can still have a reasonable relevance value. EXPERIMENTAL RESULTS

To evaluate our approach, we conducted experiments based on a large data set from an internal Lotus Activities deployment. Our first goal was to get a general sense of how well activity prediction works based on real-world data. In particular, we were interested in the predictive power of the different data fields used for activity prediction. Originally, we had hypothesized that tags, since they are a socially filtered description of the resources, would yield good predictions. We present a preliminary investigation and report baseline prediction performance. The second goal was to evaluate the performance of activity-centric resource recommendation by comparing it to traditional similarity-based recommendation.

Data collection

At the time of this study, Lotus Activities had been deployed within IBM for 7 months with more than 1000 registered users. We used a manageable, semi-random subset from this larger dataset. For our analysis, we used a snapshot of the Lotus Activities database from July 2006. In order to get unbiased results, we removed data from members of the Malibu and Lotus Activities teams. In order to make our estimations more reliable [2], we filtered out one-time users and users with very few activities. For our experiments, we set the size threshold for an activity to 40 shared items and we considered users only if they were members of more than 2 activities with more than 40 items. This reduced our data set to 111 Lotus Activities users. Among these 111 users, we randomly picked the data of 50 users for our analysis. On average, each user participated in 58.8 activities; each activity involved 30.0 users, and in each such activity, 13.2 users contributed at least one shared item. Activity Prediction Results. Predictive power of fields

Preliminary analyses showed that each user commonly uses different words for tags, names, and descriptions: on average, in an activity item, only 25.36% of tag-words appear in the name, only 27.13% of tag-words appear in the description, and only 28.69% of name-words appear in the description. The small overlap suggests that using different fields can lead to quite different prediction results. We wanted to find out the predictive power of the name, description, and tag data field of an activity item. As mentioned earlier, we were particularly interested in the predictive power of tags since they are shared and applied by multiple users, representing a socially filtered description of an activity item. Previous research has demonstrated the value of social tags as a way to improve search within the enterprise [27]. We conducted a bias-variance analysis [4, 19, 20] on predictors trained with different fields. Bias-variance analysis is a powerful tool to analyze the working mechanism of learning approaches. Given the size of the training set, it decomposes the expected error of a predictor into three components: the intrinsic noise, the variance, and the bias. Intrinsic noise is a lower bound on the expected error of any predictor. Variance measures how much the predictor’s guess fluctuates for different training sets of the same size. Bias measures how closely the average estimate of the predictor is able to approximate the perfect predictor. It reflects the inherent inaccuracy of the predictor. We used the analysis approach proposed by Kohavi and Wolpert [19]. Let x be the observation, Y be the set of possible activities, YP be the random variable representing the predicted activity, and YF be the random variable representing the real activity. The bias can be expressed as:

0.36 0.34 0.32 0.3 0.28 0.26 Using tags

Using names

Using descriptions

Figure 4. Bias values of the three predictors.

bias 2x = ∑ y∈Y [ P (YF = y | x) − P (YP = y | x)]2

(1)

We chose items containing at least one tag, and split the set of items into two parts by creation time, i.e., D (containing older items) and E (containing newer items). Then, 80 training sets are sampled from D, and each one is roughly half of D to ensure the diversity of these training sets. After that, three predictors (trained with names, descriptions, and tags, respectively) are trained with each of those training sets and the bias is estimated on E with Eq. 1. Similarly, when estimating the bias, the predictions are based on the corresponding fields only. The goal is to find the field that yields the most accurate prediction results, assuming the predictor was trained only with that field. We experimented with different sizes of D (60%, 70%, 80%). The results are similar for all configurations. Figure 4, using 60% of the older items for training, shows that prediction based on tags has the largest bias value while prediction based on names has the smallest bias value. Recall that the bias reflects the inherent inaccuracy of the predictor. This was a surprise to us since we thought that tags, which are carefully chosen by users, would yield high prediction accuracy. Perhaps non-expert users create inaccurate tags, or users may intentionally add some misleading tags in order to increase the visibility of some resources [21], or users may not use the same tags consistently [30]. Tagging provides an efficient way to let users mark a specific resource, so they can remember and find it later. However, unless we have an efficient mechanism to filter out “spam” tags or reduce the noise in tag sets, classifying items based on tags could lead to bad results. This also supports that the simple tag-based search in Malibu can return resources not relevant to a user’s interest. The predictor trained with descriptions has a smaller bias, because descriptions usually contain detailed information about an item, and thus provide meaningful information about that activity. However this also causes the description-trained predictor to have a large hypothesis space, increasing the likelihood of erroneous predictions. Names seem to have the most predictive power, possibly because

Figure 5. Precision of the different predictors

names of items provide a meaningful abstract containing only necessary but no redundant information. Precision of the collaborative activity predictor

The above analysis suggests that tags and descriptions are less predictive. However, combining all data fields for prediction can improve overall accuracy compared to using names only. The name, tags and description fields together explain the complete meaning of an item. The recommender prototype described earlier uses a predictor trained with bag-of-words extracted from the name, description, and tag fields of an item. In order to avoid distracting Malibu users unnecessarily, with incorrect predictions, we decided to adopt a rejectdecision approach: given context c, let yˆ = arg max y P( y | c) be the activity with the highest pre-

dicted probability; we refresh the recommender view in Malibu to display the predicted activities and the recommendations only when P( yˆ | c) is larger than a specified threshold, otherwise the recommender keeps silent to avoid distracting the user. For each different value of the threshold, we get different values of coverage and precision. Coverage is the number of predictions divided by the number of instances, and precision is the number of correct predictions divided by the number of predictions issued. High values of the threshold usually correspond to low coverage and high precision. In the next section we will show that appropriate thresholds can lead to better recommendation results. We divided the Lotus Activities data into two sets based on the creation time. For each user, we trained the systems with the older 70% of items and used the more recent 30% of items to evaluate the predictor. We estimated the prediction accuracy by predicting the activity based on only the names of the shared items in the test set (i.e., excluding tags and descriptions). We chose this approach in order to get a

Figure 6. The average ranking of the related resource given different sizes of the resource candidate set. Error bars denote 95% confidence intervals.

more realistic accuracy estimate, because our recommender prototype system also makes inferences based only on the titles of the windows and a resource’s name. Also, when opening documents, their names usually appear in the window title. Figure 5 shows a precision-coverage curve of the results by systematically varying the threshold. By utilizing all information, our predictor is fairly accurate in predicting activities. For example, we are able to achieve a precision of 85% with a coverage of 60%. Note that when predicting the activity based on other fields or their combinations, the results are similar: training with all fields leads to the most accurate predictions. Resource Recommendation Results

After the analyses of precision, we evaluated our resource recommendation algorithms in a simulated experiment. We trained the recommender with 70% of the old items ordered by creation time, then tested it with the remaining 30% most recent items according to the following simulated process: We randomly picked an activity y and treated it as the current activity; we randomly picked an item c and an item e from y, and treated c as the context and e as the related resource; we then randomly picked n–1 items not belonging to y, we ranked e and these n–1 items according to their relevance to c. To reduce the variance, we repeated the experiment 800 times and averaged the results. The average ranking of e given different values of n is plotted in Figure 6. A ranking l means that the relevance is the l-th highest among these n items. A smaller ranking is better and a ranking of 1 means e was predicted as the most relevant. Figure 6 shows that leveraging the rich activity information can improve the performance of resource recommendation. Both the prediction-based and the expected relevance method yield significantly better results than the LSI-based method. The expected relevance method has the best performance by considering the prediction uncertainty

recommend related resources. We call this activity-centric resource recommendation. The activity-centric strategy should allow our approach to be used for recommendations in other activity-centric work, e.g., email [3, 9], task management [13, 25], activity management [12, 32], and highly-developed specific domains [1, 18, 34].

Figure 7. Distribution of the ranking of the related resource evaluating 10 resources.

and the average ranking of e is about 3 when n is 10. This means the related item will usually be ranked among the top 3 ones. In Figure 6 we also show the average ranking when the activity prediction of c is correct. As can be seen, high activity prediction precision leads to better recommendation performance. If the activity prediction is correct, the ranking can be decreased by more than 1 on average. In order to increase prediction accuracy, we set a high threshold value as discussed in the previous section. If our activity prediction confidence of the context is lower than the threshold, we stop recommending. Additionally, users in Malibu can always manually select an activity and re-rank resources based on their relevance to it, which will increase recommendation performance too. We show the distribution of the rankings of the related resource with n=10 in Figure 7. Each point on the curve represents the percentage py of related resources which have a ranking ≤ px. Again, the expected relevance method gives the best result: nearly 70% of related resources are ranked among the top 3 ones. If the prediction of the current activity is correct, then more than 85% of related resources will be ranked among the top 3 ones. CONCLUSION AND FUTURE WORK

This paper presents a new way to automatically find and recommend resources to support knowledge workers’ activities. As a key contribution, we show that leveraging the knowledge of a user’s activity can improve existing implicit query techniques. This new approach goes beyond traditional implicit query approaches [6, 5, 11, 15, 16, 23], as well as conventional methods for monitoring/sensing the user’s desktop work [8, 11, 28] because.we do not use observations of a user’s desktop actions (e.g. applications used etc.) directly to generate search queries. Instead, we find a matching activity representation and use that data to

We integrated this approach into the user interface of Malibu, an existing experimental application providing a front-end to an activity management system [31]. Our motivation was to improve the manual tag-based search and the search accuracy. Simulations indicate that our approach has better performance than the standard implicit query approach, in particular when integrating the uncertainty about the current activity into a resource’s relevance. We analyzed the user data of Lotus Activities, an activitymanagement system, in order to design and evaluate a “good” activity predictor which provides the foundation for activity-centric resource recommendations. Although using social tags can return satisfying results in the direct-search [27], to our surprise, social tags alone did not yield high prediction accuracy. There are a number of areas for future work. The next logical step in our research is to validate our experimental results and our prototype system with real users. We would like to deploy the resource recommender to existing Malibu users but also conduct some controlled studies in which we ask users to blindly rate recommended and nonrecommended resources by our recommendation algorithm. Per default, the prior probability of activities is proportional to the number of items in an activity. An alternative approach would be to make this inverse proportional to the time since the activity was last modified because if an activity has not been modified for a long time, it is less likely that the user is working on it. In order to make recommendations, we need a target resource set readily available. The research results largely depend on the searchable data set. Malibu has data extension points that allow us to query each of its data sources independently. While this allows us to create a searchable data set, it is yet unclear how to aggregate the best searchable data. We are exploring alternative ways of compiling the resource candidate set, e.g. by refining the result from implicit queries or external search tools. ACKNOWLEDGMENTS

We would like to thank the Lotus Activities product team for their continuous support, Lida Li for inspirational research discussions, and all anonymous users of Activities. REFERENCES

1. J. Bardram. Activity-based computing - lessons learned and open issues. In ECSCW 2005 workshop, Activity From a theoretical to a computational construct.

2. R. Bekkerman, A. McCallum, G. Huang. Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora. UMass CIIR Technical Report IR-418, 2004. 3. V. Bellotti, N. Ducheneaut, M. Howard, and I. Smith. Taking email to task: the design and evaluation of a task management centered email tool. Proc. CHI-2003. 4. L. Breiman. Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, UC Berkeley, 1996. 5. J. Budzik and K. Hammond. Watson: Anticipating and contextualizing information needs. Proc. ASIS, 1999. 6. M. Czerwinski, S. Dumais, G. Robertson, S. Dziadosz, S. Tiernan, and M. Dantzich. Visualizing implicit queries for information management and retrieval. Proc. CHI-99.

20.E. B. Kong and T. G. Dietterich. Error-correcting output coding corrects bias and variance. Proc. ICML95. 21.G. Koutrika, F. A. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. Proc. AIRWeb '07. 22.N. Kushmerick and T. Lau. Automated email activity management: an unsupervised learning approach. Proc. IUI-05. 23. H. Lieberman. Letizia: An Agent That Assists Web Browsing. Proc. IJCAI-95. 24.Lotus Connections: Social Software for Business. http://www-142.ibm.com/software/sw-lotus/products/ product3.nsf/wdocs/connectionshome. 25.G. Mark, V. M. Gonzalez, and J. Harris. No task left behind? Examining the nature of fragmented work. Proc. CHI05.

7. S. Deerwester, S. Dumais, T. Landauer, G. Furnas, and R. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391–407, 1990.

26.A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. AAAI-98 Workshop on Learning for Text Categorization, 1998.

8. A. N. Dragunov, T. G. Dietterich, K. Johnsrude, M. McLaughlin, L. Li, and J. L. Herlocker. Tasktracer: A desktop environment to support multi-tasking knowledge workers. Proc. IUI-05.

27.D.R. Millen, M. Yang, S. Whittaker, J. Feinberg. Social bookmarking and exploratory search. In ECSCW 2007.

9. M. Dredze, T. Lau, and N. Kushmerick. Automatically classifying emails into activities. Proc. IUI-06 10.S. Dumais, E. Cutrell, R. Sarin, and E. Horvitz. Implicit queries (iq) for contextualized search. In Proc. 27th SIGIR, 2004. 11.Nat. Friedman. Dashboard. http://nat.org/dashboard/. 12.W. Geyer, M. Muller, M. Moore, E. Wilcox, L.-T. Cheng, B. Brownholtz, C. Hill, , and D. R. Millen. ActivityExplorer: Activity-Centric Collaboration from Research to Product. IBM Systems Journal, 45(4):713– 738, 2006. 13.V. M. Gonzalez and G. Mark. “Constant, constant, multi-tasking craziness”: managing multiple working spheres. Proc. CHI-2004. 14.J. Goodman and V. Carvalho. Implicit queries for email. In Proc. CEAS 2005, July 2005. 15.M. Henzinger, B. Chang, B. Milch, and S. Brin. Queryfree news search. World Wide Web, 8(2):101-126, 2005. 16.D. Hilbert, D. Billsus, and L. Denoue. Seamless Capture and Discovery for Corporate Memory. Proc. WWW2006. 17.T. Joachims. Learning to Classify Text Using Support Vector Machines. Kluwer Academic Publishers, 2001. 18.V. Kaptelinin. UMEA: translating interaction histories into project contexts. Proc. CHI 2003. 19.R. Kohavi and D. Wolpert. Bias plus variance decomposition for zero-one loss functions. Proc. ICML96.

28.T. M. Mitchell, S. H. Wang, Y. Huang, and A. Cheyer. Extracting knowledge about users’ activities from raw workstation contents. Proc.of AAAI-06, 2006. 29.M. Moore, M. Estrada, T. Finley, M. Muller, and W. Geyer. Next generation activity-centric computing. Proc. CSCW 2006. 30.M.J. Muller. Comparing tagging vocabularies among four enterprise tag-based services. To appear: Proc. ACM GROUP 2007. 31.M.J Muller, W. Geyer, B. Brownholtz, C. Dugan, D.R. Millen, and E. Wilcox. Tag-Based Metonymic Search in an Activity-Centric Aggregation Service. Proc ECSCW 2007, Limerick, Ireland, Sept. 2007. 32.T.P. Moran. Activity: Analysis, design, and measurement, In the Symp. Found. Int. Design, 2003. 33.M. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980. 34.D. Quan and D. Karger. Haystack: Metadata-enabled information management. Proc. UIST-2003. 35.S. Sen, S. K. Lam, A. M. Rashid, D. Cosley, D. Frankowski, J. Osterhouse, F. M. Harper, and J. Riedl. Tagging, communities, vocabulary, evolution. Proc CSCW 2006. 36.J. Shen, L. Li, T. Dietterich, and J. Herlocker. A hybrid learning system for recognizing user tasks from desktop activities and email messages. Proc. IUI-06.

Recommending Tours and Places-of-Interest based on ...