InfoSlim: An Ontology-Content Based Personalized ...

Viewer
Transcript

InfoSlim: An Ontology-Content Based Personalized Mobile News Recommendation System Feng GAO, Yuhong LI

Li HAN, Jian MA

State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing, 100876, P.R.China [email protected]; [email protected]

Nokia Research Center Beijing Beijing, 100176, P.R.China [email protected]; [email protected]

Abstract—With the development of mobile and wireless networking technologies, users can access the information online anytime and anywhere. However, this results in also the information explosion. The lack of energy and time of users, the unstable nature of wireless links and relatively poor performance of mobile devices make it impossible for both users and mobile devices to consume so huge information. To save limited resources and free user from information overload, personalized recommendation is a good solution for both mobile devices and end users. This paper proposes a novel personalized news recommendation system named InfoSlim. The new system uses semantic technique to annotate news items and user preference in order to add rich metadata information into traditional keyword vector. By doing this, the similarity measure between item profile and user profile can be done by not only lexical-level cosine-based method but also by semantic-level ontology-based method. Such recommendation method can efficiently improve the accuracy of recommendation and therefore can better reflect user’s interest and save mobile resources. Keywords-mobile application; personalized recommendation; ontology-content based;

I.

INTRODUCTION

Nowadays, with the dramatically growth of the World Wide Web, the way people past used to obtain and share information and knowledge has been changed. Although books, newspapers, and encyclopedia are still valuable sources of knowledge, no one can deny that WWW is the most efficient and fastest method we can use to find information we really need. Meanwhile, mobile devices like mobile phone, PDA, internet tablet and so on have become one important part of our daily life. Many works now can be done on those mobile devices, such as receiving/sending email, updating twitter, and it would be much more attractive if users can subscribe and read online information anytime, anywhere. However, such ubiquitous information access faces its own problem. On one hand, with the information explosion, the increasing news updating rate and various types of contents challenge the limits of human processing capabilities. The lack of time and energy of end users make users not be able to read all the information they receive from WWW. On the other

hand, the mobile environment also does not have so powerful capabilities to handle huge information. The wireless link is obviously inferior to fixed peers, and it means that such link is unstable and can not grant all information can be rightly and completely received by the end mobile devices. Moreover, the performance of mobile devices and the limitation of resources are also key problems. Current information services in wired environment require the devices having powerful computing performance, adequate energy and perfect storage capability. But unfortunately, the mobile devices can not meet all of those requirements and it must take resources like bandwidth, stability of network, status of battery, and the performance of computing into account to try receiving, storing and presenting those huge information. Personalized recommendation, in this problem domain, has already brought developers’ and end user’s eye. Not only because it meets users needs, but also it benefits solving problems brought by limitation of mobile devices. Therefore in our study, we focus on designing and implementing a novel personalized news recommendation system on mobile devices to try to solve the problems mentioned before, and the methods we used are different from previous works. It would try combine traditional text analysis method and cosine-based similarity measure with new semantic technique which would use ontology to model information and offer information much more implicit content and relationship machine can understand. Through using such ways, we expect that the novel personalized news recommendation system InfoSlim can solve those problems and improve the accuracy of recommendation to eliminate the redundancy of information. The rest of the paper is organized as follows: In next section, we will look back those previous works have been done for personalized reader system. Next, we will introduce the architecture of the whole system. Then in section 4, the representation of profiles would be discussed in detail. After it, algorithms we use to mine users interest would be explained and in section 6, how the recommendation engine works would be depicted. Finally, in section 7 we would conclude the work, and make future plan.

II.

RELATED WORK

In our study, we will mainly use content-based recommendation approaches to construct the whole system. Such approach is based on the information retrieval and uses many different techniques to deal with comparison between news profiles and the user profiles, the latter containing information about the users’ content-based tastes. To measure similarity, many different works have been done. One major way is to use text analysis technique named TF-IDF vector [1] accompanying with cosine-based similarity measure [1], [2]. Newsdude [3] is such a personal news agent system that uses a separate model for short-term and long-term interest. Those information are stored as TF-IDF vectors, and are used to measure similarity based on cosine-based similarity. Moreover, it also use Nearest Neighbor algorithm [4] to deduce new information and make recommendation according to the set of results of similarity measure. Such text analysis technique, however, just takes the lexical similarity into accoun. Thanks for semantic web [5], ontologybased representation and similarity measure were introduced into this field to provide users a semantic-level similarity measure. News@hand [6] system is the one trying using ontology-model to represente both item and user profiles. In that study, the main work is on how to use semantic technique to annotate those information, how to populate models using huge information from Wiki, and then analyze contents and use ontology to represent it. Moreover this system also takes context information into account to help constructing a semantic-based context-aware recommendation engine. The epaper [7] system is another one trying using ontology. But unlike News@hand, what it exactly did is to design a ontologybased similarity measure. This method is to represent item and user profile by a three-level ontology model then compare item model and user model by semantic method which takes the instance-class relationship between two words into account. But because it is hard to really map all concepts in the real world into the ontology model, the accuracy of such ontologybased method can not satisfy most of users.

much news he or she may not care and improve the accuracy of recommendation result. Moreover, InfoSlim uses semantic technique to annotate user profile and news items to make concept readable for machine and express the semantic relation between concept in a natural-language way. Such ontology-based concept representation of user profile and news items is also less ambiguous than traditional keyword-based or item-based model, providing an adequate grounding for the coarse to finegrained features. Furthermore, the inference mechanism offered by ontology can be used to enhance content retrieval. A user interested in natural disasters (super class of hurricane) is also recommended items about hurricanes. Inversely, a user interested in skiing and snowboarding can be inferred with a certain confidence to be interested in other winter sports. Similarly, a user fascinated about the life of actors can be recommended items in which the name of Brad Pitt appears, due to that person could be an instance of the class actor. Fig.1 shows how the whole system works. As it depicts, the remote server collect news data from various news sources like BBC, CNN, and then annotating those news data to generate item profile. After so, the remote server would encapsulate a pair of news raw data and item profile into one news package and forward it to a local server. The local server would receive news packages and store them then broadcast them to client. At client side, it would firstly resolve news package and separate the raw news date from item profile. After this work, the recommendation engine will score the news item according to the result of comparison between user profile and item profile. Finally, InfoSlim would rank those news items based on their score computed by recommendation engine, then presenting those news items in the order for end users. IV.

REPRESENTATION OF PROFILES

The core concept in profiles of InfoSlim is the meta-tag. This is a novel form of representation of keyword which combines keyword (or called tag) with metadata of keyword

In this InfoSlim study, we try to adopt merits from above two methods and design a composition mechanism to do recommendation. We will revise the ontology-based algorithm in [7] and use semantic technique to model profiles like [6] then integration traditional cosine-based similarity measure with modern ontology-based method to improve the accuracy of recommendation by taking both lexical and semantic aspect of profiles into account. III.

OVERVIEW OF INFOSLIM

InfoSlim, as a content-based news recommendation system, would exploit content that user has read to find what user is really interested in and then make a comparison between user profile and the news items to determine which item can be recommended. However, as opposite to previous works, this new system would not use a topic selection mechanism to let user specify which news he or she would be interested in. Instead, a tag-based method is provided for user. By using tags, user can narrow information space and avoid receiving too

Figure 1. The Architecture of InfoSlim

into together. Each tag in the profile would be stored as a (meta-tag, weight) pair and the metadata can be separated from the raw tag for special use. In the following three parts, important concepts in meta-tag, Item profile and user profile would be introduced in detail.

news items. Those words user explicitly types into the system would be firstly annotated then added into the user profile with an initial weight value 0.5 implying this tag is neutral now and will reflect user interest based on observing user reading history in future.

A. What is Meta-Tag To generate meta-tag, each tag in item profile or user profile would firstly annotated by semantic technique. That means the tag’s father and ancestor node would be specified by semantic annotation and recorded in a RDF file. In InfoSlim, we assume that each tag would only have one unique ancestor. Then each tag would belong to one unique K-level ontology tree. Such tag with metadata information which can be visualized as an ontology tree is so called meta-tag.

However, user’s interest is always dynamic and can not be completely specified by user, so InfoSlim also infers tags from user reading history to complete this model. The way InfoSlim uses to deduce new tag is to calculate the weight value for each potential tag in the Reading History model during the system running periodically then pick out n-top ones. Following two parts would introduce the reading history model monitoring and analyzing what users have read and how to construct the user profile from reading history model.

B. Representation of Item Profile Once InfoSlim periodically retrieves news items from wellknown news sources to remote server, the remote server would use semantic tools to annotate different news items. Typically, such process would be in linked nature language processing, Semantic engine, model population and so on. (Because such annotation work is not major topic in this paper, so details on how to implement it would not be explained)

A. Reading History Model One important source of user interest is user reading history. By mining those news items users have read, new interest tag would be deduced. In this part, details about how to monitor user reading history and how to calculate score of each tag would be explained.

After annotation, the raw news items would be converted into meta-news items which contain rich metadata about content. Then the remote server would again separate metadata from meta-news item to generate the item profile in form of VI ((T1, W1), (T2, W2), (T3, W3)…). Each element of the item profile would contain a tag with metadata (it can be called as meta-tag) in RDF format and its relevance score for the news item as weight reflecting how important and relevant to this news item. For example, the tag china can be represented as: (china, 0.634) where china is linked to a RDF file containing the tag and its metadata, and the 0.634 is its relevance score or called weight for the news item. C. Representation of User Profile In a recommendation system, user profile is always one important component. A set of models which reflect what user maybe wants to read consists of the user profile and it would be stored on client or server or both-side. For InfoSlim, to reduce data from client to server, the user profile will be stored only on the client side. By comparing user profile with item profile of news item, the recommendation engine would score the news item according to the similarity between user profile and news item. For the user profile, the representation of it should also be a (meta-tag, weight) pair vector in the form of VU ((T1, W1), (T2, W2), (T3, W3)…), where T represents meta-tag and W represents the weight value for the tag. The construction process of user profile is to mine user interest by implicitly analyzing user reading history model or explicitly recording user input, details in how to do so will be explained in the next section. V.

MINING USER INTEREST

User’s interest on news can be explicitly specified by tags or implicitly inferred by analyzing user reading history. As mentioned before, InfoSlim uses tag mechanism to subscribe

The structure of Reading History model is as Table 1. The fields in this model would be populated when each time user is reading a news item. When user is reading a news item and not marking it as “no interest”, the system would pick out all the tags in the news item whose relevance score is greater than 0.3 to populate the model. Once a tag was added into the model, the filed frequency of tag would be automatically increased by one, and the score of tag would be computed according to the algorithm as following: If tag.frequency == 1 Then score = tag.relevance_score Else Then tag.score = (tag.score * (tag.frequency-1) + tag.relevance_score * newsitem.score) / tag. frequency

Where the newsitem.score represents the final score of the news item calculated by the recommendation engine. Meanwhile, the reading history model should also record how many news items have been analyzed in the reading history model, and then this value will be used later in constructing user profile. B. Updating User Profile To update user profile, InfoSlim would periodically mining new tags and update old tags by analyzing reading history model, the process of such work is described below: Step 1. In the Reading History model, compute weight W of each candidate tag i according to (1): Wi = log

N Mi

*

Si max1≤j≤N Sj

(1)

Where Mi is the number of candidate tag i appears in the Reading History model, N is total number of News items having been analyzed in the Reading History model and Si represents the score of tag i. Alike TF-IDF method, by

TABLE I.

THE STRUCTURE OF READING HISTORY MODEL

Field

tag

frequency

score

Type

varchar

integer

float

calculating log

N Mi

, some tags appear in nearly every document

can be identified and this method would reduce its weight in the model. Step 2.

Rank tags according to the weight

Step 3. The number of tag that will be chosen depends on the value of M which can form the (2): 0.6

log

N M

1

(2)

Step 4. Adding those tags picked out from reading history model into interest model. If the tag has already been in the interest model, then update the weight in the interest model according to the new value from Reading History Model. After so, re-rank the interest model based on tag’s weight value. VI.

Figure 2. The Process of Recommendation

The ontology-based similarity measure method is represented in (4) as O(c,s) , which aims to take the semantic relationship between two meta-tags into account then concludes how similar between the item profile and user profile. It would compare the meta-tags in user profile and item profile one by one according to the meta-tag comparison algorithm discussed later. After such process, the final result would be computed according to (5), where N represents the total number of tags in item profile and M is the total number of tags in the user profile.

HOW RECOMMENDATION ENGINE WORKS

∑1≤q≤M S( ,

The traditional method previous works have used to perform recommendation is using cosine-based similarity measure as (3) [8] where c represents a user, s represents one news item, and the function u(c,s) is called utility function. The Wc , Ws are two TF-IDF vector representing user profile and item profile. u c,s = cos Wc ,Ws =

=

Wc ·Ws Wc 2 × Ws

∑K i=1 Wi,c Wi,s

2

cos W , W

p

O c, s

1

)

M

(5)

N

The function S in the (5) is a method handling the comparison between two meta-tags. To make the whole process clear, in Fig.3, a graph as example is used to demonstrate two meta-tags visualized as two K-level trees. The IK and UL represent the node in the two trees I-tree and U-tree, where K and L are the level of node in the tree. Then the comparison between the two trees would be performed from the top to bottom node by node according to the pseudo code below: g

For 1

However, in InfoSlim (Fig.2), with the introduction of ontology technique, the relationship between class and instance or relationship between two instances should also be considered. Therefore, the (3) can be modified to (4) to improve accuracy of result. u c, s

∑1≤j≤N

Begin:

(3)

2 2 K ∑K i=1 Wi,c ∑i=1 Wi,s

O c,s =

p

(4)

The coefficient p in (4) represents the match weight which determines the impact coefficient of each function in the utility function. Its value depends on the level of similarity match degree, including weak match (p=0.15), close match (p=0.45), and perfect match (p=0.8). Because the process of recommendation is first to run the cosine-based similarity measure, the result of cosine function would be used to determine which level should be selected. If the result of cosine function is lower than 0.3 then selecting weak match, if greater than 0.6 then perfect match would be selected, otherwise it is close match.

Max K, L

If Ig != Ug then break If g == 1 then return 0 If g == Max(K,L)+1 then return 1 If g == K+1 then return If g == L+1 then return If 1

g

L K

0.15 0.8

Max K, L then return

L

K

0.45

End As shown above, the process is to compare two trees from top to bottom node by node. If the roots of two trees are different, then as mentioned before, the two trees are different, and the two tags must be not similar. If the U-tree has been completely scanned when the loop is break, then it means the tag in item profile is specific than the tag in user profile, so the result of comparison is perfect match: 0.8. Similarly, if K the I-tree has been scanned completely, then the tag in item profile is general than the tag in user profile, so the result of

An ontology-based similarity measure method is the third contribution we make in this paper. This method is designed to compare two meta-tags by comparing each node in their ontology-trees. Finally, we modify the traditional utility function which is used to do recommendation. By combining the traditional cosine-based method with new ontology-based method, the accuracy of recommendation is improved. And such ways can make up for the limitation brought by cosine-based method which can only deal with the same keywords but can not identify the similar keywords. Then the user’s interest can really met by using this novel recommendation algorithm. Figure 3. Example of ontology-trees for meta-tags

comparison is weak match:

L

0.15. The result between the

two situations is close match, so its value is calculated as K

L

0.45.

After two phases similarity measure, the recommendation engine use the utility function to calculate the final score of each news item then rank all the news items and present them in this order for users.

B. Future Plan Currently, we have already implemented a simple prototype to test what we have done in this study. In the future, we would try performing several different tests on this prototype to evaluate the meta-tag form, ontology-based similarity measure, and novel recommendation algorithm Moreover, by this evaluation, we can find some problems in our methods and we should continue to find how to mining user’s interest effectively and more efficiently under mobile environment. And how to deal with huge information including reading history model and user profile and news items in a mobile device should also be discussed in the future.

VII. CONCLUSION AND FUTURE WORK A. Contribution of InfoSlim In this paper, a novel personalized news recommendation system named InfoSlim is depicted. It makes use of user’s reading history and semantic relations to help system automatically learn what is user really interested in and deduce what would attract users. Several contributions are made in this paper in order to improve the quality and accuracy of recommendation based on user’s interest. The first contribution is designing a novel form of representation of profiles. In this paper, we try using semantic technique to add rich metadata information into profiles to make those tags be machine-readable. And such meta-data form also is the basic of further ontology-based similarity measure. Secondly, we discuss a new algorithm for mining user interest. In InfoSlim, we discuss how to combine user’s reading history with user’s interest together to find user’s real interest. By mining what user has read before then computing tag’s weight and ranking tags by weight to deduce what would attract user.

REFERENCES [1] [2] [3]

[4]

[5] [6]

[7]

[8]

Salton, G.. Automatic Text Processing. Addison-Wesley, 1989 Baeza-Yates, R. and Ribeiro-Neto, B.. Modern Information Retrieval. Addison-Wesley, 1999. Billsus,D.(1999). A Personal News Agent that Talks, Learns and Explains. 3rd International Conference on Autonomous Agents(Agents 1999). 268-275.. Allan, J., Carbonell, J.G., Doddington, G., Yamron, J. and Yang, Y. Topic Detection and Tracking Pilot Study Final Report. Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1988, Lansdowne, Virginia. Berners-Lee, T., Hendler, J., and Lassila, O. The Semantic Web. Scientific American, 284(5), 34-43, May 2001. Cantador,I., Bellogin, A., and Castells, P. A Semantic Web Approach to Recommending News. Proceedings of the 5th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems. Maidel, V., Shoval, P., Shapira, B., Taieb-Maimon, M. Evaluation of an Ontology-Content Based Filtering Method for a Personalized Newspaper Proceedings of the 2008 ACM conference on Recommender systems Adomavicius, G., and Tuzhilin, A. Toward the Next Genearation of Recommednation Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering, vol. 17, NO. 6,June 2005

RBPR: Role-based Bayesian Personalized Ranking for ...