A Graph-based Collaborative and Context-aware Recommendation system for TV programs Emrah Şamdan

Arda Taşcı

Nihan Çiçekli

Dept. of Computer Engineering, METU, Ankara, Turkey [email protected]

Dept. of Computer Engineering, METU, Ankara, Turkey [email protected]

Dept. of Computer Engineering, METU, Ankara, Turkey [email protected]

ABSTRACT Due to the increasing amount of TV programs and the integration ofbroadcasting and the Internet with smart TV’s, users suffer the difficulty of selecting the most appealing TV programs among various different programs available. User decisions are mostly affected by the contextual properties of programs such as the time of day and genre of the program. In the scope of this paper, we have developed a graph based context-aware collaborative recommender system. In order to measure the effectiveness of the context variables, we have implemented the evaluation metrics on both context-free and contextual graph based methods. The results indicate that context can provide better recommendations.

Keywords Recommender systems, context-aware recommendation, TV program recommendation.

1. INTRODUCTION With the hundreds of channels broadcasting on today’s televisions, the content selectionhas become a difficult task for TV users. People mostly face the difficulty of choosing the most appealing content among thousands of content which will be available in several hours. The advent of Connected TV has provided the ability of recording the TV usage preferences for the manufacturers of consumer electronics and for the users. With the recorded past preferences, the problem of finding the most interesting content can be solved with the help of a TV recommender system. Classical recommender algorithms are generally implemented on the User x Item space with given ratings for items for each user in the dataset. After the initial ratings are calculated, the recommendation function is used to calculate the unknown ratings: R: User x Item → Rating However, if there are context variables in the dataset that can be embedded to the recommendation algorithm, the rating function can be redefined by context variables as follows: R: User x Item x Context → Rating Contextual information can be any information related to both

data and the targeted user. For example,the mood of the user, genre of the program, time slot of the day at the time of recommendation can be the context variables for context-aware recommender systems. Context variables can be used in recommender algorithms in three ways: for pre-filtering the target programs for recommendation, for modeling the user with the context variables and for re-ranking or filtering the recommendation list after it is generated [1]. Graph based recommender systems are generally used for connected data such as social media data or professional network for recommender systems[2]. By taking random walks from an initial node, algorithms can converge to the most relevant nodes in the graph. By restricting the type of the initial node to user node and the type of the final node to item node, random walks can yield the recommendation in the graph data[3]. In this paper, we present a graph based recommendation algorithm which is used with context-aware pre-filteringto generatetop 10 recommendations. In the scope of this work, we have constructed a tri-partite (user-program-term) graph by using the real channel usage data of Arçelik1 Connected TV users between October 2013 and January 2014. Throughout our research, we have investigated the effects of context aware filtering by pruning the graph according to given context. In order to find similar users, we take random walks from target users on the graph. As a result of our work, we have proved that random walks with context-aware filtering produce better results compared to context-free random walks. The paper is organized as follows: section II discusses the related work.Section III describes the collected data and the performed data pre-processing tasks. Section IV presents the used similarity functions and gives the details of our approach. Section V presents the experiments and the evaluation results. Section VI gives the conclusion and the future work.

2. RELATED WORK Recommender algorithms can be classified into two main groups with respect to the method they are based on. These are content based recommender systems and recommender systems based on collaborative filtering. Moreover, there are some hybrid approaches which combine those two methods in some ways [4]. All of these methods to recommend items to the end users have been used in TV domain as well. In general, items are represented by the features extracted from content in the content based recommender systems. For television domain, it is hard to get the features from the video and audio streams because it requires semantic interpretation of the video and audio streams[5]. That’s why the mostly usedsource for feature extraction is the textual sources like EPG (Electronic Program Guide). The textual sources 1

http://www.arcelik.com.tr/

are usually transformed to feature vectors using Bag of Words (BOW) approach. In their research, Bambini et al. used LSA (Latent Semantic Analysis) together with the bag of words approach forautomatic indexing and searching of the EPG document [6]. In our work, we have also used the BOW approach to represent the content. However, we have placed the words as terms in the graph connected to program nodes, while Bambini et al. used them as attributes in the feature vector. Collaborative recommender systems usually comprise the following three steps: user profiling, user clustering and collaborative filtering. In the work by Kim et al., user profiles are built by using a scoring technique called CF-IUF (category frequency-inverse user frequency), which is a modification of a well-known information retrieval concept TF-IDF [7]. For collaborative filtering, there are various methods like k-nearest neighbor, Pearson correlation coefficient, Spearman correlation coefficient [8]. In the research of the Kim et al, they have investigated the Pearson and Spearman correlation coefficients and have decided that Spearman correlation coefficient performs slightly better [9]. In the scope of this paper, we have used the knearest neighbor technique after conducting tests for the best k for our dataset. Most of the current recommender systems designed for TV programs disregards thenotion of context. They only operate on two dimensional User x Item space. In the research conducted by da Silva et al., a contextual user profile is the result of the aggregation of the user contextual information, user personal data profile, and the genre of TV program considered relevant in a certain context. They have implemented a contextual filtering method similar to the content based filtering but using the contextual information such as date, time and place of the origin of the TV program and the user. They have proved that the context notion improves theperformance of the recommender system. They have also pointed out that contextual aspects can be enlarged to even larger sets by adding contexts such as the room of television in the house, domain of TV usage [10]. Similar to their work, we have used the genre of the program, time of day of program for pre-filtering. Graph based methods have been used for the recommendation of TV programs and movies in recent years. In their research, Phuong et al. have used the graph based method for combining content based and collaborative filtering by using network propagation algorithm on their tri-partite graph [11]. They have proved that their method outperforms the baseline k-NN collaborative filtering method and content based method. In his research named ContextWalk, Bogers has embedded the context variables into the graph and using finite Markov random walks to calculate the similarity between each node in the graph. In order to use this method, there must be a similarity edge between each node type. In other words, data must be preprocessed to calculate the similarities between each and every node even if they are not directly connected. For example, the similarity between users and actors must be calculated before constructing the graph. In the scope of his work, he has embedded the tag, actor and genre context variables in the graph and he claims that ContextWalk can easily be extended to include additional contextual features, such as time, social network information, and mood information [12]. In the scope of our research, we have constructed our own graph and used Bogers’ approach to calculate similarities between user nodes for collaborative filtering. However, because of the performance issues, we didnot embed the context variables.

Instead, we have used our context variables for pre-filtering purposes by eliminating the redundant nodes from the graph. In our research, we have improved Phuong et al.’s graph based model with edge weights between node types such as continuous ratings between users and programs, TF-IDF values between programs and terms instead of binary values. Moreover, we have adopted Bogers’ approach on our realworld channel usage data by using the contextual pre-filtering according to the context special to our data.

3. DATA PREPROCESSING Throughout our research, we have used the channel usage data retrieved from Arçelik, Beko2 and Grundig3 TV’s between October 2013 and January 2014 which includes 3865821 records. However, channel usage data only includes channel name, start time of TV usage and end time of usage but it does not provideany information about the program which is watched. Therefore, we had to parse the TV guide website of the newspaper named Radikal [13] to get the proper EPG(Electronic Program Guide) information. Since there are different channel names for the same channel in the channel usage data, we have implemented record matching rules on channel names to match the channel usage data with the program information. For example, the channel names EUROSPORT 2, EUROSPORT2 and EUROSPORT 2 HD correspondto one channel named eurosport2 in the program information on the website. After matching the channel usage data with the EPG information, 1171533 records are matched with 41357 programs for 5466 users. The average number of watched programs for a user is 307. In order to infer ratings from program usages, we have used the following formula which yields a rating between 0 and 1:

𝑟𝑎𝑡𝑖𝑛𝑔 =

# of minutes of watch time # of minutes of program

When we have calculated the rating for all (user, program) pairs, the average rating is calculated as 0.62. Because of the performance issues, we have shrunk the dataset to 198 users by random sampling among 450 users who watched between 500 and 1200 programs and whose average rating is bigger than the average rating. As a result, we have ended up with 198 users, 28998 programsin order to use in our work.

4. Graph-based Modeling 4.1 Constructing the Graph In our research, we have constructed a tri-partite graph which includes three node types, namely User, Program and Term. We denote users by U = {User1, User2, User3, User4, ..,User|U|}, programs by P = {Program1, Program2, Program3, Program4,..,Program|P|} and terms by T = {Term1, Term2, Term3, Term4,..,Term|T|}. The set T is used to represent the set of terms used in the description of a TV program. In order to select the terms, we have stemmed all words taking place in the program description and excluded verbs in order to avoid the ambiguity problem of verb stems in Turkish. User ratings over programs are represented by the matrix UP= (upij) with size of |U| x |P|, where 2

http://www.beko.com/

3

http://www.grundig.com.tr/

each cell upij takes values from 0 to 1. The formula for calculating upijis explained in the previous section. If a user has never watched a program, the rating is set to 0. Conversely, if a user has watched the whole program, the rating is calculated as 1. We denote the program-term relation by a well-known information retrieval concept named TF-IDF. TF-IDF is calculated as a combination of two notions, which are term frequency (TF) and inverse document frequency (IDF). Term frequency is the number of occurrences of a term in a document. In our case, document is the program description. The notion inverse document frequency is a measure of how much information the word provides and calculated as follows: 𝑖𝑑𝑓 𝑡, 𝐷 = log

𝑁 |{𝑑 ∊ 𝐷 ∶ 𝑡 ∊ 𝑑}|

In the formula of IDF, the number of all documents is divided by the number of documents containing the term t. Then, IDF value is calculated as the logarithm of the quotient. TF-IDF is calculated as the product of TF and IDF. Program-term relation is represented by the matrix PT= (ptij) with the size of |P| x |T|, where each cell ptij takes the TF-IDF value between program p i and term tj. We have normalized all TF-IDF values to the range [0, 1]. 𝑡𝑓𝑖𝑑𝑓 𝑡, 𝑑, 𝐷 = 𝑡𝑓 𝑡, 𝑑 x 𝑖𝑑𝑓 𝑡, 𝐷 In our dataset, the average number of terms for a program is 12.7. In addition, we have used the notion of co-occurrence, which is calculated according to the following formula:

ttab=

#𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑎 𝑎𝑛𝑑 𝑏 𝑡𝑜𝑔𝑒𝑡 𝑕𝑒𝑟 #𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠

Term-term relation matrix TT = (ttij) includes those values of cooccurrences which takes values between 0 and 1. An example graph structure with our calculated relationship weights can be seen in Figure 1:

random walk algorithm. We have defined the user-term relationship matrix UT = (utij) with the size of |U| x |T|. The relationship between users and terms, i.e. the value of utij is calculated by the summation of all multiplications of program ratings of user and TF-IDF of the term. For example, the userterm relationship between User2 and Term4 in the example graph in Figure 1 is calculated as: 𝑢𝑡24 = 𝑢𝑝22 𝑥𝑝𝑡24 + 𝑢𝑝23 𝑥𝑝𝑡34 By using the relationship matrix between node types, we have constructed the transition probability matrix X by assigning each relationship matrix as submatrix for X. It can be seen how the initial state matrix has been constructed below.

𝑈×𝑈 𝑋 = 𝑃×𝑈 𝑇×𝑈

𝑈×𝑃 𝑃×𝑃 𝑇×𝑃

𝑈×𝑇 𝑃×𝑇 𝑇×𝑇

4.2 Context-aware Pre-filtering In our data set, each program has some attributes that can be used as a context variable such as genre, broadcast time, and channel. There are even more specific context variables like actors and directors which are only available in specific genres. In our research, we have used context aware pre-filtering in order to shrink the candidate programs to a more reasonable size. We have tested the context variables genre and time of day of broadcast both separately and in conjunction with each other. We have run experiments with 42 different genres and 7 different times of broadcasts. In the scope of pre-filtering, we have filtered out the programs which do not have the selected context variable as an attribute.For example,we have 6500 programs for the time of day “PRIME_TIME”, within the dataset of 41357 programs. Then, we have re-run our algorithm with the smaller graph. With a smaller graph we could be able to get more accurate results whichare explained in section 5.

4.3 Collaborative Filtering Using k-nearest Neighbor In order to find the k-nearest neighbors, we have used the random walk algorithm as in Bogers’ work [11]. In order to begin the random walk over our tri-partite graph, we need to define the initial state vector s0 in which only the initial user node is 1 and all the other nodes are set to zero. We can find the state probabilities at the next step by multiplying s0 with X. In general, we can calculate the transition probabilities after n steps using the following formula:

𝑠𝑛+1 = 𝑠𝑛 𝑋 Figure 1: An example graph In our algorithm, there is no relationship between users and terms. However, in his research, Bogers stated that there should be a relationship matrix betweeneach node types on the graph [12]. For this reason, we have defined the program-program, user-user and user-term matrices. Program-program relation matrix PP = (ppij) and user-user relation matrix UU = (uu ij) are represented with identity matrices since there is no direct relationship between program-program and user-user pairs at the beginning of the

After n steps, we can sort the transition probabilities to jump on another user node and select the k-most probabilities for selecting k-nearest neighbor. By using the rule of thumb mentioned in [14], we have tested k in the range of

𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑢𝑠𝑒𝑟𝑠 − 4 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑢𝑠𝑒𝑟𝑠< k < 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑢𝑠𝑒𝑟𝑠 + 4 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑢𝑠𝑒𝑟𝑠 In our work, we have tested the path length from 1 to 6. We could not increment n more because of performance issues. Experimental results are shown in the next section.

After finding the k nearest neighbors, we have used the same technique for finding top-N program recommendations for each neighbor with a path length of four. We have obtained top-10 recommendations from each similar user. We then sum all weights that belong to a program from k users in order to find the final weight. As a result, we have ended up with top-10 recommendations.

5. EXPERIMENTS & RESULTS For the evaluation of our algorithm, we have tested the effect of the context variables to the random walk by pruning the graph with respect to the context variable selected by using contextaware pre-filtering. The test mode used was k-fold cross validation which is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In order to obtain results of our algorithm, we have used two classical metrics which are precision and recall.

5.1 Test Variables In the scope of our experiments, we have investigated the optimum value of k for k-NN and, optimum length of path for random walk. After identifying the best test variables on the dataset, we have continued the experiments with fixed test variables.

5.1.1 k-NN Clustering In order to find the best k for k-NN, we have checked all k values that are between 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑢𝑠𝑒𝑟𝑠 − 4 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑢𝑠𝑒𝑟𝑠 and 4 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑢𝑠𝑒𝑟𝑠 + 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑢𝑠𝑒𝑟𝑠 on the graph which is not pruned according to any context. Since we have 198 users we have tested the k value from 11 to 17. In Figure 2, it can be seen that when we use 12 as k, we get the best values for our metrics. Therefore, we have selected the optimal k-value as 12, during our experiments with context variables.

Figure 3: Change of metrics w.r.t path length of random walk

5.2 Effects of Context Filtering In order to test the effect of context for recommendation, we have compared the performance of random walk algorithm on contextfree graph with the performance on the graphs which are pruned according to context variables. We have tested two contexts which are genre and time of day of broadcast of a program. In our dataset, we have seven types of different time of day of broadcasts and forty-two different genres which are mainly named as context variables in this paper. We have tested the effects of context variables both independently and in conjunction with each other. While using two contexts together, we have set the context variables to specific values. In order to test the effect of contextual filtering, we have selected the context variables which perform best for each context. In the following table, it can be seen that results get better when we prune the graph with more context variables. Table 1. Performance of algorithm with and without context No context

Time of day

Genre

Both

Precision

0.0889

0,1348

0,1766

0,2230

Recall

0.0701

0,0935

0,3704

0,6606

6. CONCLUSION AND FUTURE WORK

Figure 2: Change of metrics w.r.t k value in k-nn

5.1.2 Path length of Random Walk In our research, we have taken random walks with finite lengths because of performance issues. The considered path lengthswere between 1 and 6 for random walk experiments. As it can be seen in Figure 3, we get slightly better results, as the path length is extended. Therefore, we have used the path length of 6.

In the scope of our research, we have developed a graph based collaborative recommendation algorithm for TV programs which traverses the graph via random walk in order to calculate the similarities between users. We have embedded edge weights such as continuous rating instead of binary weighting, TF-IDF values between programs and terms instead of binary relations and novel similarity between users and terms to our graph by composing the similarity between users and programs and similarity between programs and terms. We have also made use of context aware prefiltering with respect to context variables of two types which are time of day of broadcast and genre of a program. By using these context variables, we have diminished the graph which enabled us to get more targeted recommendation. As a result, we have proved that when the target set of programs is reduced according to some context variables, it is possible to get more successful recommendations. Our results indicate that our algorithm can even get better results with the increasing number of context variables activated at the same time to shrink the graph. In order to get better results, more context variables related with programs can be added for context aware pre-filtering such as channel of program and language of program and so on. Since we use top-10 recommendation technique, context-aware post filtering methods can be used in order to re-rank the recommendation list with respect to the context variables used.

Actors of a program in movie genre, directors of a program in a movie genre and named entities in the program description can be used for this purpose. In addition, some other context variables related with users directly can be used to initialize users with predefined similarities. If the demographic information about users is known, this information can be used as a similarity metric between users. Social media data might also be utilized in order to create links by using friendship and/or common tastes of users. Those links may provide better results for calculating similarities between users by aggregating social web data with program preferences data before starting random walk.

[3] Cheng, H., Tan, P. N., Sticklen, J., & Punch, W. F. (2007, October). Recommendation via query centered random walk on k-partite graph. In Data Mining, 2007. ICDM 2007.Seventh IEEE International Conference on (pp. 457462). IEEE.

In order to improve the performance of our recommendation algorithm, some improvements about program information should also be provided in the future. The schedule of TV programs sometimes changes without any prior notificationand therefore the information about TV programs on the websites we crawl may not correspond to the actual program watched by the user. Thus the data set we use may contain some wrong information. This may cause a reduction in the performance of our algorithm. Moreover, some program information on websites can be too short to get insights about program. We expect to improve our performance evaluation results by improving the data set and program information.

[6] Bambini R., Cremonesi, P. and Turrin, R. (2011). A recommender system for an iptv service provider: a real large-scale production environment. In Recommender Systems Handbook (pp. 299-331). Springer US.

In order to achieve better results with our algorithm, a person or the people watching the TV at the time of recommendation should be identified which might bring better personalization possibilities. Moreover, the algorithm developed in the scope of this work should be adapted to large scale environments to perform online experiments with real users.

7. ACKNOWLEDGMENTS This work is supported partially by the Ministry of Science, Industry and Technology of Turkey and by Arçelik under Grant SANTEZ 1651.STZ.2012-2 and partially by the Scientific and Technical Council of Turkey Grant TUBITAK EEEAG-112E111

8. REFERENCES [1] Adomavicius, G. and Tuzhilin, A. 2011. Context-aware recommender systems. In Recommender systems handbook (pp. 217-253). Springer US. [2] Konstas, I., Stathopoulos, V., & Jose, J. M. (2009, July). On social networks and collaborative recommendation. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (pp. 195-202). ACM.

[4] Lekakos, G., &Caravelas, P. (2008). A hybrid approach for movie recommendation. Multimedia tools and applications, 36(1-2), 55-70. [5] Marchand-Maillet, S. (2000). Content-based video retrieval: An overview.

[7] Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4), 35-43. [8] PEARSON’S, C. O. V. O. (2007). Comparison of values of pearson’s and spearman’s correlation coefficients... [9] Kim, M. W., Kim, E. J., Song, W. M., Song, S. Y., &Khil, A. R. (2012). EfficientRecommendation for Smart TV Contents. In Big Data Analytics (pp. 158-167). Springer Berlin Heidelberg. [10] da Silva, F. S., Alves, L. G. P., &Bressan, G. (2012). PersonalTVware: An infrastructure to support the contextaware recommendation for personalized digital TV. Int. Journal of Computer Theory and Engineering, 4(2), 131-135. [11] Phuong, N. D., & Phuong, T. M. (2008). A graph-based method for combining collaborative and content-based filtering. In PRICAI 2008: Trends in Artificial Intelligence (pp. 859-869). Springer Berlin Heidelberg. [12] Bogers, T. (2010). Movie recommendation using random walks over the contextual graph. In Proc. of the 2nd Intl. Workshop on Context-Aware Recommender Systems. [13] TV Rehberi- TelevizyonProgramıveYayınAkışıRadikal'de. (n.d.). Radikal. Retrieved January 1, 2014, from http://www.radikal.com.tr/tvrehberi/ [14] Hall, Peter, Byeong U. Park, and Richard J. Samworth. "Choice of neighbor order in nearest-neighbor classification." The Annals of Statistics (2008): 2135-2152.

Paper2.pdf

Whoops! There was a problem loading this page. Retrying... Whoops! There was a problem loading this page. Retrying... Paper2.pdf. Paper2.pdf. Open. Extract.

626KB Sizes 3 Downloads 192 Views

Recommend Documents

No documents