Learning to Rank for Question Routing in Community Question Answering 1
Zongcheng Ji1,2 and Bin Wang1
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2 University of Chinese Academy of Sciences, Beijing, China
[email protected],
[email protected]
Yahoo! Answers1, Baidu Zhidao2 and the focused sites like Stack Overflow3 and TurboTax4, etc. As time passed, CQA services not only have accumulated large archives of questions and their answers, but also have attracted increasing number of users for information seeking and knowledge sharing. However, in recent years, the efficiency of CQA services is greatly challenged by the serious gap between the posted questions and the potential answerers due to the fast increase of posted questions, and the lack of an effective way to find interesting questions for the potential answerers. This will lead to the following three problems. (1) From the asker’s perspective, the askers have to passively wait for answerers to answer their questions, which may take hours or days to get their questions resolved. Previous study [5] has shown that more than 80% of the new questions cannot be resolved efficiently within 48 hours. Therefore, it will make the askers be not willing to post new questions into the CQA services, but to try other means to seek their information. (2) From the answerer’s perspective, the potential answerers, who know well the answers of questions in a particular domain and are willing to contribute their knowledge to the community, are not easy to find their interested questions because they are easily overwhelmed by the large number of open questions. Therefore, it will reduce the answerers’ enthusiasm in providing answers to share their knowledge. (3) From the CQA service’s perspective, this will degrade the services’ performance as well as reduce the users’ loyalty to the system, and will undoubtedly lose lots of users, who are the bases of the web services. To bridge the gap between the newly posted question and the potential answerers, it is essential to automatically route the newly posted questions to the potential answerers, who may answer the questions. This linking of the newly posted questions with the potential answerers will improve the users’ satisfaction as well as the performance of the CQA services, thus will make the askers be more willing to contribute knowledge to the CQA services and the answerers be more enthusiastic in providing answers. Question Routing (QR) in CQA is the task of routing newly posted questions to the potential answerers, who are interested in the questions and are most likely to answer them [5, 6, 11-14], to make the questions get answered as soon as possible. Traditional methods to solve this problem usually first build a profile for each potential answerer based on the archives of her previously answered questions, and then use the information retrieval based methods to compute the text similarity between the
ABSTRACT This paper focuses on the problem of Question Routing (QR) in Community Question Answering (CQA), which aims to route newly posted questions to the potential answerers who are most likely to answer them. Traditional methods to solve this problem only consider the text similarity features between the newly posted question and the user profile, while ignoring the important statistical features, including the question-specific statistical feature and the user-specific statistical features. Moreover, traditional methods are based on unsupervised learning, which is not easy to introduce the rich features into them. This paper proposes a general framework based on the learning to rank concepts for QR. Training sets consist of triples (q, asker, answerers) are first collected. Then, by introducing the intrinsic relationships between the asker and the answerers in each CQA session to capture the intrinsic labels/orders of the users about their expertise degree of the question q, two different methods, including the SVM-based and RankingSVM-based methods, are presented to learn the models with different example creation processes from the training set. Finally, the potential answerers are ranked using the trained models. Extensive experiments conducted on a real world CQA dataset from Stack Overflow show that our proposed two methods can both outperform the traditional query likelihood language model (QLLM) as well as the state-of-the-art Latent Dirichlet Allocation based model (LDA). Specifically, the RankingSVMbased method achieves statistical significant improvements over the SVM-based method and has gained the best performance.
Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval
Keywords Community Question Answering; Question Routing; Learning to Rank
1. INTRODUCTION Community Question Answering (CQA) has become a popular type of web services where users can ask and answer questions. Examples of such CQA services include the general sites like Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. CIKM’13, Oct. 27–Nov. 1, 2013, San Francisco, CA, USA. Copyright © 2013 ACM 978-1-4503-2263-8/13/10…$15.00. http://dx.doi.org/10.1145/2505515.2505670
1
http://answers.yahoo.com/ http://zhidao.baidu.com/ 3 http://stackoverflow.com/ 4 https://ttlc.intuit.com/ 2
2363
Figure 1: The Overall Structure of the Learning to Rank Framework for Question Routing
user profile and the newly posted question. This process is also called expertise estimation [5]. Finally, all the candidate answerers are ranked based on their expertise score on the newly posted question. These methods include the traditional query likelihood language model (QLLM) [5], the category-sensitive language model [6] and the Latent Dirichlet Allocation [2] based model (LDA) [8], which has gained the state-of-the-art performance. However, all of these approaches only consider the text similarity features between the newly posted question and the user profile, while ignoring the important statistical features, including the question-specific statistical feature and the user-specific statistical features. What is more, traditional approaches are based on unsupervised learning to estimate the potential answerers’ expertise given a newly posted question, thus are not easy to introduce the rich features into them. In this paper, we propose a general framework based on the learning to rank concepts for QR. To the best of our knowledge, it is the first extensive and empirical study to bring the learning to rank concepts for solving this problem. We first collect a training set consists of triples (q, asker, answerers). Each triple is constructed from a CQA session, where q is the question asked by the asker and answered by the answerers. Secondly, we identify eight typical features, including the question-specific statistical feature, the user-specific statistical features and the text similarity features, to capture different aspects of questions, users, and their relationships. Thirdly, we present two different learning methods, including the SVM-based and RankingSVM-based methods, to learn the ranking models of the candidate answerers for a given newly posted question. Then, we introduce the intrinsic relationships between the asker and the answerers in each CQA session to capture the intrinsic labels/orders of the users about their expertise degree of the question q. The intrinsic labels/orders of the users mean that all the answerers in one CQA session should have more expertise degree of the question than that of the asker. By capturing the intrinsic labels/orders of the users, it will be easy to construct the training examples to be used in supervised learning, and of course be available to introduce the rich features into the learning process. Finally, we rank the potential answerers using the trained ranking models for the newly posted question. We conduct extensive experiments on a real world CQA dataset from Stack Overflow website for QR. The results show that our proposed two methods can both outperform the traditional QLLM as well as the state-of-the-art LDA. In addition, the RankingSVM-based method achieves statistical significant improve-
ments over the SVM-based method and has gained the best performance. The rest of the paper is organized as follows. Section 2 details our proposed learning to rank framework for QR. Section 3 describes the experimental study. Finally, we conclude and offer the future work in section 4.
2. PROPOSED APPROACH Figure 1 shows the overall structure of the learning to rank framework for question routing. (1) We first collect a training set consists of triples (q, asker, answerers). Each triple is constructed from a CQA session, where q is the question asked by the asker and answered by the answerers. (2) Secondly, we identify features to capture the different aspects of the question (q), the users (asker and answerers), and the relationships between them. (3) Thirdly, we will choose two different learning algorithms to learning the ranking model. (4) Then, according to the different learning algorithms, we will create different training examples, which is crucial for the supervised learning. Specifically, we introduce the intrinsic relationships between the asker and the answerers in each CQA session to capture the intrinsic labels/orders of the users about their expertise degree of the question q. The intrinsic labels/orders of the users mean that all the answerers in one CQA session should have more expertise degree of the question than that of the asker. By capturing the intrinsic labels/orders of the users, it will be easy to construct the training examples to be used in supervised learning, and of course be available to introduce the rich features into the learning process. (5) Finally, we will use the learned ranking model to rank the candidate answerers according to their expertise degree to the newly posted question. In the following subsections, we will describe the feature selection, the ranking algorithms with their example creation processes and the candidate answerers ranking in more detail.
2.1 Feature Selection Given a question and a user, three types of features listed in Table 1 are used in our work. They are the question-specific statistical feature (feature 1), the user-specific statistical features (feature 2~5) and the text similarity features (feature 6~8). Feature 1~5 are the statistical features, which have not yet been investigated to be used in the ranking model for question routing. Feature 6~8 are the text similarity features which are the only features QLLM and LDA have considered before.
2364
Table 1: The features capture different aspects of questions, users and their relationships. (Q: Question, Question-specific Statistical Feature; U: User, User-specific Statistical Feature; QU: Question-User Relationship, Text Similarity Feature) #
Feature
Description
1
Q: Title Length
Length of the question title.
2 3 4 5
U: Percentage of best answer U: # of best answers U: # of answers U: # of asked questions
Percentage of answers selected as best answers among all answers the user has provided. # of best answers the user has provided. # of answers the user has provided. # of questions asked by the user.
QU: P(question, LM(answered questions)) QU: P(question, LM(answered and asked questions)) QU: P(question | LDA(answered questions))
The probability of generating the question from the user’s answered questions with language model. The probability of generating the question from the user’s answered and asked questions with language model. The probability of generating the question from the user’s answered questions with LDA model.
6 7 8
2.1.1 Question-specific Statistical Feature
2.2.1 SVM-based Method
For question-specific statistical feature, we consider the question title length (feature 1). As suggested in [1], question title length could be considered as an important feature to measure the quality of the question. In other words, a question of good quality is easier to get answered.
We first reduce the ranking of candidate answerers as a classification problem, which is a pointwise approach to learning to rank. Support Vector Machine (SVM) is a widely used approach to build a classifier based on a set of labeled objects. Given a set of labeled objects, some belong to a positive class and the others belong to a negative class, SVM tries to build a hyper-plane that separates objects belonging to the positive class from those of the negative class with the largest margin. The resultant model could then be used to classify other unknown data points in vector representation and label them as either positive or negative. Creating Examples: To directly use SVM to train the model, the key step is to create the positive and negative examples. As we have collected a training set consists of triples (q, asker, answerers). Following [13], if a user is the asker of q, we consider (q, asker) as a negative example, and if a user is the answerer of q, we consider (q, answerer) as a positive example. For clarity, we list the definition of negative and positive examples in Table 2.
2.1.2 User-specific Statistical Features For user-specific statistical features, we consider the number of answers and best answers the user has provided and the number of questions the user has asked (feature 2~5). If a user has provided a lot of answers with large amount of them as best answers (of course, with higher percentage of best answers), then we think that the user has more authority. And more authoritative users will probably provide more authoritative answers. In addition, if a user has also asked a lot of questions, it indicates that the user is very active. Active users will probably learn a lot from others and will probably provide answers to others if she will in the future.
2.1.3 Text Similarity Features
Table 2: Negative and positive examples for SVM-based method
Feature 6~8 are the text similarity features which are the only features that QLLM and LDA have considered before. Feature 6 is the probability of generating the question from the user’s answered questions with language model. Feature 7 is the probability of generating the question from the user’s answered and asked questions with language model. Feature 8 is the probability of generating the question from the user’s answered questions with LDA model.
Question-User Pair (q, asker) (q, answerer)
With the definition of positive and negative examples and the training set consists of triples (q, asker, answerers), we create two sets of examples to be used for model training. Training Models: Before training, an extra action is taken to normalize the values of all features in the training set to within the range [-1, 1]. This is to avoid the case that some features in the bigger range dominate those in the smaller range [3]. The same normalization will also be applied when the trained model is used to classifying unknown pairs (question, user). Then, we use LIBSVM [3], which is a popular implementation of SVM, with linear kernel to train the model.
2.2 Learning the Ranking Model There are three major approaches to learning to rank, i.e., the pointwise, pairwise, and listwise approaches [7, 9]. In our work, we apply the pointwise and pairwise approaches, as it is difficult to obtain the complete order of users from the existing dataset, but 5 it is easier to know the label of each user or the partial orders between two users. Specifically, in the pointwise way, we care about the label for each user to denote whether the user has enough expertise to answer the question and reduce the ranking of users as a classification problem; and in the pairwise way, we care about the partial order between two users, thus it is closer to the concept of ranking than the pointwise way.
5
Class/Label Negative Positive
2.2.2 RankingSVM-based Method Next, we introduce a pairwise approach to learning to rank for question routing. RankingSVM [4], one of the pairwise ranking methods, is a classical algorithm for ranking, which transforms ranking into pairwise classification and employ the SVM technology to perform the learning task.
The label of each user means whether the user has enough expertise to answer the newly posted question. Clearly, the asker does not have, the label is -1; but the answerer does, the label is 1.
2365
Creating Examples: To directly use RankingSVM to train the model, the key step is to create the partial order between two users. We also use the collected training set consists of triples (q, asker, answerers). Clearly, in each triple (q, asker, answerers), the expertise degree of each answerer to answer the question q should be ranked higher than that of the asker. Thus the following partial order should hold: (
)
(
)
Table 3: The subset used in our experiments. "Start" and "End" mean the start and end dates we split the dataset in the January 2011 data dump from Stack Overflow. Training Test Total
( )
Start 09-01-01 10-01-01 09-01-01
End 09-12-31 10-01-31 10-01-31
#Months 12 1 13
#Sessions 81,295 11,116 92,411
Table 4: The statistics of the user set, training question set and the test question set ( ) for the subset. X means to choose the users who have provided at least X answers.
With the definition of the partial order between two users and the training set consists of triples (q, asker, answerers), we create a partial order set to be used for model training. Training Models: Before training, we also perform the normalization of all the features in the training set to within the range [-1, 1]. The same normalization will also be applied in the test set. Then, we use RankingSVM [4] with linear kernel to train the model.
#users 10 15 20
2.3 Ranking Candidate Answerers Having the trained models at hand, including the SVM-based and RankingSVM-based methods, now we should perform candidate answerer ranking for question routing, and then route the newly posted question to the candidate answerers that are ranked higher. (1) A typical SVM classifier would only give binary results. However, we are not interested in the binary results. We are interested in knowing the expertise degree of the users for ranking for the newly posted question. Thus we enable the probability estimation functionality of LIBSVM to train the model which is able to produce a probability of the user having enough expertise to answer the question. Then the final predicted probabilities can be used directly for ranking. (2) When using RankingSVM to perform the pairwise approach to learning to rank, the final predicted ranking score can be used directly for ranking.
5,761 3,971 2,978
#questions (training) 16,033 11,179 8,374
#questions (test) 1,150 746 517
Finally, we construct the user set, the training question set and the test question set ( ) for question routing as follows: (1) We first select the users who have answered at least (we in our work) questions in the training set as choose the user set . (2) Then we collect questions that each question has the asker, the best answerer and at least one other answerer in from the training set and test set as the training question set and test question set , respectively. The statistics of the sets ( ) are shown in Table 4. Take for example, there are 5,761 users who have answered at least 10 questions in the training set. The asker, the best answerer and at least one other answerer for each question in the 16,033 training questions are within the 5,761 users. For each of the 1,150 test questions, they are routed to the 5,761 users.
3. EXPERIMENTS 3.1 Experimental Setup
Ground Truth: Following [5, 6, 12], the actual answerers for each test question are viewed as the ground truth.
Dataset: The experimental dataset is based on a snapshot of the focused CQA service Stack Overflow, which focuses on the questions and answers on a wide range of topics in computer programming. An archive of the whole content of this website is released every three months. For our experiments, we use the January 2011 data dump (since its launch on July 31, 2008 to December 31, 2010). First, we select a representative subset of the whole dataset. Table 3 shows the detail. It is from January 1, 2009 to January 31, 2010, exactly 13 months, which is the time span used in [10]. Tags are the only elements that categorize different topics in Stack Overflow. However, they belong to a very diverse topic set. Therefore, we need to create a subset of the dataset that exhibits the same properties as in the original one. Following [10], we use the reported 21 tags to choose the subset that a similar tag distribution is maintained. The representative subset is selected as follows: resolved questions with at least 2 answers and tagged with at least 1 of the 21 selected tags are picked. All the questions are lowercased and stopwords are removed using a standard list of 418 words. Then only the questions with at least 2 words are left. Finally, we get 92,411 CQA sessions for the subset. Then, we split the subset into a training set and a test set according to the timestamp when the question is posted. The training set is from January 1, 2009 to December 31, 2009, exactly 12 months with 81,295 sessions and the test set is from January 1, 2010 to January 31, 2010, exactly 1 month with 11,116 sessions.
Baselines: To evaluate the performance of our proposed learning to rank framework for question routing, we use the following two baselines: (1) QLLM: the traditional query likelihood language model, which is just the linear combination of feature 6 and 7 in Table 1. (2) LDA: the state-of-the-art LDA-based model, which only considers feature 8 in Table 1. Note that, we also combine QLLM and LDA (LDALM, which can be seen as the linear combination of feature 6, 7 and 8) to see the interpolation performance. However, using the LDA alone achieves the best performance among QLLM, LDA and LDALM in our experiments. Thus, we only report LDA as the state-of-theart method in our work. Metrics: We evaluate the performance of all the ranking methods using the following three metrics. Mean Average Precision (MAP), Mean Reciprocal Rank (MRR) and Precision@n (P@n). We also perform a significance test using a paired t-test at the 0.05 significance level.
3.2 Parameter Selection There are several parameters to be set in our experiments. (1) We use GibbsLDA++6 to conduct LDA training and inference. The parameters for LDA we used are as follows: the hyperparameters , and the topic size . 6
2366
http://gibbslda.sourceforge.net/
Table 5: Comparison of different methods for question routing. X means to choose the users who have provided at least X answers. QLLM and LDA are the baselines and LDA is the state-of-the-art method. SVM and RankingSVM are the two proposed learning to rank methods. "f7" means to use seven features and "f8" means to use eight features including the LDA-based feature. The value in bold achieves the best performance. "%chg ↑LDA" denotes the improvement in percent of the method in the left column over LDA. "*" indicate statistically significant improvements ( using a paired t-test) over LDA. QLLM (baseline1)
LDA (baseline2)
SVM (f7)
%chg (↑LDA)
SVM (f8)
%chg (↑LDA)
RankingSVM (f7)
%chg (↑LDA)
RankingSVM (f8)
%chg (↑LDA)
10
MAP MRR P@5 P@10
0.0245 0.0504 0.0143 0.0124
0.0386 0.082 0.023 0.0188
0.0363 0.0847 0.0245 0.0227
-6.0 3.3 6.5 20.7*
0.0364 0.085 0.024 0.0225
-5.7 3.7 4.3 19.7*
0.0422 0.0958 0.0289 0.0233
9.3 16.8* 25.7* 23.9*
0.0439 0.0992 0.0304 0.0252
13.7* 21.0* 32.2* 34.0*
15
MAP MRR P@5 P@10
0.0297 0.0569 0.0158 0.016
0.0439 0.0895 0.0268 0.0214
0.0443 0.098 0.0287 0.0261
0.9 9.5* 7.1 22.0*
0.0444 0.0992 0.0295 0.0253
1.1 10.8* 10.1* 18.2*
0.0508 0.1103 0.0316 0.0259
15.7* 23.2* 17.9* 21.0*
0.0537 0.1156 0.0362 0.0272
22.3* 29.2* 35.1* 27.1*
20
MAP MRR P@5 P@10
0.0335 0.0618 0.017 0.0166
0.0493 0.0967 0.0279 0.0207
0.0527 0.1149 0.0321 0.0286
6.9 18.8* 15.1* 38.2*
0.0524 0.1152 0.0313 0.0277
6.3 19.1* 12.2* 33.8*
0.0587 0.1253 0.0344 0.0286
19.1* 29.6* 23.3* 38.2*
0.061 0.1299 0.0379 0.0298
23.7* 34.3* 35.8* 44.0*
X
(2) We use LIBSVM7 as our SVM implementation with linear kernel, and we use 5-fold cross-validation and employ a grid search on SVM parameter over the following values to find the best parameter. Finally is found to be optimal in all our experiments, and of course we use this value to report all the results. (3) We use SVMRank 8 as our RankingSVM implementation with linear kernel, and we also use 5-fold cross-validation and employ a grid search on RankingSVM parameter over the same values as LIBSVM to find the best parameter. Finally, is found to be optimal in all our experiments, and of course we use this value to report all the results.
3.3 Experimental Results While designing and conducting the experiments, we have the following two questions in mind: (1) How effective are our proposed learning to ranking based methods, including SVM-based and RankingSVM-based methods? (2) Which method is better, the SVM-based method or the RankingSVM-based method? Specifically, to answer the two questions, we differentiate each of our two methods with 7 features that excludes the LDA feature and 8 features that includes the LDA feature.
3.3.1 The Effectiveness of the Proposed Methods Table 5 presents the comparisons of different methods for question routing. There are some clear trends in the results: (1) LDA significantly outperforms QLLM. This indicates that using the latent semantic topics from LDA to represent the user profile is much better than that of QLLM, and thus the state-ofthe-art LDA achieves better performance than QLLM. The results are consistent with the former work [6, 12]. (2) The SVM-based method with 7 features significantly outperforms QLLM. When compared with LDA, the method is a litter better in all the metrics except that in MAP when . In addition, the method achieves statistical improvements over LDA in the metric of P@10. The result indicates that incorporating the 7 8
question-specific statistical feature and the user-specific statistical features into the model can improve the question routing performance effectively. (3) The SVM-based method with 8 features does not improve much than that of the SVM-based method with 7 features. Moreover, it degrades the performance in some metrics. Even so, the performance of this method is a little better than LDA, and achieves statistical improvements in the metric of P@10. (4) The RankingSVM-based method with 7 features statistically outperforms LDA in almost all of the metrics. This demonstrates that the RankingSVM-based method can significantly outperform LDA with only the text similarity features and our introduced statistical features without LDA-based semantics. This also indicates that the RankingSVM-based learning method can learns the ranking model of the candidate answerers much better. This is not only due to the introduced statistical features, but also due to the superiority of pairwise-based learning to rank method over the pointwise-based learning to rank method. (5) The RankingSVM-based method with 8 features can further improve the performance when includes the LDA-based semantic feature, and achieves the best performance in all the metrics. However, this effectiveness of incorporating the LDA-based semantic feature is not the case when use the SVM-based method. (6) Note that, when , the best value of MRR in our experiments is 0.1299, which means that on average each test question will get answered if we route it to the top 8 users. However, the traditional QLLM should route it to the top 17 users (MRR=0.0618), the state-of-the-art LDA should route it to the top 11 users (MRR=0.0967) and the SVM-based method should route it to the top 9 users (MRR=0.1149 or 0.1152).
3.3.2 SVM-based vs. RankingSVM-based Method From the above results, we have already noticed that our proposed SVM-based and RankingSVM-based methods can significantly improve the question routing performance. These methods mainly benefit from our introduced statistical features. In addition, the RankingSVM-based method is much better than the SVM-based method and achieves the best performance when using 8 features. This mainly benefits from the superiority of the pairwise-based learning to rank method over the pointwise-based learning to rank
http://www.csie.ntu.edu.tw/~cjlin/libsvm/ http://www.cs.cornell.edu/People/tj/svm_light/svm_rank.html
2367
Table 6: Comparison of SVM-based and RankingSVM-based methods for question routing. Two aspects are considered: (1) different number of features (f7: 7 features, f8: 8 features including the LDA-based semantic feature; (2) different size of the chosen candidate answerers (X=10,15,20, where X means to choose the users who have provided at least X answers.). "%chg ↑SVM" denotes the improvement in percent of the RankingSVM-based method over the SVM-based method. "*" indicate statistically significant improvements ( using a paired t-test) over SVM.
f7
f8
SVM RankingSVM %chg ↑SVM SVM RankingSVM %chg ↑SVM
MAP
X=10 MRR P@5
P@10
MAP
X=15 MRR P@5
P@10
MAP
X=20 MRR P@5
P@10
0.0363 0.0422 16.3* 0.0364 0.0439 20.6*
0.0847 0.0958 13.1* 0.085 0.0992 16.7*
0.0227 0.0233 2.6 0.0225 0.0252 12.0*
0.0443 0.0508 14.7* 0.0444 0.0537 20.9*
0.098 0.1103 12.6* 0.0992 0.1156 16.5*
0.0261 0.0259 -0.8 0.0253 0.0272 7.5
0.0527 0.0587 11.4* 0.0524 0.061 16.4*
0.1149 0.1253 9.1 0.1152 0.1299 12.8*
0.0286 0.0286 0.0 0.0277 0.0298 7.6
0.0245 0.0289 18.0* 0.024 0.0304 26.7*
method. Therefore, it is necessary to do a further analysis and comparison of performance of the two methods. Table 6 presents the comparison of SVM-based and RankingSVM-based methods in two aspects: (1) Different number of features, 7 or 8; (2) Different size of the chosen candidate answerers, which is decided by the value of X, the smaller the value of X, the more candidate answerers are chosen. From the table, we can see that the RankingSVM-based method is much better than the SVM-based method and achieves the statistically significant improvements in most of the comparisons. The improvements can be seen from the two aspects. (1) The more features (from 7 to 8), the more improvements of RankingSVM-based method over SVM-based method. (2) The more the candidate answerers to be routed (decrease X from 20 to 10), the more improvements of RankingSVM-based method over SVM-based method.
0.0287 0.0316 10.1* 0.0295 0.0362 22.7*
0.0321 0.0344 7.2 0.0313 0.0379 21.1*
[3] Chih-Chung Chang and Chih-Jen Lin. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3): p. 1-27, 2011. [4] R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. Advances in Neural Information Processing Systems: p. 115-132, 1999. [5] Baichuan Li and Irwin King. Routing questions to appropriate answerers in community question answering services. In CIKM, pages 1585-1588, 2010. [6] Baichuan Li, Irwin King, and Michael R. Lyu. Question routing in community question answering: Putting category in its place. In CIKM, pages 2041-2044, 2011. [7] Hang Li. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies, 4(1): p. 1-113, 2011. [8] Mingrong Liu, Yicen Liu, and Qing Yang. Predicting best answerers for new questions in community question answering. In Proceedings of the 11th international conference on Web-age information management, pages 127138, 2010. [9] Tie-Yan Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3): p. 225-331, 2009. [10] Fatemeh Riahi, Zainab Zolaktaf, Mahdi Shafiei, and Evangelos Milios. Finding expert users in community question answering. In WWW, pages 791-798, 2012. [11] Fei Xu, Zongcheng Ji, and Bin Wang. Dual role model for question recommendation in community question answering. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 771-780, 2012. [12] Guangyou Zhou, Kang Liu, and Jun Zhao. Joint relevance and answer quality learning for question routing in community qa. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 1492-1496, 2012. [13] Tom Chao Zhou, Michael R. Lyu, and Irwin King. A classification-based approach to question routing in community question answering. In WWW, pages 783-790, 2012. [14] Y. Zhou, G. Cong, B. Cui, C.S. Jensen, and J. Yao. Routing questions to the right users in online communities. In IEEE 25th International Conference on Data Engineering. ICDE'09., pages 700-711, 2009.
4. CONCLUSIONS In this paper, we propose a general framework based on the learning to rank concepts for Question Routing (QR) in Community Question Answering (CQA).We conduct experiments on a real world CQA dataset from Stack Overflow website for QR. The results show that our proposed two methods can both outperform the traditional QLLM as well as the state-of-the-art LDA. Moreover, the RankingSVM-based method achieves statistical significant improvements over the SVM-based method and has gained the best performance. Some interesting future work could be continued. First, more features should be introduced into our proposed framework, such as the badges and reputation of the users in Stack Overflow. Second, we will try to investigate our proposed methods for other types of CQA dataset, such as the general sites like Yahoo! Answers.
5. ACKNOWLEDGMENTS This work is supported by the National Science Foundation of China under Grant No. 61070111 and the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06030200.
6. REFERENCES [1] Lada A. Adamic, Jun Zhang, Eytan Bakshy, and Mark S. Ackerman. Knowledge sharing and yahoo answers: Everyone knows something. In WWW, pages 665-674, 2008. [2] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3: p. 993-1022, 2003.
2368