A Large-scale Study of the Effect of Training Set Characteristics over Learning-to-Rank Algorithms Stefan Savev, Pavel Metrikov, Virgil Pavlu and Javed Aslam
Evangelos Kanoulas
Consider the problem of the low-cost construction of effective training datasets with the respect to learning-torank algorithms.
Queries
TRAIN
n = 8
↓
(8,0,0,0), (7,1,0,0), (7,0,1,0), (7,0,0,1), (6,2,0,0), …,
(0,0,0,8)
?
What effect has the distribution of labels across the different grades of relevance in the training set on the performance of the learning to rank algorithms?
Informative summaries found:
1. The normalized cumulative gain of the judgment set
2. The variance over the judgment set
Methodology
Conclusions
1. Construct thousands of judgment sets (points on Figure 1), each one having some pre-defined distribution of labels:
• combine the grades “3” and “4”
0
1
2
3
4
• select only queries that have at least k labeled urls in each of 0
1
2
3
the relevance grades
Less relevant
documents
d5
d7
…
d28
d1
d4
…
d8
d2
d6
…
d9
d3
d10
…
d22
0
…
≥ k
0
1
1
…
≥ k
1
• split the queries into training set, validation and testing set
• select a total number of judgments per query n, with n ≤ k
• generate as many training and validation sets as the possible distributions of the n judgments over the four grades
Results
“WEB30k“ dataset by Microsoft Research:
• ~30,000 web queries
• 136 features for each query-url pair
• 5 grades of relevance
0
TEST
2. Run learning-to-rank algorithms over the judgment sets.
3. Measure the performance in terms of nDCG@10.
Data
doc
rel
VALIDATE
2
2
…
≥ k
2
3
3
…
≥ k
3
More relevant
documents
Figure 1: RankBoost, Ranking SVM, Regression performance (nDCG@10, y-axis) as a function of the normalized cumulative gain of the training dataset per query (x-axis, left) and the variance of the labels in the training dataset per query (x-axis, right).
1. Distributions with a balance between the number of documents in the extreme grades are to be favored
2. The middle relevance grades play less important role than the extreme ones
References [1] J. A. Aslam, E. Kanoulas, V. Pavlu, S. Savev, and E. Yilmaz. Document selection methodologies for efficient and effective learning-to-rank. In Proceedings of the 32nd ACM SIGIR, pages 468–475. ACM Press, July 2009.