References Results Conclusions Data Methodology

Viewer
Transcript

A Large-scale Study of the Effect of Training Set Characteristics over Learning-to-Rank Algorithms Stefan Savev, Pavel Metrikov, Virgil Pavlu and Javed Aslam

Evangelos Kanoulas

Consider the problem of the low-cost construction of effective training datasets with the respect to learning-torank algorithms.

Queries

TRAIN

n = 8

↓

(8,0,0,0), (7,1,0,0), (7,0,1,0), (7,0,0,1), (6,2,0,0), …,

(0,0,0,8)

?

What effect has the distribution of labels across the different grades of relevance in the training set on the performance of the learning to rank algorithms?

Informative summaries found:

1. The normalized cumulative gain of the judgment set

2. The variance over the judgment set

Methodology

Conclusions

1. Construct thousands of judgment sets (points on Figure 1), each one having some pre-defined distribution of labels:

•  combine the grades “3” and “4”

0

1

2

3

4

•  select only queries that have at least k labeled urls in each of 0

1

2

3

the relevance grades

Less relevant

documents

d5

d7

…

d28

d1

d4

…

d8

d2

d6

…

d9

d3

d10

…

d22

0

…

≥ k

0

1

1

…

≥ k

1

•  split the queries into training set, validation and testing set

•  select a total number of judgments per query n, with n ≤ k

•  generate as many training and validation sets as the possible distributions of the n judgments over the four grades

Results

“WEB30k“ dataset by Microsoft Research:

•  ~30,000 web queries

•  136 features for each query-url pair

•  5 grades of relevance

0

TEST

2. Run learning-to-rank algorithms over the judgment sets.

3. Measure the performance in terms of nDCG@10.

Data

doc

rel

VALIDATE

2

2

…

≥ k

2

3

3

…

≥ k

3

More relevant

documents

Figure 1: RankBoost, Ranking SVM, Regression performance (nDCG@10, y-axis) as a function of the normalized cumulative gain of the training dataset per query (x-axis, left) and the variance of the labels in the training dataset per query (x-axis, right).

1. Distributions with a balance between the number of documents in the extreme grades are to be favored

2. The middle relevance grades play less important role than the extreme ones

References [1] J. A. Aslam, E. Kanoulas, V. Pavlu, S. Savev, and E. Yilmaz. Document selection methodologies for efficient and effective learning-to-rank. In Proceedings of the 32nd ACM SIGIR, pages 468–475. ACM Press, July 2009.

introduction methods conclusions purpose results