A  Large-­scale  Study  of  the  Effect  of  Training  Set  Characteristics     over  Learning-­to-­Rank  Algorithms   Stefan  Savev,  Pavel  Metrikov,  Virgil  Pavlu  and  Javed  Aslam  

Evangelos  Kanoulas  

Consider the problem of the low-cost construction of effective training datasets with the respect to learning-torank algorithms.

Queries

TRAIN

n = 8



(8,0,0,0), (7,1,0,0), (7,0,1,0), (7,0,0,1), (6,2,0,0), …,

(0,0,0,8)

?

What effect has the distribution of labels across the different grades of relevance in the training set on the performance of the learning to rank algorithms?

Informative summaries found:

1. The normalized cumulative gain of the judgment set

2. The variance over the judgment set

Methodology

Conclusions

1. Construct thousands of judgment sets (points on Figure 1), each one having some pre-defined distribution of labels:

•  combine the grades “3” and “4”

0

1

2

3

4

•  select only queries that have at least k labeled urls in each of 0

1

2

3

the relevance grades

Less relevant

documents

d5

d7



d28

d1

d4



d8

d2

d6



d9

d3

d10



d22

0



≥ k

0

1

1



≥ k

1

•  split the queries into training set, validation and testing set

•  select a total number of judgments per query n, with n ≤ k

•  generate as many training and validation sets as the possible distributions of the n judgments over the four grades

Results

“WEB30k“ dataset by Microsoft Research:

•  ~30,000 web queries

•  136 features for each query-url pair

•  5 grades of relevance

0



TEST

2. Run learning-to-rank algorithms over the judgment sets.

3. Measure the performance in terms of nDCG@10.

Data

doc

rel











VALIDATE

2

2



≥ k

2

3

3



≥ k

3

More relevant

documents

Figure 1: RankBoost, Ranking SVM, Regression performance (nDCG@10, y-axis) as a function of the normalized cumulative gain of the training dataset per query (x-axis, left) and the variance of the labels in the training dataset per query (x-axis, right).

1. Distributions with a balance between the number of documents in the extreme grades are to be favored

2. The middle relevance grades play less important role than the extreme ones

References [1] J. A. Aslam, E. Kanoulas, V. Pavlu, S. Savev, and E. Yilmaz. Document selection methodologies for efficient and effective learning-to-rank. In Proceedings of the 32nd ACM SIGIR, pages 468–475. ACM Press, July 2009.

References Results Conclusions Data Methodology

of relevance in the training set on the performance of ... set, validation and testing set. • select a total number of ... ~30,000 web queries. • 136 features for each ...

745KB Sizes 1 Downloads 247 Views

Recommend Documents

introduction methods conclusions purpose results
composition predictor variable produced valid VO2 Peak estimates. (compared to VO2 Peak obtained from the Bruce treadmill (TM) test protocol) for average fit ...

introduction methods conclusions purpose results
Center (BRIC), School of Medicine, University of N. Carolina at Chapel Hill. ACKNOWLEDGEMENTS ... (ParvoMedics, Salt Lake City, Utah ). Blood pressure and ...

abstract # 1785 methods conclusions purpose results ...
Metabolic Cart System (ParvoMedics, Salt Lake City, Utah ). Blood pressure and ratings of ... 2 x 2 x 2 mm. The baseline image was acquired without a diffusion gradient (b = 0). The remaining gradients were b = 1000s/mm2. Fractional anisotropy (FA) a

Background Data and methodology - egbetokun
Data and methodology. The analysis is based on a pooled cross-sectional dataset from two large surveys carried out in 2007 and. 2011 on entrepreneurial ...

References - GitHub
Policy. µk(i), π π(s, a), π π, dMD t. (s) π. Transitions pij(µk(i)). Pa ss pt(·| s, a). P(s | St,at). Cost g(i, u, j). Ra ss rt(s, a). Ct(St,at). Terminal Cost. G(iN ). rT. rN (s).

References
[4] N. Arora and D. Kumar, “System analysis and maintenance manage- ... 2Department of Mechanical and Industrial Engineering, Email: [email protected].

Data Quality Control methodology - European Medicines Agency
Page 3/26. 1. Purpose. The purpose of this document is to describe the data quality framework for Article 57(2) data. The data quality framework for Article 57(2) data ..... Data analysis. Regulatory actions and legal obligation. Communication with s

General Conclusions
tive explanation is that dissociations between implicit and explicit tasks arise, not ... saw either a series of words (e.g., dog, phone, cloud) and were asked to ...

References
sole crop (Cl), maize + soy bean at 2:1 (C2) and maize 4- ... C1 78.67 111.33 125.8 340.2 A: -8 4445 1320 1050 ... cv ADT 36 under transplanted conditions .

References - Research at Google
A. Blum and J. Hartline. Near-Optimal Online Auctions. ... Sponsored search auctions via machine learning. ... Envy-Free Auction for Digital Goods. In Proc. of 4th ...

Sardanashvily's encyclopedic references
field theory leads to an infinite-dimensional phase space, when canonical variables are values of fields in any given instant. It fails to be a partner of Lagrangian formalism of classical field theory. The Hamilton equations on such a phase space ar

Introduction Data Results Conclusion Methods
Large flares can be disruptive to technology on Earth. • Prediction is ... images? • We use this information to cluster the images ... This work was partially supported by NSF grant CCF-1217880 and a NSF Graduate Research. Fellowship to the ...

Presidency conclusions
Jun 13, 2005 - 2. The European Council welcomes the signing of the Accession Treaty in .... of development, security, human rights, the rule of law and democracy. ... of 0,7% by 2015, while those which have achieved that target ... prevention, the fi