Co-optimization of Multiple Relevance Metrics in Web Search Dong Wang1,2,*, Chenguang Zhu1,2,*, Weizhu Chen2, Gang Wang2, Zheng Chen2 1

2

Institute for Theoretical Computer Science Tsinghua University Beijing, China, 100084 {wd890415, zcg.cs60}@gmail.com

ABSTRACT Several relevance metrics, such as NDCG, precision and pSkip, are proposed to measure search relevance, where different metrics try to characterize search relevance from different perspectives. Yet we empirically find that the direct optimization of one metric cannot always achieve the optimal ranking of another metric. In this paper, we propose two novel relevance optimization approaches, which take different metrics into a global consideration where the objective is to achieve an ideal tradeoff between different metrics. To achieve this objective, we propose to co-optimize multiple relevance metrics and show their effectiveness.

Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval;

General Terms Algorithms, Design, Experimentation, Theory.

Keywords Learning to Rank, User Feedback, LambdaRank.

1. INTRODUCTION Recent advances in search relevance have positioned it as a very important aspect of information retrieval (IR), and traditional works to improve search relevance can be grouped into two different categories based on the kinds of metrics used for optimization. The first one aims to improve relevance from explicitly judged labeled data by learning a ranking model to optimize a metric, like NDCG [4]. We call this kind of metric an explicit relevance metric since itโ€™s based on the explicit data. The other category looks for ways to improve search relevance by leveraging large-scale implicit user behavior log data from commercial search engines, and optimize another kind of metric, like CTR [2], pSkip [5]. We call this kind of metric an implicit relevance metric since itโ€™s based on implicit data. However, to the best of our knowledge, previous works mostly focus on optimizing one metric to improve search relevance, though both the explicit relevance metric and implicit metric have their own merits [3]. Yet, we empirically observe that the exclusive optimization of one metric cannot always achieve the optimal ranking of another metric. For example, directly Copyright is held by the author/owner(s). WWW 2010, April 20?4, 2010, Raleigh, North Carolina, USA. ACM 978-1-60558-799-8/10/04.

Microsoft Research Asia No. 49 Zhichun Road Haidian District Beijing, China, 100080 {v-dongmw, v-chezhu, wzchen, gawa, zhengc}@microsoft.com

optimizing NDCG on the explicit data often results in a nonoptimal relevance for pSkip on the implicit data, and vice versa. We may see this conflict from a lot of real examples. As an instance, for a query ๐‘ž, we will only consider its three URLs: ๐‘ข1 , ๐‘ข2 and ๐‘ข3 . For a case that ๐‘ข1 and ๐‘ข2 are both rated as Excellent while ๐‘ข2 has a higher click frequency than ๐‘ข1 , if we only optimize NDCG, the NDCG is maximized if we put ๐‘ข1 > ๐‘ข2 , where > means the right part is put below the left part in the search result; however, the pSkip doesnโ€™t achieve the optimal result since we put ๐‘ข2 with higher click frequency below ๐‘ข1 . In this extreme case, if we can optimize NDCG and pSkip simultaneously, we may put ๐‘ข2 > ๐‘ข1 , so NDCG and pSkip can both achieve the optimal result. For another case: ๐‘ข2 is a duplicate of ๐‘ข1 , so most users wonโ€™t click ๐‘ข2 and will likely jump to ๐‘ข3 if they are unsatisfied with ๐‘ข1 . So if ๐‘ข1 and ๐‘ข2 are more relevant than ๐‘ข3 , maximizing NDCG will rank them as ๐‘ข1 > ๐‘ข2 > ๐‘ข3 , while optimizing pSkip will rank them as ๐‘ข1 > ๐‘ข3 > ๐‘ข2 based on the click frequency. All of these real cases illustrate that we cannot solve this kind of conflict if we only consider one metric in optimization. Conversely, if we can take both metrics into consideration, itโ€™s possible for us to find an ideal tradeoff to optimize both metrics simultaneously. In this paper, we propose to co-optimize the explicit relevance metric and implicit relevance metric simultaneously with our objective being to find an ideal co-optimization approach. Especially, we aim to answer the question: how can we maximize one metric without even slightly sacrificing another metric? For example, we aim to find a ranking function that optimizes pSkip with the constraint that the decrease of the NDCG score is less than 0.1 percent. To achieve this objective, we propose two novel methods from different machine learning approaches to co-optimize multiple relevancies. ๏€ 

2. LEARNING MODELS Exclusive optimization for explicit metric cannot always achieve the optimal value for implicit metric, and verse vice. Here we propose two combination models.

2.1 Indirect Optimization Model Firstly, we propose indirect optimization model. In this model, we try to integrate CTR into the calculation of NDCG. In order

๏€ 

*This work was done when the first and second authors were visiting Microsoft Research Asia.

๐‘“๐ผ๐‘‚ =

2๐‘Ÿ ๐‘ž

1

๐‘–

โˆ™ ๐›ผ๐ถ๐‘‡๐‘… ๐‘‘ ๐‘ž ๐‘– +1โˆ’๐›ผ

๐‘–

๐‘“๐‘š๐‘Ž๐‘ฅ

(1)

๐‘™๐‘œ๐‘” 1+๐‘–

where ๐‘“๐‘š๐‘Ž๐‘ฅ is the normalizing factor being the ideal evaluation score, ๐‘Ÿ๐‘ž ๐‘– is the rating for document ranked at position ๐‘– . ๐ถ๐‘‡๐‘… ๐‘‘๐‘ž ๐‘– is the click through rate for document ranked at position ๐‘– . Here, we use LambdaRank[1] to optimize the evaluation function. The ๐œ†๐‘–๐‘— here is as (2): ๐‘“

๐œ†๐‘–๐‘—๐ผ๐‘‚ โ‰ก ๐‘†๐‘–๐‘— |โˆ†๐‘“๐ผ๐‘‚ ๐‘–๐‘—

๐œ•๐ถ ๐œ• ๐‘œ๐‘– ,๐‘—

|

(2)

Moreover, we propose direct optimization model. For direct optimization we built the optimization function as (3): ๐‘“๐ท๐‘‚ = ๐›ผ๐‘“ + 1 โˆ’ ๐›ผ ๐‘๐ท๐ถ๐บ

(3)

Here ๐‘“ is an implicit evaluation function like CTR or pSkip. We can generate two ๐œ†-gradients for each pair of training documents during the training process. One is generated by documentโ€™s label in order to optimize NDCG and the other is generated by user implicit feedback in order to optimize ๐‘“. So that the total ๐œ†gradient for each pair of search result is (4): ๐œ†๐‘–๐‘— โ‰ก ๐›ผ๐œ†๐‘“๐‘–๐‘— + 1 โˆ’ ๐›ผ ๐œ†๐‘๐ท๐ถ๐บ ๐‘–๐‘—

(4)

More specially, ๐œ†๐‘–๐‘— for optimize NDCG and ๐‘“๐‘๐‘†๐‘˜๐‘–๐‘ is as (5): ๐œ•๐ถ

| + 1 โˆ’ ๐›ผ ๐‘† โ€ฒ ๐‘–๐‘— |โˆ†๐‘๐ท๐ถ๐บ๐‘–๐‘—

๐œ•๐ถ ๐œ•๐‘œ๐‘– ,๐‘—

|(5)

And ๐œ†๐‘–๐‘— for optimize NDCG and ๐‘“๐ถ๐‘‡๐‘… as (6): ๐œ•๐ถ

๐œ†๐‘–๐‘— โ‰ก ๐›ผ๐‘†๐‘–๐‘— |โˆ†๐‘“๐ถ๐‘‡๐‘…@๐‘ ๐‘–๐‘— ๐œ• ๐‘œ | + 1 โˆ’ ๐›ผ ๐‘†โ€ฒ ๐‘–๐‘— |โˆ†๐‘๐ท๐ถ๐บ๐‘–๐‘— ๐‘–,๐‘—

๐œ•๐ถ ๐œ•๐‘œ๐‘– ,๐‘—

|(6)

Notice that ๐‘†๐‘–๐‘— and ๐‘† โ€ฒ ๐‘–๐‘— may be different since they get their value by different evaluation function.

3. EXPERIMENTAL RESULTS We set two experiments to show the performance of our learning models. More specifically, our experiments show that we can improve implicit relevance such as CTR, pSkip with explicit relevance NDCG no significant drop, and vice versa. We compare different learning models on a large real dataset. In the following diagram, IO: Stand for indirect optimization model. DO: Stand for direct optimization model. pSkip vs. NDCG@10 DO_pSkip

0.64

IO_pSkip

0.61

0.57

0.74

0.76

0.78

0.8

0.82

CTR@10 Figure 2: curve generated by CTR@10 and NDCG@10 In Figure 2, we show the performance of combing ๐‘“๐ถ๐‘‡๐‘…@10 with NDCG by our learning models. We see indirect optimization model is more sensitive than direct optimization model. Both two models increase CTR score by 4% with NDCG score remains the same. Overall, Indirect optimization model always treat explicit relevance as important metric. Direct optimization model can achieve the optimal point for any tradeoff parameter.

4. CONCLUSION In this paper we investigate two novel approaches to cooptimize implicit relevance metric and explicit relevance metric, and evaluate our learning modelsโ€™ performance by the curve generated by NDCG, CTR and pSkip as entity metrics. By optimizing the combination function of these metrics, we can reach an ideal balance between explicit relevance metric and implicit metric. Especially, we achieve a better pSkip or CTR score without drop of NDCG score.

5. REFERENCES [1] Burges C.J.C., Ragno R., and Le Q.V. Learning to rank with non-smooth cost function. Proceedings of NIPS, 2006.

[2] Fox S., Karnawat K., Mydland M., Dumais S.T., and White T. Evaluating implicit measures to improve the search experience. ACM Transactions on Information Systems, 23:147โ€“168, 2005.

[3] Huffman S.B., and Hochster M. How well does result relevance predict session satisfaction? In Proc. of SIGIR, 2007.

0.62

NDCG@10

IO_CTR@10

0.63

0.55

2.2 Direct Optimization Model

๐œ• ๐‘œ๐‘–,๐‘—

CTR@10 vs. NDCG@10 DO_CTR@10

0.59

Here ๐‘†๐‘–๐‘— equals 1 when ๐‘‘๐‘ž ๐‘– is more valuable than ๐‘‘๐‘ž (๐‘—) and -1 otherwise.

๐œ†๐‘–๐‘— โ‰ก ๐›ผ๐‘†๐‘–๐‘— |โˆ†๐‘“๐‘๐‘†๐‘˜๐‘–๐‘ ๐‘–๐‘—

In Figure 1, we show the performance of direct optimization model and indirect optimization model are almost the same when pSkip is high, but direct optimization model will get a higher NDCG score when pSkip score is low. Moreover, we get the same NDCG score and decrease pSkip score by 2% in our new learning models.

NDCG@10

to balance two measurements, we add a tradeoff parameter ๐›ผ into our optimization function as (1):

0.6

[4] Jarvelin, K., and Kekanainen, J. (2000). Ir evaluation

0.58 0.56

methods for retrieving highly relevant Proceedings of SIGIR 2000, 41โ€“48.

pSkip

0.54 0.82

0.825

0.83

0.835

0.84

documents.

[5] Wang K., Walker T., and Zheng Z. PSkip: Estimating 0.845

0.85

0.855

Figure 1: curve generated by pSkip and NDCG@10

relevance ranking quality from web search clickthrough data. Proceedings of KDD, 2009.

Co-optimization of Multiple Relevance Metrics in Web ...

Co-optimization of Multiple Relevance Metrics in Web. Search. Dong Wang1,2,*, Chenguang Zhu1,2,*, Weizhu Chen2, Gang Wang2, Zheng Chen2. 1Institute for Theoretical. Computer Science. Tsinghua University. Beijing, China, 100084. {wd890415, zcg.cs60}@gmail.com. 2Microsoft Research Asia. No. 49 Zhichun Road.

474KB Sizes 0 Downloads 161 Views

Recommend Documents

Relevance of Multiple Quasiparticle Tunneling between ...
Oct 16, 2008 - A suitable framework ... The electron number density is jร‘ย€xรยฎ ร‘ย˜ @x c. jร‘ย€xรยฎ=2 . ... the appropriate commutation relations with the electron density.

A Study of Relevance Propagation for Web Search -
one of the best results on the Web Track of TREC 2004 [11], which shows that .... from the .gov domain in early 2002. ..... Clearly, so long a time to build index is.

A Study of Relevance Propagation for Web Search
H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval; H.5.4 [Information ... content-based attributes through web structure. For example,.

Entity Linking in Web Tables with Multiple Linked Knowledge Bases
in Figure 1, EL aims to link the string mention รขย€ยœMichael Jordanรขย€ย to the entity ... the first row of Figure 1) in Web tables, entity types in the target KB, and so ..... science, services and agents on the world wide web 7(3), 154รขย€ย“165 (2009) ...

Web Page Relevance: What are we measuring? - Ryen W. White
challenges for the designers of interactive systems, who need to make their own system useful ... to the design and construction of large-scale interactive systems. ... 09: 00 09: 30 Introduction and Ice-breaker: Interactive (out-of-your-chair).

Improving web search relevance with semantic features
Aug 7, 2009 - the components of our system. We then evaluate ... of-art system such as a commercial web search engine as a .... ment (Tao and Zhai, 2007).

Heterogeneous Web Data Search Using Relevance ...
Using Relevance-based On The Fly Data Integration. Daniel M. Herzig ..... have been replaced by variables in the first step, act as a keyword query in the secondย ...

Web Image Retrieval Re-Ranking with Relevance Model
ates the relevance of the HTML document linking to the im- age, and assigns a ..... Database is often used for evaluating image retrieval [5] and classification ..... design plan group veget improv garden passion plant grow tree loom crop organ.

Web Page Relevance: What are we measuring? - Ryen W. White
to the design and construction of large-scale interactive systems. Designing ... Increasing the speed of Information Access on the web using HTML feature.

The Relevance of Introducing Design Education in ...
the development and education of children will play a very important role in .... can opt to offer design and technology course to students who would study it asย ...

MULTIPLE USES OF PINEAPPLE IN FOOD INDUSTRIES.pdf
The manufacturing process of the proposed pineapple products viz. slices and juice involves many steps and different sub-processes. Ripe and matured pineapples are. washed, graded and peeled. Then they are crushed in the crusher to obtain juice. In c

MULTIPLE USES OF PINEAPPLE IN FOOD INDUSTRIES.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. PRSVKM-ย ...

Metrics of human perception of vanishing points in ...
cloud of VPs located by the subjects (such as the blue, green and red clouds in Figure 8). First, the centroid of the cloud is calculated: xcentroid= (รขยˆย‘ xi)/nvp)ย ...

Mo_Jianhua_Asilomar15_Limited Feedback in Multiple-Antenna ...
Retrying... Mo_Jianhua_Asilomar15_Limited Feedback in Multiple-Antenna Systems with One-Bit Quantization.pdf. Mo_Jianhua_Asilomar15_Limited Feedbackย ...

Selection of Techniques and Metrics - Washington University in St ...
Performance Time, Rate, Resource. รขยžยข Error rate, probability. รขยžยข Time to failure and duration. รขยย‘ Consider including: รขยžยข Mean and variance. รขยžยข Individual and Global.

Epub Download Handbook of Metrics for Research in ...
http://webpages.csom.umn.ed u/oms/schroeder/scalebook/in dex.html.Meet author Aleda. Roth! http://business.clemson.edu/. Managemt/faculty/l3_fac_Ale.

The Value of Different Customer Satisfaction and Loyalty Metrics in ...
Managers commonly use customer feedback data to set goals and monitor performance ... value in predicting future business performance and that Top 2 Boxย ...

Selection of Techniques and Metrics - Washington University in St ...
Case Study: Two Congestion Control Algorithms. รขยย‘ Service: ... Some packets are delivered out-of-order to the ... Some packets are dropped on the way (lost.