Overview of the TREC 2016 Contextual Suggestion Track Seyyed Hadi Hashemi, Charles Clarke, Jaap Kamps, ! Julia Kiseleva, and Ellen Voorhees! !

University of Amsterdam, University of Waterloo, NIST

TREC 2016

Overview 1. Contextual Suggestion Track Tasks and Setup 2. Test Collection 3. Results 4. Analysis 5. Summary 2

Track Tasks and Setup 1_____________________________ Data Collection, Context, Profile, Tasks

3

What’s Contextual Suggestion? 4

Track Setup 1) 2) 3) 4) 5)

Context! City Trip Type Trip Duration Group Type Season

1) 2) 3) 4)

Profile! Ratings Endorsements Age Gender

Contextual Suggestion System

Attractions! ————! ————! ————! ———— 5

Track’s Aim The Contextual Suggestion Track investigates search techniques for complex information needs that are highly dependent on context and user interests.

6

5th Year! How the TREC 2016 Contextual Suggestion Track changed based on what we learned in previous years?

7

Data Collection •

2012: Open Web



2013: Open Web and ClueWeb12



2014: Open Web and ClueWeb12



2015: TREC CS Collection as a fixed collection



2016: TREC CS Web Corpus as an additional data 8

TREC CS Web Corpus •

Web crawl!



WARC format •

Easy to recreate



~1M pages



Copyright form 9

Overview: Context 1) 2) 3) 4) 5)

Context! City Trip Type Trip Duration Group Type Season

1) 2) 3) 4)

Profile! Ratings Endorsements Age Gender

Contextual Suggestion System

Attractions! ————! ————! ————! ———— 10

Context •

2012: Location and time



2013 and 2014: Location



2015 and 2016: Location, and several pieces of data about the trip: •

A city (e.g., Gaithersburg)



A trip type (e.g., Business)



A trip duration (e.g., Weekend trip)



A type of group a person travelling with (e.g., Friends)



A season the trip will occur in (e.g., Autumn) 11

Overview: Profile 1) 2) 3) 4) 5)

Context! City Trip Type Trip Duration Group Type Season

1) 2) 3) 4)

Profile! Ratings Endorsements Age Gender

Contextual Suggestion System

Attractions! ————! ————! ————! ———— 12

Profile •

2012, 2013 and 2014: Ratings



2015 and 2016: Ratings, age, and gender!



2016: Endorsements collected by NIST assessors! •

Family friendly, outdoor activity, nature walks, parks, museums, art galleries, cocktails, and … 13

Overview: Tasks 1) 2) 3) 4) 5)

Context! City Trip Type Trip Duration Group Type Season

1) 2) 3) 4)

Profile! Ratings Endorsements Age Gender

Contextual Suggestion System

Attractions! ————! ————! ————! ———— 14

TREC 2016 CS Had 2 Phases •



Phase 1 •

A Collection-based task



Participants had to retrieve 50 suggestions from the TREC CS Collection

Phase 2 •

A reranking task



Suggestions candidates set is provided for each request 15

Participants’ statistics over the last 5 years •

2012: 14 teams - 27 runs



2013: 19 teams - 34 runs



2014: 17 teams - 21 runs



2015: Live (6 teams - 9 runs), and Batch (17 teams - 30 runs); 39 runs in total



2016: Phase 1 (8 teams - 15 runs), and Phase 2 (13 teams - 30 runs); 45 runs in total 16

Test Collection 2_______________________ Qrel, Test Collection Statistics

17

Test Collection •

# Requests : 424



# Persons: 241



# Cities: 215



TREC 2016 official test collection: •

# Requests: 61 (Phase 1) and 58 (Phase 2)



# Persons: 27



# Cities: 48 + 2 (Seed cities)



# Unique Judged Venues: 5,162



Avg # Venues / Request: 95



# Judgments: 5,782 unique judgments 18

Test Collection Statistics

19

Test Collection Statistics

20

Endorsements •

How many judged tags/categories? 133



Relevance probability of each category in the qrel: ! ! ! ! !

21

Results 3______________________________ Metrics, Phase 1 Results, Phase 2 Results

22

Metrics! NDCG@5! P@5! MRR

23

Phase 1 Results

24

Top-3 Phase 1 Submissions •

#1: USI (Universite della Svizzera italiana) •

They crawled 600K venues from Foursquare to create positive and negative category profiles



They created venue taste keyword profiles



Final score is a linear combination of the venue category and taste keyword scores.



#2: IAPLab (Nanjing University)



#3: ADAPT_TCD (ADAPT Centre, Trinity College Dublin) •

Ontology-based approach



They created the ontology using the Foursquare Category Hierarchy 25

Phase 2 Results

26

Top-2 Phase 2 Submissions •



#1: DUTH (Democritus University of Thrace) •

They used Rocchio-like classifier using users’ rated venues as train set.



They build a custom query for the user using a modified Rocchio relevance feedback method

#2: LavalLakehead (Laval University & Lakehead University) •

Global interests model, regressor trained on the TREC 2015 data



Contextual individual preference model, using word embeddings



Final ranking is based on the combination of the above models 27

Analaysis 4_______________________________ Multi-depth Pooling, Reusability, Overlap@N

28

Is the TREC CS Test Collection Reusable?

29

How Difficult the Reusability Problem is! ! ! ! ! ! ! •

Just 22% of suggestions are mutual! 30

Reusability based on P@5 •

Leave One Team Out (LOTO)! !

2014

2015

2016

! !

31

Multi-depth Pooling •

Hard pool Cut-off = 5, soft pool cut-off = 25, and very soft pool cut-off = 50 !

Ranked List ! ! ! !

Hard Pool Cut-off

5. !

25.

Soft Pool Cut-off

!

Very Soft Pool Cut-off

50.

32

Multi-depth Pooling Cost •

We have more stable metrics in deep ranks without spending that much effort.



It costs even less than traditional pooling using 10 as the pool depth

! ! ! ! ! !

33

Reusability of TREC 2016 CS Test Collection

34

Overlap@N •

Overlap@N is mean fraction of judged documents over requests at rank N of runs



Previously, overlap@N drops dramatically after pool cut-off! ! 2015

2016

! ! !

35

Reusability of TREC 2016 CS Test Collection •

Phase 1 experiments is reusable for a new research group if they use either bpref or MAP metrics and they do statistical significance test!



bpref: 67 / 105 (64%) of differences are significant in 2016! !

2015

2016

! ! ! !

36

Reusability of TREC 2016 CS Test Collection •

Phase 1 experiments is reusable for a new research group if they use either bpref or MAP metrics and they do statistical significance test!



MAP: 57 / 105 (54%) of differences are significant in 2016! ! !

2016

2015

! ! !

37

Phase 2 VS. Pooling Bias •

There is not any pooling bias in Phase 2.



None of the runs are pooled.



Fair to be used by research groups, which have not participated in the track.



All the measures are practical for evaluation and system rankings.

38

Summary 5__________________ Summary, Discussion

39

Summary •

An overview of the Contextual Suggestion Track (Phase 1 and Phase 2).



TREC CS Test Collection.



TREC CS Phase 1 and Phase 2 results.



TREC CS Web Corpus and a set of endorsement released.



Multi-depth pooling provides a pool that creates reusable test collection, which is also more stable in ranks deeper than the traditional pool cut-off.



Phase 1 test collection is reusable based on more stable metrics like bpref and MAP.



Phase 2 test collection does not have pooling bias and is fair to be used to evaluate non-pooled runs. 40

Do You Want to Continue Working on Contextual Suggestion?

Come to the Task Track Planning Session 41

Thanks for Your Attention!

42

Overview of the TREC 2016 Contextual Suggestion Track

Contextual Suggestion Track Tasks and Setup. 2. Test Collection. 3. Results. 4. ... Track Setup. Profile. 1) Ratings. 2) Endorsements. 3) Age. 4) Gender. Context. 1) City. 2) Trip Type. 3) Trip Duration. 4) Group Type. 5) Season. Attractions. ————. ———— ... 2015: Live (6 teams - 9 runs), and Batch (17 teams - 30 runs); 39 ...

2MB Sizes 1 Downloads 172 Views

Recommend Documents

Overview of the TREC 2014 Federated Web Search Track
collection sampled from a multitude of online search engines. ... course, it is quite likely that reducing the admin- istration cost ... the best medium, topic or genre, for this query? ..... marketing data and analytics to web pages of many enter-.

Overview of the TREC 2014 Federated Web Search Track
evaluation system (see Section 4), and the remaining 50 are the actual test topics (see ... deadline for the different tasks, we opened up an online platform where ...

Session Track at TREC 2010
(G2) to evaluate system performance over an entire session instead of a single query. We limit the focus of the track to ... Department of Computer & Information Sciences University of Delaware ... An example Web track query is shown below.

Session Track at TREC 2010
Jul 23, 2010 - We call a sequence of reformulations in service of satisfying ... many more degrees of freedom, requiring more data from which to base ...

dimacs at the trec 2004 genomics track
to interpreting our results. 3http://www.stat.rutgers.edu/∼madigan/BBR/ .... extract from the document all windows of half-size k. (i.e. 2k +1 terms per window, except at the ..... 0.6512 0.3425. 0.0114. Table 5: NIST-supplied statistics on effecti

microsoft research asia at the web track of trec 2003
All of the above information are extracted and put into a storage called “forward ... We think that a web page as a whole is not a good information unit for search.

I. Provide an overview of contextual factors and the ...
aperture, shutter speed and film and how to ... and shutter speed, and talk about light and film loading. .... shutter speed, light meter, and focus. Students will be ...

TREC Newsletter September 2016.pdf
Page 1 of 8. September 2016 unsubscribe. Follow us Visit our website. - jwskusp . Fwd: Newsletter - September 2016. 1 message. Thane Chambers Fri, Sep 9, 2016 at 11:55 AM. To: - jwskusp . ---------- Forwarded message ----------. From: TREC ...

TREC CPN.pdf
Page 1 of 1. TEXAS REAL ESTATE COMMISSION. P.O. BOX 12188. AUSTIN, TEXAS 78711-2188. (512) 936-3000. THE TEXAS REAL ESTATE COMMISSION ...

2016-fast-track-iyidit.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

BHAA Trinity Track 2016-Results.pdf
1 20 6675 Thomas Sherlock Bank of Ireland 4 2:29.1. 2 18 Paul Fleming 3 2:30.7. 3 22 Jemil Saidi DUHAC 3 2:38.2. 4 16 John Moroney DUHAC 3 2:39.7. 5 17 David Grange 3 2:40.4. 6 19 Michael Glynn 3 2:43.3. 2000 m. Place Bib No Memb Name Company Std Tim

The Kuleshov Effect: the influence of contextual framing ...
Aug 16, 2006 - 1 (a) Schematic illustration of the paradigm. Participants were ..... L insula. 3.73. А50. 24. 6. Main effect of positive context minus neutral context.

The Kuleshov Effect: the influence of contextual framing ...
Aug 16, 2006 - the storage and recall of contextual information, particularly ..... Rolls, E.T., Browning, A.S., Critchley, H.D., Inoue, K. (2005). Face-selective.

TREC Blog and TREC Chem: A View from the Corn Fields
Text windows (+/- 800 characters including HTML code) surrounding the occurrences ... Runs 1 and 2 are identical in ranking headlines, they differ in how they.

Feedback of the meeting of the Fast Track Committee.pdf ...
There was a problem loading this page. Feedback of the meeting of the Fast Track Committee.pdf. Feedback of the meeting of the Fast Track Committee.pdf.

Overview of VISTA Benefits - 2016.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Overview of ...

Physics Department - Tenure Track Tenure-track position in the ...
Research programs in PSU Physics are primarily in the areas of biophysics ... collaborative opportunities that connects Oregon's three public research universities ... Applications must be submitted online to: https://jobs.hrc.pdx.edu. Review of ...

Ergebnis PTV TREC 30
May 27, 2018 - iS. Tie fsprun g. iS. Berg ab re ite n. Aufsitzen. Tie fsprun g a d. H. Rü ckwä rts richte n iS. Tor. Bau msta mm. iS. L ab yrin th. iS. Do lin e. Stillsta.

CCISD Bond 2016 Overview Flyer.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. CCISD Bond ...

MHTF Overview Revised_Jan 2016.pdf
environmental groups, academic and research institutions, local businesses, community leaders, city planners and. visionaries whose mission is to “develop healthy public health policies and activities that accommodate the. changing conditions of th

MS Overview.2016.pdf
Published by the Massachusetts Department of Elementary and Secondary Education. Page 3 of 3. MS Overview.2016.pdf. MS Overview.2016.pdf. Open.

MS Overview.2016.pdf
Published by the Massachusetts Department of Elementary and Secondary Education. Page 3 of 3. MS Overview.2016.pdf. MS Overview.2016.pdf. Open.