Overview of the TREC 2016 Contextual Suggestion Track

Viewer
Transcript

Overview of the TREC 2016 Contextual Suggestion Track Seyyed Hadi Hashemi, Charles Clarke, Jaap Kamps, ! Julia Kiseleva, and Ellen Voorhees! !

University of Amsterdam, University of Waterloo, NIST

TREC 2016

Overview 1. Contextual Suggestion Track Tasks and Setup 2. Test Collection 3. Results 4. Analysis 5. Summary 2

Track Tasks and Setup 1_____________________________ Data Collection, Context, Profile, Tasks

3

What’s Contextual Suggestion? 4

Track Setup 1) 2) 3) 4) 5)

Context! City Trip Type Trip Duration Group Type Season

1) 2) 3) 4)

Profile! Ratings Endorsements Age Gender

Contextual Suggestion System

Attractions! ————! ————! ————! ———— 5

Track’s Aim The Contextual Suggestion Track investigates search techniques for complex information needs that are highly dependent on context and user interests.

6

5th Year! How the TREC 2016 Contextual Suggestion Track changed based on what we learned in previous years?

7

Data Collection •

2012: Open Web

•

2013: Open Web and ClueWeb12

•

2014: Open Web and ClueWeb12

•

2015: TREC CS Collection as a fixed collection

•

2016: TREC CS Web Corpus as an additional data 8

TREC CS Web Corpus •

Web crawl!

•

WARC format •

Easy to recreate

•

~1M pages

•

Copyright form 9

Overview: Context 1) 2) 3) 4) 5)

Context! City Trip Type Trip Duration Group Type Season

1) 2) 3) 4)

Profile! Ratings Endorsements Age Gender

Contextual Suggestion System

Attractions! ————! ————! ————! ———— 10

Context •

2012: Location and time

•

2013 and 2014: Location

•

2015 and 2016: Location, and several pieces of data about the trip: •

A city (e.g., Gaithersburg)

•

A trip type (e.g., Business)

•

A trip duration (e.g., Weekend trip)

•

A type of group a person travelling with (e.g., Friends)

•

A season the trip will occur in (e.g., Autumn) 11

Overview: Profile 1) 2) 3) 4) 5)

Context! City Trip Type Trip Duration Group Type Season

1) 2) 3) 4)

Profile! Ratings Endorsements Age Gender

Contextual Suggestion System

Attractions! ————! ————! ————! ———— 12

Profile •

2012, 2013 and 2014: Ratings

•

2015 and 2016: Ratings, age, and gender!

•

2016: Endorsements collected by NIST assessors! •

Family friendly, outdoor activity, nature walks, parks, museums, art galleries, cocktails, and … 13

Overview: Tasks 1) 2) 3) 4) 5)

Context! City Trip Type Trip Duration Group Type Season

1) 2) 3) 4)

Profile! Ratings Endorsements Age Gender

Contextual Suggestion System

Attractions! ————! ————! ————! ———— 14

TREC 2016 CS Had 2 Phases •

•

Phase 1 •

A Collection-based task

•

Participants had to retrieve 50 suggestions from the TREC CS Collection

Phase 2 •

A reranking task

•

Suggestions candidates set is provided for each request 15

Participants’ statistics over the last 5 years •

2012: 14 teams - 27 runs

•

2013: 19 teams - 34 runs

•

2014: 17 teams - 21 runs

•

2015: Live (6 teams - 9 runs), and Batch (17 teams - 30 runs); 39 runs in total

•

2016: Phase 1 (8 teams - 15 runs), and Phase 2 (13 teams - 30 runs); 45 runs in total 16

Test Collection 2_______________________ Qrel, Test Collection Statistics

17

Test Collection •

# Requests : 424

•

# Persons: 241

•

# Cities: 215

•

TREC 2016 official test collection: •

# Requests: 61 (Phase 1) and 58 (Phase 2)

•

# Persons: 27

•

# Cities: 48 + 2 (Seed cities)

•

# Unique Judged Venues: 5,162

•

Avg # Venues / Request: 95

•

# Judgments: 5,782 unique judgments 18

Test Collection Statistics

19

Test Collection Statistics

20

Endorsements •

How many judged tags/categories? 133

•

Relevance probability of each category in the qrel: ! ! ! ! !

21

Results 3______________________________ Metrics, Phase 1 Results, Phase 2 Results

22

Metrics! NDCG@5! P@5! MRR

23

Phase 1 Results

24

Top-3 Phase 1 Submissions •

#1: USI (Universite della Svizzera italiana) •

They crawled 600K venues from Foursquare to create positive and negative category profiles

•

They created venue taste keyword profiles

•

Final score is a linear combination of the venue category and taste keyword scores.

•

#2: IAPLab (Nanjing University)

•

#3: ADAPT_TCD (ADAPT Centre, Trinity College Dublin) •

Ontology-based approach

•

They created the ontology using the Foursquare Category Hierarchy 25

Phase 2 Results

26

Top-2 Phase 2 Submissions •

•

#1: DUTH (Democritus University of Thrace) •

They used Rocchio-like classifier using users’ rated venues as train set.

•

They build a custom query for the user using a modified Rocchio relevance feedback method

#2: LavalLakehead (Laval University & Lakehead University) •

Global interests model, regressor trained on the TREC 2015 data

•

Contextual individual preference model, using word embeddings

•

Final ranking is based on the combination of the above models 27

Analaysis 4_______________________________ Multi-depth Pooling, Reusability, Overlap@N

28

Is the TREC CS Test Collection Reusable?

29

How Difficult the Reusability Problem is! ! ! ! ! ! ! •

Just 22% of suggestions are mutual! 30

Reusability based on P@5 •

Leave One Team Out (LOTO)! !

2014

2015

2016

! !

31

Multi-depth Pooling •

Hard pool Cut-off = 5, soft pool cut-off = 25, and very soft pool cut-off = 50 !

Ranked List ! ! ! !

Hard Pool Cut-off

5. !

25.

Soft Pool Cut-off

!

Very Soft Pool Cut-off

50.

32

Multi-depth Pooling Cost •

We have more stable metrics in deep ranks without spending that much effort.

•

It costs even less than traditional pooling using 10 as the pool depth

! ! ! ! ! !

33

Reusability of TREC 2016 CS Test Collection

34

Overlap@N •

Overlap@N is mean fraction of judged documents over requests at rank N of runs

•

Previously, overlap@N drops dramatically after pool cut-off! ! 2015

2016

! ! !

35

Reusability of TREC 2016 CS Test Collection •

Phase 1 experiments is reusable for a new research group if they use either bpref or MAP metrics and they do statistical significance test!

•

bpref: 67 / 105 (64%) of differences are significant in 2016! !

2015

2016

! ! ! !

36

Reusability of TREC 2016 CS Test Collection •

Phase 1 experiments is reusable for a new research group if they use either bpref or MAP metrics and they do statistical significance test!

•

MAP: 57 / 105 (54%) of differences are significant in 2016! ! !

2016

2015

! ! !

37

Phase 2 VS. Pooling Bias •

There is not any pooling bias in Phase 2.

•

None of the runs are pooled.

•

Fair to be used by research groups, which have not participated in the track.

•

All the measures are practical for evaluation and system rankings.

38

Summary 5__________________ Summary, Discussion

39

Summary •

An overview of the Contextual Suggestion Track (Phase 1 and Phase 2).

•

TREC CS Test Collection.

•

TREC CS Phase 1 and Phase 2 results.

•

TREC CS Web Corpus and a set of endorsement released.

•

Multi-depth pooling provides a pool that creates reusable test collection, which is also more stable in ranks deeper than the traditional pool cut-off.

•

Phase 1 test collection is reusable based on more stable metrics like bpref and MAP.

•

Phase 2 test collection does not have pooling bias and is fair to be used to evaluate non-pooled runs. 40

Do You Want to Continue Working on Contextual Suggestion?

Come to the Task Track Planning Session 41

Thanks for Your Attention!

42

Overview of the TREC 2014 Federated Web Search Track

Session Track at TREC 2010

dimacs at the trec 2004 genomics track

microsoft research asia at the web track of trec 2003

I. Provide an overview of contextual factors and the ...

TREC Newsletter September 2016.pdf

TREC CPN.pdf

2016-fast-track-iyidit.pdf

BHAA Trinity Track 2016-Results.pdf

The Kuleshov Effect: the influence of contextual framing ...

TREC Blog and TREC Chem: A View from the Corn Fields

Feedback of the meeting of the Fast Track Committee.pdf ...

Overview of VISTA Benefits - 2016.pdf

Physics Department - Tenure Track Tenure-track position in the ...

Ergebnis PTV TREC 30

CCISD Bond 2016 Overview Flyer.pdf

MHTF Overview Revised_Jan 2016.pdf

MS Overview.2016.pdf