Unsupervised, efficient and semantic expertise retrieval

Viewer
Transcript

Unsupervised, efficient and semantic expertise retrieval Christophe Van Gysel, Maarten de Rijke and Marcel Worring

What is expertise retrieval? Ò

The task of finding the right person with the appropriate skills and knowledge w.r.t. a topic.  Ò

For example, an area chair looking for reviewers. 

Ò

Document collections where documents are associated with one or more experts. 

Ò

Given a textual topic (e.g., “Information Retrieval”), rank experts in descending order of expertise. Semantic expertise retrieval

2

Semantic expertise retrieval

3

Experts

Semantic expertise retrieval

3

Experts

Semantic expertise retrieval

3

Experts

Documents

Semantic expertise retrieval

3

Experts

Documents

Semantic expertise retrieval

3

Experts

Documents

Semantic expertise retrieval

3

Example:  area chair looking for a review committee on “information retrieval"

Semantic expertise retrieval

3

Example:  area chair looking for a review committee on “information retrieval"

Rank experts in decreasing order of expertise.

Semantic expertise retrieval

3

Query: "information retrieval"

1.

Example:  area chair looking for a review committee on “information retrieval"

2.

3.

Rank experts in decreasing order of expertise.

4.

5.

Semantic expertise retrieval

3

How to do this? Experts

Documents

Semantic expertise retrieval

4

How to do this? Experts

Documents

First score documents using language models, then aggregate scores per expert.

Semantic expertise retrieval

4

How to do this? Experts

Documents

Concatenate documents associated with each expert into a pseudo-document for every expert.    Perform retrieval using language models.

Semantic expertise retrieval

4

Challenges

Semantic expertise retrieval

5

Challenges Ò

Queries and documents use different representations to describe the same concepts. 

Semantic expertise retrieval

5

Challenges Ò

Queries and documents use different representations to describe the same concepts. 

Ò

Scoring the whole document collection during retrieval is costly when we are only interested in experts. 

Semantic expertise retrieval

5

Challenges Ò

Queries and documents use different representations to describe the same concepts. 

Ò

Scoring the whole document collection during retrieval is costly when we are only interested in experts. 

Ò

Improve retrieval performance without requiring relevance judgments for machine-learned ranking.

Semantic expertise retrieval

5

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

Semantic expertise retrieval

6

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

Term

ti

Semantic expertise retrieval

6

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

ti

Embedding of t_i

Term

Semantic expertise retrieval

6

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

ti

Embedding of t_i

Term

Transform and apply soft-max

Semantic expertise retrieval

6

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

Distribution over experts

ti

Embedding of t_i

Term

Transform and apply soft-max

P (C | ti )

Semantic expertise retrieval

6

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

Semantic expertise retrieval

7

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C . P (C | “information”)

P (C | “retrieval”)

Semantic expertise retrieval

7

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C . P (C | “information”)

P (C | “retrieval”)

⇥

Semantic expertise retrieval

7

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C . P (C | “information”)

P (C | “retrieval”)

⇥

P (C | “information”“retrieval”)

=

Semantic expertise retrieval

7

How to learn representations? Ò

The embeddings and transformation is trained using batched stochastic gradient descent. 

Ò

Word embeddings are specialised for the domain.

Semantic expertise retrieval

8

How to learn representations?

Semantic expertise retrieval

8

How to learn representations?

Semantic expertise retrieval

9

How to learn representations?

Semantic expertise retrieval

9

How to learn representations?

Semantic expertise retrieval

9

How to learn representations?

Semantic expertise retrieval

9

How to learn representations?

Semantic expertise retrieval

9

How to learn representations?

⇥

⇥

Semantic expertise retrieval

9

How to learn representations?

⇥ = ⇥

Semantic expertise retrieval

9

How to learn representations?

⇥ = Compare with reference distribution  using cross-entropy

⇥

Semantic expertise retrieval

9

How to learn representations?

⇥ = Compare with reference distribution  using cross-entropy

⇥

Backpropagate errors! Semantic expertise retrieval

9

Experimental set-up

Semantic expertise retrieval

10

Experimental set-up Ò

Build and evaluate models on expert finding benchmarks Ò TREC Enterprise Track (2006 - 2008): Ò W3C (715 experts, 331k docs, 99 queries) Ò CERC (3 479 experts, 370k docs, 127 queries)  Ò

TU Expert Collection  (977 experts, 31k docs, 1 662 queries)

Semantic expertise retrieval

10

Experimental set-up Ò

Build and evaluate models on expert finding benchmarks Ò TREC Enterprise Track (2006 - 2008): Ò W3C (715 experts, 331k docs, 99 queries) Ò CERC (3 479 experts, 370k docs, 127 queries)  Ò

Ò

TU Expert Collection  (977 experts, 31k docs, 1 662 queries)

Compare the log-linear model to LSI, TF-IDF and language modelling approaches (Model 1 and Model 2).

Semantic expertise retrieval

10

What window size to choose? W3C

CERC 0.45

2005 2006

0.5

2007 2008

0.40 0.35

0.4

0.3

MAP

MAP

0.30

0.2

0.25 0.20 0.15 0.10

0.1 0.05 0.0

1 2

4

8

16

Window size

32

0.00

1 2

4

8

16

32

Window size

Semantic expertise retrieval

11

What window size to choose? W3C

CERC 0.45

2005 2006

0.5

2007 2008

0.40 0.35

0.4

0.3

MAP

MAP

0.30

0.2

0.25 0.20 0.15 0.10

0.1 0.05 0.0

1 2

4

8

16

Window size

32

0.00

1 2

4

8

16

32

Window size

Semantic expertise retrieval

11

Results

Semantic expertise retrieval

12

Results Ò

Outperforms language models on 4 out of 6 benchmarks.

Semantic expertise retrieval

12

Results Ò

Outperforms language models on 4 out of 6 benchmarks. Ò 17% to 86% relative increase in MAP over state-of-the-art language models.

Semantic expertise retrieval

12

Results Ò

Outperforms language models on 4 out of 6 benchmarks. Ò 17% to 86% relative increase in MAP over state-of-the-art language models. Ò No significant difference for the other benchmarks. 

Semantic expertise retrieval

12

Results Ò

Outperforms language models on 4 out of 6 benchmarks. Ò 17% to 86% relative increase in MAP over state-of-the-art language models. Ò No significant difference for the other benchmarks. 

Ò

Compared to semantic matching methods (LSI):

Semantic expertise retrieval

12

Results Ò

Outperforms language models on 4 out of 6 benchmarks. Ò 17% to 86% relative increase in MAP over state-of-the-art language models. Ò No significant difference for the other benchmarks. 

Ò

Compared to semantic matching methods (LSI): Ò Relative increase in MAP ranges from 83% to 1000%. Semantic expertise retrieval

12

Per-topic difference in MAP w.r.t.  document-centric language models CERC 1.0

0.5

0.5

0.0

4M AP

1.0

0.0

0.5

0.5

1.0

1.0

1.0

1.0

0.5

0.5

0.0

4M AP

4M AP

4M AP

W3C

0.0

0.5

0.5

1.0

1.0

Semantic expertise retrieval

13

Per-topic difference in MAP w.r.t.  document-centric language models CERC 1.0

0.5

0.5

0.0

4M AP

1.0

0.0

0.5

0.5

1.0

1.0

1.0

1.0

0.5

0.5

0.0

4M AP

4M AP

4M AP

W3C

0.0

0.5

0.5

1.0

1.0

Semantic expertise retrieval

13

Per-topic difference in MAP w.r.t.  document-centric language models CERC 1.0

0.5

0.5

0.0

4M AP

1.0

0.0

0.5

0.5

1.0

1.0

1.0

1.0

0.5

0.5

0.0

4M AP

4M AP

4M AP

W3C

0.0

0.5

0.5

1.0

1.0

Semantic expertise retrieval

13

Per-topic difference in MAP w.r.t.  document-centric language models CERC 1.00

0.5 .5

0.55

0.0 .0

4M AP

1.0 .0

0.00

0.5 .5

0.55

1.0 .0

1.00

1.0 .0

1.00

0.5 .5

0.55

0.0 .0

4M AP

4M AP

4M AP

W3C

0.00

0.5 .5

0.55

1.0 .0

1.00

Semantic expertise retrieval

13

What if we combine our approach with language models? Ò

For every topic qi, rank experts using the loglinear model and Model 2. 

Ò

Combine these two rankings according to the reciprocal rank of experts cj .

rankensemble /

1

·

1

rankmodel 2 (cj , qi ) ranklog-linear (cj , qi ) Semantic expertise retrieval

14

What if we combine our approach with language models? Ò

Outperforms our approach on 5 out of 6 benchmarks. 

Ò

Relative increase of 15% to 31% MAP compared to our discriminative approach by combining it with generative language models.

Semantic expertise retrieval

15

Can generative language models benefit from the log-linear model? Ò

Perform semantic query expansion by using the learned word embeddings. 

Ò

For a particular query (e.g. “information retrieval”),  add k terms closest to each of the terms in embedding space (e.g. “knowledge" and “search”). 

Ò

Instead of querying for “information retrieval”,with k = 1, we query for: “information retrieval knowledge search”

Semantic expertise retrieval

16

Can generative language models benefit from the log-linear model? Ò

We notice an increase in MAP by performing semantic query expansion on most benchmarks. 

Ò

Benchmarks that did not benefit were those for which our method did not outperform language models. 

Ò

Our intuition: some benchmarks require semantic matching, while others benefit from lexical matching.

Semantic expertise retrieval

17

How efficient is the log-linear model compared to generative models? Ò

During retrieval, time complexity is linear with the number of experts. 

Ò

Previous state-of-the art:  linear w.r.t. the number of documents.

Semantic expertise retrieval

18

Code is available at  https://github.com/cvangysel/sert

Semantic expertise retrieval

19

Code is available at  https://github.com/cvangysel/sert [cvangysel@ilps SERT] ./W3C-expert-finding.sh

Semantic expertise retrieval

19

Code is available at  https://github.com/cvangysel/sert [cvangysel@ilps SERT] ./W3C-expert-finding.sh

Verifying W3C corpus. Creating output directory. Fetching topics and relevance judgments. Constructing log-linear model on W3C collection. Evaluating on TREC Enterprise tracks. 2005 Enterprise Track: ndcg=0.5474; map=0.2603; recip_rank=0.6209; P_5=0.4098; 2006 Enterprise Track: ndcg=0.7883; map=0.4937; recip_rank=0.8834; P_5=0.7000; Semantic expertise retrieval

19

Conclusions Ò

Our log-linear model performs competitively with existing methods, while taking time complexity linear w.r.t. the number of experts. 

Ò

An ensemble between the log-linear model and generative language models performs best. 

Ò

Word embeddings learned by the log-linear model can be used to improve retrieval with language models.

Semantic expertise retrieval

20

Thank you! Christophe Van Gysel @cvangysel

Maarten de Rijke @mdr

Marcel Worring @marcelworring

Slides will be made available on http://chri.stophr.be

Semantic expertise retrieval

21

Unsupervised, Efficient and Semantic Expertise Retrieval

Efficient Speaker Identification and Retrieval - Semantic Scholar

Efficient Speaker Identification and Retrieval

Enabling Efficient Content Location and Retrieval in ...

Enabling Efficient Content Location and Retrieval in Peer ... - CiteSeerX

EFFICIENT INTERACTIVE RETRIEVAL OF SPOKEN ...

Unsupervised Feature Selection for Biomarker ... - Semantic Scholar

Enabling Efficient Content Location and Retrieval in ...

3D Object Retrieval using an Efficient and Compact ...

Efficient 3D shape matching and retrieval using a ...

BloomCast Efficient And Effective Full-Text Retrieval In Unstructured ...

Semantic-Shift for Unsupervised Object Detection - CiteSeerX

UNSUPERVISED LEARNING OF SEMANTIC ... - Research at Google

Unsupervised Maximum Margin Feature Selection ... - Semantic Scholar

Semantic Image Retrieval and Auto-Annotation by ...

Image retrieval system and image retrieval method

LATENT SEMANTIC RETRIEVAL OF SPOKEN ...

Vectorial Phase Retrieval for Linear ... - Semantic Scholar