Part III

Semantic Search

Semantic search

- What is “semantic” search?

- understanding intent, contextual meaning - finding actual answers for information needs - combining text and structure

- “Entity-centric search”

In this part - Query understanding

- Semantic search tasks

- Result presentation

- My first semantic search engine

- Open challenges

Query understanding - First step: recognize, label, and disambiguate entities in queries

-

add: attributes/aspects add: types add: relationships add: actions/verbs etc.

- Then: query understanding

- what is the intent? - currently: mostly template-based [Agarwal et al. 2010]

Query understanding

- Adding structure to queries

- Query intents

- Interaction: recommendation, auto-completion

Adding structure to queries - Phrases, segmentation, weighting
 [Bendersky et al. 2010]

- Keyword queries to structured queries
 [Sarkas et al. 2010; Pound et al. 2012; Mass & Sagiv 2012]

- Query templates
 [Agarwal et al. 2010; Szpektor et al. 2011]

Query intents - Query intent classification

- navigational, informational, or transactional
 [Broder 2002] - extensions to Broder 
 [Rose & Levinson 2004; Jansen et al. 2007] - semantic intents 
 [Hu et al. 2009]

- Query context (sessions, users, history, user agents)

Named Entities in Queries [Guo et al. 2009]

- Entities in 71% of queries

- Context per type: Movie

Game

Book

Music

# movie

# games

# summary

# lyrics

# photos

# cheats

# book

# video

# soundtrack

lego #

# review

# song

# pics

# download

# star

lyrics #

# movies

# wallpaper

# synopsis

lyrics to #

Distribution of web search queries [Pound et al. 2010] 6%

41% 36%

1%5%

12%

Entity (“1978 cj5 jeep”) Type (“doctors in barcelona”) Attribute (“zip code waterville Maine”) Relation (“tom cruise katie holmes”) Other (“nightlife in Barcelona”) Uninterpretable

Interaction: recommendation, auto-completion

Semantic search 
 tasks/methods - Ad-hoc

- Information filtering

- Finding aggregates

- Profiling and slot filling

- Exploratory search, longitudinal search tasks, and serendipity

Finding aggregates - Question Answering over Linked Data (QALD)

- Aggregating information from multiple sources (both unstructured and structured) Which German cities have more than 250000 inhabitants? How many space missions have there been? Who is the youngest player in the Premier League?

Information filtering - CLEF RepLab

- Given a stream of items (tweets) identify 1. which entity this is about 2. how it impacts the “reputation” of that entity

- Cumulative Citation Recommendation (CCR) @TREC Knowledge Base Acceleration

- Filter a time-ordered corpus for documents that are highly relevant to a predefined set of entities - For each entity, provide a ranked list of documents based on their “citation-worthiness”

CCR @TREC KBA Update Wikipedia

citation worthy?

Yes

Wikipedia editor

Entity in Wikipedia

Streaming documents

Profiling and slot filling - Entity profiling

- generate a profile of an entity - summary (keywords/full-text) - timelines - …

- Slot filling

- automatically fill attribute fields

Exploratory search, longitudinal search tasks, and serendipity - Entity-driven serendipitous search system
 [Bordino et al. 2013]

- Lazy random walk on entity networks extracted from Wikipedia and Yahoo! Answers - The entity networks are similar, but Yahoo! Answers contributes more to serendipitous browsing experience

Exploratory search, longitudinal search tasks, and serendipity

Exploratory search, longitudinal search tasks, and serendipity

Result presentation

- Rich result pages (SERPs)

- Directly displaying answers and relevant information or context

Rich result pages

Rich result pages

Rich result pages

Direct displays

Spoiler alert

- If you plan on watching Germany – Portugal, close your eyes for a bit…

Template-based query understanding - Rule-based approaches (editorial)

- high precision - difficult to generalize - costly to create/maintain

- Research into more generic approaches is ongoing

Evaluation

Evaluation - Do traditional metrics cover semantic search?

- do we know the user’s intent from a keyword query? - haven’t users grown accustomed to not getting actual answers? - no interaction when the correct answer is shown

- Other “dimensions” of relevance

-

recency interestingness popularity social

So, we have some tools now: How to go about QU? - Entity linking

- assume some knowledge base/structure/graph - link to the entity that is mentioned in the input - “recognize, label, and disambiguate” - leverage various signals - titles, redirects, anchor text, ... - graph information - machine learning: clicks/editorial data

So, we have some tools now: How to go about QU? - Entity retrieval

- retrieve the actual answer - single entity - set/list of entities - attribute value(s) - snippet - from the KB, the web, a vertical, ...

- In some cases, EL/ER can be one and the same...

Let’s take a step back

- Assume you want to build a “semantic” search engine

- What would you need? - What would you do? - Which of the building blocks we have seen do we need and how do they fit together?

The life of a query

The life of a query

- “How tall is the eiffel tower”

- “eiffel tower height"

The life of a query

- “Which airlines fly the Airbus A380?”

- “airbus a380 airlines"

The life of a query

- “yul arrivals”

The life of a query

- “woody allen movies”

The life of a query

- “Who was the oldest person in outer space?”

- “oldest person outer space"

Open challenges - Tail entities

- (Hyper)local

- Social

- Aggregation/Summarization

- “Provenance” ~ result explanation

- (Online) evaluation

- Freshness

Wrap-up - Introduction

- Part 1 – Entity Linking

- Part 2 – Entity Retrieval

- Part 3 – Semantic Search

Questions?

Edgar Meij – @edgarmeij Krisztian Balog – @krisztianbalog Daan Odijk – @dodijk

See http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/ for the slides, bibliography, and links.

20140615 Entity Linking and Retrieval for Semantic ... - WordPress.com

Lazy random walk on entity networks extracted from. Wikipedia ... The entity networks are similar, but Yahoo! ... Other “dimensions” of relevance. - recency. - interestingness. - popularity. - social ... Assume you want to build a “semantic” search.

4MB Sizes 6 Downloads 263 Views

Recommend Documents

20140615 Entity Linking and Retrieval for Semantic Search ... - GitHub
Jun 15, 2014 - blog posts. - tweets. - queries. - ... - Entities: typically taken from a knowledge base. - Wikipedia. - Freebase. - ... Page 24 ... ~24M senses ...

20140615 Entity Linking and Retrieval for Semantic ... - WordPress.com
WiFi. - Network: Delta-Meeting. - Password: not needed(?). Page 3 ... Entity/Attribute/Relationship retrieval. - + social, + personal. - + (hyper)local ...

20140615 Entity Linking and Retrieval for Semantic Search ... - GitHub
Wikipedia Miner. [Milne & Witten 2008b]. - Open source. - (Public) web service. - Java. - Hadoop preprocessing pipeline. - Lexical matching + machine learning.

20140615 Entity Linking and Retrieval for Semantic ... - WordPress.com
Freebase. - Probabilistic retrieval model for semistructured data. - Exercises. - Entity Retrieval with a probabilistic retrieval model for semistructured data ...

Semantic Search Interface for Entity/Fact Retrieval
a semantic knowledge base containing the extracted data and a semantic search ... process; H.3.5 [Information Storage and Retrieval]: Online. Information ...

entity retrieval - GitHub
Jun 15, 2014 - keyword unstructured/ semi-structured ranked list keyword++. (target type(s)) ..... Optimization can get trapped in a local maximum/ minimum ...

Strong Baselines for Cross-Lingual Entity Linking - Stanford NLP
documents with articles in a knowledge base (KB). In the two earliest TAC-KBPs, the KB was a subset of the English. Wikipedia, and the documents were also in ...

Strong Baselines for Cross-Lingual Entity Linking - Stanford NLP Group
managed to score above the median entries in all previ- ous English entity ... but now also any non-English Wikipedia pages covered by the cross-mapper ...

Vectorial Phase Retrieval for Linear ... - Semantic Scholar
Sep 19, 2011 - and field-enhancement high harmonic generation (HHG). [13] have not yet been fully .... alternative solution method. The compact support con- ... calculating the relative change in the pulse's energy when using Xр!Ю (which ...

Discriminative Models for Information Retrieval - Semantic Scholar
Department of Computer Science. University ... Pattern classification, machine learning, discriminative models, max- imum entropy, support vector machines. 1.

Unsupervised, Efficient and Semantic Expertise Retrieval
a case-insensitive match of full name or e-mail address [4]. For. CERC, we make use of publicly released ... merical placeholder token. During our experiments we prune V by only retaining the 216 ..... and EX103), where the former is associated with

Unsupervised, Efficient and Semantic Expertise Retrieval
training on NVidia GTX480 and NVidia Tesla K20 GPUs. We only iterate once over the entire training set for each experiment. 5. RESULTS AND DISCUSSION. We start by giving a high-level overview of our experimental re- sults and then address issues of s

Efficient Speaker Identification and Retrieval - Semantic Scholar
identification framework and for efficient speaker retrieval. In ..... Phase two: rescoring using GMM-simulation (top-1). 0.05. 0.1. 0.2. 0.5. 1. 2. 5. 10. 20. 40. 2. 5. 10.

Efficient Speaker Identification and Retrieval - Semantic Scholar
Department of Computer Science, Bar-Ilan University, Israel. 2. School of Electrical .... computed using the top-N speedup technique [3] (N=5) and divided by the ...

Entity Linking in Web Tables with Multiple Linked Knowledge Bases
in Figure 1, EL aims to link the string mention “Michael Jordan” to the entity ... the first row of Figure 1) in Web tables, entity types in the target KB, and so ..... science, services and agents on the world wide web 7(3), 154–165 (2009) ...

Stanford-UBC Entity Linking at TAC-KBP - Stanford NLP Group
Computer Science Department, Stanford University, Stanford, CA, USA. ‡ .... Top Choice ... training data with spans that linked to a possibly related entity:.

Efficient Spectral Neighborhood Blocking for Entity ... - Semantic Scholar
106, no. 50, pp. 21 068–21 073, 2009. [23] G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Communications of the ACM, vol. 18, no. 11, pp. 613–620, 1975. [24] P. McNamee and J. Mayfield, “Character п-gram

An active feedback framework for image retrieval - Semantic Scholar
Dec 15, 2007 - coupled and can be jointly optimized. ..... space when user searches for cars images. ..... Optimizing search engines using click through data.

A Semantic Content-Based Retrieval Method for ...
image database systems store images as a complementary data of textual infor- mation, providing the ... Lecture Notes in Computer Science - Springer Verlag ...

Semantic Image Retrieval and Auto-Annotation by ...
Conventional information retrieval ...... Once we have constructed a thesaurus specific to the dataset, we ... the generated thesaurus can be seen at Table 5. 4.3.

Citation-based retrieval for scholarly publications - Semantic Scholar
for and management of information. Some commercial citation index ... database. Publications repository. Indexing client. Intelligent retrieval agent. Citation indexing agent. Indexing client. Retrieval client. Retrieval client. Figure 1. The scholar

20140615 30Km - Trail.pdf
8 LEPORCQ JEAN PHILIPPE 1736 M V1 4 02:40:01.12. 9 SIBER JONAS 1337 M SE 5 02:40:02.05. 10 HUMBERT CAMILLE 1595 M SE 6 02:40:25.11.

Distributed Kd-Trees for Retrieval from Very Large ... - Semantic Scholar
covers, where users can take a photo of a book with a cell phone and search the .... to supply two functions: (1) Map: takes an input pair and produces a set of ...