TextInfer ⋅ July 30, 2011
Representing and resolving ambiguities in ontology-based question answering Christina Unger & Philipp Cimiano Semantic Computing Group, CITEC, Bielefeld University
1 / 34
Outline Ambiguities in ontology-based interpretation
Representing and resolving ambiguities Enumeration Underspecification Ontological reasoning
Results and conclusion
Outline
2 / 34
Outline Ambiguities in ontology-based interpretation
Representing and resolving ambiguities Enumeration Underspecification Ontological reasoning
Results and conclusion
Ambiguities in ontology-based interpretation
3 / 34
Ambiguities
Ambiguities comprise all cases in which natural language expressions have more than one meaning, due to: ▸
structural properties (e.g. modifier/PP attachment sites) ▸ ▸
▸
Put the box on the table by the window in the kitchen. We saw old elves and wizards.
alternative lexical meanings ▸
bank
Ambiguities in ontology-based interpretation
3 / 34
Ontology-based interpretation
Goal: Map natural language expressions (e.g. questions) to formal representations aligned to the ontology (e.g. SPARQL queries). ▸
Which city has the most inhabitants?
▸ PREFIX geo:
SELECT ?c WHERE { ?c rdf:type geo:city . ?c geo:population ?p . } ORDER BY DESC ?p LIMIT 1
Ambiguities in ontology-based interpretation
4 / 34
Ontology-based interpretation
The meaning of a lexical expression is the ontology concept that this expression verbalizes. ▸
Which city has the most inhabitants? ▸ city → geo:city ▸ has inhabitants → geo:population
Ambiguities in ontology-based interpretation
5 / 34
Ontology-based interpretation
Main challenge: Map natural language expressions to corresponding ontology concepts. This mapping needs not be one-to-one. ▸
different expressions can refer to the same ontology concept ▸ flows through → geo:flowsThrough ▸ traverses → geo:flowsThrough
▸
one expression can refer to different ontology concepts ▸ New York → geo:new_york, geo:new_york_city ▸ has → geo:flowsThrough, geo:inState
Ambiguities in ontology-based interpretation
6 / 34
Ambiguities in ontology-based interpretation
Ambiguities in the context of ontology-based interpretation comprise all cases in which a natural language expression cannot be mapped uniquely to an ontology concept.
Ambiguities in ontology-based interpretation
7 / 34
Non-overlapping alternatives ▸
What is the area of New York? ▸ (New York state) → geo:new_york ▸ (New York city) → geo:new_york_city
▸
What is the biggest city? ▸
▸
SELECT ?c WHERE { ?c a geo:city . ?c geo:population ?n . } ORDER BY DESC ?n LIMIT 1 SELECT ?c WHERE { ?c a geo:city . ?c geo:area ?n . } ORDER BY DESC ?n LIMIT 1
Ambiguities in ontology-based interpretation
8 / 34
Overlapping alternatives
▸
Give me all films starring Jeff Bridges. ▸ ▸
▸
only inculding leading roles also including supporting roles
Which cities have more than two million inhabitants? ▸ ▸
Ambiguities in ontology-based interpretation
9 / 34
Context-dependency ▸
Which state has the most rivers? ▸
▸
SELECT COUNT(?s) AS ?n WHERE { ?s a geo:state . ?x a geo:river . ?x geo:flowsThrough ?s. } ORDER BY DESC ?n LIMIT 1
Which state has the most cities? ▸
SELECT COUNT(?s) AS ?n WHERE { ?s a geo:state . ?x a geo:city . ?x geo:inState ?s. } ORDER BY DESC ?n LIMIT 1
Due to sortal restrictions, only one of the alternatives is admissible. Ambiguities in ontology-based interpretation
10 / 34
Ambiguities are pervasive. ▸
QALD-1 training questions for DBpedia ▸
▸
At least 16 % contain expressions that do not have a unique ontological correspondent.
880 user questions for GeoBase ▸ ▸
1278 occurences of light expressions: is/are, has/have, with, in, of 151 ocurrences of context-dependent expressions: big, small, major
Ambiguities in ontology-based interpretation
11 / 34
Outline Ambiguities in ontology-based interpretation
Representing and resolving ambiguities Enumeration Underspecification Ontological reasoning
Results and conclusion
Representing and resolving ambiguities
12 / 34
Representing ambiguities
▸
Enumeration: constructing a different semantic representation and query for every meaning alternative
▸
Underspecification: constructing only one underspecified representation that subsumes all different interpretations
Representing and resolving ambiguities
12 / 34
Enumeration Constructing a different meaning representation for every possible interpretation
Representing and resolving ambiguities
Enumeration
13 / 34
Enumeration ▸
How big is New York? ▸
▸
▸
▸
SELECT ?a WHERE { geo:new_york_city geo:area ?a . } SELECT ?p WHERE { geo:new_york_city geo:population ?p.} SELECT ?a WHERE { geo:new_york geo:area ?a . } SELECT ?p WHERE { geo:new_york geo:population ?p . }
▸
two lexical entries for big, one referring to geo:area and one referring to geo:population
▸
two lexical entries for New York, one referring to geo:new_york and one referring to geo:new_york_city
Representing and resolving ambiguities
Enumeration
14 / 34
Enumeration
▸
Which state has the most rivers? Which state has the most cities?
▸
two lexical entries for has, one referring to geo:flowsThrough and one referring to geo:inState
▸
Problem: geo:flowsThrough is only possible if the relevant argument is a river, geo:inState is only relevant if the relevant argument is a city
▸
Solution: In order not to derive inconsistent interpretations, we need to capture sortal restrictions.
Representing and resolving ambiguities
Enumeration
15 / 34
Adding sortal restrictions Lexical entries are enriched with sortal restrictions vˆclass. S DP1 ↓
VP V has
geo:flowsThrough (y, x)
DP2 ↓
(DP1 , x), (DP2 , y) yˆgeo:river
S DP1 ↓
VP V has
geo:inState (y, x)
DP2 ↓
Representing and resolving ambiguities
(DP1 , x), (DP2 , y) yˆgeo:city
Enumeration
16 / 34
Adding sortal restrictions When translating semantic representations into a formal query, sortal restrictions are added as a condition. ▸
Which state has the most rivers?
▸ SELECT COUNT(?y) as ?c WHERE {
?x a geo:state . ?y a geo:river . ?y geo:flowsThrough ?x . ?y a geo:river . } ORDER BY ?c DESC LIMIT 1 ▸ SELECT COUNT(?y) as ?c WHERE {
?x a geo:state . ?y a geo:river . ?y geo:inState ?x . ?y a geo:city . } ORDER BY ?c DESC LIMIT 1 Representing and resolving ambiguities
Enumeration
17 / 34
Example: major N ADJ major
N↓
geo:population (x, p) p > 150 000
(N, x) xˆgeo:city N ADJ major
N↓
geo:population (x, p) p > 10 000 000
(N, x) xˆgeo:state
Representing and resolving ambiguities
Enumeration
18 / 34
Example: major
▸
Give me all major cities.
▸ SELECT ?x WHERE {
?x a geo:city . ?x geo:population ?p . ?x a geo:city . FILTER ( ?p > 150000 ) } ▸ SELECT ?x WHERE {
?x a geo:city . ?x geo:population ?p . ?x a geo:state . FILTER ( ?p > 10000000 ) }
Representing and resolving ambiguities
Enumeration
19 / 34
Problems with enumeration The enumeration strategy relies on a conflict that automatically filters out unwanted interpretations by constructing a query that returns no result. ▸
Extensionality No way to distinguish between queries that return no result due to an inconsistency introduced by a sortal restriction, and queries that return no result, because there is no result (e.g. Which states border Hawaii?).
▸
Number of constructed interpretations User questions easily lead to 20 or 30 different possible interpretations at least.
Representing and resolving ambiguities
Enumeration
20 / 34
Underspecification Constructing one underspecified representation that subsumes all different interpretations
Representing and resolving ambiguities
Underspecification
21 / 34
Adding metavariables
Semantic representations are enriched with metavariables and metavariable specifications that list all possible instantiations of a metavariable given certain sortal restrictions.
Representing and resolving ambiguities
Underspecification
22 / 34
Example: biggest N ADJ biggest
N↓
y
P (x, y) max(y) (N, x) P → geo:area (x = geo:city ⊔ geo:state) ∣ geo:population (x = geo:city ⊔ geo:state) ∣ geo:height (x = geo:mountain)
Representing and resolving ambiguities
Underspecification
23 / 34
Semantically light expressions
For semantically light expressions, we assume the most underspecified representation: a metavariable without a metavariable specification.
Representing and resolving ambiguities
Underspecification
24 / 34
Example: has
S DP1 ↓
VP V has
DP2 ↓
P (y, x) (DP1 , x), (DP2 , y)
Representing and resolving ambiguities
Underspecification
25 / 34
Example: final underspecified representation ▸ ▸
Which state has the biggest city? x, y, z geo:state(y) geo:city(x) P (x, y) Q (x, z) max(z)
Q → geo:area (x = geo:city ⊔ geo:state) ∣ geo:population (x = geo:city ⊔ geo:state) ∣ geo:height (x = geo:mountain)
Representing and resolving ambiguities
Underspecification
26 / 34
The interpretation process thus constructs one underspecified representation that has to be specified in order to yield a specific query that can be evaluated w.r.t. a knowledge base.
Representing and resolving ambiguities
Underspecification
27 / 34
Ontological reasoning Integrating a reasoner in order to resolve metavariables and thereby construct exactly those interpretations that are possible and consistent
Representing and resolving ambiguities
Ontological reasoning
28 / 34
Resolving Q x, y, z geo:state(y) geo:city(x) P (x, y) Q (x, z), max(z)
Q → geo:area (x = geo:city ⊔ geo:state) ∣ geo:population (x = geo:city ⊔ geo:state) ∣ geo:height (x = geo:mountain) Check satisfiability of the intersection of x's type information with the sortal restrictions: ▸ geo:city ⊓ (geo:city ⊔ geo:state)
true
▸ geo:city ⊓ geo:mountain
false
Representing and resolving ambiguities
Ontological reasoning
29 / 34
Resolving Q
x, y, z geo:state(y) geo:city(x) P (x, y) Q (x, z) max(z)
Q → geo:area (x = geo:city ⊔ geo:state) ∣ geo:population (x = geo:city ⊔ geo:state)
Representing and resolving ambiguities
Ontological reasoning
30 / 34
Resolving Q
x, y, z
x, y, z
geo:state(y) geo:city(x) P (x, y) geo:area (x, z) max(z)
geo:state(y) geo:city(x) P (x, y) geo:population (x, z) max(z)
Representing and resolving ambiguities
Ontological reasoning
30 / 34
Example: Resolving P
x, y, z
x, y, z
geo:state(y) geo:city(x) P (x, y) geo:area (x, z) max(z)
geo:state(y) geo:city(x) P (x, y) geo:population (x, z) max(z)
The ontology is searched for a relation that admits geo:city (or a superclass) as domain and geo:state (or a superclass) as range. ▸ geo:inState Representing and resolving ambiguities
Ontological reasoning
31 / 34
Example: Resolving P
x, y, z
x, y, z
geo:state(y) geo:city(x) geo:inState (x, y) geo:area (x, z) max(z)
geo:state(y) geo:city(x) geo:inState (x, y) geo:population (x, z) max(z)
The ontology is searched for a relation that admits geo:city (or a superclass) as domain and geo:state (or a superclass) as range. ▸ geo:inState Representing and resolving ambiguities
Ontological reasoning
31 / 34
Outline Ambiguities in ontology-based interpretation
Representing and resolving ambiguities Enumeration Underspecification Ontological reasoning
Results and conclusion
Results and conclusion
32 / 34
Results
Of the 880 GeoBase user questions, Pythia can handle 624.
Enumeration Reasoning
Results and conclusion
Total # queries 3180 2100
Avg. # queries 5.1 3.4 (-44%)
Max. # queries 96 24 (-75%)
32 / 34
Distribution (improved results) .
.242
.111 .87 .51
.
.3 .1
Results and conclusion
.2
.3
.4
.5
.21 .6
. 0 .7
.4 .8
. 0 .9
.4 .10
.
33 / 34
Conclusion
Ambiguous expressions are everywhere. But disambiguation strategies ▸
allow to discard interpretations that are not consistent in the context and therefore should be filtered out
▸
allow to significantly reduce the number of constructed queries: the average number of queries per question can be reduced by 44%, the maximum number of queries per question can be reduced even by 75%
Results and conclusion
34 / 34