Using Syntactic Information for Improving why-Question Answering Suzan Verberne Lou Boves Nelleke Oostdijk Peter-Arno Coppen Radbound University Nijmegen
Presenter: Sai Qian 1
Structure • Introduction & Related work • Paragraph retrival for why-QA • Answer re-ranking • Discussion • Future directions
2
Introduction & Related work • 5% of all questions in the QA system are why•
questions Difference from factoid question
– Can not be stated in a single phrase – Paragraph retrieval instead of named entity retrieval
• Improving the QA system
– Better retrieval technique (search engine) – Better ranking system
• Syntactic knowledge between question and answer helps?
3
Introduction & Related work • A substantial amount of work in improving
QA system by adding syntactic information – Tiedemann, 2005 – Quarteroni et al., 2007 – Higashinaka and Isozaki, 2008
• Syntactic information gives a small but significant improvement on top of the traditional bag-of-words approach
4
Paragraph retrival for why-QA • Baseline system – Wumpus Search Engine – Question analysis • Remove stop words • Remove punctuation • Remains: set of question content words – Ranking: QAP algorithm (passage scoring algorithm) 5
Answer re-ranking • QAP algorithm (baseline system) – Term overlap between query and passage – Passage length – Total corpus frequency for each term
• Example – Why do people sneeze? – Why do women live longer than men on average? – Why are mountain tops cold?
• The aim: The syntactic information that discourses a relation between the question and its answer! 7
Answer re-ranking • Re-ranking system
– Idea: term overlap – Term: a subset of question terms – Feature: a set of question items and a set of answer items – Proportion:
• Defined Features: 32 in total
– F1: head; F2: modifier; F3: noun phrase; – F4: subject; F6: main verb; F10: direct object; – …… 8
Answer re-ranking • Feature extraction
– Parser • Pelican Parser: more detailed • EP4IR Dependency Parser: more robust – Lemmatization • “sailors of the old” • Only to verbs
• Re-ranking
– Scoring: 0-10 for each feature – Feature selection: genetic algorithm (optimize MRR) 9
Answer re-ranking • Result
• Features that substantially contribute to the ranking score
10
Discussion • Error analysis – No effect: 35/93 • 25/35 no relevant answer • 10/35 RR=1 – Improve: 40/93 – Deteriorate: 18/93 – 11 drops out of top 10, 22 enters top 10
11
Discussion • Example of deteriorated QA pairs
– Why do neutral atoms have the same number of protons as electrons? (answer in “Oxidation number”) – Why do flies walk on food? (answer in “Insect Habitat”) – Why is Wisconsin called the Badger State? (answer in “Wisconsin”)
• Reason
– No lexical overlap between the question focus and the document title – Feature 28 & Feature 13 12
Discussion • Feature selection analysis
– QAP: baseline system – Cue words: because, since, therefore, in order to, due to…… – Main verbs: lemmatization leads to more matches – Question focus & Document title
• Parser comparison
– Only EP4IR is applied to the answer documents 13
Future directions • Improving retrieval • Collecting a larger data collection: improve feature selection • Investigating extra information for why-Q other than syntactic description • Improving the EP4IR parser in constituent extraction 14
Improving Arabic Information Retrieval System using n-gram method.pdf. Improving Arabic Information Retrieval System using n-gram method.pdf. Open. Extract.
Google Inc. ryanmcd@google. .... Prefixes like un-, in- often denote ad- jectives. Thus we ...... all three language primarily express morphological properties via ...
been developed for east-european languages like. Slovene (Dzeroski et al., ..... In Proc. of AISTATS. Kilian Q. Weinberger, John Blitzer, and Lawrence K. Saul.
Table 1: A sample English morpho-syntactic lexicon. They are often .... like English, German, Greek etc. and might not work very ..... technique to obtain a high quality tag dictionary for ...... tion in open source development of a morphological.
c 2016 Association for Computational Linguistics. ... For every attribute to be propagated, we learn ..... online adaptive gradient descent (Duchi et al., 2011).
Mar 25, 2014 - ... full and partial pixel coverage (alpha-channel) ... Choose best pair among all possible pairs ... confidence have higher smoothing weights) ...
5http://www.fjoch.com/GIZA++.html. We select and annotate 33000 phrase pairs ran- ..... In AI '01: Proceedings of the 14th Biennial Conference of the Canadian ...
acquisition of production skills, one that accounts for data that reveal how experience ...... Bock et al., 2005) separated primes and targets with a list of intransitive filler ...... connectionist software package (Rohde, 1999). The model had 145 .
Travel Information System (ATIS) domain. We compare this approach to applying the Microsoft rule-based parser (NLP- win) for the ATIS data and to using a ...
equivalent with a context-free production of the type. Z âY1 ...Yn , where Z, Y1,. .... line 3-gram model, for a wide range of values of the inter- polation weight. We note that ... Conference on Empirical Methods in Natural Language. Processing ..
Homeokinesis: A new principle to back up evolution with learning (IOS Press, 1999). [6] Edgington, M., Kassahun, Y., and Kirchner, F., Using joint probability ...
HMM and IBM Models (Och and Ney, 2003), are directional ... insensitive IBM BLEU-4 (Papineni et al., 2002). ... this setting, we run IDG to combine the bi-.
in a computer system in order to detect signs of security problems [2]. ..... Rate of increase in false positive is less for Jaccard similarity measure (0.005-.
retrieval engine using Apache Lucene (Jakarta,. 2004). Documents have been .... method (1.2K vs 1.4K, as can be seen in 1). The proximity-based method ...
back loop that feeds lexico-semantic alternations .... in the top-k passages returned by the system. The ..... http://lucene.apache.org/java/docs/index.html. Kaisser ...
Using lexico-semantic information for query expansion in passage retrieval for question answering. Lonneke van der Plas. LATL ... Information retrieval (IR) is used in most QA sys- tems to filter out relevant passages from large doc- ..... hoofdstad
Sep 1, 2010 - supports the coordinate entry or linked to an existing Web ..... positions to any GPS receiver that is within the communica tion path and is tuned ...