Crafting a lexicon of referential expressions for NLG applications Crafting a lexicon of referential expressions for NLG applications

Ariel Gutman [email protected]

Alexandros A. Chaaraoui [email protected]

Pascal Fleury [email protected]

Google Research Europe, Switzerland Introduction NLG systems with a wide coverage, such as personal assistants, need access to a lexicon of referential expressions (e.g. Paris, The Beatles or James Bond) in order to produce fluent text. The lexicon should minimally contain: ● Surface form(s) of the expression; in case-inflecting languages several forms are needed. ● Grammatical properties: gender, number, case etc. ● Collocational properties: locative preposition, determiner etc. Contrary to normal lexemes, referential expressions are open-ended (millions of entities). For some languages we only have a corpus without any grammatical annotation. For these, we rely on a pre-defined list of functional words to inform us on the grammatical and collocational properties of near or adjacent referential expressions.

Name

Preposition

Determiner

Zurich

in

-

Lake Como

at

-

Isle of Man

on

the

It is sunny (Location). in Zurich

on the Isle of Man at Lake Como For other languages, we have also a dependency parser. In this case we can propagate grammatical and collocational features across dependency arcs. The parser identifies also morphological properties of open-class lexemes, such as adjectives.

Corpus

Named-entity identification

Low-resource languages

High-resource languages

Dependency parsing Non-linguistic knowledge base

N-gram processor

Functional words

Determiner

Gender

Number

le

masc.

sg.

la

fem.

sg.

les

-

pl.

Dependency-arcs processor

Last-resort lexicon inference rules

Lexicon If the two above methods fail, we can use a knowledge base which associates each referent with a canonical name and non-linguistic properties. These can be used heuristically to infer grammatical and collocational properties, e.g. linking islands with the preposition on.

Entity type Preposition city

in

lake

at

island

on

Results

Outlook

The above techniques yield good precision results. Preliminary evaluation on some European languages (French, Swedish and Czech) gives the following results: ● Above 92% precision for grammatical number (FR, SV) ● 66%-87% precision for grammatical gender; more difficult since there are less grammatical cues for it (FR, SV) ● Above 96% precision for locative preposition (CS only)

The NLP literature hardly refers to lexical properties of referential expressions used for NLG, yet these are needed to produce eloquent texts. While conceptually simple, crafting such a lexicon is challenging due to large scale of such expressions, and the varying amount of information available in different languages.

Select references

● Selection among various expressions referring to one entity. ● Reconciling several expressions to a grammatical paradigm. ● Annotation of various types of expressions (official vs. colloquial, autonym vs. pseudonym etc.).

● D. Andor et al. 2016. Globally normalized transition-based neural networks. CoRR. ● L. Clément et al. 2004. Morphology based automatic acquisition of large-coverage lexica. LREC 2004. pp. 1841–1844. ● A. Gutman et al. 2015. Bootstrapping the syntactic bootstrapper: Probabilistic labeling of prosodic phrases. Language Acquisition 22(3), pp. 285–309. ● N. Momchev. 2010. Annotating Web Documents With Wikipedia Entities. Master’s thesis, Sofia University. ● B. Sagot. 2010. The Lefff, a freely available and large-coverage morphological and syntactic lexicon for French. LREC 2010.

Future work

We wish to acknowledge the help of Jana Strnadova and Ivan Korotkov.

Ariel Gutman Alexandros A. Chaaraoui Pascal ... - Research at Google

open-ended (millions of entities). Name ... properties of open-class lexemes, such ... city in lake at island on. Results. The above techniques yield good precision ...

257KB Sizes 4 Downloads 156 Views

Recommend Documents

ALEXANDROS corrigenda formato.pdf
... a problem loading this page. Whoops! There was a problem loading this page. ALEXANDROS corrigenda formato.pdf. ALEXANDROS corrigenda formato.pdf.

Mathematics at - Research at Google
Index. 1. How Google started. 2. PageRank. 3. Gallery of Mathematics. 4. Questions ... http://www.google.es/intl/es/about/corporate/company/history.html. ○.

A New ELF Linker - Research at Google
Building P from scratch using a compilation cluster us- ing the GNU ... Since every modern free software operating sys- tem uses the .... customized based on the endianness. The __ ... As mentioned above, the other advantage of C++ is easy.

A STAIRCASE TRANSFORM CODING ... - Research at Google
dB. DCT. Walsh−Hadamard. Haar. Fig. 1. Relative transform coding gains of staircase trans- ... pose a hybrid transform coding system where the staircase.

A Heterogeneous High Dimensional ... - Research at Google
Dimensional reduction converts the sparse heterogeneous problem into a lower dimensional full homogeneous problem. However we will ...... [6] C.Gennaro, P.Savino and P.Zezula Similarity Search in Metric Databases through Hashing Proc.

A computational perspective - Research at Google
can help a user to aesthetically design albums, slide shows, and other photo .... symptoms of the aesthetic—characteristics of symbol systems occurring in art. ...... Perhaps one of the most important steps in the life cycle of a research idea is i

Catching a viral video - Research at Google
We also find that not all highly social videos become popular, and not all popular videos ... videos. Keywords Viral videos·Internet sharing·YouTube·Social media·. Ranking ... US Presidential Election, the Obama campaign posted almost 800 videos

Bill mona ariel
Jaco pastorius live.Bangs step mom.Butt Lovenia. Lux.Themfore, billmonaarielisarguablethat which forevermoreshall be Wollheimdoes in this way bring ... Gray hat hacking.pdf.Fullservice 2.Monsters ofcock lolly ink.Sirens s01 1080p web dl. Monsters ofC

ariel wedding uncoloured.pdf
ariel wedding uncoloured.pdf. ariel wedding uncoloured.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying ariel wedding uncoloured.pdf. Page 1 of ...

ariel wedding coloured.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. ariel wedding ...

ariel wedding coloured.pdf
(ii) x y = → = += 0.5 ln 4 3 3 9.928. y = 20 500. M1. A1. correct expression for lny. (iii) Substitutes y and rearrange for 3x. Solve 3x. = 1.150. x = 0.127. M1. M1. A1. Page 3 of 10. ariel wedding coloured.pdf. ariel wedding coloured.pdf. Open. Ex

Faucet - Research at Google
infrastructure, allowing new network services and bug fixes to be rapidly and safely .... as shown in figure 1, realizing the benefits of SDN in that network without ...

BeyondCorp - Research at Google
41, NO. 1 www.usenix.org. BeyondCorp. Design to Deployment at Google ... internal networks and external networks to be completely untrusted, and ... the Trust Inferer, Device Inventory Service, Access Control Engine, Access Policy, Gate-.

VP8 - Research at Google
coding and parallel processing friendly data partitioning; section 8 .... 4. REFERENCE FRAMES. VP8 uses three types of reference frames for inter prediction: ...

JSWhiz - Research at Google
Feb 27, 2013 - and delete memory allocation API requiring matching calls. This situation is further ... process to find memory leaks in Section 3. In this section we ... bile devices, such as Chromebooks or mobile tablets, which typically have less .

Yiddish - Research at Google
translation system for these language pairs, although online dictionaries exist. ..... http://www.unesco.org/culture/ich/index.php?pg=00206. Haifeng Wang, Hua ...

traits.js - Research at Google
on the first page. To copy otherwise, to republish, to post on servers or to redistribute ..... quite pleasant to use as a library without dedicated syntax. Nevertheless ...

sysadmin - Research at Google
On-call/pager response is critical to the immediate health of the service, and ... Resolving each on-call incident takes between minutes ..... The conference has.

Introduction - Research at Google
Although most state-of-the-art approaches to speech recognition are based on the use of. HMMs and .... Figure 1.1 Illustration of the notion of margin. additional ...

References - Research at Google
A. Blum and J. Hartline. Near-Optimal Online Auctions. ... Sponsored search auctions via machine learning. ... Envy-Free Auction for Digital Goods. In Proc. of 4th ...

BeyondCorp - Research at Google
Dec 6, 2014 - Rather, one should assume that an internal network is as fraught with danger as .... service-level authorization to enterprise applications on a.

Browse - Research at Google
tion rates, including website popularity (top web- .... Several of the Internet's most popular web- sites .... can't capture search, e-mail, or social media when they ..... 10%. N/A. Table 2: HTTPS support among each set of websites, February 2017.