Building a lexicon of French deverbal nouns from a semantically annotated corpus Antonio Balvet, Lucie Barque, Rafael Mar´ın [email protected], [email protected], [email protected]

Overview The ongoing project Nomage aims at describing the aspectual properties of deverbal nouns in an empirical way. It is centered on the development of two resources: a semantically annotated corpus of deverbal nouns, and an electronic lexicon. Nominalizations have occupied a central place in grammatical analysis, with a focus on morphological and syntactic aspects: (Lees, 1960), (Chomsky, 1970) and (Grimshaw, 1990). The semantics of nominalizations, and its implications for Natural Language Processing applications such as electronic ontologies or Information Retrieval, have often been neglected before. We focus on precisely this issue in the research project Nomage, funded by the French National Research Agency (ANR-07-JCJC-0085-01). We present the Nomage corpus and the annotations we make on a French corpus of deverbal nouns. We show how we build our lexicon with the semantically annotated corpus and illustrate the kind of generalizations we can make from such data.

The Nomage corpus and annotation protocol I

Using the French Treebank for corpus-driven semantics The French Treebank (Abeill´e, 2003) is our main source of deverbal nouns: 1 million word electronic corpus for French, following the model of the Penn Treebank (Marcus et al., 1993) I manually-revised tokenized, lemmatized, tagged and parsed corpus of news extracts from Le Monde archives I candidates to annotate are simple “common noun” tokens + suffix (figure 1.) I false positives (rade, page, garance, ros´ ee) are filtered-out based on their length I total set of candidates: 10,584 I only head nouns (ca. 4,000 candidates) were semantically annotated.

Building the Nomage lexicon 815 potentially polysemous units I lexicographic description process is twofold: I

each unit is associated with a range of semantic properties, based on “high-level” semantic characterization proposed by the lexicographer I “low-level” annotation data, based on the tests presented above, are used in order to complement the high-level categorizations I

I

I

I

PROMOTION#1 Definition: “Accession d’une ou plusieurs personnes `a un niveau sup´erieur de responsabilit´e ou `a de meilleures conditions.” (An advancement in rank or position.) Example: C’est arriv´e apr`es sa promotion au poste de directeur financier. (’It happened after his promotion to a finance director.’) French Treebank occurrences: d1e22886, d1e22934, d1e10709 ... Argument structure: promotion de Y `a X accord´ee par X Aspectual class: achievement Source verb: PROMOUVOIR#1 Example: Ses sup´erieurs hi´erarchiques d´ecident de le promouvoir au poste de responsable d’unit´e. (’His superiors decided to promote him to head of division.’) Argument structure: P0 promouvoir P1 (P2) Aspectual class: achievement

PROMOTION#2 Definition: “Action de provoquer le d´eveloppement ou le succ`es de quelque chose. (Cause the development or success of something.) Example: Chirac va faire la promotion de son livre en plein marasme judiciaire. ’Chirac is about to engage in the promotion of his book, while several law suits are being filed against him.’ French Treebank occurrences : d1e71021, d1e10706, d1e44169, d1e63654... Argument structure: promotion de Y par X Aspectual class: activity Source verb: PROMOUVOIR#2 Example: Le CNRS devait promouvoir la recherche scientifique. ’The CNRS was supposed to foster scientific research.’ Argument structure: P0 promouvoir P1 Aspectual class: activity

Using rephrasing tests for semantic annotation I

Goal of semantic annotation: assess to what extent deverbal nouns inherit semantic features from the verbs they derive from annotation is to take into account contextual constraints I annotation is based on (rephrasing) transformation tests I standard semantic/aspectual tests for verbs (Dowty, 1979) do not apply to nouns I noun-centered semantic/aspectual tests were proposed in (Huyghe & Mar´ın, 2007; Haas et al., 2008; Barque et al., 2009), adapted for a corpus-driven approach, cf. figure 2 I

I

Tests in figure 2. highlight the main aspectual and referential properties of deverbal nouns STATE EVENT OBJECT Durat. Punct. 1 Plusieurs + + + 2 Avoir lieu + + ´ 3 Eprouver/ressentir +/4 Un peu de +/5 Durer x temps +/+ 6 Se trouver + 7 Effectuer/proc´eder + + ´ de 8 Etat +/9 Se d´erouler + 10 Cardinal + + +

Figure 4: Lexicon entries for PROMOTION I

“high-level” word-sense distinctions and semantic properties are matched against outcomes from “low-level” annotation I entry definitions are cross-validated with associated “low-level” annotations I sentences 3-6 illustrate 4 different word-senses of noun PROMOTION (3) Les moyens `a la disposition des op´erateurs publics concourant `a la promotion des ventes fran¸caises au Japon augmenteront de plus de 40%. ’The financial incentives available to public-owned companies that actively support French business transactions in Japan will increase by over 40%.’ (4) L’infatigable patron de Lancˆome (groupe L’Or´eal) en Allemagne ne m´enage pas son temps pour la promotion de son entreprise. ’The tireless chief executive of Lancˆome’s (l’Or´eal group) German division spares no efforts in promoting his company.’ (5) C’est arriv´e apr`es sa promotion au poste de directeur financier. ’It happened after his promotion to finance manager.’ (6) La premi`ere promotion est sortie en 1991, `a notre grande satisfaction. ’The first class completed their program in 1991, to our great satisfaction.’ SENTENCE 3 4 5 6 1 Plusieurs - - - + 2 Avoir lieu - - + ´ 3 Eprouver/ressentir - - - 4 Un peu de - - - 5 Durer x temps + + - 6 Se trouver - - - + 7 Effectuer / proc´eder + + - ´ de 8 Etat - - - 9 Se d´erouler + + - 10 Cardinal - - - + Figure 5: Aspectual test outcomes for 4 occurrences of noun PROMOTION

Annotating the corpus I

keep subjectivity to a minimum annotators are not necessarily trained in linguistics I annotators should be as “semantically-na¨ıve” as possible I an annotators’ guide is provided I rephrasing strategies are outlined in the annotation guide (direct/indirect application) I

I

example of corpus-driven semantic annotation: reconversion and r´edaction appearing in the two following sentences: (1) Dus `a des motifs personnels et `a une reconversion dans le commerce de l’art. ’Owing to personal reasons and to a career switch in the art trade.’ (2) D’ailleurs, en ce soir de r´eveillon, la r´edaction ´etait r´eduite `a la portion congrue. ’Moreover, this Christmas Eve, the editorial staff was limited to the strict minimum.’ reconversion r´edaction 1 Plusieurs + + 2 Avoir lieu + ´ 3 Eprouver/ressentir 4 Un peu de 5 Durer x temps + 6 Se trouver + 7 Effectuer/proc´eder + ´ de 8 Etat 9 Se d´erouler + 10 Cardinal + + Figure 3: Test outcomes for two deverbal candidates

Using “low-level” annotations for entry validation I

Figure 2: Aspectual classes and their transformation tests I

Figure 4 shows the structure and content of entry PROMOTION from the Nomage lexicon, illustrating the corresponding word-senses found in the French Treebank

PROMOTION

Suffix Candidates -ade 24 -age 575 -ance 716 -´ee 425 -ence 521 -ment 1824 -sion 1036 -tion 4884 -ure 559 -xion 20 Figure 1: Absolute frequencies of candidates by suffix I

Structure and content of the Nomage lexicon

I

Further research I

Main outcomes of the Nomage project are: a description of aspectual properties for French deverbal nouns I an augmented version of the French Treebank, with semantic and aspectual annotations for deverbal nouns I a semantic lexical resource, targeted for both human and machine-readability, following projects Nomlex and SIMPLE I a corpus-driven semantic annotation protocol for deverbal nouns I a framework for the semantic annotation of other categories: deadjectival nouns (e.g.: fidelit´ e, from fid`ele) and non deverbal predicative nouns (e.g.: crime, meurtre) I a framework for multilingual semantic annotation for other languages: Spanish, English and Catalan I

More information at http://nomage.recherche.univ-lille3.fr/

Building a lexicon of French deverbal nouns from a ...

in rank or position.) Example: C'est arrivé apr`es sa promotion au poste de directeur financier. ('It happened after his promotion to a finance director.') ... Source verb: PROMOUVOIR#1. Example: Ses supérieurs hiérarchiques décident de le promouvoir au poste de responsable d'unité. ('His superiors decided to promote him.

63KB Sizes 0 Downloads 138 Views

Recommend Documents

The French Lexicon Project - crr
There is a quadratic effect of word length in visual lexical decision if word ... in French can easily exceed 50 (present, simple past, past imperfective, simple ..... one illustration of the way in which the FLP data set can be used to validate and 

A Diachronic Account of English Deverbal Nominals - Cascadilla ...
verb's argument structure (destroy(agent, theme› ~ destruction(agent, theme›), but ..... But agrammatical factors cannot add an agent to an argument structure.

NOUNS (ONE) FROM PHRASAL VERBS.pdf
... happy together. (18) The company has experienced a large number of set ... (15) This is the third outbreak of the disease in the past year. (16) The police are ...

The French Lexicon Project: Lexical decision data for ...
The French Lexicon Project involved the collection of lexical decision data for 38,840 French ... Because of financial constraints and the time-intensive .... appeared in the center of the screen for 200 msec, with a gap be- ..... Science Society.

From unemployment to work: a French econometric ...
Telephone: 01-45-92-69-74. E-mail: ... locations do not have an appropriate public transportation system. In this case, workers face costs that ... authors confirm that an increase in housing prices raises the intensity of search. The two channels ..

Evolving the Program for a Cell: From French ... - Semantic Scholar
of cells, the atoms of life, modular structures used to perform all the functions of a ... by computer scientists to create models inspired by biological developmental. ..... course, that every third integer from the left must be a valid function lab

Evolving the Program for a Cell: From French ... - Semantic Scholar
by computer scientists to create models inspired by biological developmental. ... exploring the degree to which developmental approaches may help us solve ...

The French Lexicon Project: Lexical decision data for ...
text-to-speech transcription tool), and a sample of the remaining nonwords was taken such that the following ... means of a program published by van Casteren and Davis (2007). In the same way, polysyllabic nonwords were ... response; (5) the stimulus

nouns foldable.pdf
Page 1 of 1. Nouns Foldable. Nouns. People. Places. Things. *Under the flap: Write examples for each kind of noun. Page 1 of 1. nouns foldable.pdf.

A basis for generating expectancies for verbs from nouns
Because verbs follow their arguments in many constructions (particularly in verb-final ..... or a synonym or near synonym of the best response. From these norms ...

The processing of singular and plural nouns in French and English - crr
Jul 22, 2004 - a different semantic function (Ô-erÕ; smaller, builder). The importance of .... Õs or Sereno & JongmanÕs) generalizes best to a new language and ... 2 This database is available at the following website: http:// · www.lexique.org.

The processing of singular and plural nouns in French ...
Jul 22, 2004 - These data rule out the full-storage model as a viable account of the .... on the frequency of the constituent segments (e.g., cloud and -s, bride ...

A basis for generating expectancies for verbs from nouns
(1) Which elements in a sentence can be used to gener- ... in verb-final languages), deferring expectations until the verb seems inefficient. .... One possibility, consistent with the two-stage serial ... Page 3 ..... guities to knowledge of the mean

The processing of singular and plural nouns in French and English - crr
Jul 22, 2004 - Page 1 .... a different semantic function (Ô-erÕ; smaller, builder). The importance of .... Õs or Sereno & JongmanÕs) generalizes best to a new.

Study on Building a Building a Building a Secured Private ... - IJRIT
A private cloud is a layer of software and management built on top of existing ... disks, iSCSI, storage area networks [SANs], network-attached storage [NAS], etc.

Validation of a French Adaptation of the Thought ...
Several studies suggest that parallels in terms of form and content can be drawn between clini- cally relevant and clinically nonrelevant everyday intru- sions, both types of intrusion entailing most notably a de- crease of attentional resources. The

pdf-1828\sears-tower-a-building-from-the-chicago-architecture ...
Try one of the apps below to open or edit this item. pdf-1828\sears-tower-a-building-from-the-chicago-architecture-foundation-by-jay-pridmore.pdf.