Lexical chains as representations of context for the detection and correction of malapropisms Graeme Hirst David St-Onge Department of Computer Science University of Toronto

Structure  

 

Introduction & Lexical chains WordNet as a knowledge source for a lexical chainer Automatically detecting malapropisms Conclusion

Introduction & Lexical chains  



Lexical cohesion Halliday and Hasan (1976): some concepts were previous mentioned to other concepts that related to them Cohesive chains

Introduction & Lexical chains 

An example: The major potential complication of total joint replacement is inflection. It may occur just in the area of the wound or deep around the prosthesis. It may occur during the hospital stay or after the patient goes home…… Infections in the wound areas are generally treated with antibiotics.

Introduction & Lexical chains  



Morris and Hirst: lexical chains Lexical chain: A cohesive chain in which the criterion for inclusion of a word is that it bears some kind of cohesive relationship to a word that is already in the chain Thesaurus: Roget’s

Introduction & Lexical chains 

 





Their index entries point to the same, or adjacent thesaurus categories The index entry of one contains the other The index entry of one points to a category that contains the other The index entry of one points to a category that in turn contains a pointer to a category pointed by the index entry of the other Both of their index entries contain a pointer to the same category

Introduction & Lexical chains 

An example of regular relation between two words:

WordNet as a knowledge source for a lexical chainer  





Synset The relation between words------the relation between the synsets of both words Weight=C-Path length-k*number of changes of direction Three kinds of relation: 1.extra-strong 2.strong 3.medium-strong

WordNet as a knowledge source for a lexical chainer 





Extra-strong: only between a word and its literal repetition Strong: 3 kinds 1.A synset common to two words 2.A horizontal link between a synset of each word 3.Compound word or phase Medium-strong: when a member of a set of allowable paths connects a synset of each word

WordNet as a knowledge source for a lexical chainer 





Two rules for defining allowable patterns R1: No other direction may precede an upward link. R2: At most one change of direction is allowed. R2’: It is permitted to use a horizontal link to make a transition from an upward to a downward direction.

WordNet as a knowledge source for a lexical chainer

WordNet as a knowledge source for a lexical chainer 

How to build lexical chains?

WordNet as a knowledge source for a lexical chainer

WordNet as a knowledge source for a lexical chainer

WordNet as a knowledge source for a lexical chainer



Extra-strongStrong(7)Mediumstrong(3)

Automatically detecting malapropisms 



Two kinds of word errors 1. Non-word error 2. Real-word error Two techniques for non-word error 1. Lexicon lookup 2. N-gram analysis

Automatically detecting malapropisms 







syntactic errors (e.g. the students are doing there homework) semantic errors or malapropisms (e.g. he spent his summer traveling around the word) structural errors (e.g. I need three ingredients: red wine, sugar, cinnamon, and cloves) pragmatic errors (e.g. he studies at the University of Toronto in England and she studies at Cambridge)

Automatically detecting malapropisms 

Hypothesis

The more distant a word is semantically from all the other words of a text, the higher the probability is that it is a malapropism.

Automatically detecting malapropisms 





The algorithm looks for non-word error and solicits corrections from the user Construct lexical chains between high-content words in the text Find w’ besides w and raises alarm for replacement, based on the hypothesis above (PS: w’ is a word which is orthographically similar to w and does fit in one of the chains, while w doesn’t )

Automatically detecting malapropisms 

 

The experiment text: naturally occurring texts? First draft by unprofessional Inserting deliberate malapropisms to published text

Automatically detecting malapropisms 





Some examples of deliberate malapropisms: Much of that data, he notes, is available toady [today] electronically. Among the largest OTC issues, Farmers Group, which expects B.A.T. Industries to launch a hostile tenter [tender] offer for it, jumped 2 and half to 62 yesterday. But most of yesterday’s popular issues were small out-of-the-limelight technology companies that slipped in price a bit last year after the crush [crash], although their earnings are on the rise.

Automatically detecting malapropisms  



“toady” is replaced by “today” tenter IS A framework / frame INCLUDES handbarrow HAS PART handle / grip / hold INCLUDES stock “crush” is found to be wrong, while “brush” was suggested instead of “crash”

Automatically detecting malapropisms 

Analysis of the overall result

31.4% 7.01%

89.8% 36.6%

Conclusion   



A large cost……25.3/1000 Encouraging result! A further-developed WordNet is desired in future to “protect” the right words Lexical chains as context: tasks do not require a complete analysis of meaning

The End

Thanks!

Lexical chains as representations of context for the ...

prosthesis. It may occur during the hospital stay or after the patient goes home… ... Find w' besides w and raises alarm for replacement, based on the hypothesis ...

280KB Sizes 0 Downloads 215 Views

Recommend Documents

Looking for the Boundaries of Lexical Representations ...
Procedia - Social and Behavioral Sciences 61 ( 2012 ) 294 – 295. 1877-0428 © 2012 Published by Elsevier Ltd. Selection and/or peer-review under ...

Improving Word Representations via Global Visual Context
Department of Electrical Engineering and Computer Science. University of Michagan [email protected]. Abstract. Visually grounded semantics is a very ...

Improving Word Representations via Global Visual Context
Department of Electrical Engineering and Computer Science ... In this work, we propose to use global visual context to help learn better word ... In this way, we are able to measure how global visual information contributes (or affects) .... best and

Shared representations as coordination tools for ...
methods in computational reinforcement learning. Although ... A first illustration of this strategy is offered by the fact that, humans engaged in joint actions adapt.

The social context as a determinant of teacher ...
Sep 19, 2008 - been shown to be related to teachers' use of motivational strategies. Pelletier ... students with choice and a meaningful rationale). Nonetheless, ... are responsible for student performance standards, the more controlling they are ...

The social context as a determinant of teacher motivational strategies ...
Sep 19, 2008 - Within the SDT framework, the social context has previously been shown to be .... conducted all the interviews and was the only person to listen to them and ..... well at the same time and I'm on first name terms which gives.

Highest weight representations of the Virasoro algebra
Oct 8, 2003 - Definition 2 (Antilinear anti-involution). An antilinear anti-involution ω on a com- plex algebra A is a map A → A such that ω(λx + µy) = λω(x) + ...

Reinforcement Learning as a Context for Integrating AI ...
placing it at a low level would provide maximum flexibility in simulations. Furthermore ... long term achievement of values. It is important that powerful artificial ...

Anticipatory and locally-coherent lexical activation varies as a function ...
Additionally, across all versions of the study each theme picture appeared ..... to add the trial level time of the launch of the first saccade (1st. Sac) that landed on ...

Highest weight representations of the Virasoro algebra
Oct 8, 2003 - on the generators of the algebra, we conclude that D = E. Therefore d ⊆. ⊕ n∈Z. Cdn and the proof of (6) is finished. We now show the relation ...

Highest weight representations of the Virasoro algebra
Oct 8, 2003 - In this second part of the master thesis we review some of the ...... We will use the notation degh p for the degree of p as a polynomial in h.

Modeling the development of human body representations
development of a body schema in infants through a self-touch behavior. .... The “hardware-software” boundaries in biological systems are much more blurred ...

The “Bullwhip” Effect in Supply Chains Chains -
costs, poor customer service, adjusted ... Services. Information. Suppliers. Producers. Distributors. Retailers. Services. Services .... ➢Logistics outsourcing ...

The “Bullwhip” Effect in Supply Chains Chains -
Hewlett Packard (HP) and Printers. • Some variability in sales. • Bigger swings at the orders of resellers. • Even bigger swings for orders from printer division.

Designing Numerical Representations for Young Children
Institute of Education ... Digital technology presents opportunities to design novel forms of numerical ... children to explore the meaning behind these ideas or.

Bag-of-Features Representations for Offline ...
[4]). However, their application in document analysis and recognition is rather ... paper is the integration of a learned statistical bag-of-features model with an ...

of polyisoprene chains
required for the analysis in benzene, such as the reduced residual chemical potential 1 and_its tempera- ... sidered that the available data are reliable for esti-.

The representations of the arithmetic operations include ...
and Justin Green for help with data collection and to Jamie Campbell and David Geary for ... sentation and the representation of the mathematical operations. ..... (amount wrong) analysis of variance (ANOVA) on ratings of goodness.

The Role of the Syllable in Lexical Segmentation in ... - CiteSeerX
Dec 27, 2001 - Third, recent data indicate that the syllable effect may be linked to specific acous- .... classification units and the lexical entries in order to recover the intended parse. ... 1990), similar costs should be obtained for onset and o

of polyisoprene chains
sidered that the available data are reliable for esti- mation of ..... J. Polym. Sci.,. Polym. Phys. Ed. I4, 619 (1976). J. M. G. Cowie, J. Polym. Sci., C 23, 267 (1968).

The French Lexicon Project: Lexical decision data for ...
The French Lexicon Project involved the collection of lexical decision data for 38,840 French ... Because of financial constraints and the time-intensive .... appeared in the center of the screen for 200 msec, with a gap be- ..... Science Society.