Lexical chains as representations of context for the detection and correction of malapropisms Graeme Hirst David St-Onge Department of Computer Science University of Toronto
Structure
Introduction & Lexical chains WordNet as a knowledge source for a lexical chainer Automatically detecting malapropisms Conclusion
Introduction & Lexical chains
Lexical cohesion Halliday and Hasan (1976): some concepts were previous mentioned to other concepts that related to them Cohesive chains
Introduction & Lexical chains
An example: The major potential complication of total joint replacement is inflection. It may occur just in the area of the wound or deep around the prosthesis. It may occur during the hospital stay or after the patient goes home…… Infections in the wound areas are generally treated with antibiotics.
Introduction & Lexical chains
Morris and Hirst: lexical chains Lexical chain: A cohesive chain in which the criterion for inclusion of a word is that it bears some kind of cohesive relationship to a word that is already in the chain Thesaurus: Roget’s
Introduction & Lexical chains
Their index entries point to the same, or adjacent thesaurus categories The index entry of one contains the other The index entry of one points to a category that contains the other The index entry of one points to a category that in turn contains a pointer to a category pointed by the index entry of the other Both of their index entries contain a pointer to the same category
Introduction & Lexical chains
An example of regular relation between two words:
WordNet as a knowledge source for a lexical chainer
Synset The relation between words------the relation between the synsets of both words Weight=C-Path length-k*number of changes of direction Three kinds of relation: 1.extra-strong 2.strong 3.medium-strong
WordNet as a knowledge source for a lexical chainer
Extra-strong: only between a word and its literal repetition Strong: 3 kinds 1.A synset common to two words 2.A horizontal link between a synset of each word 3.Compound word or phase Medium-strong: when a member of a set of allowable paths connects a synset of each word
WordNet as a knowledge source for a lexical chainer
Two rules for defining allowable patterns R1: No other direction may precede an upward link. R2: At most one change of direction is allowed. R2’: It is permitted to use a horizontal link to make a transition from an upward to a downward direction.
WordNet as a knowledge source for a lexical chainer
WordNet as a knowledge source for a lexical chainer
How to build lexical chains?
WordNet as a knowledge source for a lexical chainer
WordNet as a knowledge source for a lexical chainer
WordNet as a knowledge source for a lexical chainer
Extra-strongStrong(7)Mediumstrong(3)
Automatically detecting malapropisms
Two kinds of word errors 1. Non-word error 2. Real-word error Two techniques for non-word error 1. Lexicon lookup 2. N-gram analysis
Automatically detecting malapropisms
syntactic errors (e.g. the students are doing there homework) semantic errors or malapropisms (e.g. he spent his summer traveling around the word) structural errors (e.g. I need three ingredients: red wine, sugar, cinnamon, and cloves) pragmatic errors (e.g. he studies at the University of Toronto in England and she studies at Cambridge)
Automatically detecting malapropisms
Hypothesis
The more distant a word is semantically from all the other words of a text, the higher the probability is that it is a malapropism.
Automatically detecting malapropisms
The algorithm looks for non-word error and solicits corrections from the user Construct lexical chains between high-content words in the text Find w’ besides w and raises alarm for replacement, based on the hypothesis above (PS: w’ is a word which is orthographically similar to w and does fit in one of the chains, while w doesn’t )
Automatically detecting malapropisms
The experiment text: naturally occurring texts? First draft by unprofessional Inserting deliberate malapropisms to published text
Automatically detecting malapropisms
Some examples of deliberate malapropisms: Much of that data, he notes, is available toady [today] electronically. Among the largest OTC issues, Farmers Group, which expects B.A.T. Industries to launch a hostile tenter [tender] offer for it, jumped 2 and half to 62 yesterday. But most of yesterday’s popular issues were small out-of-the-limelight technology companies that slipped in price a bit last year after the crush [crash], although their earnings are on the rise.
Automatically detecting malapropisms
“toady” is replaced by “today” tenter IS A framework / frame INCLUDES handbarrow HAS PART handle / grip / hold INCLUDES stock “crush” is found to be wrong, while “brush” was suggested instead of “crash”
Automatically detecting malapropisms
Analysis of the overall result
31.4% 7.01%
89.8% 36.6%
Conclusion
A large cost……25.3/1000 Encouraging result! A further-developed WordNet is desired in future to “protect” the right words Lexical chains as context: tasks do not require a complete analysis of meaning
Lexical chains as representations of context for the ...
prosthesis. It may occur during the hospital stay or after the patient goes home⦠... Find w' besides w and raises alarm for replacement, based on the hypothesis ...
Department of Electrical Engineering and Computer Science. University of Michagan [email protected]. Abstract. Visually grounded semantics is a very ...
Department of Electrical Engineering and Computer Science ... In this work, we propose to use global visual context to help learn better word ... In this way, we are able to measure how global visual information contributes (or affects) .... best and
methods in computational reinforcement learning. Although ... A first illustration of this strategy is offered by the fact that, humans engaged in joint actions adapt.
Sep 19, 2008 - been shown to be related to teachers' use of motivational strategies. Pelletier ... students with choice and a meaningful rationale). Nonetheless, ... are responsible for student performance standards, the more controlling they are ...
Sep 19, 2008 - Within the SDT framework, the social context has previously been shown to be .... conducted all the interviews and was the only person to listen to them and ..... well at the same time and I'm on first name terms which gives.
Oct 8, 2003 - Definition 2 (Antilinear anti-involution). An antilinear anti-involution Ï on a com- plex algebra A is a map A â A such that Ï(λx + µy) = λÏ(x) + ...
placing it at a low level would provide maximum flexibility in simulations. Furthermore ... long term achievement of values. It is important that powerful artificial ...
Additionally, across all versions of the study each theme picture appeared ..... to add the trial level time of the launch of the first saccade (1st. Sac) that landed on ...
Oct 8, 2003 - on the generators of the algebra, we conclude that D = E. Therefore d â. â nâZ. Cdn and the proof of (6) is finished. We now show the relation ...
Oct 8, 2003 - In this second part of the master thesis we review some of the ...... We will use the notation degh p for the degree of p as a polynomial in h.
development of a body schema in infants through a self-touch behavior. .... The âhardware-softwareâ boundaries in biological systems are much more blurred ...
Hewlett Packard (HP) and Printers. ⢠Some variability in sales. ⢠Bigger swings at the orders of resellers. ⢠Even bigger swings for orders from printer division.
Institute of Education ... Digital technology presents opportunities to design novel forms of numerical ... children to explore the meaning behind these ideas or.
[4]). However, their application in document analysis and recognition is rather ... paper is the integration of a learned statistical bag-of-features model with an ...
required for the analysis in benzene, such as the reduced residual chemical potential 1 and_its tempera- ... sidered that the available data are reliable for esti-.
and Justin Green for help with data collection and to Jamie Campbell and David Geary for ... sentation and the representation of the mathematical operations. ..... (amount wrong) analysis of variance (ANOVA) on ratings of goodness.
Dec 27, 2001 - Third, recent data indicate that the syllable effect may be linked to specific acous- .... classification units and the lexical entries in order to recover the intended parse. ... 1990), similar costs should be obtained for onset and o
sidered that the available data are reliable for esti- mation of ..... J. Polym. Sci.,. Polym. Phys. Ed. I4, 619 (1976). J. M. G. Cowie, J. Polym. Sci., C 23, 267 (1968).
The French Lexicon Project involved the collection of lexical decision data for 38,840 French ... Because of financial constraints and the time-intensive .... appeared in the center of the screen for 200 msec, with a gap be- ..... Science Society.