Automatic Resolution of Target Word Ambiguity

Viewer
Transcript

Automatic Resolution of Target Word Ambiguity Ebony Domingo

Rachel Roxas

Computer Science Department College of Science and Information Technology Ateneo de Zamboanga University

College of Computer Studies De La Salle University, Professional Schools Inc. 2401 Taft Avenue, Manila Philippines 1004

(063) (0927) (2472429)

(063) (02) (5244611 loc 342)

[email protected]

[email protected]

ABSTRACT An automated approach is presented here for resolving targetword selection, based on “word-to-sense” and “sense-to-word” relationship between source words and its translations, utilizing syntactic relationships (subject-verb, verb-object, adjective-noun). Translation selection proceeds from sense disambiguation of source words based on knowledge from a bilingual dictionary and word similarity measures from WordNet, and then selection of a target word using statistics from a target language corpus. This can be applied as an aid to selecting appropriate translation of a given source word, specifically in English to Tagalog translation. The system was tested on 200 sentences with ambiguous words (an average of 4 senses) in three categories: nouns, verbs, and adjectives. This produced an overall result of 70.67% accuracy of selecting word translation.

General Terms Algorithms, Reliability, Languages

Keywords Natural Language Processing, Machine Translation, Translation Selection, Target Word Disambiguation

1.

INTRODUCTION

Basically, translation selection is a process to select from a set of target language words, the most appropriate one that conveys the correct sense of the source word [8]. Take for example the correct translation of the word ‘wash’ in Tagalog will choose among hilamos, hugas, laba, etc. The choice will have to be made based on the object noun of the verb ‘wash’. Several methods have been developed for target-word disambiguation on different nature of corpora using different nature of word translations. Dagan and Itai [5] presented an approach for resolving lexical ambiguities in one language using statistical data from a monolingual corpus of another language. Their algorithm identified syntactic relationships between words and maps the alternative interpretation of these relationships to the target language using a bilingual lexicon. The preferred senses are then selected according to statistics on lexical relations in the target language. Along the work of Dagan and Itai [5], came other approaches like the use of a probabilistic class-based lexica for disambiguation in target-word selection [11] and target language modeling by

estimating word translation probabilities[8]. The work of Prescher, et. al., [11] used expectation maximization (EM)-based clustering algorithm to build the classes, from a distributional data of verb and noun sample pairs in different grammatical relations gathered by parsing an unannotated corpus in the target language. Each of the induced classes has a basis in lexical semantics. Their study focused on selecting among alternative target-nouns using only information provided by the governing target verb. The criterion is determined by a simple lexicon look-up where the target-noun with highest estimated class-based frequency is selected. Koehn and Knight [8], on the other hand, proposed an advanced probabilistic model using the EM algorithm to estimate wordlevel translation probabilities. Their experiment focused on cooccurring nouns rather than on syntactic relations. Nouns in the input sentences are annotated with all alternative translations found in the bilingual dictionary, then applying noun cooccurrence probability, the target noun with the highest frequency found in the noun bigram is chosen. Although the methods presented by Dagan and Itai [5], Prescher [11] and, Koehn and Knight [8], ease the problem of preparing corpora and knowledge acquisition, they do not make use of useful information available in the source language. A work on term-list translation algorithm on an un-tagged corpora [7] resolves this by using distributional clustering for word sense disambiguation on source language, where sense of a word is pre-determined by its co-occurring words. The process builds clusters of usages of word group into the different “sense profile” for the different meanings of a word. Translation selection is done through a similarity score of translation candidates closest to the new word in context. However, his work failed to capture syntactic relationships, which became the focus of this research. A more recent and novel approach in translation selection is the hybrid method [9] which combined multiple measures for sense disambiguation and target-word selection. Based on the “word-tosense and sense-to-word” relationship between source word and its translations, their method selects translation through two levels: sense disambiguation of a source word and selection of a target word. They made use of threee measures for translation selection : sense preference based on knowledge from the bilingual dictionary; sense probability and word probability which are calculated using statistics from the target language corpus. The method presented in this study is analogous with this approach.

Other techniques worth mentioning in this field is the use of dependency triples on an unrelated monolingual corpora to select among translations of a given verb [13], which assumes strong correspondence between main dependency relation of source and target language. A more recent approach exploits content-aligned bilingual corpora for phrasal translations based on monolingual similarity and translation confidence of aligned phrases of two languages [2].

2.

TRANSLATION SELECTION

Majority of the methods for translation selection have tried to select a target word directly from all translations of a source word. Such methods are apt to select an incorrect translation because of the ambiguity of target word senses for individual words. In the following sections, a comparison of two approaches for translation selection is presented.

2.1

Word-to-Word Relationship

Most approaches in translation selection usually select a target word directly from a source word. Such direct mapping is referred to as ‘word-to-word’ relationship. Based on this, previous approaches could easily obtain rules from statistics. Knowledge acquisition is simplified by utilizing corpora in the statistical method. Although difficulty of knowledge acquisition is relieved, such method for translation selection have its critical flaws. They are apt to select an incorrect translation, even if the set of target words are reduced, since ambiguity of both source and target words are not taken into consideration as depicted in figure 1.

Figure 1. Word-to-word relationship of the word ‘break’ to its tagalong translations

2.2 Word-to-Sense and Sense-to-Word Relationship The ‘word-to-sense and sense-to-word’ relationship means that a word in a source language has multiple senses and each sense can be mapped into multiple target words [9]. Using such relationship, senses of the source words are disambiguated before selecting a translation. Since each sense covers a set of target words as shown in figure 2, information can be utilized needing less elaborate knowledge.

Figure 2. ‘Word-to-sense and sense-to-word’ relationship of ‘break’ As suggested by method, senses of source word must be resolved first before selection of a target word. Knowledge for resolving word ambiguities can be extracted from various machine-readable dictionaries. These provide a wide range of information about words by giving definitions of senses of words. As for this study, knowledge for word sense disambiguation was extracted from the Leo English English-Tagalog Dictionary[6], which contains sense definitions of an English word with a list of Tagalog translations grouped for each sense, as shown in figure 3.

Figure 3. Apart of the English-Tagalog dictionary

2.3 Measure of Semantic Relatedness for Sense Disambiguation Word sense disambiguation is the process of assigning a meaning to a word based on the context in which it occurs. The most appropriate meaning for a word is selected from a predefined set of possibilities, usually known as sense inventory. Recent researches on this field exploits WordNet, as a hierarchical knowledge base, for the computation of semantic relatedness between concepts. Following Budanitsky and Hirst [4], semantic similarity is a kind of relatedness that defines resemblance while semantic relatedness covers a broader range of relationships between concepts that include similarity (or difference) as well as other relationships such as is-a-kind-of, is-a-part-of, is-a-specificexample-of, is-the-opposite-of to name a few. There are pairs of words that tend to occur together more often than is expected by chance. Even though these relationships are quite diverse, humans can judge if a pair of words is more related than another. Yet, it would be useful to assign a value that characterizes the degree to which the two words are related. The main idea behind using semantic relatedness in word sense disambiguation is that a word can be disambiguated by finding its sense that is most related to its neighbors. Budanitsky [3] presented an extensive survey and classification of measures of semantic relatedness. These vary from simple edgecounting to considering link direction, relative depth and conceptual density. However, these analytic methods were

challenged with a number of hybrid approaches. Budanitsky and Hirst [4] experimentally compared five of these hybrid measures using WordNet by examining their performance in a real-word spelling correction system. The outcome of their study showed that Jiang and Conrath’s measure gave the best results overall. The said method demonstrated practical usability, in that it used information content and augment it with path length between concepts. The notion of an information content is simply a measure of a specificity of a concept. A concept with a high information content is very specific to a particular topic, while concepts with lower information content are associated with more general, less specific concepts. Information content of a concept is estimated by counting the frequency of the concept in a large corpus and thereby determining its probability via a maximum likelihood estimate.

3.

ARCHITECTURAL DESIGN

A general overview of the system workflow is presented in the architectural design in figure 4.

total of 317,113 words. A total of 145,746 words in syntactic relationship were extracted from these target corpora with the aid of a target lexicon using a partial parser [1]. The target lexicon was extracted from a bilingual lexicon used in the work of Tiu [12]. An existing bilingual dictionary [English], was used in this study for the building of sense profiles of source words and their translation mappings in each sense division. Words were manually into the sense profile database using an MS Access interface. The sense profiles consist of entries of source words in different senses along with list of translations in each sense. This also includes content words extracted in definition and example sentences for each word sense. WordNet is an online lexical reference system which is particularly suited for similarity measure used un this study, since it organizes nouns and verbs into hierarchies of is-a relations. The process of translation selection proceeds from classifying senses of word in the input sentences through computation of word similarity based on WordNet hierarchy. The classified senses of words are then submitted to the sense probability module for probability computation based on target word cooccurrence. Probability of each candidate word translation are then calculated. Finally, an overall computation is done to produce preferred translation of words.

3.1

Word Sense Classifier

Before senses of words in syntactic relations are classified, word similarity between co-occurring words in the input sentences, and words in definition and example sentences for each sense division, must be computed first. This is to show how similar co-occurring words are with clues from different senses of the word, whose senses are to be classified.

Figure 4. The architectural design. The design mainly consisted of the preprocessing of language resources needed for this study, and modules for sense disambiguation, target word selection and translation preference. The initial resources utilized for this study include the target corpora, the target lexicon, an existing bilingual dictionary (source to target) for the sense profile and WordNet for the word similarity measures. The target corpora were gathered from various online Tagalog editorials and readings, and the Tagalog New Testament with a

An external tool, WordNet::Similarity [10] was used to compute for word similarity using a similarity measure proposed by Jiang Conrath. Based on this, sense of a word is classified through the word sense with the highest total computed similarity. The classification adapts an equation for sense preference calculation from the work of Lee, et al. [9]. Given that the i-th word of in an input sentence is si (which is in syntactic relationship with another word in the input sentence) and the k-th sense of si is sik , sense preference (spf) is calculated in the equation. SNT is a set of all content words except si in an input sentence. DEF sk i and EF sk i are the sets of words in definitions and example sentences, respectively.

3.2

Sense Probability

Sense probability (sp) represents how likely target words with the same sense co-occur with translations of other words in syntactic relationship with, in an input sentence. Computation for this module has been adapted from the work of Lee, et., al. [9].

average of 5 senses, and 148 verbs in an average of 6 senses, for a total of 457 words.

4.1

In the equation, θ(si) indicates a set of co-occurrences of a word si on a syntactic relation. In an element (sj,m,c) of θ(si), sj is a word that co-occurs with si in an input sentence, c is a syntactic relation between sj and si , and m is the number of translations of sj. Given the set of translations of sense sk i is T( sk i ) and a member of T( sk i ) is tk iq , the frequency of co-occurring tk iq and tjP with a syntactic relation c is denoted by f(tk iq , tjP,,c ). Equation for n(tk iq) signifies how frequently tk iq co-occurs with translations of sj. By summing up the results for all target words in T(sk i), sense probability (sp) for sense sk i , is obtained.

3.3

Word Probability

Word probability represents the probability of selecting a target word among all other target words in the same sense division. This is denoted by wp(tk iq) , that is the probability of selecting tk iq from T(t kiq). Using n(t kiq), calculation for word probability as follows:

3.4

Translation Preference

Selection of a target word among all other translations of a source word is done by computing the translation preference (tpf) for each translation. Thereby merging results from sense classifier, sense probability and word probability. Values from sense classifier and sense probability are added as a score of sense disambiguation. Score for word selection is computed by using a normalizing factor for word probability, as suggested by Lee, et. al.[9]. to prevent discounting the score of a word the sense of which has many corresponding target words. Then selection is made on the target word with the highest computed tpf(tk iq).

Figure 5. Accuracy of the Sense Classifiers The DEF shows accuracy of using only words found in sense definitions as clues for sense disambiguation. The EX shows accuracy of using word available in example sentences as clues for sense disambiguation. DEF-EX shows accuracy of combining these clues for sense disambiguation. Both EX and DEF-EX rows show an overall accuracy of 51.20% , being able to classify senses of 234 words out of 457. As shown in the figure 5, using clues both found in the definition and example sentences produced better results than using clues found in sense definitions only. It can be noted that performance for verbs increased at using both clues from definition and example sentences. It got correct senses for 65 out of 148 verbs (43.94%). Same thing goes for adjectives at 52.17% (48 out of 92 adjectives). However, result for nouns is high at using only clues from example sentences. Computation for sense probability is based on word cooccurrences that are syntactically related in an input sentence. An evaluation on this module resulted into an overall accuracy of 61.27%.

4.2 4.

Sense Disambiguation

For sense disambiguation, two measures were used: sense preference and sense probability. The accuracy of the sense preference is shown in figure 5. Calculation for sense classifier made use of words in sense definitions (DEF) and words in example sentences (EX) as clues for sense disambiguation. Evaluation was done by altering these clues. The accuracy of each module was evaluated by testing whether any target word of the sense that scores the highest is identified as a translation of its source word in a target sentence.

EVALUATION

Translations of content words – adjectives, noun and verbs – participating in syntactic relationship - subj-verb, verb-object, adjective-noun and subj-adjective – were obtained and evaluated on altering combinations of clues and measures. Evaluation was made on a set of 200 bilingual sentences extracted from various bilingual dictionaries and reference books for studying Tagalog. About 244 word pairs in syntactic relation were extracted from the English sentences using Memory-based shallow parser which is available online. From these word pairs, there were 217 nouns with an average of 3 senses, 92 adjectives with an

Translation Preference

Figure 6 shows the result of the translation preference. The evaluation was done on altering combination of sense preference, sense probability and word probability. These alterations were compared against three types of baseline: random selection, first translation of the first sense (1st sense) and most frequent translation (mft). The sense classification (sc) shows the accuracy of selecting the first translation of the sense that scores the highest. The sense probability (sp) shows the accuracy of selecting the first translation of the sense with the highest probability. The word probability (wp) shows the accuracy of using only the word probability. This assumes that all translations of the source word have the same sense. The sp x wp is the result of combining sense probability and word probability. The sc x wp row is the result of combining sense classification and word

probability. The (sc+ sp) x wp shows result from combining all measures. Based from this, the system have an overall result of 70.67% accuracy.

[3] Budanitsky, A. Lexical semantic relatedness and its application in natural language processing, 1999. http://citeseer.ist.psu.edu/budanitsky99lexical.html

[4] Budanitsky, A and Hirst, G. Semantic distance in Wordnet: an experimental application-oriented evaluation of five measures.In Workshop on WordNet and Other Lexical Resources, 2nd Meeting of North American Chapter of the Association for Computational Linguistics, 2001. http://citeseer.ist.psu.edu/budanitsky01semantic.html

[5] Dagan, I. And Itai, A. Word sense disambiguation using a second monolingual corpus. Association for Computational Linguistics, 1994. http://www.citeseer.nj.nec.com/dagan94.html. Figure 6. Accuracy of Translation Selection

[6] English, L. J. English-Tagalog Dictionary. Congregation of the Most Holy Redeemer, 2003.

5.

Conclusion

This research has presented a method that automatically resolves target-word ambiguity. Previous researches have only concentrated on statistics of word co-occurrence on target language. The method presented here first resolves senses of source words through calculations of word similarities and word selection through frequency of word co-occurrence in the target language. The system was evaluated on 200 sentences with highly ambiguous words. The evaluation was done with altering combination of the measures for resolving target-word ambiguity. It was shown that the method performed at satisfactory level, yielding 70.67% of the expected translations. The method is highly dependent on clues found in the sense profile for disambiguation of source words. Despite a satisfactory result, the algorithm could not properly disambiguate some source words because of inadequate clues in sense definition as well as example sentences for a certain sense of a source word. Some smoothing techniques can further improve results since 0-values produced by sense probability and word probability affected the quality of translation. The translation words produced by the system are root words. The system can further be improved with integration of morphological analysis into a machine translation system. In addition, an algorithm can be developed based from this method for bi-directional translation (Tagalog to English).

[7] Kikui, G. Resolving translation ambiguity using non-parallel bilingual corpora. 1999. http://www.slt.atr.co.jp/~gkikui/papers/9906kikuiACLUNS. ps.gz .

[8] Koehn. P. and Knight, K. Knowledge sources for word-level translation models. Proceedings of the Emperical Methods in Natural Language Processing conference 2001. http://ww.isi.edu/~koehn/publications/emnlp2001.pdf.

[9] Lee, H. A., Yoon, J., and Kim, G. C. Translation selection by combining multiple measures for sense disambiguation and word selection. International Journal of Computer Processing of Oriental Languages, Vol 16, No. 3, 2003. http://csone.kaist.ac.kr/~halee/paper/ijcpol20003.pdf.

[10] Pedersen, T., Patwadhan, S. and Michlizzi, J. Wordnet::Similarity – measuring the relatedness of concepts. In Proceedings of 5th Annual Meeting of the North American Chapter of the Association for Computational Linguistics, 2004. http://wwwd.umn.edu/~pederse/pubs/naacl04-demosimilarity.pdf

[11] Prescher, D., Reizler, S., and Rooth, M. Using probabilistic class-based lexicon for lexical ambiguity resolution. Proceeding of the 18th International Conference Computational Linguistics, 2000. http://ims.unistuttgart.de/projekte/gramotron/PAPERS/COLING00/COLI NG00.ps.

[12] Tiu, P.E. Lexicon extraction from comparable corpora.MS Thesis Proposal. College of Computer Studies, De La Salle University. 2003.

6.

REFERENCES

[1] Abney. S.Tagging and partial parsing,1996. http://www.vinartus.net/spa/95a.pdf

[2] Aramaki, E., Kurohashi, S., Kashioka, H., and Tanaka, H. Word selection for EBMT based on monolingual similarity and translation confidence. HLT-NAACL Workshop (2003). http;//www.cs.unt.edu/~rada/wpt/papers/pdf/Aramaki.pdf.

[13] Zhou, M., Ding, Y., and Huang, C. Improving translation selection with a new translation model trained by independent monolingual corpora. Journal of Computational Linguistics and Chinese Language Processing, Vol. 16 No. 1, 2001. 1-26. http://research.microsoft.com/asia/dload_files/group/nlps/200 2p/improving%20translation%20selection-CLCLP.pdf.

Mechanisms of Semantic Ambiguity Resolution ... - Springer Link

Automatic multi-resolution shape modeling of multi ...

Automatic segmentation of the clinical target volume and ... - AAPM

Syntactic ambiguity resolution and the prosodic foot

Top-Down Influence in Young Children's Linguistic Ambiguity Resolution

Rapid Integer Ambiguity Resolution in GPS Precise ...

Top-Down Influence in Young Children's Linguistic Ambiguity Resolution

Measurable Ambiguity

6070 Resolution of Concerns_Complaints_Grievances.pdf ...

Journal of Conflict Resolution

Joint Forest Management Resolution of Arunachal Pradesh.pdf ...

High Resolution Electromechanical Imaging of ...

Generalized entropy measure of ambiguity and its ...

The Usability of Ambiguity Detection Methods for Context-Free ...

1966 In search of ambiguity 1966.pdf

Word-By-Word-The-Secret-Life-Of-Dictionaries.pdf

Journal of Conflict Resolution

AUTOMATED SUPER-RESOLUTION DETECTION OF ...