Transactions Template

Viewer
Transcript

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 2, ISSUE 1, JULY 2010 32

Identification of Proverbs in Hindi Text Corpus and their Translation into Punjabi Brahmaleen K. Sidhu, Arjan Singh and Vishal Goyal Abstract— Hindi, the official language of India is spoken by over 500 million people over the world. Punjabi is an Indo-Aryan language spoken by inhabitants of the historical Punjab region in Pakistan and north western India. Punjabi, the official language of the Indian state of Punjab, is spoken as a native language by over 2.85% of Indians. This paper describes an approach to search proverbs in Hindi text corpus, followed by their translation and transliteration into Punjabi language. The inflected forms of proverbs shall also be identified. Index Terms— Computational Linguistics, Hindi Proverbs, Machine Translation System, Natural Language Processing, Transliteration.

——————————  ——————————

1 INTRODUCTION

B

OTH Hindi and Punjabi languages have originated from Sanskrit which is one of the oldest languages. In terms of speakers, Hindi is third most widely spoken language and Punjabi is twelfth most widely spoken language in the world [21]. Hindi is spoken and used by the people all over the country. Punjabi language is mostly used in the Northern India and in some areas of Pakistan as well as in UK, Canada and USA. The script of Hindi is Devanagri and that of Punjabi is Gurmukhi. A proverb, also called a byword, adage or nay word, is defined as a concrete and short saying, which is often repeated. These statements usually express a truth of some kind that maybe philosophical, spiritual or then practical. The word proverb is said to originate from the Latin word proverbium, meaning concrete statement. Proverbs may be defined as words collocated together happen to become fossilized, becoming fixed over time [1]. Within the area of corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance [2]. Proverbs can be classified into various types such as the metaphorical, maxim or aphorism. A proverb that describes a basic rule of conduct is known as a maxim. If a proverb is distinguished by particularly good phrasing, it may be known as an aphorism [3]. Many writers make use of a proverb to enhance their work, or then simply to concretize what they wish to say. A proverb is a short pithy saying in general use, held to embody a general truth. It is also called popular sayings. In Hindi it is called Kahavat or Kahawat. Some of the most popular Hindi Proverbs with their meanings are as follows:

————————————————

  

Brahmaleen K. Sidhu is with the Punjabi University, Patiala, Punjab, India. Arjan Singh is with the Baba Banda Singh Bahadur College of Engineering,Fatehgarh Sahib, Punjab, India. Vishal Goyal is with the Punjabi University, Patiala, Punjab, India.

canj D;k tkus vnjd dk Lokn Bandar kya jaane adark ka swaad English: What does a monkey know of the taste of ginger? Meaning: Someone who can't understand can't appreciate. nwj ds
© 2010 JCSE http://sites.google.com/site/jcseuk/

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 2, ISSUE 1, JULY 2010 33

language processing. At its basic level, Machine Translation performs simple substitution of words in one natural language for words in another. But word to word translations of Multi Word Expressions, such as proverbs, in source language produce incorrect results in target language. Such complex translations may be attempted using dictionary based corpus techniques that allow for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, proverbs etc. Behind this ostensibly simple procedure lies a complex cognitive operation. The goal of the system presented in this thesis is to accurately detect all proverbs in input Hindi corpus and produce high-quality Punjabi translation for them. The output is revised manually (post-edited). The system uses lexical information of both source and target languages as an aid. The rest of paper is divided into six sections. The first section discusses the literature survey, followed by system description in the second. The comparison with previous work is presented in the third section, evaluation methodology in the fourth, followed by result and conclusion for future work in the last two.

2 LITERATURE SURVEY The main technique used in the work is pattern matching. Pattern recognition is a sub-topic of machine learning. It is the act of taking in raw data and taking an action based on the category of the data [5]. The first computer programs to use pattern matching were text editors. Regular expressions were used for specifying strings for find and replace functions in a document [5]. Various collocation based measures have been proposed to compute the relative compositionality of Multiword Expressions (MWE's) [6], [7]. Measuring the relative compositionality of MWEs is crucial to Natural Language Processing. Sriram and Joshi [6] define collocation based and context based measures to measure the relative compositionality of MWEs of Verb Noun (V-N) type. It has been shown that the correlation of these features with the human ranking is much superior to the correlation of the traditional features with the human ranking. The collocation based, context based and the traditional features have been integrated using a Support Vector Machine (SVM) based ranking function to rank the collocations of V-N type based on their relative compositionality. Statistical techniques have also been used to recognize MWEs [8]. It has been shown that the properties `Distributed frequency of object' [9] and `Nearest Mutual Information' [10] contribute greatly to the recognition of the non-compositional MWEs of the V-N type and to the ranking of the V-N collocations based on their relative compositionality. The distributed frequency of object is based on the idea that if an object appears only with one verb (or few verbs) in a large corpus, the collocation is expected to have idiomatic nature. Lin [10] states that a possible way to separate compositional phrases from noncompositional ones is to check the existence and mutual

information values of similar collocations obtained by replacing one of the words with a similar word. In [8], all the existing statistical features were integrated and a range of classifiers was investigated for their suitability for recognizing the non-compositional Verb-Noun (V-N) collocations. It is well known that multi-word expressions are problematic in natural language processing. In existing theories, it has been suggested that information about their degree of compositionality can be helpful in various applications but it has not been proven empirically. Information about the multi-word expressions has been used in the word-alignment task. Even simple features like point-wise mutual information are useful for wordalignment task in English-Hindi parallel corpora [11]. The alignment error rate achieved (AER = 0.5040) is significantly better (about 10% decrease in AER) than the alignment error rates of the state-of-art models (Best AER = 0.5518) on the English-Hindi dataset [12]. Very limited work has been done towards characterizing the MWEs in Hindi of Noun Verb type. Also, various statistical measures which are used to measure the compositionality of different kinds of collocations in English cannot be applied straight-away to Hindi due to insufficient corpus and resources. In [13], various types of Noun Verb expressions in Hindi have been analyzed in detail and an approach has been proposed to measure their relative compositionality automatically using maximum entropy model (MaxEnt). MaxEnt integrates various measures representing the properties of the Noun Verb expressions in Hindi. Some of the measures used by the MaxEnt are computed by mapping them to Verb-Noun expressions in English. MWEs are also known as Complex Predicates (CPs). Complex Predicates are multi-word complexes functioning as single verbal units. CPs are particularly pervasive in Hindi and other Indo-Aryan languages, but an usage account driven by corpus-based identification of these constructs has not been possible since single-language systems based on rules and statistical approaches require reliable tools (Part of Speech (POS) taggers, parsers, etc.) that are unavailable for Hindi [14]. A database has been developed based on the simple idea of projecting POS tags across an English-Hindi parallel corpus. The CP types considered include adjective-verb (AV), noun-verb (NV), adverb-verb (Adv-V), and verb-verb (VV) composites. CPs are hypothesized where a verb in English is projected onto a multi-word sequence in Hindi. Identifying compound noun multiword expressions is important for applications like machine translation and information retrieval. Anoop and Damani [15] describe a system for extracting Hindi compound noun multiword expressions (MWE) from a given corpus. Major categories of compound noun MWEs are identified based on linguistic and psycholinguistic principles. The extraction methods use various statistical co-occurrence measures to exploit the statistical idiosyncrasy of MWEs. Various lexical cues from the corpus are used to enhancing accuracy.

© 2010 JCSE http://sites.google.com/site/jcseuk/

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 2, ISSUE 1, JULY 2010 34

The extraction of reduplicative expressions is addressed using lexical, semantic and phonetic knowledge. In the age of modernization, linguists cannot forget that India is a multilingual country. Problem of multilingualism was solved by translation but in the age of information technology machine is the competitor of language [16]. In the field of machine translation there are many problems. One of the major problems is homonymy disambiguation. The problem of Homonymy is not only in machine translation but also in the area of Human Translation, quick translations (interpretation), new language learners (Language Acquisition), and young children of normal intelligence, Artificial Intelligence specialists, because it causes word ambiguity. The homonymy words come from original words (Tatsam, Tadbhav) of native languages or dialects. These words can also be combinations of foreign languages and Hindi form. Homophones are pronounced with the same sound but have different meaning. Because Devanagari script is sound based (Dhwanimulaka), if there is a difference in pronunciation then the difference appears in the spelling also [17]. Recently, a Hindi language to Punjabi language translation system has been developed at The Advanced Centre for Technical Development of Punjabi language, Literature and Culture, Punjabi University, Patiala. The system entitled „Web Based Hindi to Punjabi Machine Translation System‟ follows word to word translation approach which tends to produce incorrect results for idioms and proverbs of a language [21]. The accuracy of such a system can be improved by replacing source language proverbs by their equivalents in the target language.

(i) If a match is found at row „r‟, compare input words following „x‟ with subsequent words of the proverb until the length of proverb is exceeded. (a) If match is found then highlight the input words starting at „x‟ as a proverb and output their Punjabi translation and transliteration. (b) Else, backtrack to input word „x‟ and repeat comparison and searching from „r+1‟. (ii) If „x‟ does not match with any word in first column of array „p‟ move to next input word in „a‟ and repeat comparison and searching. (iii) Go to step IV when all input words have been scanned. Step IV. Display Result Display the total number of proverbs found in input text along with their frequencies, Punjabi translation and transliteration.

Hindi text

Splitting input text

Splitting proverbs Comparison and searching

Proverbs found with frequencies, translation, and transliteration

3 SYSTEM DESCRIPTION To start with, a proverb translation system is created in which proverbs from source language (Hindi) text are searched and their equivalents in target language (Punjabi) are produced using lexical information. For achieving higher accuracy levels the system also identifies the inflected forms of Hindi proverbs. The input to the system is a Hindi language text file. The lexical information aiding the task is in the form of a database of 850 Hindi proverbs along with their translated and transliterated forms in Punjabi language. Another database of various forms of verbs and their root word is used to identify the inflected forms of proverbs in input text. The algorithm devised is described below and illustrated in Fig. 1. Input: Hindi text to be searched, Hindi proverbs database. Step I. Splitting input text (i) Split input Hindi text into single word units and save in one dimensional array „a‟. (ii) Replace all verbs in array „a‟ with their root words. Step II. Splitting Proverbs Split every proverb into single word unit and save as a row in two dimensional array „p‟. Step III. Comparison and searching Compare input word „x‟ against first column of array „p‟.

Proverbs

Fig. 1. Summarizing the framework of algorithm.

4 COMPARISON WITH TECHNIQUES USED IN PREVIOUS WORK Despite the fact that history of Machine translation is quite old, the number of really successful systems is not very impressive. Recently a word to word translation system for Hindi to Punjabi language has been developed at The Advanced Centre for Technical Development of Punjabi language, Literature and Culture, Punjabi University [19]. But proverbs are Multi Word Expressions (MWE), for which word to word translation yields incorrect results, as illustrated in Table 1. TABLE 1 COMPARISON W ITH W ORD TO W ORD TRANSLATION Hindi Proverb ,d vkSj ,d X;kjg gksrs gSz cfu;s dh lyke csxjt+ ugha gksrh

© 2010 JCSE http://sites.google.com/site/jcseuk/

Word to Word Translation fJZe s/ fJZe frnkoK j[zd/ jB pfB:/ dh ;bkw p/ro} BjhI j'sh

Correct Translation J/esk@u pVh skes j[zdh j? wsbph dk ;fjp ;bkw th ;[nkoE Gfonk j[zdk j?

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 2, ISSUE 1, JULY 2010 35

cgkj ij vkuk chl iM+uk cky cky cpuk cxysa ctkuk dkVks rks cnu eas [kwu ugha ehu es[k fudkyuk

pjko go nkBk ph; gVBk pkb pkb puBk prb/I pikBk ekN' s' pdB w/I y{B BjhI whB w/y fBekbBk

eksgjk ysuk fgpj fipj djuk flj cspuk flj ij ikao j[k nsuk

c[bDk cbDk pbtkB j'Dk w;K w;K puDk eZSK tikT[DhnK fujok cZe j' ikDk poheh dhnk rZbK eoBhnK fGV ikDk Nkb wN'b eoBh fJZis t/uDh wks gkT[Dh

w'jok b/Bk fjuo fguo eoBk f;o p/uBk f;o go gKt oy d/Bk gok ij pyuk jtk go ubBk nzdk}k bkT[Dk Hkqjdql fudyuk G[oe[; fBebBk u{o u{o j'Dk jkbZ jkbZ dj Mkyuk okJh okJh eo N[eV/ N[eV/ eo vkbBk ;[ZND/ lc /kku ckbZl ilsjh ;p XkB pkJh; Gb/ p[o/ ~ fJe' g;/oh fijk Qsj esa iM+uk c/o w/I gVBk X'yk ykDk Thus, proverbs require special treatment during the translation process. Producing their equivalent in target language is a complex task. Sentences like jktw us jke dks ukp upkuk pkgk izUrq og ,slk djus esa vlQy jgkA and eksgu dk flj p<+uk bl ckr dk gh urhtk gS A contain standard forms of proverbs ukp upkuk and flj p<+uk. But this is not always the case. Proverbs occur in variant forms due to inflection of the verbs involved. For example, the sentence in Hindi, fctyh ls iaxk ukx [ksyus okyh ckr gSA contains the inflected form of proverb ukx [ksyuk. Table 2 gives a list of such examples. TABLE 2 DETECTION OF INFLECTED PROVERBS Sentences Containing Inflected Proverbs

Proverb Found ekFkk Buduk

esjk rks dy gh ekFkk Budus yxk Fkk tc eSaus ml dh cnyh gqbZ lwjr ns[khA jke us eksgu dks ukd HkkSa fldksM+rs gq, budkj dj ukd HkkSa fldksM+uk fn;kA ljdl esa ukx [ksyus dk dke djuk iMrk gSA ukx [ksyuk eSa rks igys gh dg jgk Fkk fd fcuk eryc fnYyh tkuk uhy ?kksVuk rks uhy ?kksVus okyh ckr gSA dqN fnu igys uhrw ds uk[kwu uhys gksus yxs Fks ijUrw uk[kwu uhys gksuk ekfgj MkdVjksa dh Vhe us mls cpk fy;kA ge lHk nkslrksa us cqjkbZ ls ikj ik;k ftl dkju vc ikj ikuk gekjs ekrk firk gesa vPNk le>us yxs gSaA /khjw HkkbZ vackuh us iwjs Hkkjr es vius ia[k ilkj j[ks gSaA ia[k ilkjuk ,lh ckrsa cksyuk rks ekr djus okyh ckrksa ls vyx ugha ekr djuk gSA dqN lky igys Hkkjr ds dqN jkT;ksa es ejh i<+us ls dbZ ejh i<+uk yksx ej x,A og eq>s ns[k dj uhyk ihyk gksus s yxkA uhyk ihyk gksus k vius firk dh ezR;w vQlksl es xhrk lwjr cukrh gqbZ lwjr cukuk utj vkbZA The identification of the inflected forms of proverbs in sentences is another feature of the presented approach. The system successfully identifies and translates standard forms as well as inflected forms of the proverbs. An instance of the system in execution is given below. Let

the system input Hindi text file‟s contents be: fiNys fnuksa dqN fQYe vfHkusrk jktuhfr esa dwns rks dqN jktuhfr dks ck; ck; dg x;sA buesa xksfoUnk izeq[k gSaA xksfoUnk ns'k ij ikd gksuk viuk /kje letrs FksA jktuhfr mUgsa teh ugha rks og okil fQYeh nqfu;k esa ykSV vk;sA njvly ckWyhoqM esa XySej dh tks pdkpkSa/k gSa mlds vknh vfHkusrk vfHkus=h vU; {ks=ksa dh pdkpkSa/k esa Hkh [kqn dks ns[kuk pkgrs gSa vkSj blds fy, fcuk rS;kjh ds nwljs {ks=ksa esa dwn rks tkrs gSa ysfdu tc ckr ugha curh rks okil ckWyhoqM ykSV vkrs gSaA xksfoUnk ds ekeys esa Hkh ,slk gh gqvk gSA mUgksua s ckWyhoqM esa ykSVus ds lkFk gh 'kwfVax 'kq: dj nhA mUgsa fQj ls LFkkfir djus dk chM+k mBk;k gS muds fe= vkSj pgsrs funs'Z kd MsfoM /kou usA MsfoM vkSj xksfoUnk us ,d lkFk yxHkx Ms<+ ntZu fQYeksa esa dke fd;k gS vkSj ;g lHkh fQYesa fgV jghaA MsfoM vkSj xksfoUnk dh cf<+;k dSesLVªh dks ns[krs gq, mEehn Fkh fd fQYe ^Mw ukWV fMLVcZ* Hkh n'kZdksa dks xqnxqnkus esa dke;kc jgsxhA ysfdu ,slk gqvk ughaA bl fQYe esa xksfoUnk ds lkFk fjrs'k ns'keq[k] lqf"erk lsu] lksgsy [kku vkSj ykjk nRrk Hkh fn[khaA The highlighted proverb in the text above is the output, along with translation details given below. The system also produces a summary of total number of proverbs detected along with their translations and frequencies. Hindi Proverb found: ikd gksuk Punjabi Transliteration: gke j'Bk Punjabi Translation: fwNDk (correct) Total number of proverbs found in input text: 01 The word to word translators output the following result: Word to word Punjabi Translation: ;wkgs j'Dk (incorrect)

5 EVALUATION TECHNIQUES Evaluation is without doubt a major aspect of language engineering, including Machine Translation (MT). It plays an important role for system developers to tell if their system is improving, for system integrators to determine the appropriate approach and for consumers to identify which system will best meet a specific set of needs. Beyond this, evaluation plays a critical role in guiding and focusing research [18]. Despite the fact that history of Machine translation is quite old, the number of really successful systems is not very impressive, the reason being the complexity of the task itself. Since the system developed does not implement complete translation, available automatic MT evaluation metrics such as, BLEU, NIST, WER and METEOR cannot be applied [18]. Based on the previous approaches [19], following evaluation methods and techniques are applied.

5.1 Selection of Input Text to be searched Input text is chosen randomly from newspapers articles, reviews, stories and sentences used in people‟s day to day conversations. All possible constructs including simple as well as complex ones are incorporated in the set. Table 3 gives a summary of the size and type of input sets taken for testing of the system. It is concluded from the testing phase that stories contain more proverbs than used in common day to day language by a lay man. Also, formal Hindi texts found in news articles and reports contain few proverbs. Fig. 2 gives an illustration.

© 2010 JCSE http://sites.google.com/site/jcseuk/

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 2, ISSUE 1, JULY 2010 36

TABLE 3 SIZE OF INPUT DATA Stories Total inputs Total sentences Total words

10 350

News Articles 10 175

Common sentences 10 284

1656

830

1343

Frequency of Hindi Proverbs 3.5 3 2.5 2 1.5 1 0.5 0

Some quantitative tests are also applied to check the error rate of the system. The possible errors can be: o Unidentified proverb: a proverb in input text is not identified due to absence in database of commonly used proverbs. o Un-translated proverb: the Punjabi translation is not produced for an identified proverb due to nonavailability. o Mistranslated proverb: the meaning of the source language proverb is not preserved in the translation. The output of the system is compared against manual scanning to produce the following evaluation results.

6

Stories

News

Sentences

Fig. 2. Number of Proverbs per input found in various text corpuses.

5.2 Tests Some tests are required to analyze the system [20]. One problem with sentences which contain proverb is that they are typically ambiguous; in the sense that a literal interpretation is generally possible. For example, the phrase iuhj tekuk (e'Jh nfijk ezw eoBk fi;dk bkG GftZy ftu ekch ;kok fwb/ in Punjabi) can really be about making cheese. A subjective test for accuracy of the system is used to check this aspect. This happens in case of rarely used proverbs. Only 30% of the Hindi proverbs were found to be occurring commonly. The rest are usually used as random phrases and mistaken by the system to be proverbs. The finding is graphically represented in Fig. 3.The input text was manually scanned for phrases which are actually used as proverbs. The result is compared with that produced by the system.

RESULTS

The presented system successfully highlights all the proverbs used in input text corpus producing Punjabi translation and transliteration for each. The error rate of the system in labeling literal phrases as proverbs is found to be nearly 2%. Proverbs are special MWE‟s which contain advice or state a generally accepted truth, and rarely coincide with literal phrases. No errors are spotted in identification or translation of proverbs as the system is supported by vast linguistic and lexical information.

7

CONCLUSION AND FUTURE WORK

Proverbs are concrete and short sayings. Because most proverbs have their origins in oral tradition, they are generally worded in such a way as to be remembered easily and tend to change little from generation to generation, so much so that sometimes their specific meaning is no longer relevant. The system for “Identification of Proverbs in Hindi Text Corpus and their Translation into Punjabi” is capable of identification of Hindi proverbs as well as their inflected forms. The system can be improved by including features like identification of idioms and their translation. It can be then used to implement full machine translation from Hindi language to Punjabi language, having much higher accuracy than existing word to word translators.

Hindi Proverbs

REFERENCES [1] [2] [3]

Popular

Rare

[4] [5]

Fig. 3. Usage of Hindi Proverbs.

[6]

John I. Saeed, “Semantics”, 2nd edition, 2003 by Blackwell Publishing Ltd. Gledhill C, “Collocations in Science Writing”, Tübingen: Gunter Narr Verlag (ESP), 2000. Mieder, Wolfgang, “Proverbs in Nazi Germany: The Promulgation of Anti-Semitism and Stereotypes through Folklore”, published in Journal of American Folklore 95, No. 378, 1982, pg. 435-64. Charniak Eugene, “Introduction to artificial intelligence”, AddisonWesley, 1984. Ritchie Dennis, “An incomplete history of the QED Text Editor”, Murray Hill: Bell Labs, 2004. Sriram V, Aravind K. Joshi, “Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features”, published in Pro-

© 2010 JCSE http://sites.google.com/site/jcseuk/

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 2, ISSUE 1, JULY 2010 37

[7]

[8]

[9]

[10] [11]

[12]

[13]

[14]

[15]

[16] [17]

[18]

[19]

[20]

[21]

ceedings of Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP)2005, Vancouver. Sriram V, Aravind K. Joshi, “Recognition of Multi-word Expressions: A Study of Verb-Noun (V-N) Collocations”, published in the proceedings of ICON 2004, 19 -22 Dec 2004, Hyderabad. Sriram V, “Relative compositionality of multi-word expressions: a study of verb-noun (V-N) collocations”, published in Proceedings of International Joint Conference on Natural Language Processing - 2005, Jeju Island, Korea. Pasi Tapanainen, Jussi Piitulaine and Timo Jarvinen, “Idiomatic object usage and support verbs”, published in proceedings of 36th Annual Meeting of the Association for Computational Linguistics (ACL), 1998. Dekang Lin, “Automatic identification of non-compositional phrases”, published in proceedings of ACL-99, College Park, USA. Sriram V, “Using Information about Multi-word Expressions for the Word-Alignment Task”, published in Proceedings of Coling-Association for Computational Linguistics (ACL)-2006, Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney. Franz Josef Och, Hermann Ney, “A Systematic Comparison of Various Statistical Alignment Models”, published in Computational Linguistics Volume 29, Issue 1 (March 2003), Pages: 19 - 51. Sriram V, Preeti Agrawal and Aravind K. Joshi “Relative Compositionality of Noun Verb Multi-word Expressions in Hindi”, published in Proceedings of International Conference on Natural Language Processing (ICON)-2005, Kanpur. Amitabha Mukerjee, Ankit Soni, and Achla M Raina, “Detecting Complex Predicates in Hindi using POS Projection across Parallel Corpora”, published in proceedings of Coling/ACL Workshop on Multi-Word Expressions, Sydney, July 23, 2006. Anoop Kunchukuttan, Om P. Damani, “A System for Compound Noun Multiword Expression Extraction for Hindi”, published in proceedings of ICON-2008, Kanpur. Christopher Kasparek, “The Translator's Endless Toil”, The Polish Review, vol. XXVIII, no. 2, 1983, pp. 83-87. Kamble Prakash Abhimanyu, “Lexical Ambiguity in Hindi–Marathi Machine Translation System (In the Context of Homonymy)”, available at “http://prakashblog-google.blogspot.com/2008/11/lexical-ambiguity-inhindimarathi.html”. Lynette Hirschman and Henry S. Thompson, “Overview of Evaluation in Speech and Natural Language Processing”, published in proceedings of the workshop on Human Language Technology Conference, Plainsboro, NJ, 1997. Vishal Goyal, Gurpreet Singh Lehal “Evaluation of Hindi to Punjabi Machine Translation System” published in International Journal of Computer Science Issues, Vol. 4, No. 1, 2009 Thomas J, Mas J. A. and Casacuberta F., "A Quantitative Method for Machine Translation Evaluation", published in workshop of 11th Conference of the European Chapter of the Association for Computational Linguistics April 12-17, 2003 Agro Hotel, Budapest, Hungary. Vishal Goyal, Gurpreet Singh Lehal “Web Based Hindi to Punjabi Machine Translation System” published in Journal of Emerging Technologies in Web Intelligence, Vol 2, No 2 (2010), 148-151, May 2010.

© 2010 JCSE http://sites.google.com/site/jcseuk/

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 2, ISSUE 1, JULY 2010. 32 ... Arjan Singh is with the Baba Banda Singh Bahadur College of Engi- neering ... ranking of the V-N collocations based on their relative.

Download PDF

544KB Sizes 5 Downloads 289 Views

Report

Transactions Template

Recommend Documents