A Comprehensive Roman (English)-to-Bangla Transliteration Scheme Naushad UzZaman, Arnab Zaheen and Mumit Khan Center for Research on Bangla Language Processing BRAC University Bangladesh

Naushad UzZaman

International Conference on Computer Processing of Bangla (ICCPB 2006) 17 February, 2006 Dhaka, Bangladesh

ICCPB 2006

1

Outline Introduction on transliteration Discuss proposed transliteration scheme Performance of our proposed scheme

Naushad UzZaman

ICCPB 2006

2

Introduction Transliteration „

In narrow sense: mapping of letters from one script to another

Transcription „

Writing the sound of one language using the script of another language

Transliteration „

In broader sense: both

Naushad UzZaman

ICCPB 2006

3

Why transliteration? If you don’t have a good text input system for your script Want to write the text phonetically in another script „ „

e.g. chatting and smsing in English Language is not too phonetic Š dokkhin for দিkণ (দ ক ◌্ ষ ি◌ ণ )

Naushad UzZaman

ICCPB 2006

4

Previous work Roman (English) to Non-European languages „

English - Japanese/Chinese/Arabic and many other languages

Roman (English) to Bangla „ „ „

ITRANS 1991 And few others All of these are one-to-one mapping

Naushad UzZaman

ICCPB 2006

5

Direct Mapping Trivial phonetic mapping that maps letters from one script to another. „ „

k-ক s-স

Naushad UzZaman

ICCPB 2006

6

Why another direct mapping? Many-to-one mapping „

phul, phool, ful, fool - ফুল

Using this direct mapping in phonetic mapping algorithm as well

Naushad UzZaman

ICCPB 2006

7

Phonetic mapping Transcription + transliteration in narrow sense „ „ „

ottonto - aতয্n (a ত ◌্ য ন ◌্ ত ) shondha - সnয্া (স ন ◌্ ধ ◌্ য ◌া) bebohar - বয্বহার (ব ◌্ য ব হ ◌া র)

Solution: Phonetic encoding Naushad UzZaman

ICCPB 2006

8

Phonetic encoding Code a word based on its pronunciation. „ „ „

aতয্n - সnয্া- বয্বহার -

Naushad UzZaman and Mumit Khan, A Double Metaphone

Encoding for Bangla and its Application in Spelling Checker,

Proc. IEEE NLP KE, Wuhan, China, 2005

Naushad UzZaman

ICCPB 2006

9

Algorithm for Phonetic Mapping Generate phonetic codes for all Bangla words Input Bangla word using Roman (English) characters Generate phonetic code string of the input IF input’s phonetic code „

Matches only one word in the lexicon Š Then convert input to that Bangla word; e.g. ottonto

- aতয্n

„

Matches to multiple words in the lexicon Š Then produce suggestions of all relevant Bangla word

and let the user select; e.g. poddo - পd/পদয্

„

Does not match Š Then use direct mapping; e.g. ultapalta - uলতাপালতা

Naushad UzZaman

ICCPB 2006

10

Performance Problems: „ „

Input word does not exist in the lexicon Inflected form of the head word is missing Š সরকার Š সরকােরর

Solution „ „

Increase the lexicon size Use morphological generator to produce inflected forms

Naushad UzZaman

ICCPB 2006

11

Performance on 2500 Newspaper words Words found in the lexicon: 68% (We had lexicon with more than 100,000 entries) „

Given the word is in the lexicon, the instances it was handled properly by phonetic mapping with phonetic lexicon: 100%

Words not found in the lexicon: 32% „ „ „

direct mapping handles: 23% absence of inflected words: 7% not handled properly by direct mapping: 2%

Naushad UzZaman

ICCPB 2006

12

Cross Language Information Retrieval Application User issues a query in one language to search a collection in a different language. Search সnয্া in a Bangla document querying shondha in English

Naushad UzZaman

ICCPB 2006

13

Summary Transliteration Proposed direct mapping and phonetic mapping Prototype shows significant success „

„

91% accuracy for this sample set using lexicon of 100,000 entries Can achieve 100% by increasing lexicon size

Used in cross language information retrieval application Naushad UzZaman

ICCPB 2006

14

Acknowledgment Supported in part by the PAN Localization Project (www.panl10n.net), grant from the International Development Research Center, Ottawa, Canada and BRAC University.

Naushad UzZaman

ICCPB 2006

15

A Comprehensive Roman (English)-to-Bangla ...

Feb 17, 2006 - In narrow sense: mapping of letters from one script ... Search সnয্া in a Bangla document ... 91% accuracy for this sample set using lexicon of.

312KB Sizes 5 Downloads 149 Views

Recommend Documents

Indonesian-A-Comprehensive-Grammar-Routledge-Comprehensive ...
CATALAN: A COMPREHENSIVE GRAMMAR (ROUTLEDGE COMPREHENSIVE GRAMMARS). Read On the internet and Download Ebook Catalan: A Comprehensive Grammar (Routledge Comprehensive Grammars). Download Max Wheeler ebook file free of charge and this ebook pdf found

A Journey in Pictures through Roman Religion
Thus, at the funeral the deceased was present twice: firstly, in the corporal form, which still connected him to human beings, and, secondly, through the actor with ...

A Journey in Pictures through Roman Religion
As of Roman Emperor Nero (54–68 AD), around AD 65. The individual farmer may have found it difficult to always bear in mind all the gods that were responsible for making an action come out successful. Quite easily he could forget to invoke Janus. C

A Journey in Pictures through Roman Religion
Utensils used in Roman cult: ladle, axe and apex (head-‐dress of the member of a priests' college). Denarius of the Roman imperator G. Julius Caesar, minted in Gaul, 49–48 BC. The Roman state ... Centre: Jupiter Tonans. Denarius of Roman Emperor

Roman Front
Java Transaction API (JTA) and Java Transaction Service (JTS). 35 ...... An online grocery store can use the pricing component as a discrete part of a complete ...

Roman Numerals.pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Roman Numerals.pdf. Roman Numerals.pdf. Open. Extract.

002 Roman Hatyara.pdf
Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps.

Roman Wituła RAMANUJAN TYPE TRIGONOMETRIC ... - MiNI PW
1641 −4725. 13605 −39174 dn. 0 −3. 12 −36. 105 −303. 873 −2514. 7239 −20844. 60018 −172815. Acknowledgments. I wish to thank D. Słota for help in numerical verification of the formulae. I also wish to thank the referee for the valuab

pdf-1238\a-concise-dictionary-of-greek-and-roman ...
Connect more apps... Try one of the apps below to open or edit this item. pdf-1238\a-concise-dictionary-of-greek-and-roman-antiquities-by-william-smith.pdf.

Roman Wituła RAMANUJAN TYPE TRIGONOMETRIC ... - MiNI PW
the Ramanujan cubic polynomials (shortly RCP), i.e. real cubic polynomials. (1.5) x3 + px2 + qx + r, r = 0, having real roots ξ1,ξ2,ξ3 and satisfying the condition.

Roman Imperial Coins.pdf
Sign in. Page. 1. /. 126. Loading… Page 1 of 126. Page 1 of 126. Page 2 of 126. Page 2 of 126. Page 3 of 126. Page 3 of 126. Roman Imperial Coins.pdf. Roman ...

pdf-1238\a-concise-dictionary-of-greek-and-roman ...
Connect more apps... Try one of the apps below to open or edit this item. pdf-1238\a-concise-dictionary-of-greek-and-roman-antiquities-by-william-smith.pdf.

A Journey in Pictures through Roman Religion - PDFKUL.COM
Concordia was responsible for harmony in the community, and .... Utensils used in Roman cult: ladle, axe and apex (head-‐dress of the member of a priests' college). Denarius of the Roman imperator G. Julius ... distributed the religious tasks to th

Holy Roman Empire.pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Holy Roman Empire.pdf. Holy Roman Empire.pdf. Open. Extract.

Le roman de
Madison Parker Please Me.V4 elite daz3d.Oblivion subs nl.Techsmith ... other oneand headed back..267807230315489982. 2bcf5wa80 - Assassin creed pdf.