Uyghur language processing on the Web Dr. Waris Abdukerim Janbaz , Prof. Imad Saleh Paragraphe Laboratory, University of Paris VIII, France [email protected], [email protected] http://paragraphe.univ-paris8.fr

Abstract In this paper, we discuss some important issues related to web processing of an agglutinative Turkic language – Uyghur. Especially, we will discuss the advent of grassroots efforts on Uyghur Unicode font developing, Uyghur character displaying, font embedding and Uyghur character inputting method within Uyghursupport-less environment. We will also introduce a multiscript conversion application to further use the Unicode standard for Uyghur language processing.

navigators) and correctly displaying Uyghur characters presented huge difficulties. In spite of the fairly passive attitude of Government authorities to the development of Uyghur information technology, many individuals started creating Uyghur websites using the three above mentioned script. ASU, used by the most populous segment of XUAR Uyghurs caused special coding problems given that it uses a non-standard set of Arabicbased glyphs.

2. Background Keywords: Unicode, Font, Turkic Language, multiscript, transliteration, Arabic-Script Uyghur, Cyrillic-Script Uyghur, Latin-Script Uyghur.

1. Introduction The Uyghurs are a Turkic-speaking ethnic group, officially about nine million, inhabiting in Central Asia including today’s Xinjiang Uyghur Autonomous Region (hereafter: XUAR, also called Chinese Turkistan) as well as parts of Kazakhstan and urban regions in the Ferghana valley. The official writing system of the XUAR Uyghurs is Arabic-Script Uyghur 1 (hereafter: ASU) whereas the Cyrillic-Script Uyghur2 (hereafter: CSU ) is still in used by the Uyghurs of the ex-Soviet Union Republics (USSR). The newly introduced transliteration 3 – LatinScript Uyghur 4 (hereafter: LSU) has become widely accepted among Uyghurs and Uyghurologists is a commonly used standard for the transliteration for both ASU and CSU. The influence of web publishing started appearing in Uyghur society in the last 10 years. Since the existing platforms don’t supply any Uyghur input method nor any fonts that including all the glyphs of the ASU alphabet, inputting Uyghur text into interactive web pages (in the 1

See annex 2 See annex 1 3 Using one writing system to represent words in another is called transliteration. 4 called Uyghur Kompyutér Yéziqi (UKY) or Uyghur Latin Yéziqi (ULY) in Uyghur, meaning “Uyghur Computer Writing” or “Latin-Script Uyghur”. See http://www.ukij.org/teshwiq/UKY_Heqqide(KonaYeziq).htm 2

For ASU, before 2002, either of the two following methods became very common on web publishing in Uyghur: 1) font downloading; and/or 2) image format. There is no need to explain the inconvenience of the second method. More interesting but complex problems occurred in the case of the first one. The major problem came from the fact that every web site owner created and named his/her own fonts, and users/visitors had to download a specific font (or different fonts) for almost every single website. No one accepted the font name and coding of the other, and no common standard was created. Most of the fonts created during this period, either replaced the ASCII characters or replaced the Unicode Arabic characters (0x600-0x6FF) with Uyghur characters, without replacement agreement. Since the number of the Arabic letters in the code rage 0x600-0x6FF is larger than the number of ASU letters, people made different choices as they replaced some Arabic characters with ASU characters. Therefore, multiplication of the font names and the growth of coding differences (for the same glyphs) among the fonts became an obstacle to the development of ASU computer processing and web publishing. A large number of issues regarding nonstandard fonts and their use were addressed in many different ways to the individual computer scientists. Meanwhile, many of these problems were circumvented by using methods unrelated to the Unicode standard. As a result, web site creators eventually expressed their strong desire to further use the Unicode standard for Uyghur language processing. In June 2002, the author developed the first Uyghur Unicode font and implemented both system-level and browser-level Input Method Editors for Windows. It

became a revolutionary accomplishment, owing mostly to the new method and applications that are fully Unicode-compliant (as opposed to occasionally compatible). Hence, a campaign was launched to popularize and adapt the Unicode standard for Uyghur fonts. In this paper, we present the entire process that we have been following and developing for three years. The following subsections will cover four major parts of the entire implementation procedure.

3. Uyghur Unicode font developing Uyghur (ASU) letters have been developed on the basis of the Arabic alphabet from Arabic. The ASU alphabet has 8 vowels5 and 24 consonants (see annex1). Uyghur, just like Arabic, is written from right to left, each letter having different shapes depending on its position in a word. The Uyghur letters have initial, median, final and isolated forms; some letters have conjunct forms6. In total, the Uyghur alphabet has 126 different glyphs. The 108 basic glyphs 7 of the Uyghur letters have already been accepted by the Unicode Consortium/ISO, and 18 glyphs8 out of the 20 glyphs for composed forms were added in 1998. Unfortunately, two conjunct median forms (of the Uyghur letters ‫ ﺋﯥ‬and ‫ﺌﯧ )ﺋﻰ‬9 and ‫ﺌﯩ‬10 are still absent11 in the Unicode Standard’s table 12 – Arabic Presentation forms-A. This lack renders the Unicode Consortium/ISO as it stands incomplete and this has forced people to supplement it through borrowing from FBD1 and FBD2 the “supported hamze” which is then combined with the median′ form of ‫ ﺋﯥ‬and ‫ ﺋﻰ‬to generate two synthetic combined letters. The 20 conjunct glyphs can also be expressed as a sequence of two existing Unicode glyphs (as it is the case now for the two missing conjunct glyphs). But this kind of usage may cause problems like reducing text inputting speed, increasing data storage redundancy, complicating data sorting operations etc. 5

The Arabic alphabet only has 3 letters and for long vowels uses ‫ﺍ ﻭ ﻱ‬. The others are not noted in normal writing. Given its phonetic characteristics, Uyghur notes down all vowels: ،‫ ﺋﻪ‬،‫ﺋﺎ‬ ‫ ﺋﻰ‬،‫ ﺋﯥ‬،‫ ﺋﯜ‬،‫ ﺋﯚ‬،‫ ﺋﯘ‬،‫ ﺋﻮ‬, using derivates of traditional Arabic letters. 6 The initial form and, under some circumstances, the median form of all vowels is preceded by one “glottal stop sign ‫ ﺉ‬or ‫”ﺌ‬ (supported hamze) with which they form a common letter (treated by Uyghur as a single letter, see annex 2). ‫ ﻝ‬followed by ‫ ﺍ‬forms ‫ ﻼ‬or ‫ ﻻ‬depending on their position. 7 See http://www.oyghan.com/images/UyghurUnicodeTable.gif 8 See Arabic Presentation Forms-A, glyph code range: FBEA – FBFB. See also table 1. 9 Character name for the Unicode Standard: ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E MEDIAN FORM. Ex: ‫( ﺑﺎﻏﺌﯧﺮﯨﻖ‬Baghériq). 10 Character name for the Unicode Standard: ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA MEDIAN FORM. Ex: ‫( ﻗﻪﺗﺌﯩﻲ‬certainly, doubtlessly) 11 The XUAR’s delegation members, Prof. Hoshur Islam and Yasin Imin, who have submitted the proposition also admit this fault. See also Arabic Presentation Forms-A (code range: FBEA – FBFB). 12 http://www.unicode.org/charts/PDF/UFB50.pdf

The creation of a Unicode based Uyghur font has became a necessity for the progress of Uyghur information processing since the existing platforms do not include (supply) any Uyghur font. Existing fonts (both Arabic fonts and other fonts which include Arabic letters) do not include all the necessary shapes of Uyghur letters (see annex 2), and therefore some substitution sequences mislead display problems. For example: ‫ ﺋﺎﻟەﻣﺪىﻜﻰ هەﻣﻤە ﺋىﻨﺴﺎن ﻗەﺑىﻪ ﺋەﻣەس‬.1 ‫ ﺋﺎﻟﻪﻣﺪﯨﻜﻰ ھﻪﻣﻤﻪ ﺋﯩﻨﺴﺎﻥ ﻗﻪﺑﯩﻬ ﺋﻪﻣﻪﺱ‬.2 (Not all human beings in the world are evil) The first sentence above is considered illegal character combination if it uses existing fonts (ex: Times New Roman, Traditional Arabic) because the cursive shapes of ‫ﻯ‬, ‫ھ‬, ‫ ﺋﻪ‬are not correct according to the ASU alphabet (see annex 2). It should appear as in sentence 2 in which the letters use a specific font — UKIJ Tuz Tom. In order to create right cursive connection forms for Uyghur, it was necessary to take special measures for three problem-letters‫ ﻯ‬, ‫ھ‬, ‫ ﺋﻪ‬and two “glottal stop signs ‫ ﺉ‬, ‫”ﺌ‬ (supported hamze), during the creation of Uyghur fonts. The absence of such measures would make it impossible to display the cursive forms of the three letters correctly in browsers and other application software. ‫ ﻯ‬13 : Uyghur letter i as in ishik (‫ﺋﯩﺸﯩﻚ‬, door). The 8 different forms are listed in the table 1 below. For the initial′ and median′ forms (‫ ﯩ‬, ‫ )ﯨ‬of this letter we use the initial and median forms of the Arabic letter ‫ ﻯ‬0649; for the final′ and isolated′ forms (‫ ﻰ‬, ‫ )ﻯ‬we use the final and isolated forms of the Farsi letter ‫ ﻯ‬06CC, respectively. ‫ﺋﻪ‬14: Uyghur letter e as in eyneklerde (‫ﺋﻪﻳﻨﻪﻛﻠﻪﺭﺩە‬, in the mirrors). This letter uses the final and isolated glyph s(‫ ﻩ‬, ‫ )ﻪ‬of the Arabic letter ‫ ھ‬0647(h), in the same way as Persian does. This causes a special problem due to the fact that the glyphs of Arabic ‫ھ‬15 0647(h) in the initial and median positions(‫ ﻬ‬, ‫ )ھ‬correspond to those of Uyghur ‫( ھ‬h as in ‫ ھﯧﻠﯩﻬﻪﻡ‬hélihem, even now; ‫ ﮔﯘﻧﺎھ‬gunah, sin or offense; ‫ ﻗﻪﺑﯩﻬ‬qebih, odious), which, in turn, has different final and isolated glyphs(‫ ﻬ‬, ‫)ھ‬. In order to deal with this inconsistency, we have chosen to use 06D5 for the Uyghur letter ‫ ﺋﻪ‬and 06BE for the Uyghur letter ‫ھ‬. iso.′ fin.′ med.′ ini.′ iso. fin. med. ini. ‫ﺍ‬ ‫ﺎ‬ ‫ﯫ‬ ‫ﯪ‬ ‫ﻩ‬ ‫ﻪ‬ ‫ﯭ‬ ‫ﯬ‬ ‫ﻭ‬ ‫ﻮ‬ ‫ﯯ‬ ‫ﯮ‬ ‫ﯗ‬ ‫ﯘ‬ ‫ﯱ‬ ‫ﯰ‬ ‫ﯙ‬ ‫ﯚ‬ ‫ﯳ‬ ‫ﯲ‬ ‫ﯛ‬ ‫ﯜ‬ ‫ﯵ‬ ‫ﯴ‬

13 Character name for the Unicode Standard: ARABIC LETTER UIGHUR KAZAKH KIRGHIZ ALEF MAKSURA (represents YEH-shaped letter with no dots in any positional form), 0649. 14 Character name for the Unicode Standard:ARABIC LETTER AE (Uighur, Kazakh, Kirghiz), 06D (isolated form is ‫)ە‬. 15 See http://www.unicode.org/standard/where/ , Variant shapes of the Arabic character hah.

‫ې‬ ‫ﻯ‬

‫ﯥ‬ ‫ﻰ‬

‫ﯧ‬ ‫ﯩ‬

‫ﯦ‬ ‫ﯨ‬

‫ﯶ‬ ‫ﯹ‬ ‫ھ‬

‫ﯷ‬ ‫ﯺ‬ ‫ﻬ‬

‫ﺌﯧ‬ ‫ﺌﯩ‬ ‫ﻬ‬

‫ﯸ‬ ‫ﯻ‬ ‫ھ‬

Table 1. Uyghur vowels and the three problem-letters (the one Arabic character ‫ ھ‬hah has four different basic shapes, which correspond to the four shapes of two different letters in Uyghur).

‫ ﺉ‬and ‫ﺌ‬16: the glottal stop: this is a phoneme which is not listed separately in the ASU alphabet but still covered by its spelling rules. In Uyghur words, the glottal stop is not as strongly pronounced as it is in Semitic languages or in Uzbek, for example, and it has weakened to become no more than a hiatus. Marked in ASU by a hamza on top of a “tooth”, it appears usually in words of Arabic origin and replaces an original ‘ain (‫ )ع‬or a hamza (‫ )ء‬in a median or final position (e.g. ‫ ﺋﺎﻟﻪﻡ‬from Arabic ‫ﻋﺎﻟﹶﻢ‬, ‫ ﺳﺎﺋﻪﺕ‬from Arabic ‫ﺳﺎﻋَﺔ‬, ‫ ﺧﺎﺋﯩﻦ‬from Arabic ‫ ﺧﺎﺋِﻦ‬, ‫ﺳﻮﺋﺎﻝ‬ from Arabic ‫ﺳﺆَال‬ ُ ). In initial position, the same sign is considered as part of the initial form of a vowel and does not have any phonetic value 17 . They correspond to the initial and median forms of the Arabic letter ‫ ئ‬0626. These Arabic glyphs are not considered as different shapes of any independent letter in the Uyghur alphabet (cf. annex 2). Since one glyph of each of the two letters ‫ ﺋﯥ‬and ‫( ﺋﻰ‬shown in light red in the table above) are still missing in Unicode, we can use a sequence of either of these glyphs ( ‫ ﺉ‬or ‫ )ﺌ‬followed by the final, isolated, median′ or final′ forms of vowels ‫ ﺋﯥ‬and ‫( ﺋﻰ‬shown in blue in the table above). More precisely, the other conjunct forms can be obtained combining with the Arabic letter ‫ ئ‬0626 and a vowel respectively. In spite of the above mentioned limitations (two glyphs instead of one conjunct glyph for ‫ ﺋﯥ‬and ‫ )ﺋﻰ‬the above mentioned conventions have now been widely accepted by the Uyghur Computer Science Association(UCSA18), and at a later date, by the Xinjiang University branch of the 863 Research Group19. After having learnt the specificities of those letters, it is easy to create Uyghur fonts using existing font creating software. The inclusion of non-spacing combining marks, such as ZWJ (zero width joiner 200C), ZWNJ (zero width non-joiner; 200D), LTR (left to right mark; 200E), 16

Character name for the Unicode Standard: ARABIC LETTER YEH WITH HAMZA ABOVE and 0626. 17 It is often said that the decision of Uyghur linguists to add this sign as part of the initial form of letters is a link with the old Uyghur writing system, in which all initial vowels were preceded by a tooth. The Arabic alphabet has 3 letters, ‫ا‬, ‫ و‬and ‫ ي‬which can be used to indicate long vowels. Short vowels can be indicated through the use of vowel marks above or under the consonants but which are dispensed of in normal writing. Given its phonetic characteristics, Uyghur notes down all vowels: ،‫ﺋﺎ‬ ‫ ﺋﻰ‬،‫ ﺋﯥ‬،‫ ﺋﯜ‬،‫ ﺋﯚ‬،‫ ﺋﯘ‬،‫ ﺋﻮ‬،‫ﺋﻪ‬, using derivates of traditional Arabic letters. 18 UCSA – The Uyghur Computer Science Association (or UKIJ – Uyghur Kompyutér Ilimi Jem’iyiti in Uyghur) is a nonprofit association, founded by the author in Jan 2004. Web site: http://www.ukij.org 19 A National High-Tech Research Group, financed by the PRC government. The XJU branch is specialized in multilingual software development.

and RTL (right to left mark; 200F), is also recommended in any Uyghur font. The rest of the time-consuming repetitive font developing task is absolutely the same as when creating an Arabic script font 20 . Some Uyghur Unicode fonts are available for free at the UCSA website. Our recommended font creating tools are: Font Creator21 and Fontographer 22 . Glyph substitutions, positioning lookups and shaping features and Open Type tables of Arabic fonts can be added with the help of software like Microsoft VOLT.

4. Font embedding and character displaying Web pages can be rendered without downloading or installing any specific fonts if: 1) the fonts used in the pages are available on user’s computer, and 2) if the browsers provide native support for the fonts and languages used. The second condition has already been met but unfortunately the first one has not yet, as there are no Uyghur fonts available on the existing platforms that are installed on the users’ computers. Therefore, to ensure that Uyghur texts are displayed correctly in web browsers, users must find a way to install in their computers the fonts that are used in the web pages. The same holds true for all the other “forgotten languages” on different platforms. The font installation requirement either causes difficulties for people who don’t have much technical experience, or discourages others from attempting to read the text. These difficulties can be overcome by embedding fonts into the web pages. When a page is downloaded into a browser via the Hypertext Transfer Protocol, any embedded fonts in the page are also downloaded without any need for the user to intervene. The Microsoft Web Embedding Fonts Tool—WEFT 23 makes it possible to create embedded font objects that can be linked to web pages. The following steps let web pages developers create embedded fonts and link them to a web page: • Create embedded fonts using Microsoft WEFT • Prepare the web page using any fonts that are installed on the platform, and • Link the embedded fonts to the web page. Microsoft WEFT generates 1) embedded fonts for every web site with a different extension (.EOT), and 2) a script that links an embedding font to a web page. The disadvantage of the WEFT generated embedded fonts is that the fonts are compatible only with Internet Explorer. This makes it highly desirable for more efforts to be invested in providing a cross-platform functionality for this kind of software.

20

See http://www.microsoft.com/typography/OpenType%20Dev/arabi c/intro.mspx for more information about developing OpenType Fonts for Arabic Script 21 http://www.high-logic.com/fontcreator.html 22 http://www.fontlab.com/Font-tools/Fontographer 23 Free software at http://www.microsoft.com/typography/web/embedding/default. htm

5. Creation of a browser-level virtual input method As mentioned in the introduction, the existing platforms do not supply any system-level Uyghur language inputting service. Late in 2003, the first system-level Uyghur Unicode IME for Windows was developed by the author and distributed free of charge24. Six month later, the Xinjiang University branch of the 863 Research Group and some individuals started joining the Uyghur Unicode Popularization campaign by distributing their Unicode-supported IME. Nevertheless, it still can not be said that all or even most Uyghur internet users are equipped with Uyghur inputting tools. Therefore, the browser-level inputting method still fills a great need since it enables people to input Uyghur letter into any text-inputting field on a web page without having to install a system-level Uyghur IME. The basic structure of the browser-level Uyghur text inputting tool is represented as in figure 1: Keyboard and mouse events

events” module frees the hook immediately after the user decides to switch the inputting language to another one. This method has been implemented using JavaScript and VBScript language, tested on different browsers and commonly used in some Uyghur web sites25.

6. Multiscript converting Due to the co-existence of different writing systems (Arabic-Script Uyghur, Cyrillic-Script Uyghur and LatinScript Uyghur) for the Uyghur language, research on a conversion tool with which people can toggle between the three scripts is forthcoming for future information sharing. The fact that there is one-to-one correspondence 26 between the letters of these three writing systems is certainly a major helping factor. For better understanding, we take an example of the Uyghur proverb “working for free is better than doing nothing” in three scripts: ‫ﺑﯩﻜﺎﺭ ﻳﯜﺭﮔﯩﭽﻪ ﺑﯩﻜﺎﺭ ﺋﯩﺸﻠﻪ‬ бикар йүргичə бикар ишлə bikar yürgiche bikar ishle The following basic workflow explains the basic conversion process: Source text in source script

Input Uyghur?

no Pre-processing

yes Capture K.&M. Events

Character mapping Code – Char. Mapping Character converting Dispatch Events Disambiguation Switch Lang.?

no Conversion end.?

yes Release K.&M. Events

no

yes Result in destination script

Figure 1. workflow of the browser-level inputting method

Figure 2. script converting

As we can see from the workflow above, once the user selects the Uyghur Inputting option, the “capture keyboard and mouse events” module creates a hook to monitor the keyboard and mouse activities. The “codechar. mapping” module creates a keycode-to-UyghurCharacter matrix to get the right Uyghur character that corresponds to the key code (ex: 109 Æ ‫)ﻡ‬. The “dispatch events” module sends Uyghur characters from the map to the active text inputting field on a web page. This process repeats itself until the “release keyboard and mouse

The functionalities of each module may require some clarification: Pre-processing: this is an important step in converting. It involves preserving elements that should remain unchanged27 after the conversion. For example, when converting LSU text “Men Photoshop ni yaxshi körimen” (I love Photoshop) into ASU, we should be able to obtain “‫ ﻧﻰ ﻳﺎﺧﺸﻰ ﻛﯚﺭﯨﻤﻪﻥ‬Photoshop ‫ ”ﻣﻪﻥ‬and vice-versa.

24

More than 200,000 downloads counted since Dec 2003 from www.oyghan.com and www.bizuyghur.com/oyghan .

25 See www.ukij.org , www.biliwal.com, www.oyghan.com, www.uyghurdictionary.org etc. 26 The only exception is j (as in jurnal) in LSU 27 This is the case of hypertext links, HTML tags and proper names.

Character mapping: creates an “A_is_B” matrix for every script pair, or three matrices in total. Character converting: uses the three matrices in order to convert between the different scripts. Disambiguation: this module is necessary when converting from LSU to ASU and/or CSU, because of spelling mistakes or, more importantly, because of the problems due to the difficulty encountered in typing the LSU diacritical makes on many keyboards: very commonly, the letters Ö, Ü, É, ö, ü and é are replaced by O, U, E, o, u and e. This may cause fatal errors. For example: öltürüsh (to kill) Ù olturush(to sit, party), térim yer (cultivable land) Ù terim yer (who eats my sweat), yétim(orphan) Ù yetim(spelling mistake). Besides, spelling mistakes due to the poor grasp of LSU rules are significant problem. All these problems require intensive language processing. This functionality of the multiscript converting tool28 that we have released on the internet is still under development. The following images will help you understand our converting tools which use above mentioned methods.

Image 1. Offline plug-in version for Microsoft Word

The embeddable web fonts, generated by third-party software WEFT, are compatible only with Internet Explorer. Therefore, we are truly looking forward to more efforts by the computer software industry to expand compatibility. We expect to improve the pre-processing module of the converting tool to make it more userfriendly. There are undoubtedly other theoretical issues to resolve especially in the disambiguating of LSU misspelled words. Another important problem related to Uyghur is the major impediment to developing a spell-check functionality caused by its agglutinative language, coupled with associated spelling changes in root words. This work is going to be the focus of our attention in a next stage of development. Finally, we call on software companies not to omit the Uyghur from their supported language list in the future.

8. References [1] Waris A. Janbaz, Online Uyghur Unicode processing technique and its implementation (publication in Chinese), Xinjiang University Press, China, 2002. [2] Abdurehim, Waris A. Janbaz, Orthographic rules of the Latin-Script Uyghur (in Uyghur) , 2004, http://www.ukij.org/teshwiq/UKY_Heqqide(KonaYe ziq).htm. [3] The Unicode Consortium The Unicode Standard, Version 4.0, Addison-Wesley Professional, ISBN: 0321185781, USA, 2003. [4] Xinjiang University, Proceedings 2000 International Conference on Multilingual Information Processing. Ürümchi (publication in Chinese), China, 2000. [5] The Unicode Consortium Website http://www.unicode.org [6] Reinhard F. Hahn, Spoken Uyghur. Washington: the University of Washington Press, ISBN: 0-29597015-4, USA, 1991.

Annex 1: Arabic-Script Uyghur, CyrillicScript Uyghur and Latin-Script Uyghur Alphabets

Image 2. Online demo version

7. Conclusions and future work Our work to date has focused mainly on the design and implementation issues related to creating Uyghur Unicode fonts, as well as on browser-level input method and multi-script converting application. According to user feedback, we feel fairly satisfied with the results of this first ever research on Uyghur language processing. 28

Online demo version is available at http://www.uyghurdictionary.org/tools.asp, offline plug-in version for Microsoft Word is available at http://oyghan.com/OTB/index.html

‫ﺥ‬

‫چ‬

‫ﺝ‬

‫ﺕ‬

‫پ‬

‫ﺏ‬

‫ﺋﻪ‬

‫ﺋﺎ‬

ASU

x

ch

j

t

p

b

e

a

LSU

x

ч

җ

т

п

б

ə

а

CSU

‫ﻑ‬

‫ﻍ‬

‫ﺵ‬

‫ﺱ‬

‫ژ‬

‫ﺯ‬

‫ﺭ‬

‫ﺩ‬

ASU

f

gh

sh

s

j (zh)

z

r

d

LSU

ф

ғ

ш

c

ж

з

р

д

CSU

‫ھ‬

‫ﻥ‬

‫ﻡ‬

‫ﻝ‬

‫ڭ‬

‫گ‬

‫ﻙ‬

‫ﻕ‬

ASU

h

n

m

l

ng

g

k

q

LSU

һ

н

м

л

ң

г

k

қ

CSU

‫ﻱ‬

‫ﺋﻰ‬

‫ﺋﯥ‬

‫ۋ‬

‫ﺋﯜ‬

‫ﺋﯚ‬

‫ﺋﯘ‬

‫ﺋﻮ‬

ASU

y

i

é

w

ü

ö

u

o

LSU

й

и e в ү ө у o Additional Cyrillic letters : ы ё ц э ю я

CSU

Annex 2: Arabic-Script Uyghur Alphabet with shapes

Uyghur language processing on the Web

In this paper, we present the entire process that we have been following and ..... converting LSU text “Men Photoshop ni yaxshi körimen” ... Offline plug-in version for Microsoft Word. Image 2. Online demo version. 7. Conclusions and future ...

448KB Sizes 1 Downloads 155 Views

Recommend Documents

Uyghur language processing on the Web
platforms don't supply any Uyghur input method nor any fonts that including all the glyphs of the ASU alphabet, inputting Uyghur text into interactive web pages (in the .... Not all human beings in the world are evil. (. The first sentence above is c

Language Education And Uyghur Identity In Urban ...
You can download this ebook, i offer downloads like a pdf, kindle, term, txt, ... These days, creating the web book turns into extremely increasingly.1 in the ... obtain link on this page and also you will probably be directed towards the totally fre

Blunsom - Natural Language Processing Language Modelling and ...
Download. Connect more apps. ... Blunsom - Natural Language Processing Language Modelling and Machine Translation - DLSS 2017.pdf. Blunsom - Natural ...

natural language processing
In AI, more attention has been paid ... the AI area of knowledge representation via the study of ... McTear (http://www.infj.ulst.ac.uk/ cbdg23/dialsite.html).

Mathematical Language Processing
Permission to make digital or hard copies of all or part of this work for personal or classroom ...... to build predictive models, as in the Rasch model [27] or item.

Impact of Web Based Language Modeling on Speech ...
volves a large company's call center customer hotline for tech- nical assistance. Furthermore, we evaluate the impact of the speech recognition performance ...

Feature Preprocessing on Web Page Language ...
sequently, the problems of digital divide across the internet will get more serious. Web page ..... language in a complex multilingual business environment.

Impact of Web Based Language Modeling on Speech ...
IBM T.J. Watson Research Center. Yorktown Heights, NY ... used for language modeling as well [1, 4, 5]. ... volves a large company's call center customer hotline for tech- ... cation we use a natural language call–routing system [9]. The rest of ..

until 2014 - World Uyghur Congress
one-year anti-terror campaign initiated back in May in response to a deadly incident in Urumqi that left 43 dead. .... strong trade and diplomatic ties to. China. .... The right to operate Uyghur language schools has been largely supplanted by the.

until 2014 - World Uyghur Congress
The United States, the UK, Canada, Germany, Australia and Italy, .... 14 http://www.economist.com/blogs/analects/2014/01/more-violence-East Turkestan ...... science professor from Hong Kong University of Science and Technology, argued.

Syntactic Processing in Aphasia - Language and Cognitive ...
That noted, we hasten to add that the Wernicke's patients are not likely to be entirely ..... All are variations on the same theme, namely, that syntactic limitations.

ReadDownload Spoken Language Processing: A ...
ReadDownload Spoken Language Processing: A Guide to. Theory, Algorithm and System Development EPUB Full book. Book Synopsis. This will be the definitive book on spoken language systems written by the people at Microsoft. Research who have developed t

Natural Language Processing Research - Research at Google
Used numerous well known systems techniques. • MapReduce for scalability. • Multiple cores and threads per computer for efficiency. • GFS to store lots of data.

Social language processing - Cornell blogs - Cornell University
example of a high-level feature would be the degree of cohesion in deceptive texts, since liars are expected ...... in Arabic from the Internet. There was a total of ...

Natural Language Processing (almost) from Scratch - CiteSeerX
Looking at all submitted systems reported on each CoNLL challenge website ..... Figure 4: Charniak parse tree for the sentence “The luxury auto maker last year ...

Social language processing - Cornell blogs - Cornell University
cate with the public the narrative that defines their cause. The content and style ...... links to al-Qaeda) and non-false statements (e.g. that Hussein had used gas on his own people) produced ... Iraqi Intelligence Service (IIS). In one report by a