Statistical Machine Translation

Term Project 1. Don’ts-to-Do’s English-English Decoder 2. Chinese Spell Check Decoder 3. Natural User Interface to Linggle 4. Term Translator Jason S. Chang National Tsing Hua University 2013-0527 1

Entire contents © 2012 Jason S. Chang. All rights reserved.

13年5月27日星期一

3. Natural User Interface to Linggle

2

Entire contents © 2011 Jason S. Chang. All rights reserved.

13年5月27日星期一

Using MT decoder to handle NUI input • Scenario: convert NUI input to command • What are some words that describe a fork? --> $N fork --> $A fork • Training – language model • Compute pos-word ngram based on Web 1T 5-grams – $N spoon 9999, $A spoon 9999 – translation model • Crawl answer.com • Obtain QA pairs – Q = What are some words that describe a spoon?' – A = silver silver plate silverware stainless plastic wooden soup dessert utensil... – dominant pos in A = noun • add (what are some words that describe a ||| $N ||| 0.0) to data/tm 3

Entire contents © 2011 Jason S. Chang. All rights reserved.

13年5月27日星期一

Data and code related to decoder • Data – data/input ── test sentence (NUI sentences) – data/lm ── language model (commands ngrams) – data/tm ── mapping between NUI sentences and commands • Code – decode.py ── INPUT bad English sentence (data/input) ── OUTPUT bad/good English sentence (data/output) – models.py ── INPUT language+translation models (data/lm+tm) ── OUTPUT dictionary of LM and TM

4

Entire contents © 2011 Jason S. Chang. All rights reserved.

13年5月27日星期一

Training data

5

• Data – bncWordLemmaPos.txt ── BNC with lexical information – 2ga.txt, 3ga.txt ──Web 1T ngram filtered (GSL+AWL) • Training – language model • use the Web 1T to generate commands • e.g., sandy beach --> Linggle commands – $A beach – sandy $N • generate language model from Linggle commands – translation model • use qestion-answer pair • map question phrase to dominating part of speech in answer • generate (question phrase, command symble, probability)

Entire contents © 2011 Jason S. Chang. All rights reserved.

13年5月27日星期一

Finding Dominating POS • Data – bncWordLemmaPos.txt ── BNC with lexical information • Code – findAnswerPOS.py ── 計算 answer 部分的主要詞性 • A = Some words that can describe the noun spoon are: silver silver plate silverware stainless plastic wooden soup dessert utensil... • OUTPUT = noun

6

Entire contents © 2011 Jason S. Chang. All rights reserved.

13年5月27日星期一

Using MT decoder to handle NUI input • Modify decoder – modify lm • add $N spoon, $A spoon, $N fork, $A fork in 2-gram • add $N, $A in 1-gram – modify tm • what are some words that describe a ||| $N ||| -0.1 • what are some words that describe a ||| $A ||| 0.0 – modify input • add “what are some words that describe a fork?”

7

Entire contents © 2011 Jason S. Chang. All rights reserved.

13年5月27日星期一

Term Project

bncWordLemmaPos.txt ── BNC with lexical information. – 2ga.txt, 3ga.txt ──Web 1T ngram filtered (GSL+AWL). • Training. – language model. •use the Web ...

317KB Sizes 2 Downloads 170 Views

Recommend Documents

Term Project Guide
Hundreds = 0;. Tens = 0;. Ones = 0; for (i=0; i= 5 if (Hundreds >= 5) Hundreds += 3; if (Tens >= 5) Tens += 3; if (Ones >= 5) Ones ...

Leandro Capalleja MSF 524 Term Project White Paper.pdf ...
Leandro Capalleja MSF 524 Term Project White Paper.pdf. Leandro Capalleja MSF 524 Term Project White Paper.pdf. Open. Extract. Open with. Sign In.

TURB 101-Exploring Cities Spring 2012 Term Project ...
development of the neighborhood and 3) wh be for the neighborhood. We will be using Google Sites to upload these first be submitted on Catalyst in Word forma your own individual Google Sites. By the end. Sites for the project with uploaded informatio

Term Project for 995202088: Mining Interesting Ngrams ...
results, problem definition, and formal evaluation will also be presented to show ..... In application development, we can also make use of concept hierarchies to ...

Short-Term Momentum and Long-Term Reversal in ...
finite unions of the sets C(st). The σ-algebras Ft define a filtration F0 ⊂ ... ⊂ Ft ⊂ . ..... good and a price system is given by q ≡ {q1 t , ..., qK t. }∞ t=0 . Agent i faces a state contingent solvency constraint, B ξ i,t(s), that limi

short-term or long-term - European Medicines Agency - Europa EU
Sep 26, 2016 - Stage 1 Online registration . ..... In the case of specialist or further training, candidates must specify whether the course was full-time or part-time ...

Exploiting the Short-Term and Long-Term Channel Properties in ...
Sep 18, 2002 - approach is the best single-user linear detector1 in terms of bit-error-ratio (BER). ..... structure of the mobile radio channel, short-term process-.

Term structures
Thus, depending on the setup of the general equilibrium model, the marginal rates of substitution of ...... Journal of Financial Economics 79, 365-399. Wachter ...

Term Mark
Jun 6, 2017 - Class. Avg. (%). 1. 322987967. 96. 84. 12. 2. 323026047. 84. 84. 0. 3. 323139576. 92. 84. 8. 4. 323343699. 98. 84. 14. 5. 323343905. 86. 84. 2.

Term X
Dec 15, 2017 - Telephone: (02) 9642 8199. Fax: (02) 9642 6729. eMail:[email protected] www.stmbelfield.catholic.edu.au. Staff – 2018. School Leadership Team: Principal. Mrs Mary Colagrossi. Assistant Principal. Mrs Sandra Mendonca. L

Term Mark
Mar 31, 2017 - Type Codes: [S]ummative, [F]ormative, [D]iagnostic, [S]elf, [P]eer. Mr. Ho. Unit: [ALL]. Category: [ALL]. Average: 81%. Type: S----. Median: 87%.

Parallel Pursuit of Near-Term and Long-Term Mitigation.pdf ...
Page 1 of 2. 526 23 OCTOBER 2009 VOL 326 SCIENCE www.sciencemag.org. POLICYFORUM. It is well accepted that. reduction of carbon diox- ide (CO2. ) emissions is. the lynchpin of any long-term. climate stabilization strat- egy, because of the long life-

term-paper.pdf
cut-eliminable sequent system for intuitionistic logic, and add the truth rules. while removing the cut rule; the arguments given by Ripley [201+b] for the. classical ...

Term Planners.pdf
onomatopoeia in poetry and prose. Australian Curriculum - English in Modes – Year 3 Term Planner. Reading Wring S&L. Page 4 of 15. Term Planners.pdf.

Term Mark
Feb 24, 2017 - Lawrence Park CI. MCV4U105 - 2016/2017. Term Mark. Type Codes: [S]ummative, [F]ormative, [D]iagnostic, [S]elf, [P]eer. Mr. Ho. Unit: [ALL].

Term Sheet
Management fee. 0,166% per month. Performance fee. 20%. High water mark. Perpetual high watermark. Costs. Direct related investment transaction costs only.

Term Mark
Apr 13, 2017 - 98. 82. 16. -----. -----. NOTE: 'NoMark' entries are NOT included in mark calculations. DISTRIBUTION. 0 to 39.9: 1. 40 to 49.9: 2. 50 to 59.9: 0.

Project 4.3 - Project Proposal - GitHub
Nov 5, 2013 - software will find the optimal meet time for all users. This component is similar to the ... enjoy each others company! Existing Approaches:.

3G Long Term Evolution - 3g4g.co.uk
Mar 27, 2007 - FDD preferred if paired spectrum available ... layer transmission, and to enable frequency-domain channel ... time-domain already for HSPA.

3G Long Term Evolution - 3g4g.co.uk
Mar 27, 2007 - EPC: Evolved Packet Core. MME: Mobility Management ..... and unicast on the same carrier as well as dedicated multicast/broadcast carrier ...