Issues in Visualizing Intercultural Dialogue Using Word2Vec and t_SNE Heeryon Cho and Sang Min Yoon HCI Lab., College of Computer Science, Kookmin University, Seoul, 02707, South Korea [email protected], [email protected]

Culture & Computing 2017, 10-12 September, 2017, Doshisha University, Japan ►ABSTRACT◄ One way to visualize an intercultural dialogue is to plot keywords jointly used by the intercultural speakers to see how the keywords locate relatively to each other, with the position of the keywords signifying some kind of a similarity relationship. We processed a Japanese transcription of a Korean-Japanese dialogue using Word2Vec and t-SNE algorithm to generate various 2D plots of the noun words jointly used by the Korean and Japanese speakers. Through this visualization process, we tracked down some of the issues involved in generating a meaningful visualization of the noun words jointly used by the intercultural speakers.

►ACKNOWLEDGMENT◄ This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) grants funded by the Korean Ministry of Science, ICT & Future Planning (NRF-2017R1A2B4011015) and Korean Ministry of Education (NRF-2016R1D1A1B04932889).

Figure 1. 2D plots of the same twenty nouns jointly spoken by the Japanese and Korean speakers during an intercultural dialogue. The same meaning nouns spoken by the two countries’ speakers are linked with colored lines. Different t-SNE learning rates induce different visualizations as shown in Fig 1. left & right, but some words are positioned relatively similarly to each other as in the case of ‘military’ and ‘criticism’. Country names such as ‘U.S.’, ‘China’, ‘Japan’, and ‘Korea’, and ‘history’ and ‘economy’ are also located near each other regardless of different t-SNE learning rates. Fig 1. left: t-SNE learning rate = 800, Fig 1. right: t-SNE learning rate = 1000

►RESEARCH SUMMARY◄

우리들은…

 The Problem How well is an intercultural dialogue taking place?

►ISSUES & FINDINGS◄

我々は…

 Keyword Selection - Jointly spoken noun words were automatically identified by taking the intersection - However, selection of meaningful and interesting words was subjective and difficult to automate

 One Solution Visualize intercultural dialogue by plotting noun words jointly spoken by the participants

 Word2Vec Generation - Initially, we generated each country’s word embedding vectors using country-specific texts - Result was wildly varying word embedding vectors that were difficult to compare - We used unknown-speakers’ texts to first generate ‘pivotal’ vectors and updated only the noun word embeddings using country-specific texts

 Ultimate Goal Build a dynamic system that visualizes intercultural dialog by tracking jointly spoken keywords  Case Study - Japanese transcript of Korean-Japanese dialogue discussing the present and future of Korea-Japan relations was processed and key noun words were plotted on a 2D space using NLP technology - We identified several issues in visualization by reviewing the step-by-step visualization process we have taken

 t-SNE Parameter Selection - t-SNE learning rate and perplexity value affected visualization outcome - Perplexity value was adjusted to be smaller than the number of plotting points (i.e., words)

►VISUALIZATION PROCESS◄ Japanese Transcript of Korean-Japanese Dialog

① PDF2Text Conversion

Text Format Korean-Japanese Dialog

② Manually Split Text Files

Three Files: KO / JA UNKNOWN

Word Token & Part of Speech

③ Obtain Word & POS

④ Obtain Word Embedding

100D Word Embedding (Word2Vec) INPUT

PROJECTION

2D Visualization (t-SNE) OUTPUT

w(t-2) w(t-1)

SUM

⑤ Reduce Dimension (2D Vis.)

w(t) w(t+1) w(t+2)

►REFERENCES◄ [1] H. Heuer, “Text comparison using word vector representations and dimensionality reduction,” in Proc. EuroSciPy, 2015, pp. 13–16. [2] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proc. NIPS, 2013, pp. 3111–3119. [3] L. Van Der Maaten, “Accelerating t-SNE using tree-based algorithms,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 3221–3245, Jan. 2014. [4] H. Morita, D. Kawahara, and S. Kurohashi, “Morphological analysis for unsegmented languages using recurrent neural network language model,” in Proc. EMNLP, 2015, pp. 2292–2297. [5] R. Rˇ ehu°rˇek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” in Proc. LREC 2010 Workshop on New Challenges for NLP Frameworks, 2010, pp. 45–50. [6] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, Nov. 2011. [7] M. Wattenberg, F. Vi´egas, and I. Johnson, “How to use t-SNE effectively,” Distill, 2016.

Heeryon Cho and Sang Min Yoon

Sep 12, 2017 - HCI Lab., College of Computer Science, Kookmin University, Seoul, ... The same meaning nouns spoken by the two countries' speakers are ...

1MB Sizes 2 Downloads 208 Views

Recommend Documents

Hover / Howeler+Yoon
Hover / Howeler+Yoon. Page 2. Facade / Pei Partnership. Page 3. Ceiling / Gage-Clemenceau. Page 4. Random Webs / Keith Peters. Page 5. FLUX Installation ...

sang-pemimpi-full.pdf
Page 2 of 109. PDF by Kang Zusi. Mozaik 1. What a Wonderful World. Daftar Isi Daratan ini mencuat dari perut bumi laksana tanah yang dilantakkan tenaga.

108.84 Min: 0 Max: 1384.67 Min: 0 Max: 1916.72 Min -
0. 30. 60. 90. ϕ1. Φ. Max: 0. 0◦. Min: 0. 0 30 60 90120150180210240270300330360. 0. 30. 60. 90. ϕ1. Φ. Max: 0. 5◦. Min: 0. 0 30 60 ...

Eu Yan Sang - RHB Research Institute
Aug 26, 2015 - and wellness products under the Eu Yan Sang brand name. ... as, an offer or a solicitation of an offer to buy or sell the securities .... websites.

CHO Centrosome Prep_SIM_homepage.pdf
nucleated from isolated centrosomes in Xenopus egg extracts. 1. Introduction. Throughout the cell cycle the centrosome is the dominant microtubule- organizing ...

KQ AVTNhi Sang K100.pdf
KẾT QUẢ KIỂM TRA CUỐI LỚP AV Thiếu nhi Ï­A sĄng Ï­, Ca Ï®: εhϯϬ - Khu Ï­ ;ϭϬϬTEÏ­ASϬϭͿ-Ϭϭ/KHUÏ­. Tƌắc. nghiệm Vấn đĄp Kết Ƌuả. Page 1 of 37 ...

146. Azab Sang Murid.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 146. Azab Sang ...

1.00 Min.
Winner Win_Score Los_Score. Loser. Pittsburgh Steelers : 6 Min. :14.00 Min. : 3.00 Buffalo Bills : 4. Dallas Cowboys : 5 1st Qu.:23.00 1st Qu.:10.00 Denver Broncos : 4. San Francisco 49ers : 5 Median :29.50 Median :16.50 Minnesota Vikings : 4. Green

10 min trainer
Cassandraclare pdf.10 min trainer.548087392536.Refrigeration ... Big bang 14. Yat il un flic ... Thelastman earth s01e10. ... Big ban theory completeseason 8.

Yoon & Zinbarg GAD (2007).pdf
For example, Durham et al. (1997) investigated predictors of 12-month post-treatment (either. cognitive therapy, analytic psychotherapy, or anxiety management ...

Yoon & Quartana 2012.pdf
Jul 4, 2012 - Nonetheless, the so- cial phobia group still self-reported higher levels of somatic. sensations than the control group. Thus, socially anxious. individuals seem to misperceive higher levels of arousal. compared to controls. In a recent

nmol/min
?ber Which is added to a food product above and beyond any dietary ?ber ...... accounting for area ampli?cation by villi and microvilli) were estimated as the ...

MIN WEI
Aug 6, 2015 - Dissertation Title: Human Capital, Business Cycles and Asset Pricing ... Programs” (with Canlin Li), 2013, International Journal of Central ...

Everything, Everything Nicola Yoon
Theinvisible man ralph.051375827.C fundamentals iand ii.R.e.e.o. mix vol.Ms king.Wisin ft ... The box and whisker. graph shows that which forevermoreshall be ...

Quartana, Yoon, Burns 2007.pdf
Page 1 of 1. Page 1 of 1. Quartana, Yoon, Burns 2007.pdf. Quartana, Yoon, Burns 2007.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Quartana, Yoon, Burns 2007.pdf. Page 1 of 1.

Eu Yan Sang - RHB Research Institute
Aug 26, 2015 - in Hong Kong, which has affected sales to parallel traders from. Mainland China. ... Soft retail market across the board, coupled ..... websites.

10 min presentation - Anna dR.pptx
1) Development of the New Zealand Geospatial. Research and Development Strategy (NZGRDS). 2) Canterbury SDI /Spatial Data Infrastructure. Programme ...

Iskolar Sang Iloilo Program.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Iskolar Sang Iloilo Program.pdf. Iskolar Sang Iloilo Program.pdf. Open. Extract. Open with. Sign In. Main me

sang nabi kahlil gibran pdf
Download now. Click here if your download doesn't start automatically. Page 1 of 1. sang nabi kahlil gibran pdf. sang nabi kahlil gibran pdf. Open. Extract.

karaoke vietnamese cho android.pdf
Page 1 of 1. Karaoke vietnamese cho android. Page 1 of 1. karaoke vietnamese cho android.pdf. karaoke vietnamese cho android.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying karaoke vietnamese cho android.pdf. Page 1 of 1.

Vigor2920 - QoS cho game online.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Vigor2920 - QoS ...

CHO Metabolism_94-95-2.pdf
Fifa 15 pccrack.Cougar recruits janet. Avataylor 18yearsold.496337311.Queen s Blade.Tiny times 3.Federer djokovic 2015.Thethirty three.Saint james in. Theshawshank redemption. yify.Robyn and Tonya.Doctor ... CHO Metabolism_94-95-2.pdf. CHO Metabolism

pdf-1834\and-no-birds-sang-by-farley-mowat.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.