LOD2 KOREA : Towards Publishing Korean Linked Data on the Web Key-Sun Choi

Joint work with Martin Rezk Jungyeul Park

Yoon Yongun Kyungtae Lim

YoungGyun Hahm

Key-Sun Choi - Personal History • • • •

• • • •

NEC C&C Lab. – PIVOT Japanese-Korean Machine Translation Korean Part-of-Speech Tagset, Corpus, Dictionary CoreNet (Korean-Chinese-Japanese) Semantic Wordnet (2004) KORTERM: Korea Terminology Research Center for Language and Knowledge Engineering (1998-2007), Research Center of Ministry of Culture KAIST Research Grand Award (1998) ISO/TC37/SC4 Founding member (Language Resource Management Standards) ISWC 2007 PC Co-Chair (International Semantic Web Conference) AFNLP President (2009-2010)

• DBPedia Korea http://ko.dbpedia.org/ • http://lod2.eu/ partner (EU FP7)

2

NLP2RDF • Subject • Object • Predicate

• Extract from Sentences • 野生種의 장미는 主로 北半球의 溫帶와 寒帶 地方에 分布한다. • Wild rose is located mainly in the northern hemisphere of its temperate and figid zones. 1. 2. 3.

Subject : 장미 (rose) Object : 북반구의 온대 지방, 한대 지방 (Northern hemi-sphere, Temperate and Frigid Zones) Predicate : 分布 (isDistributedAt)

Key-Sun Choi - LOD2 Korea

• Triple in Natural Language

3

4

4

제조회사 consists_of

플랫폼

제조사 reside_on 시스템

소프트웨어

개발환경

운영체제

응용 프로그램

미들웨어 임베디드 시스템

임베디드 소프트웨어 브라우져

임베디드 운영체제

미디어 플레이어 실시간 임베디드 운영체제

RTOS

통신 미들웨어

가전기기

비실시간 임베디드 운영체제

디지털카메라 DVD 플레이어

MP3플레이어 셋탑박스

VRTX VxWorks

pSOS

WinCE

5 마이크로소프트

Wind River

5

NLP2RDF (based on DBpedia Ontology) Barack Obama URI = dbpedia12415 (conceptonal Unique) President United States Democrats ,,,

LOD algorithm Barack Obama is the President of the United States Barack Obama URI = sen1word1 (documentary Unique) NNG ,,,

“KNIF” Wrapper

The Output of NLP tools

Sentence: ‘Barack Obama is the President of the United States’

For these work 1.

For RDF Mapping

• String Ontology • Structured Sentence Ontology • NIF and Korean language

2.

For LOD Mapping • •

URI for DBpedia entity Mapping Word in Text  DBpedia

Key-Sun Choi - LOD2 Korea

• Triples and URI • Ontology

8

Parser tree to Summary • 물체의 낙하 거리는 시간의 제곱에 비례한다

2. Predicate • 비례한다 3. Contents • 시간의 제곱

Key-Sun Choi - LOD2 Korea

1. Subject • 물체의 낙하거리

10

Why NLP? Why Syntactic, Semantics?

Key-Sun Choi - LOD2 Korea

• Advanced technology on the higher-level layers

11

Key-Sun Choi - LOD2 Korea

NLP Layer Cake

12

Semantic Web vs. NLP layer cake

Discourse

John: X1, room: L2

Syntactic structure

subject, object, predicate

Phrase

Room in 2nd floor

Semantic tagging

[John: Human], [2-FL: Loc], [seminar-room: Room]

Morph. Analysis

+가//2+층+에

POS tagging

NPP/JOSA//Numeral/

Tokenization

철수가//2층에//

String URI

Encoding

Key-Sun Choi - LOD2 Korea

철수가 2층에 있는 세미나실을 예약한다. John-SUBJ 2-floor-LOC room-OBJ reserve-FIN

13

How to develop parser and semantic classifier creatively? • Open Source NLP tools • Rich English, Japanese open tools/resources • A few Korean tools

• Already developed Korean language resources • KAIST tools/resources • KAIST open source in sourceforge and web • Cambridge University Press: NLP Textbook (undergoing)

• Linked Data – http://lod2.eu/ partner

Key-Sun Choi - LOD2 Korea

• How to adapt Korean tools to the already developed tools

14

• The idea of linking data from different sources is not new: • Network Database Model: 70’s • Linked Data: Today • The goal is to facilitate sharing and re-using information. • Linked Data aims to extend the Web with data commons by creating typed links between data from different sources

Key-Sun Choi - LOD2 Korea

Background

15

Background

• Each piece of data is identified with an URI

Key-Sun Choi - LOD2 Korea

• These links are usually modeled using the Resource Description Framework (RDF)

• The first task towards linking data is to identify which resources and which properties we want to describe 16

• NLP2RDF is a LOD2 Community project that is developing the NLP Interchange Format (NIF) • NIF aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations • The output of NLP tools can be converted into RDF and used in the LOD2 Stack • http://nlp2rdf.org

 NIF… •

Is based on RDF/OWL



Enables users to annotate for several languages in a uniform way



Enables users to query text documents with SPARQL (EX http://semanticweb.kaist.ac.kr/nlp2rdf/ )

• •

Sentence : 다크나이트는 미국의 영화이다. Dark knight is a American film.

Key-Sun Choi - LOD2 Korea

Introduction

17

18

Key-Sun Choi - LOD2 Korea

NIF Wrapping

Key-Sun Choi - LOD2 Korea

• NLP Interchange Format (NIF) is an RDF/OWL-based format that allows to combine and chain several Natural Language Processing (NLP) tools in a flexible, light-weight way.

19 Sebastian Hellmann, AKSW, Universitat Leipzig, NLP Interchange Format(NIF)

Structure of NLP2RDF

Interchange

Layer

Data Layer

Key-Sun Choi - LOD2 Korea

NLP Layer

20

Example of NLP Layer

English NLP

Tokenization

CFG Parser

Dependency Parser

Key-Sun Choi - LOD2 Korea

Input Sentence

21

How to create RDF from NLP output Process

Example

Raw Texts

My dog also likes eating sausage.

output NIF Wrapper

StanfordWrapper.Java

Key-Sun Choi - LOD2 Korea

NLP Tools

RDF 22

Example of NLP2RDF in ENG • http://nlp2rdf.lod2.eu/demo.php

sso:oliaLink ; sso:posTag "NNP" ; sso:lemma "Obama" ; str:referenceContext ; str:anchorOf "Obama" ; rdf:type sso:Word , str:String .

Key-Sun Choi - LOD2 Korea

• Sentence: Obama is the president of USA.

23

Korean NLP2RDF

• Properties: POS, grammatical roles, etc. • Problems to solve: • Linguistic Modeling (OLiA) • Processing Korean Text (NLP) • How to Produce and Query RDF

Key-Sun Choi - LOD2 Korea

• Resources: morphemes, words (eojeols) and sentences in Korean

24

Linguistic Modeling (1) • We use OLiA (Ontologies of Linguistic Annotation) to link the Sejong tagset with language-independent reference concepts.

• OLiA consists of three different ontologies: • the OLiA reference model (language-independent), • the OLiA annotation model (depends on the tagset), • the OLiA linking model (depends on the tagset).

• We developed a fragment of these last two ontologies for Korean, that is, for the Sejong tagset.

Key-Sun Choi - LOD2 Korea

• Sejong tagset is a Korean default standard

25

Linguistic Modeling (2) • We use the NIF (NLP Interchange Format) to

• NIF provides two URI schemes to identify resources • Offset-based • Hash-based

Key-Sun Choi - LOD2 Korea

• standardize the input/output of the different tools to ease to connection among them, and to • uniquely identify (parts of) text, entities and relationships.

• We opt in our application for the Hash-based 26

Korean NLP2RDF Platform RAW Text

HanNanum •

Korean Open Source Morpheme Analyzer



Developed by SWRC, KAIST

Morpheme Analyzer •

Training set: Modified Sejong Treebank (DongHyun Choi, Jungyeul Park, Key-Sun

Parser

Choi , Korean Treebank Transformation for Parsr Training, ACL - SPMRL 2012) •

F1-score: 82.12%

Wrapper

Key-Sun Choi - LOD2 Korea

Korean Berkeley Parser

Produce triples



Use OLiA (Ontologies of Linguistic Annotation) to link the Korean tagsets with

NIF output

language-independent reference concepts •

The OLiA annotation model and the OLiA linking model produce triples using the Sejong tagset

27

Korean Language information

Morph. Analyzer

Input Korean Sentence

CFG Parser

Korean Grammar Framework Parsed result URI, Tag

Dependency Parser

DataBase Mappings Ontologies

RDF triples

SPARQL Query

RDF generator

SPARQL Query Handler

OnTop Framework

RDF triples

Key-Sun Choi - LOD2 Korea

Korean NLP

28



Each piece of data is identified with an URI (Hash-based)



Resources: Morphemes, Words (eojeols), Sentences in Korean



Properties: POS-tag, Grammatical roles, etc.

Some produced triples

DEMO site: http://semanticweb.kaist.ac.kr/nlp2rdf Parsing results

Key-Sun Choi - LOD2 Korea

NIF Output

29

NIF Output

Key-Sun Choi - LOD2 Korea

이탈리아에서 공부하고 온 마틴은 한국을 사랑합니다. Martin who came from Italy after studying there loves Korea.

30

Specific Issues of Korean 1. 2. 3. 4.

String Word, Sentence, Phrase,,, Tag ,,,

Ontology: 1. 2. 3. 4.

String Ontology Structured Sentence Ontology (SSO) OLiA Penn

Sejong Tag Set NLP2RDF: Produce Triples

RDF output

1. Korean Tagset 2. Linking with OLiA

Key-Sun Choi - LOD2 Korea

Parser Output

31

superclass

Sejong

OLiA

LinguisticAnnotation/Tag/

LinguisticConcept/MorphosyntacticCategory/

Adverb Adverb/ConjunctiveAdverb

Adverb Adverb and Conjunction/CoordinatingConjunction

MAG SN, XN MM SH, SL IC NA, NF, NN XR NNB, NNG NNP NP SE, SF, SO, SP, SS

Adverb/GeneralAdverb CardinalNumber Determiner ForeignWord Interjection Noun Noun/BaseMorpheme Noun/CommonNoun Noun/ProperNoun Pronoun Symbol

Adverb Quantifier/Numeral PronounOrDeterminer/Determiner Residual/Foreign Interjection Noun Noun/CommonNoun Noun/CommonNoun Noun/ProperNoun PronounOrDeterminer/Pronoun Punctuation

NV, V VA VX VC, VCN, VCP

Verb Verb/Adjective Verb/AuxiliaryPredicate Verb/Copula

Verb Verb and Adjective/PredicativeAdjective Verb/AuxiliaryVerb Verb/FiniteVerb

VV E, JK, XP, XS JC, JX

Verb/VerbalPredicate Particle Particle/AuxiliaryPostposition

Verb MorphologicalCategory/morpheme/

Particle/CaseMarker

MorphologicalCategory/morpheme/MorphologicalParticle

Particle/Prefix Particle/Suffix

MorphologicalCategory/morpheme/prefix MorphologicalCategory/morpheme/suffix

MA MAJ

JKB, JKC, JKG, J KO, JKQ, JKS, JK V XPN XSA, XSN, XSV

EC, EF, EP, ETM, E Particle/VerbalEnding TN

Key-Sun Choi - LOD2 Korea

Tag

MorphologicalCategory/morpheme/MorphologicalParticle

MorphologicalCategory/morpheme/suffix

32

Conclusions: • We presented a framework that allows

• The RDF outcome of our framework is compliant with the NIF (NLP Interchange Format) and the OLiA ontologies to facilitate its combination with other NLP tools • Future: • complete the development of the language-dependent part of the OLiA ontologies, • include the missing features required by NIF, • allow richer SPARQL queries, and • disambiguate the different entities in the text and link them with Wikipedia articles.

Key-Sun Choi - LOD2 Korea

• processing Korean text, • Efficiently producing RDF triples, and • querying the NLP tools outcome

33

Issues

• Josa (postposition case marker) • Korean specific grammatical feature

Sentence : 다크나이트는 미국의 영화이다.

Key-Sun Choi - LOD2 Korea

• DBpedia • How to link between produced triples and DBpedia triples

Sentence : Dark knight is the American movie.

34

Source

• Demo Site : for Korean • http://semanticweb.kaist.ac.kr/nlp2rdf • Demo site : for English • http://nlp2rdf.lod2.eu/demo.php

Key-Sun Choi - LOD2 Korea

• OnTop • https://babbage.inf.unibz.it/trac/obdapublic/wiki/ObdalibPluginIntro

• NLP2RDF • http://nlp2rdf.org 35

Key-Sun Choi, Mun-Yong Yi, In-Young Koh, Younghee Lee (CS/WebST, Knowledge Service Eng., CS/WebST, CS) Tony Veale (Invited Professor, Computational Creativity) Yoon, Yong-Un (research professor, NLP+DB) Martin Rezk (postdoctoral researcher, Logic) Park, Jung-Yeol (researcher, parser) Lee, Jae-Sung (Professor, morphology and word)

Graduate Students: Soon-Gil Hong, Young-Gyun Hahm , Kyungtae Lim, Se-Mi Jang, Youngho Jeong, … http://ko.dbpedia.org/ http://semanticweb.kaist.ac.kr [email protected]

lod2 korea

Source. • OnTop. • https://babbage.inf.unibz.it/trac/obdapublic/wiki/ObdalibPluginIntro. • Demo Site : for Korean. • http://semanticweb.kaist.ac.kr/nlp2rdf. • Demo site : for English. • http://nlp2rdf.lod2.eu/demo.php. • NLP2RDF. • http://nlp2rdf.org. Key-Sun Choi - LOD. 2 Korea. 35 ...

2MB Sizes 1 Downloads 177 Views

Recommend Documents

korea-summary.pdf
The Uruguayan. delegation also had a follow up meeting with Robert Kim, CEO of Iportfolio, a digital book. creator and distribution platform. Contacts had been previously initiated and he presented. updates and new releases of the tool for next year.

korea-summary.pdf
the field of Learning Analytics (LA). Among the more remarkable talk can be highlighted the. National Framework for Educational Data Standards presented by ...

KOREA 1.pdf
Engaging youth to be aware to the global education and to become. the best generation who brings prosperity to the country. Youth Academy program designed ...

Korea oped .pdf
20 January 2018, with both governments, and the two National Olympic Committees. With. this declaration, the IOC made not only the participation of North Korean athletes in. PyeongChang possible, but also the joint march behind one flag at the Openin

US-Korea FTA.pdf
Page 1 of 31. Hart-Landsberg / Capitalism, the KORUS FTA. CAPITALISM, THE KOREA–U.S.. FREE TRADE AGREEMENT,. AND RESISTANCE. Martin Hart-Landsberg. ABSTRACT: Free trade agreements (FTAs) have become an essential part of the corpo- rate effort to es

KOREA 2018.pdf
Page. 1. /. 1. Loading… Page 1 of 1. Page 1 of 1. KOREA 2018.pdf. KOREA 2018.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying KOREA 2018.pdf. Page 1 of 1.

Japan & Korea - Kaleidoscope Far East.pdf
Page 1 of 3. Japan & Korea - Kaleidoscope Far East. Introduction. Experience adelightful combinationof two countries, Japan and South Korea, which offer a. varietyof impressionsin a fascinatingrelationship between traditionand modernity.In. Japan,you

TAC System Korea Detailing compound.pdf
TAC System Korea Detailing compound.pdf. TAC System Korea Detailing compound.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying TAC System ...

Korea pharmaceuticals serialisation policy & national traceability ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Korea pharmaceuticals serialisation policy & national traceability system_KyongjaLee_KPIS_Beijing2016.pdf. K

Our Mobile Planet: Korea Services
appear in local results and leveraging location based services on mobile make it easy ... Spent more time online with their smartphone in the last 6 months. 50%. Base: Private smartphone users who use the Internet in general, Smartphone n= 1.000 ...

1. CsF Korea - Intro.PDF
Nanotechnology and New Materials. • Technology for prevention and mitigation of. • Natural disasters. • Biodiversity and Bio prospection. • Marine Sciences.

1. CsF Korea - Intro.PDF
Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 1. CsF Korea - Intro.PDF. 1. CsF Korea -

Industrial Policy for North Korea
restructuring, new business formation, and export promotion. ..... both banks and their borrower firms needed to change their business behaviour, and at first this ...

Itinerary Korea (10 days).pdf
4. sila layari website ini untuk maklumat lanjut. http://hamydy.wordpress.com/2013/09/16/mt- hallasan-jeju-korea-selatan/. 1.00 pm Makan tengahari 10,000 ...

Online PDF Korea: The Impossible Country
... for foreach in srv users serverpilot apps jujaitaly public index php on line 447 .... ism, the dilemma posed by North Korea, the myths about doing business in ...