Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Instance-Based Ontology Matching By Instance Enrichment Balthasar A.C. Schopman – supervisors: Antoine Isaac Shenghui Wang Stefan Schlobach Vrije Universiteit Amsterdam

June 29, 2009

Conclusions

Ontology matching

Instance-based OM

Outline

1

Ontology matching

2

Instance-based OM

3

IBOMbIE

4

Experiments

5

Comparison other OM

6

Conclusions

IBOMbIE

Experiments

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Research questions

General research questions: How do different algorithm design options of IBOMbIE influence the final result? How does the performance of IBOMbIE relate to other OM algorithms?

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Questions from the audience

Crucial questions: please interrupt me. Other questions: after presentation please.

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Introduction

Ontology Definition of an ontology1 : An ontology typically (1) defines a vocabulary relevant in a certain domain of interest, (2) specifies the meaning of terms and (3) specifies relations between terms. Ontologies: controlled vocabulary thesaurus database schema canonical semantic web ontology: a set of typed, interrelated concepts defined in a formal language

1

by Euzenat and Shvaiko

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Introduction

Ontology Definition of an ontology1 : An ontology typically (1) defines a vocabulary relevant in a certain domain of interest, (2) specifies the meaning of terms and (3) specifies relations between terms. Ontologies: controlled vocabulary thesaurus database schema canonical semantic web ontology: a set of typed, interrelated concepts defined in a formal language

1

by Euzenat and Shvaiko

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Introduction

Ontology Matching (OM) Ontologies ... facilitate interoperability between parties do not solve heterogeneity problem, but raise it to a higher level: the OM level Elementary OM techniques: terminological structure-based semantic-based instance-based

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Introduction

Ontology Matching (OM) Ontologies ... facilitate interoperability between parties do not solve heterogeneity problem, but raise it to a higher level: the OM level Elementary OM techniques: terminological structure-based semantic-based instance-based

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Introduction

Instance-based OM (IBOM) Variants IBOM: 1

use dually annotated instances (DAI)

2

create DAI

3

use extension of concepts (DAI not required)

General pros and cons: Con: does not deduce specific relations Con: suitable instances rarely available Pro: focus on active part of ontology Pro: able to deal with ambiguous linguistic phenomena: synonym, homonym

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Introduction

Instance-based OM (IBOM) Variants IBOM: 1

use dually annotated instances (DAI)

2

create DAI

3

use extension of concepts (DAI not required)

General pros and cons: Con: does not deduce specific relations Con: suitable instances rarely available Pro: focus on active part of ontology Pro: able to deal with ambiguous linguistic phenomena: synonym, homonym

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Intro

Definitions of ‘instance of’-relation Example definitions: Canonical semantic web definition Library definition someone:Peter foaf:name

rdf:type

"Peter"

foaf:knows someone:Nate

foaf:Person

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Intro

Definitions of ‘instance of’-relation Example definitions: Canonical semantic web definition Library definition ontology / vocabulary

object o1

c1

c1

c2 c3 ...

object o2 c1

c2 c3

...

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Intro

Application

Two library scenarios: KB and TEL match controlled vocabularies data-sets: book catalogs multi-lingual

Experiments

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

IBOM

IBOM: measuring similarity

c1 c2

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

IBOM: measuring similarity

c1 c2

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

IBOM: measuring similarity

c1 c2

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

IBOM: measuring similarity

c1 c2

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

Jaccard coefficient

Jaccard coefficient: J(c1 , c2 ) =

|i1 ∩ i2 | |i1 ∪ i2 |

quantifies the overlap of the extension of concepts → relatedness between concepts Con: no multi-sets

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

Jaccard coefficient

Jaccard coefficient: J(c1 , c2 ) =

|i1 ∩ i2 | |i1 ∪ i2 |

quantifies the overlap of the extension of concepts → relatedness between concepts Con: no multi-sets

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

Creating dually annotated instances (DAI)

Jaccard needs DAI If DAI unavailable: exact instance matching → merge annotations

approximate instance matching → enrich instances

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

Creating dually annotated instances (DAI)

Jaccard needs DAI If DAI unavailable: exact instance matching → merge annotations

approximate instance matching → enrich instances

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Instance matching

Approximate instance matching

Instance similarity measures: Lucene vector space model (VSM)

Experiments

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

Basic instance enrichment (IE)

data-set D1

data-set D2

i

i

i2

i1 a i

b

match

A

B i

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

Basic instance enrichment (IE)

data-set D1

data-set D2

i

i

i2

i1 a

b

A

B

i

A

B i

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: topN data-set D1

data-set D2

i

i2 i1 a

i

b

1st match

A

B i3 D

2nd match i4 3rd match

A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: topN data-set D1

data-set D2

i

i2 i1 A

B

a

b

i3

A

B

D

i

i4 A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: topN data-set D1

data-set D2

i

i2 i1 A

B

a

b

i3

A

B

D

i

i4

D A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: topN data-set D1

data-set D2

i

i2 i1 A

B

a

b

i3

A

B

D

i

i4

D A

C

A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: similarity threshold (ST) data-set D1

data-set D2

i

i2 i1 a

i

b

sim(i1,i2) = 0.8

A

i3

sim(i1,i3) = 0.4 sim(i1,i4) = 0.2

B

D i4 A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: similarity threshold (ST) data-set D1

data-set D2

i

i2 i1

i

a

b

A

B

sim(i1,i2) = 0.8

A

i3

sim(i1,i3) = 0.4 sim(i1,i4) = 0.2

B

D i4 A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: similarity threshold (ST)

data-set D1

data-set D2

i

i2 i1

i

a

b

A

B

sim(i1,i2) = 0.8

A

i3

sim(i1,i3) = 0.4

D i4

D sim(i1,i4) = 0.2

B

A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: similarity threshold (ST)

data-set D1

data-set D2

i

i2 i1 a

b

A

B

i

sim(i1,i2) = 0.8

A

i3

sim(i1,i3) = 0.4

D i4

D A

C

sim(i1,i4) = 0.2

B

A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Experimental questions

Experimental questions

Instance similarity measure topN parameter ST parameter combining topN + ST parameters performance as compared to other OM algorithms

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Evaluation

Alignment evaluation

Methods: Gold standard := good alignment Reindexing

Measures: Precision Recall f-measure

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: instance similarity measure - quality

1

1 P VSM R VSM F VSM P Lucene R Lucene F Lucene

0.8

0.6

performance

performance

0.8

0.4

0.2

P VSM R VSM F VSM P Lucene R Lucene F Lucene

0.6

0.4

0.2

0 10

100

1000

10000

100000

1e+06

0 100

mapping rank

1000

10000 mapping rank

(a) Gold standard

(b) Reindex

Virtually equal

100000

1e+06

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: instance similarity measure - quality

1

1 precision VSM precision Lucene

0.8

0.6

0.6

overlap

performance

0.8

0.4

0.2

0.4

0.2

0

0 1

10

100

1000

10000

100000

1e+06

0

500

mapping rank

(c) Overlap

1000

1500

2000

2500

3000

3500

4000

mapping rank

(d) Manual Evaluation

Edge to VSM

4500

5000

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: instance similarity measure - run-time time to enrich 100K instances (hrs:min) Lucene VSM 1:04 0:17 7:20 0:22 26:15 0:32 (e) stats

1600 VSM Lucene 1400

1200

increase run-time

amount indexed instances 524K 1,457K 2,506K

1000

800

600

400

200

0 4

6

8

10

12

14

16

18

20

indexed documents * 100K

(f) figure it out

Optimizations VSM: pre-calculate weights indexed documents purge insignificant weights (35% + 50%) word centered indexing approach

22

24

26

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: instance similarity measure - run-time time to enrich 100K instances (hrs:min) Lucene VSM 1:04 0:17 7:20 0:22 26:15 0:32 (g) stats

1600 VSM Lucene 1400

1200

increase run-time

amount indexed instances 524K 1,457K 2,506K

1000

800

600

400

200

0 4

6

8

10

12

14

16

18

20

indexed documents * 100K

(h) figure it out

Optimizations VSM: pre-calculate weights indexed documents purge insignificant weights (35% + 50%) word centered indexing approach

22

24

26

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: topN parameter (TEL)

As N increases, quality of mappings decrease 0.45

0.25 top1 (baseline) top2 top3 top4 top5 top6

0.4

0.2

0.35

top1 (baseline) top2 top3 top4 top5 top6

0.3

f-measure

f-measure

0.15 0.25

0.2

0.1 0.15

0.1

0.05

0.05

0 1

10

100

1000

10000

mapping rank

(i) Gold standard

100000

1e+06

0 100

1000

10000 mapping rank

(j) Reindex

100000

1e+06

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: similarity threshold parameter (KB) Best performance with ST: ST=µ Best performance: baseline (topN=1, ST=∞) 0.6

0.4 baseline T=mean-1.5s T=mean-s T=mean-.5s T=mean T=mean+.5s T=mean+s T=mean+1.5s

0.5

0.35

0.3

baseline T=mean-1.5s T=mean-s T=mean-.5s T=mean T=mean+.5s T=mean+s T=mean+1.5s

0.4

f-measure

f-measure

0.25

0.3

0.2

0.15 0.2 0.1 0.1 0.05

0 10

100

1000

10000 mapping rank

(k) Gold standard

100000

1e+06

0 100

1000

10000 mapping rank

(l) Reindex

100000

1e+06

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: combining parameters Using both parameters performs good in TEL, not in KB... possibly due to: more selective IBOMbIE pays off in TEL, because vocabularies + instance annotations are more different than in KB scenario. 0.4

0.35

0.3

0.3 baseline topN=1 ST=mu-0.5s topN=1 ST=mu topN=1 ST=mu+0.5s topN=2 ST=mu-0.5s topN=2 ST=mu topN=2 ST=mu+0.5s topN=3 ST=mu-0.5s topN=3 ST=mu

0.25

baseline topN=1 ST=mu-0.5s topN=1 ST=mu topN=1 ST=mu+0.5s topN=2 ST=mu-0.5s topN=2 ST=mu topN=2 ST=mu+0.5s topN=3 ST=mu topN=3 ST=mu+0.5s

0.2

f-measure

f-measure

0.25

0.2

0.15

0.15 0.1 0.1 0.05 0.05

0 100

1000

10000

100000

mapping rank

(m) KB

(evaluation method: reindexing)

1e+06

0 100

1000

10000 mapping rank

(n) TEL

100000

1e+06

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

OAEI

Ontology alignment evaluation initiative (OAEI)

DSSim Lily TaxoMap IBOMbIE

terminological X X X ✗

structurebased X X X ✗

semanticbased X X X ✗

instancebased ✗ ✗ ✗ X

DSSim, Lily and TaxoMap: consider KB ontologies “huge” feature functionality to deal with large ontologies

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

OAEI

Performance comparison: quality 0.8 P IBOMbIE topN=1 R IBOMbIE topN=1 P DSSim R DSSim P Lily R Lily P TaxoMap R TaxoMap

0.7

0.6

performance

0.5

0.4

0.3

0.2

0.1

0 0

2000

4000

6000 mapping rank

8000

10000

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

OAEI

Performance comparison: resources + coverage

matcher DSSim Lily TaxoMap IBOMbIE

run-time 12:00 ? 2:40 1:54

amount mappings 2930 2797 1851 7000+

(Amount lexically equal concepts KB vocabulaires = 2,895)

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions + discussion

IBOMbIE algorithm is quite promising: Relatively low run-time Able to deal with large ontologies Amount + quality of mappings Pros of IBOM Able to align ontologies using disjunct data-sets

Basic instance enrichment appears best performing method. Possible cause: Jaccard coefficient does not support multi-sets.

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Fin

Thank you... any questions ?

Experiments

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Vocabularies

KB TEL

vocabulary GTT Brinkman LCSH Rameau SWD

size 35K 5K 340K 155K 805K

Experiments

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IE parameter: similarity threshold (ST)

KB TEL

standard ST: µ step-size: 21 σ

D1 annotated with O1 O2 O1 O2

D2 annotated with O2 O1 O2 O1

µ 0.297 0.279 0.260 0.232

σ 0.106 0.101 0.097 0.084

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

VSM Weights are components of vectors: term frequency - inverse document frequency: TF-IDF e.g. audiovisual features tfidfw ,d = tfw ,d ∗ idfw √ nw ,d tfw ,d = |d| idfw = log VSM cosine similarity

|D| |d ∈ D : w ∈ d|

Pn wi ,d wi ,d d~1 · d~2 cosine sim(d1 , d2 ) = = qP i =1 q1P 2 |d~1 ||d~2 | w2 w2 i

i ,d1

i

i ,d2

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Evaluation method: gold standard

Gold standard := good alignment P = precision = R = recall =

|{reference} ∩ {retrieved}| |{retrieved}|

|{reference} ∩ {retrieved}| |{reference}|

F = f − measure = 2 ∗

P ∗R P +R

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Evaluation method: reindexing o_1

o_2

a

x

b

y

c

z

instance i_dual {a, b} {x}

instance i_dual reindex

{x, z} {a, b}

P=

Pdually

annotated instances |{reference}∩{retrieved}| |{retrieved}|

R=

Pdually

annotated instances |{reference}∩{retrieved}| |{reference}|

|{reindexed instances}|

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IbOM by IM algorithm overview

Whole algorithm Start: two data-sets Dx and Dy 1

Enrich instances of Dx with annotations of instances of Dy For every instance a: 1 2

Find N best matching instances {b} in Dy Add annotations of {b} to a

2

Enrich vice versa

3

Merge data-sets into one dually annotated data-set

4

Apply Jaccard measure

Conclusions

Instance-Based Ontology Matching By Instance ...

Jun 29, 2009 - Results: instance similarity measure - run-time amount time to enrich 100K indexed instances (hrs:min) instances Lucene VSM. 524K. 1:04.

1MB Sizes 1 Downloads 187 Views

Recommend Documents

Instance-Based Ontology Matching By Instance ...
Jul 17, 2009 - 1.6 Real-world OM scenarios. To empirically test our method we apply IBOMbIE to two real-world OM scenar- ios: the KB and TEL scenarios. ...... of a word w in a document d, which is part of a data-set D. The TF-IDF weight. 1http://www.

Instance-Based Ontology Matching By Instance ...
Jul 17, 2009 - houses, e-commerce and semantic query processing. The use of ..... Lucene2 is a well-known open-source text indexing and search engine. ..... section 6.2.7 we list the system specifications of the platform the experiments.

A Semantic-Based Ontology Matching Process for PDMS
and Ana Carolina Salgado1. 1 Federal University of ..... In: International Conference on Management of Data (SIGMOD), Software. Demonstration (2005). 7.

A Tool for Matching Ontology-based Schemas
matching techniques have been used to determine schema mappings between .... the semantic correspondence's weight, database credentials, and linguistic- ...

Ontology Matching and Schema Integration using Node ...
Department of Computer Science and Engineering ... best lexical analysis can be used to derive the node ranks. .... UG Courses (0.69) + 0.2*0.86 = (0.808).

Actively Learning Ontology Matching via User Interaction
The user-centric social Web brings big challenges to the ontology matching field, at the same time, it also provides opportuni- ties to solve the matching problem.

Actively Learning Ontology Matching via User Interaction
1 Department of Computer Science and Technology. Tsinghua National .... Second, it finds the match (eS,eD) whose similarity degree is closest to θ, and let a ...

Instance Painter Manual.pdf
Animate trees and grass with efficient wind animations. Alternative rendering method with built-in culling, billboarding and LOD management. Comes with fully textured tree and grass models. QUICKSTART GUIDE. In this quickstart, we will paint some tre

Extending an Ontology Editor for Domain-related Ontology Patterns ...
Reuse: An Application in the Collaboration Domain.pdf. Extending an Ontology Editor for Domain-related Ontolog ... Reuse: An Application in the Collaboration ...

Ontology Standards -
... Icon, Index, Symbol. For further discusion, see http://jfsowa.com/pubs/signs.pdf .... Microscopes, telescopes, and TV use enhanced methods of perception and ...

Extending an Ontology Editor for Domain-related Ontology Patterns ...
Extending an Ontology Editor for Domain-related Ontolo ... Reuse: An Application in the Collaboration Domain.pdf. Extending an Ontology Editor for ...

PowerPoint Presentation - Instance-level Multiple ...
denote a selection option by . ➢It can be viewed ... features from the original 3- dimensional data, a possible option is: 6 ... graph Laplacian and its degree matrix.

Experiences of ontology engineering with YAMATO by Miz.pdf ...
Experiences of ontology engineering with YAMATO by Miz.pdf. Experiences of ontology engineering with YAMATO by Miz.pdf. Open. Extract. Open with. Sign In.

A Multi-scale Multiple Instance Video Description Network
tor, corresponding to activations of N high-level concepts. A recurrent network accepts such semantic vectors from all frames in the video, and then decodes the resulting state into the output sentence. Unlike previous approaches that used a single-s

Crop Ontology- Brochure.pdf
Page 1 of 2. Bioversity International. Bioversity International delivers scientific. evidence, management practices and policy. options to use and safeguard agricultural and tree. biodiversity to attain sustainable global food and. nutrition security

Ontology acquisition process
This work is part of a larger project to build ontologies .... Running in a powerful server,. Wmatrix is ... 10 groups: culture, formation, glossary, help, game law,.

Instance-Level Label Propagation with Multi ... - Research at Google
graph based on example-to-example similarity, as- suming that the resulting ... row: image label, 'apple', gets wrongly propagated from (b) to (c). Bottom row: ...

A Synthesis Instance Pruning Approach Based on ... - Semantic Scholar
Department of Computer Science, Ocean University of China, Qingdao 266100, China;. 2. Department of ..... the loss of dynamic coverage after deleting the unit's leaf that ..... uniform instances is in some degree made up by virtual non-uniform ...

Context-Aware Multi-instance Learning Based on ...
security [7], face detection [8][9], visual tracking [18] and ... are trained in batch settings, in which whole training set ...... for computer security applications.

Tree Pattern Matching to Subset Matching in Linear ...
'U"cdc f f There are only O ( ns ) mar k ed nodes#I with the property that all nodes in either the left subtree ofBI or the right subtree ofBI are unmar k ed; this is ...

New Brunswick increases tourism by matching audiences to ...
CTHL needed support for the seasonal marketing campaign. They wanted to use analytics to evaluate its performance, and also influence decisions for future campaigns. Since the window to attract visitors is both short and competitive, they needed a pl