Instance-Based Ontology Matching By Instance ...

Viewer
Transcript

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Instance-Based Ontology Matching By Instance Enrichment Balthasar A.C. Schopman – supervisors: Antoine Isaac Shenghui Wang Stefan Schlobach Vrije Universiteit Amsterdam

June 29, 2009

Conclusions

Ontology matching

Instance-based OM

Outline

1

Ontology matching

2

Instance-based OM

3

IBOMbIE

4

Experiments

5

Comparison other OM

6

Conclusions

IBOMbIE

Experiments

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Research questions

General research questions: How do different algorithm design options of IBOMbIE influence the final result? How does the performance of IBOMbIE relate to other OM algorithms?

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Questions from the audience

Crucial questions: please interrupt me. Other questions: after presentation please.

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Introduction

Ontology Definition of an ontology1 : An ontology typically (1) defines a vocabulary relevant in a certain domain of interest, (2) specifies the meaning of terms and (3) specifies relations between terms. Ontologies: controlled vocabulary thesaurus database schema canonical semantic web ontology: a set of typed, interrelated concepts defined in a formal language

1

by Euzenat and Shvaiko

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Introduction

Ontology Definition of an ontology1 : An ontology typically (1) defines a vocabulary relevant in a certain domain of interest, (2) specifies the meaning of terms and (3) specifies relations between terms. Ontologies: controlled vocabulary thesaurus database schema canonical semantic web ontology: a set of typed, interrelated concepts defined in a formal language

1

by Euzenat and Shvaiko

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Introduction

Ontology Matching (OM) Ontologies ... facilitate interoperability between parties do not solve heterogeneity problem, but raise it to a higher level: the OM level Elementary OM techniques: terminological structure-based semantic-based instance-based

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Introduction

Ontology Matching (OM) Ontologies ... facilitate interoperability between parties do not solve heterogeneity problem, but raise it to a higher level: the OM level Elementary OM techniques: terminological structure-based semantic-based instance-based

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Introduction

Instance-based OM (IBOM) Variants IBOM: 1

use dually annotated instances (DAI)

2

create DAI

3

use extension of concepts (DAI not required)

General pros and cons: Con: does not deduce specific relations Con: suitable instances rarely available Pro: focus on active part of ontology Pro: able to deal with ambiguous linguistic phenomena: synonym, homonym

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Introduction

Instance-based OM (IBOM) Variants IBOM: 1

use dually annotated instances (DAI)

2

create DAI

3

use extension of concepts (DAI not required)

General pros and cons: Con: does not deduce specific relations Con: suitable instances rarely available Pro: focus on active part of ontology Pro: able to deal with ambiguous linguistic phenomena: synonym, homonym

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Intro

Definitions of ‘instance of’-relation Example definitions: Canonical semantic web definition Library definition someone:Peter foaf:name

rdf:type

"Peter"

foaf:knows someone:Nate

foaf:Person

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Intro

Definitions of ‘instance of’-relation Example definitions: Canonical semantic web definition Library definition ontology / vocabulary

object o1

c1

c1

c2 c3 ...

object o2 c1

c2 c3

...

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Intro

Application

Two library scenarios: KB and TEL match controlled vocabularies data-sets: book catalogs multi-lingual

Experiments

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

IBOM

IBOM: measuring similarity

c1 c2

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

IBOM: measuring similarity

c1 c2

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

IBOM: measuring similarity

c1 c2

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

IBOM: measuring similarity

c1 c2

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

Jaccard coefficient

Jaccard coefficient: J(c1 , c2 ) =

|i1 ∩ i2 | |i1 ∪ i2 |

quantifies the overlap of the extension of concepts → relatedness between concepts Con: no multi-sets

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

Jaccard coefficient

Jaccard coefficient: J(c1 , c2 ) =

|i1 ∩ i2 | |i1 ∪ i2 |

quantifies the overlap of the extension of concepts → relatedness between concepts Con: no multi-sets

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

Creating dually annotated instances (DAI)

Jaccard needs DAI If DAI unavailable: exact instance matching → merge annotations

approximate instance matching → enrich instances

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IBOM

Creating dually annotated instances (DAI)

Jaccard needs DAI If DAI unavailable: exact instance matching → merge annotations

approximate instance matching → enrich instances

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Instance matching

Approximate instance matching

Instance similarity measures: Lucene vector space model (VSM)

Experiments

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

Basic instance enrichment (IE)

data-set D1

data-set D2

i

i

i2

i1 a i

b

match

A

B i

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

Basic instance enrichment (IE)

data-set D1

data-set D2

i

i

i2

i1 a

b

A

B

i

A

B i

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: topN data-set D1

data-set D2

i

i2 i1 a

i

b

1st match

A

B i3 D

2nd match i4 3rd match

A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: topN data-set D1

data-set D2

i

i2 i1 A

B

a

b

i3

A

B

D

i

i4 A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: topN data-set D1

data-set D2

i

i2 i1 A

B

a

b

i3

A

B

D

i

i4

D A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: topN data-set D1

data-set D2

i

i2 i1 A

B

a

b

i3

A

B

D

i

i4

D A

C

A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: similarity threshold (ST) data-set D1

data-set D2

i

i2 i1 a

i

b

sim(i1,i2) = 0.8

A

i3

sim(i1,i3) = 0.4 sim(i1,i4) = 0.2

B

D i4 A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: similarity threshold (ST) data-set D1

data-set D2

i

i2 i1

i

a

b

A

B

sim(i1,i2) = 0.8

A

i3

sim(i1,i3) = 0.4 sim(i1,i4) = 0.2

B

D i4 A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: similarity threshold (ST)

data-set D1

data-set D2

i

i2 i1

i

a

b

A

B

sim(i1,i2) = 0.8

A

i3

sim(i1,i3) = 0.4

D i4

D sim(i1,i4) = 0.2

B

A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Enriching instances

IE parameter: similarity threshold (ST)

data-set D1

data-set D2

i

i2 i1 a

b

A

B

i

sim(i1,i2) = 0.8

A

i3

sim(i1,i3) = 0.4

D i4

D A

C

sim(i1,i4) = 0.2

B

A

C

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Experimental questions

Experimental questions

Instance similarity measure topN parameter ST parameter combining topN + ST parameters performance as compared to other OM algorithms

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Evaluation

Alignment evaluation

Methods: Gold standard := good alignment Reindexing

Measures: Precision Recall f-measure

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: instance similarity measure - quality

1

1 P VSM R VSM F VSM P Lucene R Lucene F Lucene

0.8

0.6

performance

performance

0.8

0.4

0.2

P VSM R VSM F VSM P Lucene R Lucene F Lucene

0.6

0.4

0.2

0 10

100

1000

10000

100000

1e+06

0 100

mapping rank

1000

10000 mapping rank

(a) Gold standard

(b) Reindex

Virtually equal

100000

1e+06

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: instance similarity measure - quality

1

1 precision VSM precision Lucene

0.8

0.6

0.6

overlap

performance

0.8

0.4

0.2

0.4

0.2

0

0 1

10

100

1000

10000

100000

1e+06

0

500

mapping rank

(c) Overlap

1000

1500

2000

2500

3000

3500

4000

mapping rank

(d) Manual Evaluation

Edge to VSM

4500

5000

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: instance similarity measure - run-time time to enrich 100K instances (hrs:min) Lucene VSM 1:04 0:17 7:20 0:22 26:15 0:32 (e) stats

1600 VSM Lucene 1400

1200

increase run-time

amount indexed instances 524K 1,457K 2,506K

1000

800

600

400

200

0 4

6

8

10

12

14

16

18

20

indexed documents * 100K

(f) figure it out

Optimizations VSM: pre-calculate weights indexed documents purge insignificant weights (35% + 50%) word centered indexing approach

22

24

26

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: instance similarity measure - run-time time to enrich 100K instances (hrs:min) Lucene VSM 1:04 0:17 7:20 0:22 26:15 0:32 (g) stats

1600 VSM Lucene 1400

1200

increase run-time

amount indexed instances 524K 1,457K 2,506K

1000

800

600

400

200

0 4

6

8

10

12

14

16

18

20

indexed documents * 100K

(h) figure it out

Optimizations VSM: pre-calculate weights indexed documents purge insignificant weights (35% + 50%) word centered indexing approach

22

24

26

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: topN parameter (TEL)

As N increases, quality of mappings decrease 0.45

0.25 top1 (baseline) top2 top3 top4 top5 top6

0.4

0.2

0.35

top1 (baseline) top2 top3 top4 top5 top6

0.3

f-measure

f-measure

0.15 0.25

0.2

0.1 0.15

0.1

0.05

0.05

0 1

10

100

1000

10000

mapping rank

(i) Gold standard

100000

1e+06

0 100

1000

10000 mapping rank

(j) Reindex

100000

1e+06

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: similarity threshold parameter (KB) Best performance with ST: ST=µ Best performance: baseline (topN=1, ST=∞) 0.6

0.4 baseline T=mean-1.5s T=mean-s T=mean-.5s T=mean T=mean+.5s T=mean+s T=mean+1.5s

0.5

0.35

0.3

baseline T=mean-1.5s T=mean-s T=mean-.5s T=mean T=mean+.5s T=mean+s T=mean+1.5s

0.4

f-measure

f-measure

0.25

0.3

0.2

0.15 0.2 0.1 0.1 0.05

0 10

100

1000

10000 mapping rank

(k) Gold standard

100000

1e+06

0 100

1000

10000 mapping rank

(l) Reindex

100000

1e+06

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions

Results of experiments

Results: combining parameters Using both parameters performs good in TEL, not in KB... possibly due to: more selective IBOMbIE pays off in TEL, because vocabularies + instance annotations are more different than in KB scenario. 0.4

0.35

0.3

0.3 baseline topN=1 ST=mu-0.5s topN=1 ST=mu topN=1 ST=mu+0.5s topN=2 ST=mu-0.5s topN=2 ST=mu topN=2 ST=mu+0.5s topN=3 ST=mu-0.5s topN=3 ST=mu

0.25

baseline topN=1 ST=mu-0.5s topN=1 ST=mu topN=1 ST=mu+0.5s topN=2 ST=mu-0.5s topN=2 ST=mu topN=2 ST=mu+0.5s topN=3 ST=mu topN=3 ST=mu+0.5s

0.2

f-measure

f-measure

0.25

0.2

0.15

0.15 0.1 0.1 0.05 0.05

0 100

1000

10000

100000

mapping rank

(m) KB

(evaluation method: reindexing)

1e+06

0 100

1000

10000 mapping rank

(n) TEL

100000

1e+06

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

OAEI

Ontology alignment evaluation initiative (OAEI)

DSSim Lily TaxoMap IBOMbIE

terminological X X X ✗

structurebased X X X ✗

semanticbased X X X ✗

instancebased ✗ ✗ ✗ X

DSSim, Lily and TaxoMap: consider KB ontologies “huge” feature functionality to deal with large ontologies

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

OAEI

Performance comparison: quality 0.8 P IBOMbIE topN=1 R IBOMbIE topN=1 P DSSim R DSSim P Lily R Lily P TaxoMap R TaxoMap

0.7

0.6

performance

0.5

0.4

0.3

0.2

0.1

0 0

2000

4000

6000 mapping rank

8000

10000

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

OAEI

Performance comparison: resources + coverage

matcher DSSim Lily TaxoMap IBOMbIE

run-time 12:00 ? 2:40 1:54

amount mappings 2930 2797 1851 7000+

(Amount lexically equal concepts KB vocabulaires = 2,895)

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Conclusions + discussion

IBOMbIE algorithm is quite promising: Relatively low run-time Able to deal with large ontologies Amount + quality of mappings Pros of IBOM Able to align ontologies using disjunct data-sets

Basic instance enrichment appears best performing method. Possible cause: Jaccard coefficient does not support multi-sets.

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Fin

Thank you... any questions ?

Experiments

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Vocabularies

KB TEL

vocabulary GTT Brinkman LCSH Rameau SWD

size 35K 5K 340K 155K 805K

Experiments

Comparison other OM

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IE parameter: similarity threshold (ST)

KB TEL

standard ST: µ step-size: 21 σ

D1 annotated with O1 O2 O1 O2

D2 annotated with O2 O1 O2 O1

µ 0.297 0.279 0.260 0.232

σ 0.106 0.101 0.097 0.084

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

VSM Weights are components of vectors: term frequency - inverse document frequency: TF-IDF e.g. audiovisual features tfidfw ,d = tfw ,d ∗ idfw √ nw ,d tfw ,d = |d| idfw = log VSM cosine similarity

|D| |d ∈ D : w ∈ d|

Pn wi ,d wi ,d d~1 · d~2 cosine sim(d1 , d2 ) = = qP i =1 q1P 2 |d~1 ||d~2 | w2 w2 i

i ,d1

i

i ,d2

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Evaluation method: gold standard

Gold standard := good alignment P = precision = R = recall =

|{reference} ∩ {retrieved}| |{retrieved}|

|{reference} ∩ {retrieved}| |{reference}|

F = f − measure = 2 ∗

P ∗R P +R

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

Evaluation method: reindexing o_1

o_2

a

x

b

y

c

z

instance i_dual {a, b} {x}

instance i_dual reindex

{x, z} {a, b}

P=

Pdually

annotated instances |{reference}∩{retrieved}| |{retrieved}|

R=

Pdually

annotated instances |{reference}∩{retrieved}| |{reference}|

|{reindexed instances}|

Conclusions

Ontology matching

Instance-based OM

IBOMbIE

Experiments

Comparison other OM

IbOM by IM algorithm overview

Whole algorithm Start: two data-sets Dx and Dy 1

Enrich instances of Dx with annotations of instances of Dy For every instance a: 1 2

Find N best matching instances {b} in Dy Add annotations of {b} to a

2

Enrich vice versa

3

Merge data-sets into one dually annotated data-set

4

Apply Jaccard measure

Conclusions