SemMatcher: A Tool for Matching Ontology-based Schemas Carlos Eduardo Pires1, Damires Souza2, Thiago Pachêco1, Ana Carolina Salgado1 1

Centro de Informática, Universidade Federal de Pernambuco, Recife, PE, Brazil 2

Instituto Federal de Educação, Ciência e Tecnologia da Paraíba/IFPB, Brazil {cesp, tpap, acs}@cin.ufpe.br, [email protected]

Abstract. In Peer Data Management Systems (PDMS), each peer is an autonomous source that makes available a local schema. Information exchange occurs through the establishment of schema mappings between local schemas. To help matters, ontologies have been considered as uniform representation of local schemas (i.e., peer ontologies). Consequently, ontology matching techniques have been used to determine schema mappings between peer ontologies. This work presents SemMatcher, a tool for matching ontology-based schemas. SemMatcher allows the identification of semantic correspondences between two peer ontologies using a domain ontology as background knowledge. Also, the tool determines a global similarity measure between the matching ontologies that can be used for peer clustering.

1. Introduction The development of communication infrastructures has led to a wide range of data sources being available through networks such as Peer Data Management Systems (PDMS) [Adjiman et al., 2007]. In these settings, each peer is an autonomous source that makes available a local schema. Schema mappings, i.e., correspondences between schema elements, are generated to allow information exchange between peers. To help matters, ontologies have been considered as uniform representation of peer schemas (referred here as peer ontologies) and, consequently, as a means for enhancing information integration [Xiao, 2006]. Peer ontologies are designed and developed autonomously, what entails several forms of heterogeneity between them, even between those on the same domain [Euzenat and Shvaiko, 2007]. Reconciling such ontologies and finding correspondences between their elements (concepts or properties) through ontology matching techniques is still a relevant research issue. In order to identify such correspondences, some works have used semantic additional descriptions, called background knowledge (e.g., domain ontologies) [Sabou et al., 2006]. Resulting correspondences between peer ontologies elements are usually associated with a similarity value which expresses the level of confidence on the correspondence. Particularly, in a PDMS, it is also important to have a global measure representing the overall similarity degree between two peer ontologies (and not only between their elements). For instance, the measure can be used to cluster semantically similar peers in the overlay network [Pires, 2009].

In this light, we present SemMatcher, a semantic-based ontology matching tool. Its main contributions are twofold: i) in order to identify semantic correspondences between two peer ontologies, SemMatcher takes into account a domain ontology as background knowledge; and ii) the tool determines a global similarity measure between the matching ontologies. This paper is organized as follows: Section 2 explains the ontology matching process. Section 3 presents an instantiation of a process running in SemMatcher. Finally, in Section 4, we conclude our work giving some perspective about future works.

2. A Process for Matching Ontology-based Schemas In our semantic-based approach, the process for matching peer ontologies brings together a combination of already defined matching strategies. In this process, a linguistic-structural matcher and a semantic matcher are executed in parallel. The obtained similarity values of both matchers are combined through a weighted average. Each matcher receives a particular weight according to its importance in the matching process. As shown in Figure 1, the process receives as input two ontologies (O1 and O2) and a domain ontology DO to be used as background knowledge. As output, it may produce one or two alignments (i.e., the semantic and final alignments, also called ASE (Phase 1) and AFI, (Phase 2), respectively), and a global similarity measure between the involved ontologies.

Figure 1. The overall process for matching ontology-based schemas

The main steps carried out by the semantic-based ontology matching process are: (1) Linguistic-Structural Matching In this step, any existing ontology matching tool including linguistic and/or structural matchers can be used. Such linguistic and structural matchers are handled as a hybrid matcher, i.e., as a fixed combination of simple matchers. The combination of their similarity values depends on the composition strategy of the ontology matching tool that is used. The alignment produced by the hybrid matcher is denoted by ALS. (2) Semantic Matching

Using a DO, the semantic matcher applies a set of semantic rules described in Section 2.1 to derive the type of semantic correspondences between O1 and O2. The resulting alignment is denoted by ASE. (3) Similarity Combination The individual similarity values of the correspondences produced by the hybrid matcher and the semantic matcher are associated in a combined similarity value, through a weighted average of the values generated by the individual matchers. A weighted average is used because matchers may produce opposing similarity values. (4) Correspondence Ranking Correspondences containing the elements of O1 are ranked (in descending order) according to the associated similarity values. (5) Correspondence Selection A filter strategy is applied to choose the most suitable correspondence for each O1 element. The strategy consists in selecting the correspondence with the highest combined similarity. As a result, an alignment A12 is generated. Steps 4 and 5 are also executed in the opposite direction. Correspondences containing the elements of O2 are ranked according to the associated similarity values and the same filter strategy is applied. An alignment A21 is then produced. The final alignment (AFI) is obtained as the union between the alignments A12 and A21. At the end of the matching process, three goals are possible to be achieved: A. Generating only the semantic alignment (ASE): in this option (Phase 1), only the alignment produced by the semantic matcher is shown. The alignment can be used for query reformulation purposes [Souza et al., 2009]; B. Generating the final alignment (AFI): in this option (Phases 1 and 2), the resulting set of correspondences identified by the linguistic-structural matcher is combined with the correspondences produced by the semantic matcher. The final alignment can be used as a schema mapping between two peer ontologies [Pires, 2009]; C. Calculating the global similarity measure: in this case (Phases 1 and 2), the global measure between O1 and O2 is generated. The measure can be used to cluster semantically similar peers in a PDMS [Pires, 2009]. 2.1 Semantic Matching In our work, we consider domain ontologies (DO) as reliable references that are made available on the Web. We use them in order to bridge the conceptual differences or similarities between two peer ontologies. In this sense, first concepts and properties from the two peer ontologies are mapped to equivalent concepts/properties in the DO and then their semantic correspondence is inferred based on the existing semantic relationship between the DO elements. To specify the correspondences, we take into account four aspects: i) the semantic knowledge found in the DO; ii) if the peer ontology concepts share super-concepts in the DO; iii) if these super-concepts are different from the root concept; and iv) the depth of concepts measured in number of nodes.

We have defined seven types of semantic correspondences [Souza et al., 2009], each one associated with a particular weight which corresponds to the level of confidence on such correspondence, as follows: isEquivalentTo (1.0), isSubConceptOf (0.8), isSuperConceptOf (0.8), isCloseTo (0.7), isPartOf (0.3), isWholeOf (0.3), and isDisjointWith (0.0). Particularly, the isCloseTo correspondence denotes a strong semantic relationship holding between two concepts. We added this correspondence type as a way to enrich queries and provide users with approximate answers. 2.2. Calculating the Global Similarity Measure The evaluation of the overall similarity between the ontologies O1 and O2 is an additional step in the proposed ontology matching process. Such step takes as input the alignments A12 and A21 (Step 5) to calculate the similarity value. The value indicates the global similarity degree between the ontologies. Existing similarity measures such as dice [Aumüller et al., 2005], weighted [Castano et al., 1998] and overlap [Rijsbergen, 1979] can be adapted to calculate the global similarity degree (Figure 2). They consider the size of the input ontologies. In this work, the size of an ontology is determined by the number of its elements and is denoted by |O|.

Figure 2. The measures used to determine the overall similarity between ontologies

3. SemMatcher In the following, we describe the tool in practice throughout an example. In such example, we use ontologies belonging to Education’s knowledge domain: Semiport.owl (O1), UnivBench.owl (O2), and UnivCsCMO.owl (DO). All ontologies can be found in the tool’s web site1 as well as some experimental results obtained with SemMatcher. The SemMatcher interface is presented in Figure 3. First, the user must choose the two ontologies (O1 and O2) to be matched as well as the domain ontology (DO) which will be used as background knowledge. After that, the user must press the “Run” button and options regarding the possible outputs (illustrated in Figure 4) are presented (semantic alignment, global similarity measure, and final alignment). In the example, all three options have been selected indicating that the whole matching process (i.e., Phases 1 and 2) must be executed. Once the output is indicated, the matching process is executed and the selected outputs are shown. In this version, the user is allowed to use H-Match2 and Alignment API3 to generate a structural-linguistic alignment. However, s/he can also use an alignment generated by any other ontology matching tool that uses the alignment format described in [OAEI, 2009]. In the example, H-Match has been chosen. The semantic alignment is shown in Figure 3 while the final alignment and the global measure are depicted in Figure 5. For the sake of space, not all correspondences are shown. 1

http://www.cin.ufpe.br/~speed/SemMatch/index.htm http://islab.dico.unimi.it/hmatch/ 3 http://alignapi.gforge.inria.fr/ 2

Figure 3. SemMatcher’s main screen

Figure 4. SemMatcher’s output options

Figure 5. SemMatcher’s results (Phase 2): final alignment and global similarity measure

All alignments generated during the matching process are saved in an operating system’s folder that can be configured in the “File Preferences Directory Options” menu. The alignments can also be stored in a MySQL database. Other parameters (e.g. the semantic correspondence’s weight, database credentials, and linguistic-structural

matcher) can also be configured in the Preferences menu. After the matching process is finished, the user can compare the final alignment against a reference alignment. The reference alignment must be manually generated by expert users which are knowledgeable about the corresponding domain and must be represented in the format defined by [OAEI, 2009]. The final and reference alignments are compared using the measures suggested by OAEI: Precision, Recall, Fallout, oMeasure, and fMeasure. Figure 5 depicts the results obtained for the two ontologies of our example. SemMatcher has been implemented in Java. In order to provide ontology manipulation and reasoning, we have used Jena4 and OWL API5.

4. Conclusions In highly dynamic environments such as PDMS, the semantics surrounding correspondences among ontologies representing peer schemas is rather important for tasks such as query answering or peer clustering. This work has presented a tool for matching ontology-based peer schemas, combining different matching strategies (e.g., linguistic, structural, and semantic). Furthermore, as a result of the overall process, we have introduced the determination of a global similarity measure between the matching ontologies. Such measure is used to semantically cluster peers in our PDMS. Currently, we are extending SemMatcher by considering properties both in the correspondences identification and in the determination of the global similarity measure.

References Adjiman, P., Goasdoué, F., Rousset, M.-C. (2007) “SomeRDFS in the Semantic Web”, In: Journal on Data Semantics, LNCS, Vol. 8, pp. 158-181. Aumüller, D., Do, H. H., Massmann, S., Rahm, E. (2005) “Schema and Ontology Matching with COMA++”. In: International Conference on Management of Data (SIGMOD), Software Demonstration. Castano, S., Antonellis, V., Fugini, M. G., Pernici, B. (1998) “Conceptual Schema Analysis: Techniques and Applications”. In: ACM Transactions on Database Systems, Vol. 23, No. 3, pp. 286-333. Euzenat, J., Shvaiko, P. (2007) “Ontology Matching”. Springer-Verlag. OAEI – Ontology Alignment Evaluation Initiative. (2009) http://oaei.ontologymatching.org/ Pires, C. E. S. (2009) “Ontology-based Clustering in a Peer Data Management System”. PhD Thesis. Federal University of Pernambuco (UFPE), Recife, PE, Brazil. Rijsbergen, C. J. (1979) “Information Retrieval”, 2nd Edition, Stoneham, MA: Butterworths. http://www.dcs.gla.ac.uk/Keith/Preface.html. Sabou M., D’Aquin M., Motta E. (2006) “Using the Semantic Web as Background Knowledge for Ontology Mapping”. In: ISWC’06 Ontology Matching WS. Souza D., Arruda T., Salgado A. C., Tedesco P., Kedad, Z. (2009) “Using Semantics to Enhance Query Reformulation in Dynamic Environments”. To appear in the 13th East European Conference on Advances in Databases and Information Systems, Riga, Latvia. Xiao, H. (2006) “Query Processing for Heterogeneous Data Integration using Ontologies”. Ph.D. Thesis. University of Illinois at Chicago, Chicago, USA. 4 5

Jena, http://jena.sourceforge.net/ OWL API, http://owlapi.sourceforge.net/

A Tool for Matching Ontology-based Schemas

matching techniques have been used to determine schema mappings between .... the semantic correspondence's weight, database credentials, and linguistic- ...

204KB Sizes 0 Downloads 232 Views

Recommend Documents

A Tool for Text Comparison
The data to be processed was a comparative corpus, the. METER ..... where xk denotes the mean value of the kth variables of all the entries within a cluster.

A new tool for teachers
Items 11 - 20 - Note: The authors wish to express their sincere thanks to Jim Davis .... of the American population) to allow confident generalizations. Children were ..... available to them and (b) whether they currently had a library card. Those to

A Tool for All Seasons
variation. Moreover, museum curators are often reluctant to allow researchers to drill deep grooves into rare hominin teeth. In contrast to conventional methods, ...

A Collaborative Tool for Synchronous Distance Education
application in a simulated distance education setting. The application combines video-conference with a networked virtual environment in which the instructor and the students can experiment ..... Virtual Campus: Trends for Higher Education and. Train

Generating Synthetic Database Schemas for Simulation ...
To simulate query answering in Peer Data Management System. (PDMSs) ..... In: 2nd Int. Conf. on Data Management in Grid and P2P Systems, Linz,. Austria, pp.

Learning of Tool Affordances for Autonomous Tool ...
plan a strategy for target object manipulation by a tool via ... through its motor actions using different tools and learning ..... Robotics and Automation (ICRA).

Learning of Tool Affordances for Autonomous Tool ...
But at the same time it is an infinitely open challenge and demands to be ... Problem1 is addressed by learning tool affordances using random ..... The BN's are implemented using the open source .... Pattern Recognition and Machine Learning.

A faster algorithm for finding optimal semi-matching
Sep 29, 2007 - CancelAll(N2). Figure 2: Divide-and-conquer algorithm. To find the min-cost flow in N, the algorithm use a subroutine called CancelAll to cancel.

Domains and image schemas - Semantic Scholar
Despite diÄering theoretical views within cognitive semantics there ...... taxonomic relation: a CIRCLE is a special kind of arc, a 360-degree arc of constant.

Domains and image schemas - Semantic Scholar
Cognitive linguists and cognitive scientists working in related research traditions have ... ``category structure'', which are basic to all cognitive linguistic theories. After briefly ...... Of course, reanalyzing image schemas as image. 20 T. C. ..

A Semantic-Based Ontology Matching Process for PDMS
and Ana Carolina Salgado1. 1 Federal University of ..... In: International Conference on Management of Data (SIGMOD), Software. Demonstration (2005). 7.

A Declarative Framework for Matching Iterative and ...
effective pattern matching in modern applications. A language for de- ..... “tick-shape” pattern is monitored for each company symbol over online stock events, see rules (1). ..... graphcq: Continuous dataflow processing for an uncertain world.

A Program Behavior Matching Architecture for ...
program such as e2fsck to restore consistency. Journaling has three well-known modes of operation: •. Journal Mode: Both file system data and metadata are.

Satellite matching for a federated ground station network
May 29, 2008 - there is an effort to develop a federated ground station network which will ..... approach to monitor and control for deep space communications.

a foveal architecture for stereo matching - Laboratorio de Señales
[1] Martin D. Levine, Vision in man and machine. Addison,. 1984. [2] Santiago Ramón y Cajal, Recuerdos de mi vida: Historia de mi labor científica. Alianza, Madrid, 1981. [3] R. E.Cummings, J. Van der Spiegel, P. Mueller, and M. Z.. Zhang, "A fovea

A New Point Pattern Matching Method for Palmprint
Email: [email protected]; [email protected]. Abstract—Point ..... new template minutiae set), we traverse all of the candidates pair 〈u, v〉 ∈ C × D.

a foveal architecture for stereo matching
[3] R. E.Cummings, J. Van der Spiegel, P. Mueller, and M. Z.. Zhang, "A foveated silicon retina for two-dimensional tracking" IEEE trans. on CAS-II, vol. 47, no. 6, pp. 504-517,. 2000. [4] S.S. Young, P.D. Scott and C Bandera, “Foveal automatic tar

a foveal architecture for stereo matching - Laboratorio de Señales
INTRODUCTION. The idea of a fovea as a spot of higher resolution in computer images is inspired in biological vision [1], [2]. For vertebrates, the distribution of photoreceptors in the retina is not uniform but concentrated on a small area centralli

SPAC: A financing tool with something for everyone
their careers and avoid the cost of an equity ... business. SPAC offerings are typically done on the basis of a firm com- ..... tive, technology, and secretarial.

A Web-Based Tool for Developing Multilingual ... - Research at Google
plication on Google App Engine and can be accessed remotely from a web browser. The client application displays to users a textual prompt and interface that ...