Choosing between Axioms, Rules and Queries: Experiments in Semantic Integration Techniques Aidan Boran1, Ivan Bedini1, Christopher J. Matheus1 Peter F. Patel-Schneider2 and John Keeney3 1 Bell Labs Ireland, Alcatel-Lucent, Dublin, Ireland, {aidan.boran, ivan.bedini, chris.matheus}@alcatel-lucent.com 2 Bell Labs, Alcatel-Lucent, Murray Hill, NJ, USA, [email protected] 3 Trinity College Dublin, Ireland, [email protected]

Abstract. When using semantic technologies developers are frequently confused about which specific inferencing technique is best to use for a given problem. As an initial step towards identifying "best practices" for users of semantic technologies we are conducting a series of experiments to contrast the benefits and limitations of three approaches to inferring cross-data relationships: RDFS/OWL axioms, user-defined rules and SPARQL queries. At the highest level our aim is to identify which approaches provide the best (or acceptable) solutions in terms of memory use, cpu cycles and developer effort, given a variety of specific problem characteristics. In this paper we describe the three semantic approaches we are investigating, identify three broad problem areas in our initial focus and summarize some preliminary results. Keywords: semantic integration, empirical experiments, axioms, rules, queries.

1 Introduction The work described in this paper* is part of an on-going research effort to develop a methodological approach to Semantic Data Access (SDA) that will enable application programmers to more easily use semantic techniques to access, transform, query and reason about data residing in distributed systems. One aspect of this effort focuses on specific techniques for inferring relationships needed to integrate information across heterogeneous data sources. This paper discusses three semantic integration approaches - OWL axioms, user-defined rules and SPARQL queries – and outlines a series of empirical experiments being developed to evaluate their general applicability across a range of problem domains. The objective is to develop an understanding of the strengths and limitations of these approaches that can be turned into general design patterns for guidance on their use by application developers in realistic scenarios, i.e., with a substantive number, but often not billions, of triples. Some initial results have been produced for experiments involving a “Smart Conference” scenario, which is used in this paper to provide an overview of our experimental approach and the objectives of our longer-term research involving empirical evaluation of semantic techniques.

*

This work was partly funded by the Industrial Development Authority (IDA) Ireland and the Science Foundation Ireland via grant 08/SRC/I1403 (FAME).

2 Three Integration Techniques The simplest of the data integration techniques we cover involves using RDF and OWL axioms by themselves. As will be demonstrated below, this can be achieved using ontological constructs such as rdfs:subPropertyOf, owl:sameAs, etc. without resorting to the use of user-defined rules or SPARQL queries. One advantage of this approach is that at runtime the integration process can be accomplished using commonly available OWL reasoners alone. A major drawback here is that the representational limits of OWL may not permit the formulation of the relationships needed for the intended data integration problem. Our second integration technique is to use a rule language (e.g., SWRL, RIF) to define specific rules to perform the desired integration. This approach permits more complex data integration than OWL alone can afford, such as is often required for joining property-values across multiple objects (i.e., relating an object’s property value to a property value of another object). This approach can be used in isolation or along side OWL reasoning, but for the work described here we focus on the use of rules without an explicit OWL reasoning system. The final technique investigated involves the use of SPARQL queries to extract and construct the desired data integration from an RDF datastore. Using a combination of query match clauses (i.e., the SPARQL query WHERE statement) and output construction clauses (i.e., the CONSTRUCT statement) can be shown to be equivalent to a rule-based approach. This technique may also be used in conjunction with an OWL/RDF reasoning system (whereby a full or partial materialization of the inferable triples in the knowledge base is pre-stored in the RDF datastore), but for this work it is assumed that the datastore only contains the original (un-materialized) data.

3 Three Problem Domains We have identified three broad problem domains in which to carry out specific experiments to empirically explore the use of the three integration techniques identified above on problems with different data characteristics. The first domain involves social network information in the context of a location-based service for a “Smart Conference” scenario; image a conference service based around onsite registration using smartphones that automatically identifies and locates your acquaintances who are also in attendance at the conference. The data sources for this problem include a large Friend-of-a-Friend (FOAF) dataset and a semantically enhanced version of the DBLP Computer Science Bibliography (DBLP++). The integration problem is to identify “acquaintances” of a conference attendee, which are defined as either people the attendee foaf:knows or co-authors of papers the attendee has written as indicated by DBLP++. For this domain a series of experiments have been conducted and some of the high-level results are summarized below. The second problem domain is that of wireless telecommunication networks, with an initial focus on Femto Cell networks owing to the recent establishment of a fully functional Femto Testbed at Bell Labs Ireland. The scenario under development for this domain involves the use of network information (e.g., configuration, performance, customer data) in conjunction with location (e.g., FourSquare, Google Latitude, ALU Presence Server) and context (e.g., smartphone sensors, office

equipment proximity, local services) information to provide “Smart Communication” services to assist in efficiently connecting with people, information and networked objects. The integration problem will involve relating dynamic information retrieved from the network with social network and personal smartphone based information. In this domain, most of the information (in particular all of the network data) will need to be automatically “lifted” into a semantic layer using, for example, DERI’s XSPARQL† engine. We hope to have some preliminary results to share by the time of the OWLED Workshop. The final problem domain we intend to begin working with later this year is that of sensor networks. This domain is the least well defined as we are waiting for initial developments to materialize in a sister project focused on wide-area Wireless Sensor Networks (WSN). Our main interests will be in semantically describing the sensors and the networks that they comprise as well as providing the means to semantically access and integrate the data that a federation of WSNs would collectively gather.

4 Smart Conference Experiments A series of initial experiments carried out in the Smart Conference domain are very briefly described here; additional details can be found in a paper submitted to the 2011 IEEE/WIC/ACM International Conference on Web Intelligence.‡ Data and Tools. For this work FOAF datasets where combined from a number of sources resulting in a dataset with 19,054,802 triples, 340,430 foaf:Person individuals identified by name, and 511,745 occurrences of the foaf:knows relationship. The DBLP dataset used in this work contains 36,439,753 triples, 850,149 person individuals identified by name, and zero occurrences of the foaf:knows relationship. The semantic tools used in these experiments included Pellet for OWL reasoning, Jena for rule processing and Jena TDB/ARQ for SPARQL querying. Integration Problem. To identify “acquaintances” there is a key piece of implicit information within the DBLP data that we would like to make explicit and integrate with the FOAF data. Specifically, in the case of publications having more than one author (foaf:maker), we can infer that the co-authors are “acquaintances”. As neither dataset explicitly defines a mechanism to capture a symmetric “acquaintance” relationship between two conference attendees, we introduce an additional ontology where we define a new symmetric OWL object property, “sda:acquaintance”, for our scenario. The challenge then becomes that of mapping the various relationships defined in the FOAF and DBLP datasets onto this sda:acquaintance relationship. OWL Axioms. Mapping the foaf:knows to sda:acquaintance in the FOAF dataset using OWL axioms was done by simply asserting sda:acquaintance as an rdfs:subPropertyOf foaf:knows. In the DBLP dataset mapping co-authorship to a “knows” relationship involved two subparts: transforming the multiple foaf:maker relationships between a document and its authors into a co-maker (co-author) relationship, and deriving the sda:acquaintance relationship from the co-maker/coauthorship relationship. The co-maker/co-authorship relationship was achieved by defining an OWL property chain for sda:co-maker using foaf:maker and the inverse † ‡

http://xsparql.deri.org/ http://liris.cnrs.fr/~wi-iat11/WI_2011/

of foaf:maker: SubPropertyOf( ObjectPropertyChain( foaf:maker ObjectInverseOf( foaf:maker )) sda:co-maker). Deriving sda:acquaintance from co-authorship was achieved by marking the sda:co-maker property chain as a sub-property of the symmetric sda:acquaintance property. User-defined Rules. For the FOAF dataset a simple Jena rule was defined to effectively make sda:acquaintance into a subproperty of foaf:knows: [FoafRule: (?Person2 foaf:knows ?Person1) -> (?Person1 sda:acquaintance ?Person2) (?Person2 sda:acquaintance ?Person1) ]

For the DBLP dataset we infer the colleague relationship using this rule: [AuthorRule: (?Document foaf:maker ?Person1) (?Document foaf:maker ?Person2) -> (?Person1 sda:acquaintance ?Person2) (?Person2 sda:acquaintance ?Person1)]

Caveat: with FOAF it is common for a single person to be represented by a number of different foaf:Person instances inter-related by the owl:sameAs property. Without OWL inference, properties for each individual will not be materialized for all other relevant individuals. For this reason we extended the FOAF ruleset as follows: [SameAs1: (?x owl:sameAs ?y) [SameAs2: (?x owl:sameAs ?y)

(?x ?p ?o ) -> (?y ?p ?o)] (?s ?p ?x ) -> (?s ?p ?y)]

SPARQL Queries. For the FOAF dataset SPARQL queries were constructed at runtime to work for a specific individual (in this code example, John Doe): WHERE

{?Person1 foaf:name "John Doe". ?Friend foaf:knows ?Person1. ?Friend foaf:name ?Friendname} CONSTRUCT {?Person1 sda:acquaintance ?Friend. ?Friend sda:acquaintance ?Person1. ?Friend foaf:name ?Friendname. ?Person1 foaf:name "John Doe".}

Unfortunately, this query needs to be extended to return not just the foaf:Person instances but all of their owl:sameAs instances. It is also necessary for the CONSTRUCT clause to fully materialize all sda:acquaintance relationships, and their inverses, for all returned foaf:Person instances and their optional owl:sameAs instances. This makes the FOAF query much more complicated than represented here. A slightly less complex SPARQL query was required for the DBLP dataset but its details are not presented here due to space concerns. Results. Performance wise, SPARQL queries proved most efficient in terms of compute time and memory requirements (i.e., triples generated) while being able to handle our largest datasets; but they were also difficult to construct, particularly given the need to implement within the queries inferences built into OWL reasoners (e.g., owl:sameAs). User-defined rules suffer from the same problems and were arguably even more challenging to develop than queries; this was particularly true in our initial use of SWRL where its verbose XML syntax added to rule complexity. OWL axioms were by far the easiest to construct and performed reasonably well except on the largest datasets, which could not be handled due to memory constraints. For experienced OWL practitioners these results may not appear surprising, but for those newly exposed to semantic technologies this type of direct comparison of semantic techniques using readily available tools will hopefully provide instructive insight. Collective results from our planned future experiments should help further reveal how these semantic techniques perform relative to each other under a wider range of problem characteristics.

Choosing between Axioms, Rules and Queries ...

and construct the desired data integration from an RDF datastore. Using a ... relationships between a document and its authors into a co-maker (co-author).

107KB Sizes 2 Downloads 164 Views

Recommend Documents

Adversarial Decision Making: Choosing Between ...
Mar 24, 2016 - “It is your job to sort the information before trial, organize it, simplify it and present it to the jury in a simple model that explains what happened ...

Twelve Considerations in Choosing between Gaussian ...
many decisions to be made in designing an IT2 FLC. One of .... the model-driven approach, whereas trapezoidal IT2 FSs can ...... Making Subjective Judgments.

Symmetry Axioms and Perceived Ambiguity
Kyoungwon Seo. §. June 12, 2017. Abstract. Since at least de Finetti [7], preference symmetry assumptions have played an important role in models of decision ...

Viewport and Media Queries
Nevermind the pixels, here comes the Complete Idiot's Guide to. Viewport and ... If you want to have a CSS style that only smartphones will pick up, use: @media ...

Axioms for Deferred Acceptance - CiteSeerX
the Office of the Econometric Society (contact information may be found at the website ... achievements in mathematics and science go to the best engineering univer- sities. Stability is regarded as a .... tion rules over the domain of pairs of respo

Axioms for Deferred Acceptance - CiteSeerX
achievements in mathematics and science go to the best engineering univer- ...... Two-Sided Matching Based Decision Support System for Military Personnel ...

Axioms for Deferred Acceptance - CiteSeerX
For example, schools in Boston give higher priority to students who live nearby or .... Kesten (2006) showed that the deferred acceptance rule and the top trading cycle rule for ..... light on the mechanics of the deferred acceptance algorithm.

AXIOMS FOR DEFERRED ACCEPTANCE Department of ... - CiteSeerX
Apr 14, 2009 - Korkmaz, ˙I., H. Gökçen, and T. C¸etinyokus (2008): “An analytic hierarchy ... (1983b): “Problems of fair division and the Egalitarian solution,” ...

Choosing Inclusion, Peace and Justice.pdf
Choosing Inclusion, Peace and Justice.pdf. Choosing Inclusion, Peace and Justice.pdf. Open. Extract. Open with. Sign In. Main menu.

author queries
8 Sep 2008 - Email: [email protected]. 22. ... life and domain satisfaction: to do well from one's own point of view is to believe that one's life is ..... among my goals. I also value positive hedonic experience, but in this particular. 235 situ

Axioms for Reading the Landscape - mrso
For most Americans, ordinary man-made landscape is something to be looked at, but .... School” in geography at the University of California, and whose students number some of the ... Years ago, when I started teaching about cultural ..... Pierre Da

Object-Based Unawareness: Axioms
Dec 29, 2011 - knowledge operator k, which is a mapping from events to events (recall that an ... (so (M,w1) |= ¬φ means “not φ is true in world 1 of model M).

Page Proof Instructions and Queries
2004; Kijkuit & van den Ende, 2010), an individual's level of self-efficacy will impact their deci- sions regarding how they obtain these resources. The decisions that individuals are motivated to change are likely to include networking behaviors, su

Conceptual Queries
article highlights the advantages of conceptual query languages such as ... used to work directly with the system (e.g. screen forms and printed reports). ... made to the general type of data model to be used for storage (e.g. relational or ... The u

author queries
Some psychologists call for the replacement of all. 35 traditional first ... are alive and well,3 which means that objectivists about happiness and well-being .... of LS judgments shows them to be flawed in a way that 'objective happiness' is not.

Combination Skyline Queries
replaced by the combinations when a dominance check is invoked. In order to keep the combinations for each rule encountered, we use a matrix M with MBRs as rows and cardinalities columns. As the search order shown in Fig. 5, pattern p1 comes before p

AXIOMS FOR DEFERRED ACCEPTANCE Department of ... - CiteSeerX
Apr 14, 2009 - and science go to the best engineering universities–and stability can be ...... sided matching based decision support system for military ...

Axioms for Reading the Landscape
School” in geography at the University of California, and whose students number some of the ..... City (1973)26 Almost anything by J. B. Jackson will do the job nicely, ..... Dictionary lists a bewildering variety of journals – trade and otherwis