Materializing and Querying Learned Knowledge

Volker Tresp, Yi Huang Siemens, Corporate Technology

Markus Bundschus University of Munich

Achim Rettinger Technical University of Munich

Deductive and Inductive Reasoning 

In many Semantic Web (SW) domains a tremendous amount of statements (expressed as triples) might be true but only a small number of statements is known to be true or can be inferred to be true



There are regularities in the data  But: cannot be captured by axioms  Large people tend to have higher weight



The goal of this work  Estimate the truth values of statements by exploring regularities in the SW data with machine learning  Store the probabilistic triples in the SW-KB  Make those triples available for querying IRMLES 2009

A Regular SPARQL Query

Query (including deductive inference):  Find all actors that act in movies that are filmed in an Italian city

IRMLES 2009

A SPARQL Query with Learned Probabilities

Query (including deduction and induction)  Find all actors that are likely to act in movies that are filmed in an Italian city

integrate in query learn!

sort by probability

Ronald Reagan George W. Bush … Damn … should have excluded US-presidents IRMLES 2009

Requirements

  



Machine learning should be “push-button” requiring a minimum of user intervention Learning time should scale well with the size of the SW The statements and their probabilities, which are predicted from machine learning, should easily be integrated into SPARQL-type querying Machine learning should be suitable for the data situation on the SW with sparse data (e.g., only a small number persons are friends) and missing information (e.g., some people don't reveal private information)

IRMLES 2009

The Key Steps  User defines key entity (person)  User defines population (person, that is an actor)  LarKC: defines sample (subset of population)  LarKC: finds all triples in which key entity is either subject or property value  Calculate aggregates features  The (sparse, incomplete) data matrix is generated (including deduced triples)  Pruning: Columns with few ones are removed  Learning by matrix completion methods  Learned models makes prediction in the sample  Learned model is applied to population  A subset of estimated (probabilistic) triples is written into triple store  Queries can be formulated IRMLES 2009

FOAF Experiment

Of

Ivey League

kn ow s

kn ow s

te nd s

Harvard at

Joe

residence te da

po st

#ofBlogs

NE-US n io g Re in

Boston

ha s

irth OfB

OnlineChat Account

ed

knows s ld o h

From subclass relations

dIn ate lo c

ws o kn

Jack

type

Mary

image

1980 ageGroup

ThirtySomething

RULE: If born between 1979 and 1989 then in ageGroup ThirtySomething IRMLES 2009

Kn thir ows a tyS ge om Gro eth up ing

hir tyS om Re sid eth en ing ce inR kn eg ow ion sJ NE oe -U S kn ow sJ ac k kn ow sM ary

ag eG rou pt

Re sid en ce Bo Re sto sid n en ce NY ho lds C on lin eC ha ha sIm tA co ag un e t

Data Matrix (FOAF)

Joe Jack Mary



IRMLES 2009

FOAF Experiment Statistics  We selected 636 persons with a "dense" friendship information  On average, a given person has 18 friends  Numerical values such as date of birth or the number of blog posts were discretized  The resulting data matrix, after pruning columns with few ones, has 636 persons (rows) and 491 columns  462 of the 491 columns (friendship attributes) refer to the property knows  The remaining columns (general attributes) refer to general information about age, location, number of blog posts, attended school, etc.  We can then answer queries such as  Who would likely want to be Jack's friend;  Which female persons in the north-east US, would likely want to be Jack's friends IRMLES 2009

Learning Approaches  SVD based

X =UDV

T

( rr ) T ˆ X =UD V  NNMF

 LDA

X = AB X =AB

T

T

ai , j ≥ 0

bi , j ≥ 0

ai ,k = P(attr = i | z = k )

bi , j = P( KE = i | z = k ) IRMLES 2009

Experimental Results

NDCG-Score for different learning approaches IRMLES 2009

Persisting Probabilistic Triples

• quadruple

PersonA

foaf:knows

PersonB

0.758

_:node

• reification

rdf:subject

rdf:predicate

rdf:object

prob

rdf:type

(simplest but high memory cost)

PersonA

foaf:knows

PersonB

0.758

Statement

• blank node

PersonA

kp

_:node

foaf:knows

PersonB

prob

0.758

IRMLES 2009

Results: Who wants to be Trelena’s Friend

IRMLES 2009

Conclusion and Outlook  We presented a novel generic learning approach for deriving probabilistic SW statements and demonstrated how these can be integrated into an extended SPARQL query  The approach is suitable for a typical situations with sparse/missing data  The learning process is to a large degree autonomous (goal!)  Generalization from the sample to the population is linear in the size of the population (matrix!)  Learned statements are materialized for fast querying  LDA showed best performance (Bayesian averaging)  Part of EU-FP7: LarKC

IRMLES 2009

Materializing and Querying Learned Knowledge

There are regularities in the data ... regularities in the SW data with machine learning. ▫ Store the ... The (sparse, incomplete) data matrix is generated (including.

285KB Sizes 0 Downloads 104 Views

Recommend Documents

Short paper: Materializing Highly Available Grids
decorating, but not changing, the original services, thus making them highly available. ... ity hardware and software components, susceptible to frequent failures. .... prises pool accounting and user priority information, and should be made ...

KNOWLEDGE AND EMPLOYABILITY COURSES
Apr 12, 2016 - Principals must articulate clearly and document the implications of a ... For a student to take a K&E course, the student must sign a consent form ...

Learned helplessness and generalization - Stanford University
In learned helplessness experiments, subjects first expe- rience a lack of control in one situation, and then show learning deficits when performing or learning ...

KNOWLEDGE MANAGEMENT TECHNIQUES, SYSTEMS AND ...
KNOWLEDGE MANAGEMENT TECHNIQUES, SYSTEMS AND TOOLS NOTES 2.pdf. KNOWLEDGE MANAGEMENT TECHNIQUES, SYSTEMS AND TOOLS ...

Stimulating Knowledge Discovery and Sharing
enhances knowledge discovery and sharing by providing services addressing these ..... Thus, the meeting is not limited to people inside the room. Furthermore, while .... systems – e.g., personal data management systems, and document management ....

Linked Data and Live Querying for Enabling Support ...
Linked Data and Live Querying for Enabling. Support Platforms for Web Dataspaces. Jürgen Umbrich1, Marcel Karnstedt1, Josiane Xavier Parreira1,.

SIHJoin: Querying Remote and Local Linked Data
problem of Linked Data query processing: to query not only remote, but also local ..... server on the local network so that data can be accessed using URI lookup,.

SIHJoin: Querying Remote and Local Linked Data
are retrieved by a dedicated retrieval thread [6] and their data is pushed directly ..... server on the local network so that data can be accessed using URI lookup,.

Techniques for efficiently storing and querying RDF
The RDF Data Model. Subject. Predicate. Object http://aalto.fi/Eetu http://yso.fi/placeOfWork http://aalto.fi/ http://aalto.fi/Eetu http://yso.fi/hobby http://yso.fi/Tea ..... SP 11,91%. OP 13,52%. SPO+POS+OSP 14,53%. SOP+PSO+OPS 14,59%. POS+PSO+OP+S

Eastside Lessons Learned and Hopes Report.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Eastside ...

The DaQuinCIS Broker: Querying Data and Their ... - Semantic Scholar
on the design of a broker, which selects the best available data from dif- ferent sources; such a broker .... Indeed cooperative information systems are software systems supporting coop- ...... Intelligent Data Analysis, Cascais, Portugal, 2001. 15.

Storing and Querying Tree-Structured Records in ... - VLDB Endowment
Introduction. Systems for managing “big data” often use tree-structured data models. ... A tuple type is a list of attribute names and a (previously defined) type for ...

The DaQuinCIS Broker: Querying Data and Their ... - Semantic Scholar
Indeed cooperative information systems are software systems supporting coop- .... data quality dimension values evaluated for the application data) according to a specific data model. ..... (SAW) [22] or Analytical Hierarchy Process (AHP) [23].