Department of Media Technology

Techniques for efficiently storing and querying RDF Eetu Mäkelä

Department of Media Technology

Outline ● ● ● ●

The RDF Data Model General Triple Store Architecture Examples of Variations and Optimizations (Further Examples of Variations and Optimizations)

Department of Media Technology

Department of Media Technology

The RDF Data Model

Department of Media Technology

The RDF Data Model Subject

Predicate

Object

http://aalto.fi/Eetu

http://yso.fi/placeOfWork

http://aalto.fi/

http://aalto.fi/Eetu

http://yso.fi/hobby

http://yso.fi/Tea

Department of Media Technology

The RDF Data Model yso:School Subject

Predicate

Object

http://aalto.fi/Eetu

http://yso.fi/placeOfWork

http://aalto.fi/

http://aalto.fi/Eetu

http://yso.fi/hobby

http://yso.fi/Tea

rdfs:subClassOf yso:University rdf:type

aalto:Eero yso:Tea

yso:place OfWork

yso:hobby

aalto:

aalto:Eetu



Department of Media Technology

Sources: ○ Eero's homepage ○ Eetu's homepage ○ Aalto's homepage ○ yso.fi common ontology

The RDF Data Model - Complications: Literals and Blank Nodes Subject

Predicate

Object

http://aalto.fi/Eetu

foaf:name

"Eetu Mäkelä"

http://aalto.fi/

skos:prefLabel

"Aalto-yliopisto"@fi

http://aalto.fi/

skos:prefLabel

"Aalto University"@en

http://aalto.fi/

yso:establishedOn

"2010-01-01"^^xsd:date

Subject

Predicate

Object

http://aalto.fi/

vcard:hasAddress

_:a

_:a

vcard:streetAddress

"Otakaari 1B"

_:a

vcard:postalCode

"FI-00076 AALTO"

Department of Media Technology

Querying RDF SELECT ?placeOfWorkName ?placeOfWorkType WHERE { ?person foaf:name "Eetu Mäkelä" . yso:School

?person yso:placeOfWork ?placeOfWork . ?placeOfWork skos:prefLabel ?placeOfWorkName .

rdfs:subClassOf

?placeOfWork rdf:type ?placeOfWorkType . }

yso:University rdf:type "Eetu Mäkelä"

aalto:Eero yso:place OfWork

foaf:name

aalto:

aalto:Eetu skos:prefLabel

Department of Media Technology

"Aaltoyliopisto"@fi

"Aalto University"@en

Querying RDF SELECT ?placeOfWorkName ?placeOfWorkType WHERE { ?person foaf:name "Eetu Mäkelä" . ?person yso:placeOfWork ?placeOfWork . ?placeOfWork skos:prefLabel ?placeOfWorkName . ?placeOfWork rdf:type ?placeOfWorkType . }

a. b.

?placeOfWorkName="Aalto-yliopisto"@fi ; ?placeOfWorkType=yso:University . ?placeOfWorkName="Aalto University"@en ; ?placeOfWorkType=yso:University .

Department of Media Technology

Department of Media Technology

General Triple Store Architecture

Department of Media Technology

Querying RDF, the Triple Stores' Point of View ●



Need to quickly find all triples matching a single graph pattern ○ Helps to be space-efficient The graph patterns can be run in arbitrary orders. Need to quickly and accurately find the optimal order ?person foaf:name "Eetu Mäkelä" . #get all people named Eetu ?person yso:placeOfWork ?placeOfWork . #retrieve their places of work

vs ?person yso:placeOfWork ?placeOfWork . #get all peoples' places of work ?person foaf:name "Eetu Mäkelä" . #filter to only those whose name is Eetu

Department of Media Technology

Quickly Finding Triples Matching a Graph Pattern ● Triples have only three parts. Varying the bound parts gives eight different possible triple query patterns: ○ SPO = aalto:Eetu yso:placeOfWork aalto: .? ○ SP? = aalto:Eetu foaf:name ?name . ○ S?? = aalto:Eetu ?property ?object . ○ S?O = aalto:Eetu ?property aalto: . ○ ?PO = ?people yso:placeOfWork aalto: . ○ ??O = ?subject ?property yso:Tea . ○ ?P? = ?subject rdfs:label ?object . ○ ???

Department of Media Technology

Quickly Finding Triples Matching a Graph Pattern ● Triples have only three parts. Varying the bound parts gives eight different possible triple query patterns: ○ SPO, SP?, S??, S?O, ?PO, ??O, ?P?, ??? ● Most triple stores use a configuration of three B+Tree indices: ○ SPO (can answer SPO, S??, SP?) ○ POS (can answer SPO, ?P?, ?PO) ○ OSP (can answer SPO, ??O, S?O) POS index Query: ?,P1,O2 = P1,O2,?

P1,O1,S3

P1,O2,S2

P1,O1,S1

P1,O1,S3

P1,O2,S1

P1,O1,S2

Department of Media Technology

P1,O2,S2

P1,O2,S3

P2,O1,S2

Internal Resource Identifiers and Node to ID Indices ● It is beneficial for the B+Trees to be space-efficient, but URIs and literals take up lots of space

P1,O1,S1

P1,O1,S2

Department of Media Technology

P1,O1,S3

P1,O2,S2

P1,O1,S3

P1,O2,S1

P1,O2,S2

P1,O2,S3

P2,O1,S2

Internal Resource Identifiers and Node to ID Indices ● IDEA: use 64 bit numerical identifiers in B+Trees, add new indices to map from internal ids to terms and back

P1,O1,S1

P1,O1,S2

P1,O1,S3

P1,O2,S2

P1,O1,S3

P1,O2,S1

P1,O2,S2

Query: P1,O2,?->1,6,? 1,2,5

1,2,3

1,2,4

1,2,5

1,6,4

1,6,3

Department of Media Technology

1,6,4

1,6,5

7,2,4

P1,O2,S3

P2,O1,S2

1->P1

P1->1

2->O1

O1->2

3->S1

S1->3

...

...

Quickly Finding Triples Matching a Graph Pattern SELECT ?product ?label WHERE { ?product rdf:type bsbm:footwear . ?product bsbm:feature bsbm:stripes . ?product bsbm:feature bsbm:laces . ?product bsbm:priceEUR ?price . FILTER ( ?price < 120 )

SELECT ?product AVG(?price) WHERE { ?product rdf:type bsbm:footwear . ?product bsbm:priceEUR ?price .

VS }

GROUP BY ?product

?product rdfs:label ?label . } # Find a limited set of products matching multiple constraints and get some info on them

1,2,5

1,2,3

1,2,4

1,2,5

# Enumerate the prices of all products of a certain type for averaging

1,6,4

1,6,3

Department of Media Technology

1,6,4

1,6,5

7,2,4

1->P1

P1->1

2->O1

O1->2

3->S1

S1->3

...

...

Ordering Query Patterns ● Most use B+Tree divergence depth to approximate the number of results ● Melior uses a modified B+Tree that keeps accurate counts of triples in each subtree. Supports also runtime sampling.

Department of Media Technology

Ordering Query Patterns ● Most use B+Tree divergence depth to approximate the number of results ● Melior uses a modified B+Tree that keeps accurate counts of triples in each subtree. Supports also runtime sampling. ● BUT! Jena TDB gets truly surprisingly good results using basically just a few rules: a. SPO = aalto:Eetu yso:placeOfWork aalto: .? b. SP? = aalto:Eetu foaf:name ?name . c. ?PO = ?people yso:placeOfWork aalto: . ■ except "? rdf:type O"

d. Others Department of Media Technology

Department of Media Technology

Examples of Variations and Optimizations

Department of Media Technology



Triple Indices in Virtuoso SPO

Typical query patterns have upon evaluation either PO or SP bound a. ?object rdf:type foaf:Person . b. ?object foaf:name ?name .

POS

OSP

S1

P1

O1

P1

O1

S1

O1

S1

P1

S1

P2

O2

P1

O1

S2

O1

S2

P1

S2

P1

O1

P2

O2

S1

O2

S1

P2

VS P1

S1

O1

P1

O1

S1

P1

S2

O1

P1

O1

S2

O1

P2

S1

O2

P2

O2

S1

O2

PSO

POS

Department of Media Technology

S1

P1

P1

S1

P2

P2

S2

P1

OP

SP



Triple Indices in Virtuoso SPO

Typical query patterns have upon evaluation either PO or SP bound a. ?object rdf:type foaf:Person . b. ?object foaf:name ?name .

POS

OSP

S1

P1

O1

P1

O1

S1

O1

S1

P1

S1

P2

O2

P1

O1

S2

O1

S2

P1

S2

P1

O1

P2

O2

S1

O2

S1

P2

VS P1

S1

O1

P1

O1

S1

P1

S2

O1

P1

O1

S2

O1

P2

S1

O2

P2

O2

S1

O2

PSO

POS

Department of Media Technology

S1

P1

P1

S1

P2

P2

S2

P1

OP

SP



Triple Indices in Virtuoso SPO

Typical query patterns have upon evaluation either PO or SP bound a. ?object rdf:type foaf:Person . b. ?object foaf:name ?name .

POS

OSP

S1

P1

O1

P1

O1

S1

O1

S1

P1

S1

P2

O2

P1

O1

S2

O1

S2

P1

S2

P1

O1

P2

O2

S1

O2

S1

P2

VS P1

S1

O1

P1

O1

S1

P1

S2

O1

P1

O1

S2

O1

P2

S1

O2

P2

O2

S1

O2

PSO

POS

Department of Media Technology

S1

P1

P1

S1

P2

P2

S2

P1

OP

SP

Triple Indices in Virtuoso SPO

POS

OSP

S1

P1

O1

P1

O1

S1

O1

S1

P1

S1

P2

O2

P1

O1

S2

O1

S2

P1

S2

P1

O1

P2

O2

S1

O2

S1

P2

VS P1

S1

O1

P1

O1

S1

P1

S2

O1

P1

O1

S2

O1

P2

S1

O2

P2

O2

S1

O2



S1

P1

P1

S1

P2

P2

S2

P1

PSO POS OP SP S?? and O?? are answered by enumerating all P from the SP/OP indices and querying PS?/PO?

Department of Media Technology

Triple Indices in Virtuoso SPO

POS

OSP

S1

P1

O1

P1

O1

S1

O1

S1

P1

S1

P2

O2

P1

O1

S2

O1

S2

P1

S2

P1

O1

P2

O2

S1

O2

S1

P2

VS P1

S1

O1

P1

O1

S1

P1

S2

O1

P1

O1

S2

O1

P2

S1

O2

P2

O2

S1

O2



S1

P1

P1

S1

P2

P2

S2

P1

PSO POS OP SP S?? and O?? are answered by enumerating all P from the SP/OP indices and querying PS?/PO?

Department of Media Technology

Triple Indices in 4store SPO

POS

OSP

S1

P1

O1

P1

O1

S1

O1

S1

P1

S1

P2

O2

P1

O1

S2

O1

S2

P1

S2

P1

O1

P2

O2

S1

O2

S1

P2

VS P1SO S1

O1

S2

O1

P1

P1OS O1

Department of Media Technology

S1, S2

P2SO S1

O2

P2

P2OS O2

S1

Triple Index Compression ●

Size: 30% of uncompressed

P1

O1

S1

P1

O1

S2

P1

O1

S1 S2

P2

O2

S1

P2

O2

S1

POS

Department of Media Technology

POS

Triple Index Compression

Size

SPO 19,31% POS 10,16%

P1

O1

S1

P1

O1

S2

P1

O1

S1 S2

P2

O2

S1

P2

O2

S1

POS

Index

POS

OSP 14,13% SOP 19,65% PSO 13,44% OPS 10,67% SP 11,91% OP 13,52% SPO+POS+OSP 14,53% SOP+PSO+OPS 14,59% POS+PSO+OP+SP 11,83%

Packed POS+PSO+OP+SP = 9,24% of baseline SPO+POS+OSP

Department of Media Technology

Grouped Execution of Triples in Melior POS

POS P1

O1

P1

S1

O1

S1 S1

S1 P1

O1

S2

P1

O1

S2

P2

O2

S1

S2 P2

O2

S1

VS SELECT ?product ?label WHERE { ?product rdf:type bsbm:footwear . ?product bsbm:feature bsbm:stripes . ?product bsbm:feature bsbm:laces . ?product bsbm:priceEUR ?price . FILTER ( ?price < 120 ) ?product rdfs:label ?label . }

Department of Media Technology

POS P1

O1

S1

P1

O1

S2

P2

O2

S1

S1

Melior Benchmark Results

Department of Media Technology

All Optimizations in Melior ● Efficient B+Tree index compression (1/10 space consumption) ● Efficient node2id/id2node index compression and aggressive node inlining (1/10 space consumption) ● Range queries answered directly by range scans on B+Tree indices (at times 15x speedup) ● Efficient counts (at times 40x speedup) and triple pattern ordering using count information stored in modified B+Tree ● Triple pattern ordering based on sampling ● Constraint tightening based on variable co-occurrence in other patterns ● Grouped evaluation of triple patterns ● Eager or delayed evaluation of filters ● Parallel evaluation of triple patterns, joins and filters in general

Department of Media Technology

Department of Media Technology

Further Examples of Variations and Optimizations

Department of Media Technology

Internal Resource Identifiers and Node to ID Indices: Common Optimizations p1

http://en.wikipedia.org/wiki/

http://en.wikipedia.org/wiki/RDF

1

p1:Blank_Node

http://en.wikipedia.org/wiki/...

2

p1:RDF

3

p1:...

1

http://en.wikipedia.org/wiki/Blank_Node

2 3 4

"1"^^xsd:integer

5

"2"^^xsd:integer

6

"3"^^xsd:integer

SELECT ?product ?label WHERE { ?product rdf:type bsbm:footwear . ?product bsbm:feature bsbm:stripes . ?product bsbm:feature bsbm:laces . ?product bsbm:priceEUR ?price . FILTER ( ?price < 120 ) ?product rdfs:label ?label . } Department of Media Technology

∀x∈[xsd:int]: id(x)=0b100...x -> id(1)=0b100..001 id(2)=0b100..002 SELECT ?product AVG(?price) WHERE { ?product rdf:type bsbm:footwear . ?product bsbm:priceEUR ?price . } GROUP BY ?product

Internal Resource Identifiers and Node to ID Indices: Common Optimizations 1

http://en.wikipedia.org/wiki/Blank_Node

1

p1:Blank_Node

2

http://en.wikipedia.org/wiki/RDF

2

p1:RDF

3

http://en.wikipedia.org/wiki/...

3

p1:...

4

"1"^^xsd:integer

5

"2"^^xsd:integer

6

"3"^^xsd:integer

7

"very long literal that just goes on forev.."

4

"gz%#!"aSDFG"

8

"another very long literal that just goes.."

5

"gz%#!CVS%?"

9

"yet another very long literal that just.."

6

"gz%#57Aa?m"

Department of Media Technology

∀x∈[xsd:int]: id(x)=0b100...x -> id(1)=0b100..001 id(2)=0b100..002

term2id (B+Tree) MD5SUM(uri1)->1

Node to ID Indice Variations

MD5SUM(uri2)->6 MD5SUM(literal1)->11

● Jena TDB u

r

i

1

1

2

3

4

5

...

id2term (linear file) u

r

i

2

6

7

8

9

10

l

i

t

e

r

a

l

1

11

12

13

14

15

16

17

18

... 19

id2term (hashmap)

● 4store term2id

64bitHASH(uri1)->uri1

∀x id(x)=64bitHASH(x)

64bitHASH(uri2)->uri2 64bitHASH(literal1)->literal1 ...

● Melior ○ Layout similar to TDB, aggressive inlining and compression results in 1/10 space usage and better colocation Department of Media Technology

...

Melior Optimizations from Pattern Index Analysis

Size

SPO 19,31%

● ●



In the SP index consecutive subjects often have exactly the same sets of properties In the PSO index, there are three cases depending on the P: ○ if P is akin to "rdf:type", usually there is a long list of S with the same O (?a rdf:type foaf:Person) ○ if P is a literal property such as "wgs84:lat", usually there is only one differing O per S ○ if P is akin to a keyword property such as "dc:subject", usually there are multiple O for each S Packed POS+PSO+OP+SP = 9,24% of baseline SPO+POS+OSP

Department of Media Technology

POS 10,16% OSP 14,13% SOP 19,65% PSO 13,44% OPS 10,67% SP 11,91% OP 13,52% SPO+POS+OSP 14,53% SOP+PSO+OPS 14,59% POS+PSO+OP+SP 11,83%

Techniques for efficiently storing and querying RDF

The RDF Data Model. Subject. Predicate. Object http://aalto.fi/Eetu http://yso.fi/placeOfWork http://aalto.fi/ http://aalto.fi/Eetu http://yso.fi/hobby http://yso.fi/Tea ..... SP 11,91%. OP 13,52%. SPO+POS+OSP 14,53%. SOP+PSO+OPS 14,59%. POS+PSO+OP+SP 11,83%. Packed POS+PSO+OP+SP = 9,24% of baseline.

817KB Sizes 1 Downloads 180 Views

Recommend Documents

Storing and Querying Tree-Structured Records in ... - VLDB Endowment
Introduction. Systems for managing “big data” often use tree-structured data models. ... A tuple type is a list of attribute names and a (previously defined) type for ...

Storing and Querying Tree-Structured Records in ... - VLDB Endowment
1. Introduction. Systems for managing “big data” often use tree-structured data models. Two important examples of such models are the JSON data format [1] and ...

Scalable SPARQL Querying of Large RDF Graphs
SPARQL queries into high performance fragments that take advantage of how ...... Journal of High Performance Computing Applications, pages. 81–97, 2003.

03 Storing Cryptocurrencies - Cryptography and Wallets.pdf ...
Page 4 of 59. 03 Storing Cryptocurrencies - Cryptography and Wallets.pdf. 03 Storing Cryptocurrencies - Cryptography and Wallets.pdf. Open. Extract. Open with.

Query-Independent Learning to Rank for RDF Entity ...
This paradigm constitutes the state-of-the-art in IR, and is widely used by .... For illustration, Figure 3 shows a subgraph from the Yago knowledge base.

Linked Data and Live Querying for Enabling Support ...
Linked Data and Live Querying for Enabling. Support Platforms for Web Dataspaces. Jürgen Umbrich1, Marcel Karnstedt1, Josiane Xavier Parreira1,.

Rate-Distortion based Video Watermarking for storing ...
to monitor the improper activities in an environment. At the same time, ... But the security flaw overlooked in this system is that once the modifications are done.

Properly Winterizing & Storing Your RV For The Winter: Moore's RV ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Properly Winterizing & Storing Your RV For The Winter: Moore's RV - November 2016 Newsletter[201].pdf. Prope

Apparatus and method for downloading and storing data from a digital ...
Jul 6, 2010 - phone signal reception/processing in order to facilitate retrieval of telephone ... teaching those skilled in the art the best mode of carrying out.

Properly Winterizing & Storing Your RV For The Winter: Moore's RV ...
Properly Winterizing & Storing Your RV For The Winter: Moore's RV - November 2016 Newsletter[201].pdf. Properly Winterizing & Storing Your RV For The ...

3Store: Efficient Bulk RDF Storage
May 3, 2010 - describes the 3store RDF storage and query engine. – discuss ... Company ... The biggest of these is that RDF didn't have any data‐typing so.

RDF(S) introduction Francisco Javier Cervigon Ruckauer.pdf ...
RDF. Database XML RDF(S). Schema. Data. Whoops! There was a problem loading this page. RDF(S) introduction Francisco Javier Cervigon Ruckauer.pdf.

Apparatus and method for downloading and storing data from a digital ...
Jul 6, 2010 - The features and advantages of the present invention will become .... instead be any other type of Wireless link such as an RF or ultrasonic link.