A Framework for Access Methods for Versioned Data

Viewer
Transcript

A Framework for Access Methods for Versioned Data B. Salzberg, L. Jiang, D. Lomet, M. Barrena, J. Shan & E. Kanoulas

Outline •  Motivation •  Introducing versions through the examples •  Versions and version ranges •  Data pages –  Page splitting and consolidation –  Efficiency guarantee

•  Index pages •  Conclusions

Motivation •  Historical archives need to be retained –  Medical, banking, …

•  Different historical versions created along different branches must be reconstructed –  Software libraries, design, …

•  Access methods for versioned data have been proposed

Motivation •  We present a framework for constructing and understanding versioned access methods •  Central point: the study of version splitting of units of data storage (disk pages) •  Main goal: to make the stabbing query efficient: “Find all data alive at this version”

The two-version example •  Records format: version v1 Set of versions for which record k2 does not change

version v2

Redundancy What k2 use does Couldifwe not for a startchange and end large number of versions labels? <{v1,v2},k2,d2> versions? <{v1,v2},k3,d3>

The three-version example branch b2

v3

v1

v2

branch b1

Key space k3 k2 k1

v1

version v2

version v3

branch b1 d3

d1

version v1

d’1

v2

d2

now

Key space k3 k2 time k1

d2

branch b2 d’2

d3 d1

v1

v3

now

time

The three-version example Key space k3 k2 k1

branch b1 d3

d1

v1

d’1

v2

d2

now

Key space k3 k2 k time 1

d2

branch b2 d’2

d3 d1

v1

<{v1, v3},k1,d1> <{v1, v2},k2,d2> <{v1, v2, v3},k3,d3>

v3

now

time

What if there k3 iskeep never When We might is the updated branch b1? branching, start andinthe weend cannot Should expresswe version on a keep unique each end updating the end version for a set of branch version for k3 as new versions versions appear on b1?

Versions •  The initial version set: V = {v1} •  New versions are obtained by updating, inserting or deleting records from old versions of V •  V can be represented by a tree: the version tree •  There is a partial order on the nodes of the version tree –  Ancestors: anc(v)={a ∈ V/ a < v} –  Descendents: desc(v) = {d ∈ V/ d > v}

Version Ranges •  Records correspond to sets of versions over which they do not change •  Such a set forms a subtree called the version range. We have: –  One start version: the root of the subtree –  A set of end versions: the leaves of the subtree (one on each branch)

•  The main objection: –  To have to update end versions for every new version for which the record does not change

•  The solution: –  To take apart end versions from the version range

Version Ranges

v1

v3

v4

v5 v2

Considerthat Assume now a record athat new RRversion isinserted updated v5 at version appears v1. Suppose but v4.RWe is that not could Rtouched remains say v3 unchanged is by an v5 end version at v2 and for R v 3.

Version Ranges

v1

v3

v4

v5 v2

v6

Later, Now We choose version any number the v3 is end no ofversion longer an for end R descendent to version be a “stop for in VR sign” R. R could remains along be a unchanged branch. created. The If these atend {v1versions version , v2, v3, vdo 54,} does not change not belong R, thetoVR the version range for automatically expands R

Version Ranges

v1

v3

v4

v5 v2

v6

Later, any number of descendent in VR could be created. If this versions do not change R, the VR expands automatically

Version Ranges • 

The version range vr = (start(vr), end(vr)), where: –  start(vr) is an individual version –  end(vr) is the minimal set of versions ev with the property: v ∈ vr iif 1. start(vr) ≤ v 2.  ∀ ev ∈ end(vr) ¬(ev ≤ v)

The three-version example revisited <{v1, v3},k1,d1> <{v1, v2},k2,d2> <{v1, v2, v3},k3,d3>

<(v1, {v2}),k1,d1> <(v2, { }),k1,d’1> <(v1, {v3}),k2,d2> <(v3, { }),k2,d’2> <(v1, { }),k3,d3>

Data pages •  Data pages (P) delimit one version range (vr) and one key range (kr) •  We define KVR(P) = (kr(P),vr(P)) –  A data page with KVR(P) = (kr,vr) stores all records such that: 1.  k ∈ kr and 2.  vr ∩ vr’ ≠ ∅

Compact record representation •  To store records in data pages we use the compact record representation <(v1, {v2}),k1,d1> <(v2, { }),k1,d’1> <(v1, {v3}),k2,d2> <(v3, { }),k2,d’2> <(v1, { }),k3,d3> Deletion events do not cause lose of content, they are stated by means of compact null records

Looking for the efficiency •  To make the stabbing query efficient, a substantial percentage of the records in an accessed page must be alive for a version v •  The splitting page policy –  When a page P gets full, a version splitting of P must be done (here current version vn is used) –  A new page P’ is allocated with VR(P’) = (vn,∅) –  Records from P can be moved or copied to P’

Page splitting policy •  Records created by vn which are not null are moved from P to P’ •  Records whose version range lie in VR (P) ∩ VR(P’) which are not null are copied to P’

Page splitting policy •  Some kind of key splits are allowed in our framework (similar to B-tree page splits) –  After a version split if the new page has more than a certain threshold value Tk (we call version-andkey split) –  When a full page has version range (current_version, ∅) (we call restricted-key split)

•  Pure key splits cannot guarantee a minimun number of records alive for a given version

Consolidation •  Delete operations may damage the stabbing query efficiency •  When the number of records alive in P at vn fall below a threshold Tc, a consolidation process is triggered •  A sparse page and a proper sibling are current-version split, and the results are combined in one page •  Transactions with a large number of deletions may generate ghost pages

Efficiency guarantee •  We start with a page D at version v1 having n alive records •  Our framework guarantees a minimum number of records in a data page D in answering a stabbing query (v ∈ VR(D)) under different scenarios

Efficiency guarantee Assertions 1.  No deletes and only version splits: at least n 2.  No deletes and only current-version or version-and-key or restricted-key splits: at least min(n,Tk/2) 3.  Any kind of transactions and version splits, version-and-key splits, restricted-key splits and node consolidation: at least min(Tc,n)

Index pages •  Index pages + data pages form a DAG •  Index pages also correspond to key-version ranges •  Index page entries contain for every child C: •  Index page splits and consolidations follow the same policy as for data pages •  Additional details about properties and treatment of index pages can be seen in the paper

Conclusions •  Version data are not trivial to deal with •  Our framework –  contributes to understand the implications of managing and retrieving version data –  gives clear cues to represent in a compact and robust way this kind of data –  supports realistic assumptions on transactions

A Framework for Access Methods for Versioned Data B. Salzberg, L. Jiang, D. Lomet, M. Barrena, J. Shan & E. Kanoulas

A Framework for Access Methods for Versioned Data

3. ,d. 3. > version v. 3 branch b. 2 branch b. 1 time. Key space v. 1 v. 3 k. 1 k. 2 k. 3 now d. 1 ..... (current_version, â) (we call restricted-key split). â¢ Pure key splits ...

Download PDF

222KB Sizes 2 Downloads 269 Views

Report

A Framework for Access Methods for Versioned Data

Recommend Documents