A Framework for Access Methods for Versioned Data B. Salzberg, L. Jiang, D. Lomet, M. Barrena, J. Shan & E. Kanoulas

Outline •  Motivation •  Introducing versions through the examples •  Versions and version ranges •  Data pages –  Page splitting and consolidation –  Efficiency guarantee

•  Index pages •  Conclusions

Motivation •  Historical archives need to be retained –  Medical, banking, …

•  Different historical versions created along different branches must be reconstructed –  Software libraries, design, …

•  Access methods for versioned data have been proposed

Motivation •  We present a framework for constructing and understanding versioned access methods •  Central point: the study of version splitting of units of data storage (disk pages) •  Main goal: to make the stabbing query efficient: “Find all data alive at this version”

The two-version example •  Records format: version v1 Set of versions for which record k2 does not change

version v2

Redundancy What k2 use does Couldifwe not for a startchange and end large number of versions labels? <{v1,v2},k2,d2> versions? <{v1,v2},k3,d3>

The three-version example branch b2

v3

v1

v2

branch b1

Key space k3 k2 k1

v1

version v2

version v3







branch b1 d3

d1

version v1

d’1

v2

d2

now

Key space k3 k2 time k1

d2

branch b2 d’2

d3 d1

v1

v3

now

time

The three-version example Key space k3 k2 k1

branch b1 d3

d1

v1

d’1

v2

d2

now

Key space k3 k2 k time 1

d2

branch b2 d’2

d3 d1

v1

<{v1, v3},k1,d1> <{v1, v2},k2,d2> <{v1, v2, v3},k3,d3>

v3

now

time

What if there k3 iskeep never When We might is the updated branch b1? branching, start andinthe weend cannot Should expresswe version on a keep unique each end updating the end version for a set of branch version for k3 as new versions versions appear on b1?

Versions •  The initial version set: V = {v1} •  New versions are obtained by updating, inserting or deleting records from old versions of V •  V can be represented by a tree: the version tree •  There is a partial order on the nodes of the version tree –  Ancestors: anc(v)={a ∈ V/ a < v} –  Descendents: desc(v) = {d ∈ V/ d > v}

Version Ranges •  Records correspond to sets of versions over which they do not change •  Such a set forms a subtree called the version range. We have: –  One start version: the root of the subtree –  A set of end versions: the leaves of the subtree (one on each branch)

•  The main objection: –  To have to update end versions for every new version for which the record does not change

•  The solution: –  To take apart end versions from the version range

Version Ranges

v1

v3

v4

v5 v2

Considerthat Assume now a record athat new RRversion isinserted updated v5 at version appears v1. Suppose but v4.RWe is that not could Rtouched remains say v3 unchanged is by an v5 end version at v2 and for R v 3.

Version Ranges

v1

v3

v4

v5 v2

v6

Later, Now We choose version any number the v3 is end no ofversion longer an for end R descendent to version be a “stop for in VR sign” R. R could remains along be a unchanged branch. created. The If these atend {v1versions version , v2, v3, vdo 54,} does not change not belong R, thetoVR the version range for automatically expands R

Version Ranges

v1

v3

v4

v5 v2

v6

Later, any number of descendent in VR could be created. If this versions do not change R, the VR expands automatically

Version Ranges • 

The version range vr = (start(vr), end(vr)), where: –  start(vr) is an individual version –  end(vr) is the minimal set of versions ev with the property: v ∈ vr iif 1. start(vr) ≤ v 2.  ∀ ev ∈ end(vr) ¬(ev ≤ v)

The three-version example revisited <{v1, v3},k1,d1> <{v1, v2},k2,d2> <{v1, v2, v3},k3,d3>

<(v1, {v2}),k1,d1> <(v2, { }),k1,d’1> <(v1, {v3}),k2,d2> <(v3, { }),k2,d’2> <(v1, { }),k3,d3>

Data pages •  Data pages (P) delimit one version range (vr) and one key range (kr) •  We define KVR(P) = (kr(P),vr(P)) –  A data page with KVR(P) = (kr,vr) stores all records such that: 1.  k ∈ kr and 2.  vr ∩ vr’ ≠ ∅

Compact record representation •  To store records in data pages we use the compact record representation <(v1, {v2}),k1,d1> <(v2, { }),k1,d’1> <(v1, {v3}),k2,d2> <(v3, { }),k2,d’2> <(v1, { }),k3,d3> Deletion events do not cause lose of content, they are stated by means of compact null records



Looking for the efficiency •  To make the stabbing query efficient, a substantial percentage of the records in an accessed page must be alive for a version v •  The splitting page policy –  When a page P gets full, a version splitting of P must be done (here current version vn is used) –  A new page P’ is allocated with VR(P’) = (vn,∅) –  Records from P can be moved or copied to P’

Page splitting policy •  Records created by vn which are not null are moved from P to P’ •  Records whose version range lie in VR (P) ∩ VR(P’) which are not null are copied to P’

Page splitting policy •  Some kind of key splits are allowed in our framework (similar to B-tree page splits) –  After a version split if the new page has more than a certain threshold value Tk (we call version-andkey split) –  When a full page has version range (current_version, ∅) (we call restricted-key split)

•  Pure key splits cannot guarantee a minimun number of records alive for a given version

Consolidation •  Delete operations may damage the stabbing query efficiency •  When the number of records alive in P at vn fall below a threshold Tc, a consolidation process is triggered •  A sparse page and a proper sibling are current-version split, and the results are combined in one page •  Transactions with a large number of deletions may generate ghost pages

Efficiency guarantee •  We start with a page D at version v1 having n alive records •  Our framework guarantees a minimum number of records in a data page D in answering a stabbing query (v ∈ VR(D)) under different scenarios

Efficiency guarantee Assertions 1.  No deletes and only version splits: at least n 2.  No deletes and only current-version or version-and-key or restricted-key splits: at least min(n,Tk/2) 3.  Any kind of transactions and version splits, version-and-key splits, restricted-key splits and node consolidation: at least min(Tc,n)

Index pages •  Index pages + data pages form a DAG •  Index pages also correspond to key-version ranges •  Index page entries contain for every child C: •  Index page splits and consolidations follow the same policy as for data pages •  Additional details about properties and treatment of index pages can be seen in the paper

Conclusions •  Version data are not trivial to deal with •  Our framework –  contributes to understand the implications of managing and retrieving version data –  gives clear cues to represent in a compact and robust way this kind of data –  supports realistic assumptions on transactions

A Framework for Access Methods for Versioned Data B. Salzberg, L. Jiang, D. Lomet, M. Barrena, J. Shan & E. Kanoulas

A Framework for Access Methods for Versioned Data

3. ,d. 3. > version v. 3 branch b. 2 branch b. 1 time. Key space v. 1 v. 3 k. 1 k. 2 k. 3 now d. 1 ..... (current_version, ∅) (we call restricted-key split). • Pure key splits ...

222KB Sizes 2 Downloads 228 Views

Recommend Documents

A Framework for Access Methods for Versioned Data
sentation of a record can be made using start version of the version range ... Many applications such as medical records databases and banking require his-.

SDAFT: A Novel Scalable Data Access Framework for ...
becomes too heavy to move in the network in today's big data era. In this paper, we develop a Scalable Data Access Frame- work (SDAFT) to solve the problem.

A Proposed Framework for Proposed Framework for ...
approach helps to predict QoS ranking of a set of cloud services. ...... Guarantee in Cloud Systems” International Journal of Grid and Distributed Computing Vol.3 ...

Designing with data: A framework for the design professional
Products become tools that deliver a complete experience within a complex system for the user. How can a designer stay relevant in this process, where users have the ... 2. Generative: Create design opportunities. 3. Evaluative: Further development o

a simulation framework for energy efficient data grids
ing a data grid that can conserve energy for data-intensive ... Figure 1: A system architecture for data grids. 1418 .... distributed memory multiprocessors.

A Java Framework for Mobile Data Synchronization
file systems, availability is more important than serializability. .... accumulate a list of newly inserted objects, and listen for completion of the receiving phase to ...

A Framework for Simplifying Trip Data into Networks via Coupled ...
simultaneously cluster locations and times based on the associated .... In the context of social media ... arrival-type events (e.g. Foursquare check-in data [20]).

SilkRoute: A Framework for Publishing Relational Data in XML
To implement the SilkRoute framework, this work makes two key technical ... for selecting a good decomposition plan; the algorithm takes as input estimates of query and data ...... else . Fig. ...... nationkey CHAR(10), phone CHAR(10)).

SilkRoute: A Framework for Publishing Relational Data in XML
virtual XML view over the canonical XML view; and an application formulates an ... supported by the NSF CAREER Grant 0092955, a gift from Microsoft, and ... serialization format, a network message format, and most importantly, a uni-.

FEDC: A Framework for Field Ecological Data ...
of these projects use data grid technology to transmit and manage the data, such ... data mining and mathematical methods to do some data analysis, so that the ...

Sailfish: A Framework For Large Scale Data Processing
... data intensive computing has become ubiquitous at Internet companies of all sizes, ... by using parallel dataflow graph frameworks such as Map-Reduce [10], ... Our Sailfish implementation and the other software components developed as ...

pdf-1432\studyguide-for-statistical-methods-for-survival-data ...
pdf-1432\studyguide-for-statistical-methods-for-survival-data-analysis.pdf. pdf-1432\studyguide-for-statistical-methods-for-survival-data-analysis.pdf. Open.

Quantitative Methods for Business (with Printed Access ...
Written for the future business professional, QUANTITATIVE METHODS FOR ... 12E by a powerhouse, award-winning author team, makes it easy for you to ... including instant online access to Excel worksheets, TreePlan, Crystal Ball, Premium ...

Developing a Framework for Decomposing ...
Nov 2, 2012 - with higher prevalence and increases in medical care service prices being the key drivers of ... ket, which is an economically important segmento accounting for more enrollees than ..... that developed the grouper software.