DISTRIBUTED DATABASE SYSTEMS Alternatively: Database or Electric Disgrace? Angus Macdonald [email protected]

13/12/2010

Angus Macdonald

1

For this talk… • I will explain current trends in distributed database design

• The demands of

• From small scale to

13/12/2010

speed and scale

internet scale

Angus Macdonald

2

Speed • How do we make databases faster? • Specialization • Removing un-needed features • Focusing on what’s needed

• Create Databases for specific tasks • Data Warehousing • On-line Transaction Processing

13/12/2010

Angus Macdonald

3

Specialization

DATA WAREHOUSING

13/12/2010

Angus Macdonald

4

Specialization: Data Warehousing • Many reads, few writes • So, optimize for reads

13/12/2010

Angus Macdonald

5

Data Warehouse Example • Most databases store tables by row

13/12/2010

Name

Age

Address

Favourite Colour

Judith

21

North Street

Blue

Sonny

21

Market Street

Red

Lisa

21

South Street

Green

Jon

52

The Scores

Red

Angus Macdonald

6

Column Storage • Data stored by column Name

Age

Address

Favourite Colour

Judith

21

North Street

Blue

Sonny

21

Market Street

Red

Lisa

21

South Street

Green

Jon

52

The Scores

Red

• Avoid un-needed data • More opportunity for compression

13/12/2010

Angus Macdonald

7

Everything’s Distributed • To provide higher availability all of the table data is replicated

13/12/2010

Name

Age

Address

Favourite Colour Blue

Judith

21

North Street

Sonny

21

Market Street

Red

Lisa

21

South Street

Green

Jon

52

The Scores

Red

Name

Age

Address

Favourite Colour Blue

Judith

21

North Street

Sonny

21

Market Street

Red

Lisa

21

South Street

Green

Jon

52

The Scores

Red

Angus Macdonald

Name

Age

Address

Favourite Colour Blue

Judith

21

North Street

Sonny

21

Market Street

Red

Lisa

21

South Street

Green

Jon

52

The Scores

Red

8

Projections • Copies of tables can be stored together in different groups, sorted in different forms Name

Age

Name

Address

Favourite Colour

Jon

52

Judith

North Street

Blue

Judith

21

Lisa

South Street

Green

Lisa

21

Sonny

Market Street

Red

Sonny

21

Jon

The Scores

Red

• Queries take advantage of the most appropriate projection

13/12/2010

Angus Macdonald

9

The Result • A product called Vertica • Formerly C-Store

• Extremely fast reads • Reasonably fast writes • Size Comparison* • C-Store (1.9Gb), Row Store (4.4Gb) • With ‘projections’: C-Store(1.9Gb), Row Store (11.9Gb)

• Speed Comparison* • 164x faster than a conventional row store • 21x faster than a conventional column store

*C-store: a column-oriented DBMS 13/12/2010

Angus Macdonald

10

Our Second Specialization

TRANSACTION PROCESSING

13/12/2010

Angus Macdonald

11

Specialization: Transaction Processing • What features are actually needed to support transaction processing? • ACID semantics, fault tolerance

• What percentage of instructions in a transaction come from each database component? Disk Buffer Pool Crash Recovery Multi-threading Locking Other

13/12/2010

Angus Macdonald

12

What’s needed? • What functionality can we afford to lose? JDBC

StoredJDBC Procedures

Query Parsing

Query Parsing Parsing Query

Locking

Locking

Latching Disk Buffer Pool

13/12/2010

Disk

Memory Disk

Traditional Database

New Database

Angus Macdonald

13

The Result • A product called VoltDB • Formerly H-Store

• 100x performance improvement over MySQL (single-node)* • 15x faster than a memcached-mysql combination*

* http://voltdb.com/content/mike-stonebraker-sql-urban-myths-webinar-recording 13/12/2010

Angus Macdonald

14

Speed and…

SCALE

13/12/2010

Angus Macdonald

15

Scale • Big web companies need to store petabytes of data • Fighting against the limits of distribution

13/12/2010

Angus Macdonald

16

The limits of distribution • All distributed systems face a fundamental problem • CAP Theorem • You can have two of these three properties

Consistency

Availability

Partition Tolerance

13/12/2010

Angus Macdonald

17

Consistency and Partition Tolerance

Consistency

Availability

Partition Tolerance

A A

13/12/2010

Angus Macdonald

18

Availability and Partition Tolerance

Consistency

Availability

Partition Tolerance

A A` A

13/12/2010

Angus Macdonald

19

Consistency and Availability

Consistency

Availability

Partition Tolerance

A A

13/12/2010

Angus Macdonald

20

Eventual Consistency • Most large-scale databases sacrifice consistency • Network partitions happen, and availability is a top priority

• VoltDB sacrifices partition tolerance • Network partitions in a cluster are rare

• Examples of eventually consistent databases: • Dynamo (Amazon) • Cassandra

13/12/2010

Angus Macdonald

21

Dynamo • Aims for Availability and Partition Tolerance • Run over many data centres

• Key-Value Interface • Partitioned and replicated using consistent hashing

• 0.00057% of requests see two versions 13/12/2010

Angus Macdonald

22

MY OWN RESEARCH

13/12/2010

Angus Macdonald

23

Speed, Scale, and… • …efficiency?

Power Usage (relative to peak)

100% 90% 80% 70% 60% 50%

Power

40%

Energy Efficiency

30% 20%

10% 0%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Utilization

13/12/2010

Angus Macdonald

24

Use Case (for the Organisation) • Massive amount of resources in the enterprise are unused • 10,000 machines at the University of St Andrews alone

• Could we make use of resources to run a database system?

13/12/2010

Angus Macdonald

25

Use Case (for the user) • You want to create a database • How would you make that database available to someone else?

• How would you ensure the database is available, when machines can will fail? • How will you scale up to handle increased load?

13/12/2010

Angus Macdonald

26

Requirements We need something which is: • Resource-aware

• Adaptive

500

CPU Utilization 220

1024

HD Space

C

A

Memory

C

A

CPU Utilization

1353 200

HD Space

B

B

3950 Memory

• Highly-available

• Autonomic Resources Latency Coherency

13/12/2010

Angus Macdonald

...

Replication

27

The Result • A research project called H2O • Sacrifices Availability • Aims for ACID semantics • Assumes partitions are rare Able to run on a constantly changing set of machines

• Currently running evaluations on its performance

13/12/2010

Angus Macdonald

28

Current Trends in Distributed Database Systems.pdf

There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Current Trends ...

1MB Sizes 2 Downloads 203 Views

Recommend Documents

Current Trends in Distributed Database Systems.pdf
Page 2 of 28. For this talk... • I will explain current trends in distributed database design. • The demands of speed and scale. • From small scale to internet scale.

CCIS 335 - Recent Trends in Computer Networks and Distributed ...
in Computer and Information Science. 335. Editorial ... of the Russian Academy of Sciences, Russia. Dominik ... The University of Sydney, School of Information Technologies ..... Simulation and Evaluation of Different Mobility Models in Ad-Hoc.

Recent Trends in Computer Networks and Distributed ...
societies including the IEEE and Computer Society of India. We would also ..... Hybrid Two-Tier Expert Engine-Based IDS for Cloud Computing. Environment .

Current Trends in Childhood Obesity Research - Springer Link
Aug 23, 2012 - Springer Science+Business Media, LLC 2012. Abstract Childhood .... 1 Distribution of childhood obesity research papers by domain and theme ...

National workshop on Current trends in IOT.pdf
Chief Patron(s). Hon.Sri. S. M. Katkar, Founder Director,ZES ... the art research, infrastructural, sports, cultural facilities,. the campus ... COURSE CONTENTS.

pdf Current Trends in Reliability, Availability ...
years of experiences in Reliability and maintenance modeling, risk assessment, and asset management, this work maximizes reader insights into the current ...

Spanner: Google's Globally-Distributed Database
Spanner is Google's scalable, multi-version, globally- distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and sup- port externally-consistent distributed transactions. This paper describes ho

Current Trends and Future Directions in Data Curation Research ...
Current Trends and Future Directions in Data Curation Research and Education.pdf. Current Trends and Future Directions in Data Curation Research and ...

Current Trends in Childhood Obesity Research - Springer Link
Aug 23, 2012 - intervention in Australia included health education combined with changes to ... tal contexts [49, 50]. Lastly, the Basic Law on Shokuiku was.