Comparing SSD-placement strategies to scale a Database-in-the-Cloud ∗

Yingyi Bu

Hongrae Lee Google Inc.

Google Inc.

[email protected]

[email protected]

[email protected]

SSD strategy

ABSTRACT Flash memory solid state drives (SSDs) have increasingly been advocated and adopted as a means of speeding up and scaling up data-driven applications. However, given the layered software architecture of cloud-based services, there are a number of options available for placing SSDs. In this work, we studied the trade-offs involved in different SSD placement strategies, their impact of response time and throughput, and ultimately the potential in achieving scalability in Google Fusion Tables (GFT), a cloud-based service for data management and visualization [1].

1.

KS1 KS2 QS3 QS4 QS5

Improves loading throughput Yes Yes No No No

Speedsup data loading limited limited No No No

Scalesup to larger datasets No No Yes Yes Yes

Offers cache reliability

Cache response time

N/A N/A No Yes Yes

N/A N/A low high medium

Table 1: A comparison of SSD placement strategies • KS1. Bigtable on SSD: The log and data files of Bigtable are stored on SSD-powered DFS. • KS2. Bigtable cache on SSD-powered local file system (LFS): The internal cache of a Bigtable is on locally attached SSDs. • QS3. QES column store on SSD-powered LFS: Column arrays in QES are placed on local SSDs. • QS4. QES column store on SSD-powered DFS: Similar to QS3, except the column arrays are placed in SSDpowered DFS. • QS5. QES column store on SSD-backed Bigtable: The table content and column indices for the QES are loaded into a Bigtable backed by SSD-powered DFS, such that column store accesses correspond to Bigtable reads/writes.

GFT ARCHITECTURE

The GFT system is built on top of cloud storage layers such as Colossus (a distributed file system – DFS) and Bigtable (a key-value store) that provide persistent storage and transparent replication. Our frontend servers have very stringent requirements on the respond time, e.g., 100 milliseconds, to support interactive visualizations. To meet the tight latency bound, GFT has an in-memory columnoriented query execution servers (QESs). Datasets are entirely loaded and indexed on demand into the QES column store. This architecture, though simpler, limits our ability to scale to (a) large individual datasets, and to (b) large numbers of simultaneously active datasets. Our goal is to use SSDs as a means to address both these challenges.

2.

Jayant Madhavan

University of California, Irvine

We identified experiments that isolated the benefits (and downsides) of each configuration. Our findings are summarized in Table 1. KS1 and KS2 enable faster loading of datasets into the QES, but no scaling. QS4 and QS5 offer opportunities for sharing column arrays between different QESs, but incur higher response time as compared to QS3.

SSD PLACEMENT STRATEGIES We explored the following placement strategies:

∗Work done at Google Inc.

3.

CONCLUSION

We found that to meet our latency needs in GFT, the QS3 strategy was the most suitable. We further explored changes that were needed to our column store to better realize the potential of the locally placed SSDs. Our observations and guidelines, though made in the contents of GFT, are largely applicable many cloud-based data management services in general.

c Permission to make digital or hard copies of part Copyright or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). SoCC’13, 1–3 Oct. 2013, Santa Clara, California, USA. ACM 978-1-4503-2428-1. http://dx.doi.org/10.1145/2523616.2525949

1

4.

REFERENCES

[1] H. Gonzalez, A. Y. Halevy, C. S. Jensen, A. Langen, J. Madhavan, R. Shapley, and W. Shen. Google Fusion Tables: data management, integration and collaboration in the cloud. In SoCC, pages 175–180, 2010.

2

Comparing SSD-placement strategies to scale a ...

Comparing SSD-placement strategies to scale a. Database-in-the-Cloud. Yingyi Bu. ∗. University of California, Irvine ... scaling up data-driven applications. However, given the lay- ered software architecture of ... ers such as Colossus (a distributed file system – DFS) and. Bigtable (a key-value store) that provide persistent ...

71KB Sizes 0 Downloads 110 Views

Recommend Documents

Comparing Gigapixel-scale Images Randy Sargent ...
Randy Sargent, Chris Bartley, Paul Dille, Mary Jo Knelly, Ron Schott, Illah Nourbakhsh. Abstract: Ever wished to be two places at once? Gigapixel-scale images capture environments for later exploration. We present technology and approaches to place t

Comparing Consensus Monte Carlo Strategies ... - Research at Google
Dec 8, 2016 - Data centers are extremely large, shared, clusters of computers which can contain many ... A standard laptop with 8GB of memory can hold 1 billion double- ..... 10. 15 density. 0.2. 0.4. 0.6. 0.8 exact means. SCMC. MxCMC.

A Randomized Pilot Trial Comparing ...
failure in the long term.3-5 Conversely, patients who respond to therapy .... by telephone to 1 of the 2 treatment regimens in a centralized randomized order, with ...

Comparing Building Energy Performance Measurement A framework ...
building, making use of fewer energy resource inputs per productive output, minimizes .... 2 All 27 Member States are required to certify buildings as a result of the EPBD, and four non-members .... Non-renewable energy sources converted ...

Comparing Building Energy Performance Measurement A framework ...
Environment Rating System (NABERS), the National House Energy Rating. Scheme (NatHERS) and ... CBECS. Commercial Building Energy Consumption Survey [U.S.]. DEC .... EPBD)2 but we have chosen the three largest EU economies – Germany,. France, UK ...

Comparing Categorization Models| A psychological experiment
May 14, 1993 - examples of the concept, and X1 { X6 are used to test subjects' .... that can be used to test human subjects and computer subjects in the same ...

Comparing Building Energy Performance Measurement A framework ...
building performance and exchange best practices. This paper proposes a ... assessment systems, we provide context for this analysis. To follow, we ... Consumption Reporting software to monitor energy performance in public buildings ...

Comparing Categorization Models| A psychological experiment
May 14, 1993 - Center for Research on Concepts and Cognition ... with the \prototype" of the concept, that is, the statistical description ... Let's call them.

Comparing Categorization Models| A psychological experiment
May 14, 1993 - Center for Research on Concepts and Cognition ... Each time a concept is used by the system, only some of its relations are ... Let's call them.