Flash-Friendly Buffer Replacement Algorithm for Improving Performance and Lifetime of NAND Flash Storages ∗ Dong Hyun Kang† , Changwoo Min†∗ , Young Ik Eom† † Sungkyunkwan University, Korea ∗ Samsung Electronics, Korea {kkangsu, multics69, yieom}@skku.edu

1

Introduction

Buffer replacement algorithms have been actively researched for decades because it receives I/O requests directly from applications and transforms the requests into desirable I/O patterns for storage devices. Traditional replacement algorithms such as LRU and CLOCK exploit the temporal locality to hide disk-seek latency and evict page which are most unlikely to be accessed in the future, minimizing the number of slow I/O operations. However, traditional algorithms are inappropriate for NAND flash storages, such as eMMCs, microSD cards, and SSDs, because performance characteristics of the NAND flash storages are quite different to those of HDDs. First, limited program/erase (P/E) cycles: P/E cycles of multi-level cell (MLC) is roughly 3K. Second, asymmetric read and write cost: write operations are much slower than read operations. Third, no in-place update: FTL takes log-structured approaches since a flash block must be erased before writing a page that belong to the block and it involves garbage collection (GC) operations to move valid pages to new blocks. Finally, performance variability in write patterns: random write patterns are significantly slower than sequential write patterns because random write patterns cause more fragmentation in FTL and thus it drops both performance and lifetime of the NAND flash storage by increasing write amplification factor (WAF). Previous flash-aware buffer replacement algorithms [2, 3, 4] can be largely classified into two: First, CFLRU [4] exploits the asymmetric read and write cost and it prefers to evict clean pages over dirty pages to reduce more expensive write operations. Second, some other schemes [2, 3] exploit the performance variability in write patterns. FAB [2] selects a block including the largest number of pages and evicts all pages that belong to the block for generating sequential write patterns. Recently proposed Sp.Clock [3] modifies Clock algorithm to maintain pages by their sector numbers rather than recency order and evicts pages in the order of their sector numbers. ∗ This

work was supported by the IT R&D program of MKE/KEIT. [10041244, SmartTV 2.0 Software Platform]

We propose a novel buffer replacement algorithm, called TS-CLOCK (Temporal and Spatial locality-aware CLOCK) to improve the performance and lifetime of NAND flash storages.

2

Our Approach: TS-CLOCK

To improve the performance and lifetime of NAND flash storages, we designed TS-CLOCK algorithm to reduce the GC cost caused by the fragmentation in FTL. We propose three techniques exploiting temporal and spatial locality: Cache hit ratio: we extends Clock algorithm, which exploits temporal locality, for high cache hit ratio. Though Sp.Clock [3] shows comparable cache hit ratio to Clock algorithm in mobile workloads, it shows low hit ratio in server workloads because Sp.Clock distorts the recency by sorting pages in order of their sector numbers (Figure 1a). In contrast, since TS-CLOCK maintains pages in recency order, it shows high cache hit ratio under various workloads. Clean-first eviction: TS-CLOCK prefers to evict clean pages over dirty pages since evicting dirty pages generates slow write operations and hurts the lifetime of NAND flash storages. CFLRU [4] also adopts clean-first policy. However, it can cause the fragmentation in FTL and it leads to performance degradation because it randomly evicts dirty pages according to LRU policy. Flash-friendly eviction: To solve the fragmentation problem, we shape evicted dirty pages to flash-friendly write patterns. FAB [2] and Sp.Clock [3] also mitigate the fragmentation. However, FAB [2] causes unnecessary operations because it performs eviction at the granularity of a block and Sp.Clock [3] suffers from high GC cost for workloads with wide I/O ranges. TS-CLOCK selects a block with the largest number of dirty pages, which are least likely to be accessed, and then sequentially evicts pages in that block, generating flash-friendly write patterns. TS-CLOCK follows the basic rule of traditional Clock algorithm with three differences. First, TS-CLOCK manages a sorted list of dirty pages for each block number. Second, TS-CLOCK maintains a reference count

3

Evaluation

For evaluation, we implemented a prototype of TSCLOCK and four buffer replacement algorithms, Clock, CFLRU [4], FAB [2], and Sp.Clock [3], on Linux. For comparison, we follow the evaluation methodology used in Sp.Clock: (1) We collect the before-cache traces, which consist of I/O requests issued to the buffer cache, by running Dbench [1] benchmark that emulates server workloads. (2) For trace-driven simulations, we use the before-cache traces as the input of our cache simulator and collect after-cache traces, which are I/O requests generated by each replacement algorithm. (3) Finally, we replay the after-cache traces on a SSD manufactured by Samsung with the O DIRECT option. Figure 1 presents the simulated cache hit ratios as well as elapsed times measured on the Samsung SSD. Although TS-CLOCK shapes evicted pages to flashfriendly write pattern, cache hit ratios in Figure 1a clearly show that it maintains higher cache hit ratio than other algorithms. On the other hands, as expected,

(a) Cache Hit Ratio

(b) Elapsed Time 8000

Clock CFLRU FAB Sp.Clock TS-CLOCK

0.8 0.6

Elapsed Time (Sec.)

Cache Hit Ratio

1

0.4 0.2

6000 4000 2000

0

0 32

64

128

256

512

32

64

Cache Size (MB)

128

256

512

Cache Size (MB)

Figure 1: Cache hit ratio and elapsed time on SSD 1500

Number of Erases

per page rather than a reference bit. If an accessed page is clean, TS-CLOCK always sets its reference count to one. Otherwise, its reference count is determined by its update likelihood. For all dirty pages in the same block, their update likelihood is calculated as the ratio of the number of dirty pages, whose reference count is zero, to the total number of pages per block (i.e., a dirty page which belongs to a block with many dirty pages is considered likely to be updated). As the update likelihood is higher, TS-CLOCK sets larger value to the reference count to keep pages with higher update likelihood in cache longer. For 25%, 50%, 75%, and 100% update likelihoods, TS-CLOCK sets a reference count to 1, 2, 3, and 4, respectively. Third, TS-CLOCK manages two hands, t-hand and s-hand, to select a victim page. When there is no free space, t-hand scans pages in a circular manner and checks the reference count of each page. If the reference count is greater than zero, it is decreased by one. Otherwise, TS-CLOCK considers the page pointed by t-hand as a victim candidate. If the victim candidate is clean, it is immediately evicted. Otherwise, s-hand scans dirty pages belonging to a block in order of sector number and selects an evicted dirty page instead of a victim candidate for flash-friendly eviction. If s-hand is not set or reaches the end of the block during the scan, s-hand is set to the first dirty page of the block that includes the victim candidate pointed by t-hand. Finally, TS-CLOCK evicts the selected page by s-hand and inserts a new page into the position of t-hand. TS-CLOCK can significantly improve performance and lifetime of NAND flash storage without any hardware support by maintaining high cache hit ratio and evicting pages in flash-friendly write patterns.

Clock CFLRU FAB Sp.Clock TS-CLOCK

1200 900 600 300 0 32

64

128

256

512

Cache Size (MB)

Figure 2: Comparison of erase count on page-level FTL FAB [2] shows significantly low hit ratio due to its block-level eviction. Figure 1b shows that TS-CLOCK outperforms other algorithms. Especially, TS-CLOCK significantly outperforms the state-of-the-art algorithm, Sp.Clock, by up to 22.7%. That is because TS-CLOCK prefers to evict clean pages to minimize write operations as well as it evicts dirty pages flash-friendly to reduce GC cost. We also implemented a FTL simulator, which supports a page-level FTL scheme, to investigate how each replacement policy affects the lifetime of NAND flash storages. Figure 2 compares erase counts of each replacement algorithm. Our experimental result clearly presents that TS-CLOCK can extend the lifetime of NAND flash storages by up to 40.8%.

References [1] The DBENCH web pages. samba.org/.

http://dbench.

[2] J O , H., K ANG , J.-U., PARK , S.-Y., K IM , J.-S., AND L EE , J. FAB: flash-aware buffer management policy for portable media players. IEEE Transactions on Consumer Electronics 52, 2 (May 2006), 485–493. [3] K IM , H., RYU , M., AND R AMACHANDRAN , U. What is a good buffer cache replacement scheme for mobile flash storage? In Proc. of SIGMETRICS’12 (2012), ACM, pp. 235–246. [4] PARK , S.-Y., J UNG , D., K ANG , J.-U., K IM , J.-S., AND L EE , J. CFLRU: A Replacement Algorithm for Flash Memory. In Proc. of CASES’06 (2006), ACM, pp. 234–241.

Flash-Friendly Buffer Replacement Algorithm for ...

∗Samsung Electronics, Korea. {kkangsu ... ations are much slower than read operations. Third, no ... Though Sp.Clock [3] shows comparable cache hit ratio.

140KB Sizes 0 Downloads 90 Views

Recommend Documents

Block Utilization-Aware Buffer Replacement Scheme for Mobile NAND ...
Sep 9, 2014 - SUMMARY. NAND flash storage devices, such as eMMCs and mi-. croSD cards, are now widely used in mobile devices. In this paper, we pro-.

\ REPLACEMENT
Mar 8, 2006 - US RE41,138 E. Page 2. US. PATENT DOCUMENTS. OTHER PUBLICATIONS. 5 265 206 A * “H993 shackelford et a1' ______ __ 719/316. Orafali et al. “The Essential Distributed Object Survival. 5,301,301 A * 4/1994 Kodoskyetal. ............ ..

\ REPLACEMENT
8 Mar 2006 - from the object B] Object-oriented programming is per formed by displaying objects connected by wirings that rep resent the ?ow of data, control or messages from one object to a second object. A coupling ofinput and output terminal so th

buffer final.pdf
hidrógeno (H+. ) por cada molécula, y una base es una sustancia que libera uno o más iones. hidroxilos (OH-. ) por cada molécula, como uno de los productos ...

Replacement
indicating their experience with similar engagements. In addition, any relevant. Continuing Education Programs should be listed. 9.1.5.1 Substitutions for the ...

Practical Recommendations for Fluid Replacement-Post Around ...
Try one of the apps below to open or edit this item. Practical Recommendations for Fluid Replacement-Post Around Firehouse.pdf. Practical Recommendations ...

Practical Recommendations for Fluid Replacement-Post Around ...
Direktur GTK Madrasah Suyitno (tengah mengenakan udeng) sedang bercengkrama dengan guru. Whoops! There was a problem loading this page. Retrying... Practical Recommendations for Fluid Replacement-Post Around Firehouse.pdf. Practical Recommendations f

Forested Buffer Strips - Davey Tree
which is the basic source of energy for the stream ecosystem .... odologies, see Guide 05 Index of Titles or call the ... contact: ODNR Public Information Center.

Buffer-Aware Power Control for Cognitive Radio ...
CSI roles in the wireless resource allocation problem, yet, in a different setting ... (CSI). This is of paramount importance to gain key insights about the sum rate maximization power control problem and the potential role of BSI in balancing the fu

A DTN Routing and Buffer Management Strategy for ... - IEEE Xplore
Dept. of Computer Science, UCLA. Los Angeles, USA. {tuanle, kalantarian, gerla}@cs.ucla.edu. Abstract—Delay Tolerant Networks (DTNs) are sparse mobile.

Reverse Total Shoulder Replacement
How long will the procedure last and will I be under anesthesia? 4. Will I have dressings, bandages, or stitches after surgery? When should they be removed? 5.

Polynomial algorithm for graphs isomorphism's
Polynomial algorithm for graphs isomorphism's i. Author: Mohamed MIMOUNI. 20 Street kadissia Oujda 60000 Morocco. Email1 : mimouni.mohamed@gmail.

CSSV: Towards a Realistic Tool for Statically Detecting All Buffer ...
Verifyer (CSSV), a tool that statically uncovers all string manipulation errors. ... Science, Israel and by the RTD project IST-1999-20527 ..... course, in contrast to the concrete semantics, the abstract ...... Lecture Notes in Computer Science, 200

frame buffer design for image sensor array
Design of an optimal memory controller must consist of system-level .... The DRAMsim specific address trace file consists of three columns – memory address ...

Improved Online Algorithms for the Sorting Buffer Problem
still capture one of the most fundamental problems in the design of storage systems, known as the disk ... ‡School of Mathematical Sciences, Tel-Aviv University, Israel. ... management, computer graphics, and even in the automotive industry.

Cheap 100% Original Replacement Cushion Ear Pads Cover For ...
Cheap 100% Original Replacement Cushion Ear Pads Co ... s Bluetooth Mic Free Shipping & Wholesale Price.pdf. Cheap 100% Original Replacement Cushion ...

Replacement of Terminology.PDF
Dui'ingrhe course discussions held on 0710512015 in the Chamberof ED/E(N) on NFIR's. pNM item No.l412015, the Official Side requested the Federation to provide details of specific. cases for taking necessary action to rectify the situation. ' ln this

Cheap Replacement Accessory for Garmin VIRB X Compact ...
Cheap Replacement Accessory for Garmin VIRB X Com ... attery for Garmin VIRB XE X GPS Sport Camere.pdf. Cheap Replacement Accessory for Garmin VIRB ...

Cheap 2 Pcs 4 Pcs Plastic Replacement Recessed Handle For ...
Cheap 2 Pcs 4 Pcs Plastic Replacement Recessed Ha ... ker 146X139Mm Free Shipping & Wholesale Price.pdf. Cheap 2 Pcs 4 Pcs Plastic Replacement ...

Rapid Repair and Replacement Techniques For ...
Construction productivity is not the only factor that affects the speed of the repair or ..... The columns can be topped with a pre-cast or cast-in-place pier cap (Hieber, ... input on the use of this technology in Texas ([email protected]).

translation lookaside buffer pdf
File: Translation lookaside buffer pdf. Download now. Click here if your download doesn't start automatically. Page 1 of 1. translation lookaside buffer pdf.

Single chip frame buffer and graphics accelerator
Nov 5, 1999 - 16.5.1—16.5.4.*. 546, 545, 559; 365/189.07, 203, 276, 230.06,. 230.08. (List continued on next page.) -. Prim/1r Examiner—Kee M. Tun. 56. R f.