Early Experience and Evaluation of File Systems on SSD with Database Applications Yongkun WANG, Kazuo GODA, Miyuki NAKANO, Masaru KITSUREGAWA The University of Tokyo
1
Outline • • • • •
Motivation Flash SSD Basic Performance Study Performance Evaluation by TPC‐C Benchmark Conclusion and Future Work
2
Motivation • Flash SSDs are likely to be used in enterprise storage platforms for achieving high performance in data‐intensive applications • IO path management techniques should be evaluated carefully – Existing systems are designed for traditional hard disks – IO performance features of flash SSD are different from that of hard disk
• For better utilization of SSDs in DBMS – Evaluate basic performance of SSDs – Evaluate performance of IO path in conventional DBMS • With different file systems and IO schedulers
3
Flash SSD Flash SSD (Solid State Drive)
•
Performance properties of flash Memory (Samsung K9XXG08UXM) – READ (4KB) takes 25us – PROGRAM (4KB) takes 200us – ERASE (256KB) takes 1500us
•
Erase‐before‐program design can lead to poor performance in a normal in‐place‐write system
Flash SSD FTL
Controller Chip Bus
SDRAM Buffer
NAND NAND Flash Flash Flash Memory Memory Memory Chip
Flash Memory Bus
– A package of multiple flash memory chips – FTL (Flash Translation Layer) provides block device emulation
SATA Port
•
NAND NAND Flash Flash Flash Memory Memory Memory Chip
NAND NAND Flash Flash Flash Memory Memory Memory Chip
4
Outline • • • • •
Motivation Flash SSD Basic Performance Study Performance Evaluation by TPC‐C Benchmark Conclusion and Future Work
5
Purpose of Basic Performance Study • Clarify the performance between SSD and HDD • Clarify the performance difference among SSDs • Clarify the erase problem on SSDs
6
Experimental System Dell Precision™ 390 Workstation Dual‐core Intel Core 2 Duo 1.86GHz 2GB Memory SATA 3.0Gbps Controller CentOS 5.2 64‐bit Kernel 2.6.18
Hard Disk (HDD) Hitachi HDS72107, 3.5”, 7200RPM, 32M Cache, 750GB
Flash SSD Mtron PRO 7500 SLC, 3.5” 32GB
Flash SSD Intel X25‐E SLC, 2.5” 64GB
Flash SSD OCZ VERTEX EX SLC, 2.5” 120GB
Inside each device, read‐ahead pre‐fetching and write‐back caching are enabled
Micro Benchmark • One million requests for each case • Request Size: 512B to 256KB • Access patterns – Sequential Read/Write – Random Read/Write – Mixed Random (50% Read plus 50% write)
• Number of outstanding IOs – One outstanding IO: submit one IO request at a time – 30 outstanding IOs: submit 30 IO requests at a time
8
Basic Performance of Flash SSDs IO Throughput [MB/s]
~ Sequential Access ~ 300
300
300
300
250
250
250
250
200
200
200
200
150
150
150
150
100
100
100
100
50
50
50
50
0
0 512 1K 2K 4K 8K 16K 32K 64K128K256K IO Size [bytes]
HDD
0 512 1K 2K 4K 8K 16K 32K 64K128K256K IO Size [bytes]
Mtron
0 512 1K
2K 4K 8K 16K 32K 64K128K256K
512 1K 2K 4K 8K 16K 32K64K128K256K IO Size [bytes]
IO Size [bytes]
Intel
OCZ Read … 100% Write … 100%
• • • •
The read throughput of Intel’s SSD and OCZ’s SSD is much higher The write throughput of Intel’s SSD is higher Write throughput of Intel’s SSD drops quickly after the request size is larger than 32KB The performance gap between read and write throughput of OCZ’s SSD is large 9
Basic Performance of Flash SSDs IO Throughput [K IOPS]
~ Random Access (Single outstanding IO) ~ 20
20
20
20
18
18
18
18
16
16
16
16
14
14
14
14
12
12
12
12
10
10
10
10
8
8
8
8
6
6
6
6
4
4
4
4
2
2
2
2
0
0
0
512 1K 2K 4K 8K 16K 32K 64K128K256K 512 1K IO Size [bytes]
HDD
2K
4K 8K 16K 32K 64K 128K 256K 512 1K IO Size [bytes]
Mtron
0 2K
4K 8K 16K 32K 64K 128K 256K512 1K 2K IO Size [bytes]
Intel
4K 8K 16K 32K 64K128K256K IO Size [bytes]
OCZ 100% ReadRead 100% WriteWrite 50% MixRead 50% Write
• • •
The read IOPS of SSD is much higher than that of HDD. The performance of random write drops drastically on Mtron’s SSD and OCZ’s SSD. The performance of mixed‐access also drops drastically on Mtron’s SSD and OCZ’s SSD. [Bathtub effect, by Freitas on FAST2010 tutorial] 10
Basic Performance of Flash SSDs IO Throughput [K IOPS]
~ Random Access (30 outstanding IOs) ~ 60
60
60
60
50
50
50
50
40
40
40
40
30
30
30
30
20
20
20
20
10
10
10
10
0
0 512 1K 2K 4K 8K 16K 32K 64K128K256K
0 512 1K 2K 4K 8K 16K 32K 64K128K256K
IO Size [bytes]
IO Size [bytes]
HDD
Mtron
512 1K
0 2K 4K 8K 16K 32K 64K128K256K 512 1K IO Size [bytes]
Intel
2K 4K 8K 16K 32K 64K128K256K IO Size [bytes]
OCZ 100% Read Read (30 outstanding IOs) 100% Read Read (one outstanding IOs)
•
The read throughput is improved Intel’s SSD and OCZ’ SSD.
11
Basic Performance of Flash SSDs ~ Response Time Distribution of 4KB Random Access ~ 100
90
90
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
10
0
0
Cumulative Frequency [%]
100
1
100
10000
Response Time [us]
HDD
•
•
1000000
100 90 80 70 60 50 40 30 20 10 0 1
100
10000
Response Time [us]
Mtron
1000000
100 90 80 70 60 50 40 30 20 10 1
100
10000
Response Time [us]
Intel
0 1000000 1
100
10000
1000000
Response Time [us]
OCZ 100% ReadRead 100% WriteWrite MixRead 50% Write 50%
Random Read (blue line) • Most of random reads could complete in a very small range of response times on SSDs Random Write (red line) • The random write behavior is different among three SSDs 12
Outline • • • • •
Motivation Flash SSD Basic Performance Study Performance Evaluation by TPC‐C Benchmark Conclusion and Future Work
13
Purpose of Evaluation By TPC‐C • Provide evaluation on the IO behaviors of SSDs running an actual database application – Two file systems, two DBMSs and four IO schedulers
• Investigate the detailed behavior of IO path
14
Experimental System Dell Precision™ 390 Workstation Dual‐core Intel Core 2 Duo 1.86GHz 2GB Memory SATA 3.0Gbps Controller CentOS 5.2 64‐bit Kernel 2.6.18
Hard Disk (HDD) Hitachi HDS72107, 3.5”, 7200RPM, 32M Cache, 750GB
Flash SSD Mtron PRO 7500 SLC, 3.5” 32GB
Flash SSD Intel X25‐E SLC, 2.5” 64GB
Flash SSD OCZ VERTEX EX SLC, 2.5” 120GB
Inside each device, read‐ahead pre‐fetching and write‐back caching are enabled
System Configuration • •
Database Application (TPC-C Benchmark)
TPC‐C benchmark 5.10 Database settings
DBMS (MySQL, Commercial DBMS)
– MySQL: InnoDB – Commercial DBMS
•
– Ext2fs (ext2) – Nilfs2
•
OS kernel
File system options
IO scheduler – – – –
File System (ext2fs, nilfs2) Kernel Tracer
No operation (Noop) Anticipatory Deadline Completely Fair Queuing (CFQ)
IO Schedulers
Device Driver (SATA)
Disk for OS
HDD for Database
Flash SSDs for Database 16
Configuration of TPC‐C Benchmark • 30 warehouses, with 30 virtual users • “Key and Think” time was 0 • DBMS configuration for TPC‐C benchmark Data buffer size Log buffer size Data block size Data file Synchronous IO Log flushing method
Commercial DBMS MySQL(InnoDB) 8MB 4MB 5MB 2MB 4KB 16KB fixed, 5.5GB, database size is 2.7GB Yes Yes flushing log at transaction commit
17
File Systems •
data page read write
Ext2fs (ext2) – In‐place update
a b c d
– Seek then read – Seek then update
c •
b
d
Buffer
a
Disk obsolete data page
Nilfs2 – An example of log‐structured file system
– Seek then read – Random writes => sequential writes
a b c d
c
b
d
a
Buffer
a’ b’ c’ d’
Disk 18
Experimental Study • • • • •
Transaction Throughput IO Throughput Buffer Size Workload Property IO Scheduler
19
Transaction Throughput Intel’s SSD is better than HDD. Mtron’s SSD is better than HDD with LFS. OCZ’s SSD is better than HDD with ext2fs. The performance difference is caused by the combination of SSDs and file system
Transaction Throughput [tpm]
• • • •
14,000 12,000 10,000 8,000
ext2fs
6,000
nilfs2
4,000 2,000 0 HDD Mtron Intel
OCZ
Commercial DBMS
HDD Mtron Intel MySQL
OCZ 20
IO Path Investigation •
•
Logical IO is captured at the system call level, where DBMS call the service routine of OS kernel. Physical IO is captured at the device driver level, where the IO requests are sorted and merged, ready to be served by the device.
Database Application (TPC-C Benchmark) DBMS (MySQL, Commercial DBMS)
OS kernel
Logical IO
File System (ext2fs, nilfs2) Kernel Tracer
IO Schedulers
Physical IO
Device Driver (SATA)
Disk for OS
HDD for Database
Flash SSDs for Database 21
Logical IO Throughput • The transaction throughput follows the results of the logical IO throughput. Transaction Throughput
Write
nilfs2
12,000 10,000 8,000 6,000 4,000 2,000 0 HDD Mtron Intel OCZ
HDD Mtron Intel OCZ
Commercial DBMS
MySQL
120 100 80 60 40 20 0 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
Read/Write Rate by DBMS [MB/s]
Transaction Throughput [tpm]
14,000
Read
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
ext2fs
Logical IO Throughput
HDD Mtron Intel OCZ
HDD Mtron Intel OCZ 22 MySQL
Commercial DBMS
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
Write
120
100
80
60
40
20
0
HDD Mtron Intel OCZ HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL Commercial DBMS HDD Mtron Intel OCZ 23 MySQL
Read/Write Rate to device [MB/s]
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
Read/Write Rate by DBMS [MB/s]
Physical IO Throughput
Logical IO Throughput Physical IO Throughput
Read Write Read
120
100 80
60
40
20
0
Physical IO Throughput (Read) •
Large amount of reads are absorbed by the file system buffer cache.
Logical IO Throughput
Write
120 100 80 60 40 20
Read
120 100 80 60 40 20 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
0
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
0
Read/Write Rate to device [MB/s]
Read
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
Read/Write Rate by DBMS [MB/s]
Write
Physical IO Throughput
HDD Mtron Intel OCZ
HDD Mtron Intel OCZ
HDD Mtron Intel OCZ
Commercial DBMS
MySQL
Commercial DBMS
HDD Mtron Intel OCZ 24 MySQL
Physical IO Throughput (Write,ext2fs) • •
Large amount of reads are absorbed by the file system buffer cache. For ext2fs, write throughput are almost the same between logical throughput and physical throughput. ( Synchronous IO)
Logical IO Throughput
Write
120 100 80 60 40 20
Read
120 100 80 60 40 20 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
0
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
0
Read/Write Rate to device [MB/s]
Read
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
Read/Write Rate by DBMS [MB/s]
Write
Physical IO Throughput
HDD Mtron Intel OCZ
HDD Mtron Intel OCZ
HDD Mtron Intel OCZ
Commercial DBMS
MySQL
Commercial DBMS
HDD Mtron Intel OCZ 25 MySQL
Physical IO Throughput (Write, nilfs2) • • •
Large amount of reads are absorbed by the file system buffer cache. For ext2fs, write throughput are almost the same between logical throughput and physical throughput. ( Synchronous IO) LFS(nilfs2) produces additional writes at the physical IO layer, which has a serious impact on the overall transaction throughput. Logical IO Throughput
Write
120 100 80 60 40 20
Read
120 100 80 60 40 20 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
0
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
0
Read/Write Rate to device [MB/s]
Read
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
Read/Write Rate by DBMS [MB/s]
Write
Physical IO Throughput
HDD Mtron Intel OCZ
HDD Mtron Intel OCZ
HDD Mtron Intel OCZ
Commercial DBMS
MySQL
Commercial DBMS
HDD Mtron Intel OCZ 26 MySQL
Physical IO Size
Read Size (nilfs2)
Write Size (nilfs2) 180,392
60,000 50,000
186,352
Write Size (ext2fs)
181,115
Read Size (ext2fs)
107,423
The average request size of Physical IO
Average Read/Write Size [bytes]
•
40,000 30,000 20,000 10,000 0 HDD Mtron Intel
OCZ
Commercial DBMS
HDD Mtron Intel MySQL
OCZ 27
Physical IO Size (HDD, Mtron) The average request size of Physical IO The avg. write size of LFS is much larger than that of ext2fs, which is beneficial for hard disk and some SSD such as Mtron’s SSD.
HDD 300
IO Throughput [MB/s]
250 200 150 100
Read Size (ext2fs)
Write Size (ext2fs)
50
Read Size (nilfs2)
Write Size (nilfs2)
0
186,352
50,000
181,115
180,392
60,000
107,423
512
1,024
2,048
4,096
8,192
16,384
32,768
65,536
131,072 262,144
65,536
131,072
IO Size [bytes]
Mtron 300
40,000
250 IO Throughput [MB/s]
Average Read/Write Size [bytes]
• •
30,000 20,000 10,000
200 150 100 50
0 HDD Mtron Intel
OCZ
HDD Mtron Intel
OCZ
0 512
1,024
2,048
4,096
8,192
16,384
32,768
IO Size [bytes]
Commercial DBMS
MySQL
28
262,144
Physical IO Size (Intel, OCZ)
Write Size (nilfs2)
60,000 50,000
IO Throughput [MB/s]
250 200 150 100
0 512
1,024
2,048
4,096
8,192
16,384
32,768 65,536 131,072 262,144
IO Size [bytes]
186,352
Read Size (nilfs2)
Intel 300
50
181,115
Write Size (ext2fs)
107,423
Read Size (ext2fs)
180,392
OCZ 300
40,000 250 IO Throughput [MB/s]
•
The average request size of Physical IO The avg. write size of LFS is much larger than that of ext2fs, which is beneficial for hard disk and some SSD such as Mtron’s SSD. Large write size is not beneficial on Intel’s and OCZ’s SSD, as shown in the basic performance study. This helps to explain the inferior transaction throughput on nilfs2.
Average Read/Write Size [bytes]
• •
30,000 20,000 10,000
200 150 100 50
0 HDD Mtron Intel
OCZ
HDD Mtron Intel
OCZ
0 512
Commercial DBMS
MySQL
1,024
2,048
4,096
8,192
16,384
32,768
IO Size [bytes]
65,536 131,072 262,144
29
Database Buffer Size (Mtron) • The throughput is improved when increasing the buffer size
Commercial DBMS nilfs2
18,000
4,000
16,000
3,500 Transaction Throughput [tpm]
Transaction Throughput [tpm]
ext2fs
MySQL
14,000 12,000 10,000 8,000 6,000 4,000
ext2fs
nilfs2
3,000 2,500 2,000 1,500 1,000 500
2,000 0
0 8M
16M 32M 64M 128M 256M 512M 1G Buffer Size [bytes]
4M
8M 16M 32M 64M 128M 256M 512M 1G 30
Buffer Size [bytes]
Workload Property Measure with three types of workloads Speedup of nilfs2 over ext2fs is increasing when the percentage of read‐ write transactions is increased ext2fs
nilfs2
speedup
% of mix
Transaction Type
IO Property
read intensive
normal
write intensive
New Order
Read‐Write
4.35
43.48
96.00
Payment
Read‐Write
4.35
43.48
1.00
Delivery
Read‐Write
4.35
4.35
1.00
Stock Level
Read‐Only
43.48
4.35
1.00
Order Status
Read‐Only
43.48
4.35
1.00
Transaction Throughput [tpm]
25,000
12 10
20,000
8 15,000 6 10,000 4 5,000
2
0
0 read normal write intensive intensive
read normal write intensive intensive
Commercial DBMS
MySQL
Buffer Size [bytes] 31
speedup
• •
IO Schedulers • Noop – No operation
• Anticipatory – Merge the IO requests, and re‐order in an elevation manner
• Deadline – Impose the deadline for each request
• Completely Fair Queuing (CFQ) – Balance the service time of IOs among processes 32
Transaction Throughput with IO Schedulers • IO scheduling does not affect the transaction throughput largely.
Transaction Throughput [tpm]
Noop
Anticipatory
Deadline
CFQ
25000 20000 15000 10000 5000 0 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 Mtron
Intel Commercial DBMS
OCZ
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 Mtron
Intel MySQL
OCZ 33
Conclusion and Future Work • We study the basic performance characteristics of flash SSDs • We measure and analyze the application performance and the IO behavior on three flash SSDs and two file systems with TPC‐ C benchmark. – Transaction Throughput – Logical IO Throughput – Physical IO Throughput
• We plan to study IO path management techniques for database applications running on flash SSDs.
34
Q&A Thank you very much!
35