Understanding Manycore Scalability of File Systems Changwoo Min, Sanidhya Kashyap, Steffen Maass Woonhak Kang, and Taesoo Kim

Application must parallelize I/O operations ●



Death of single core CPU scaling –

CPU clock frequency: 3 ~ 3.8 GHz



# of physical cores: up to 24 (Xeon E7 v4)

From mechanical HDD to flash SSD –

IOPS of a commodity SSD: 900K



Non-volatile memory (e.g., 3D XPoint): 1,000x ↑

But file systems become a scalability bottleneck

Problem: Lack of understanding in internal scalability behavior Exim mail server on RAMDISK 14k

messages/sec

12k

btrfs

F2FS

ext4

XFS

10k

Embarrassingly parallel application!

1. Saturated

8k 6k 4k 2k 0k

2. Collapsed 0

10 ● ●

20

30

40 #core

50

60

70

Intel 80-core machine: 8-socket, 10-core Xeon E7-8870 RAM: 512GB, 1TB SSD, 7200 RPM HDD

3

80

3. Never scale

Even in slower storage medium file system becomes a bottleneck Exim email server at 80 cores 12k

RAMDISK SSD HDD

messages/sec

10k 8k 6k 4k 2k 0k

btrfs

ext4

F2FS

4

XFS

Outline ●

Background



FxMark design –

A file system benchmark suite for manycore scalability



Analysis of five Linux file systems



Pilot solution



Related work



Summary 5

Research questions ●

What file system operations are not scalable?



Why they are not scalable?



Is it the problem of implementation or design?

6

Technical challenges ●

Applications are usually stuck with a few bottlenecks → cannot see the next level of bottlenecks before resolving them → difficult to understand overall scalability behavior



How to systematically stress file systems to understand scalability behavior 7

FxMark: evaluate & analyze manycore scalability of file systems FxMark: File systems:

tmpfs Memory FS

ext4

XFS

J/NJ

Journaling FS

Storage medium: # core:

3 applications

19 micro-benchmarks

btrfs

F2FS

CoW FS

Log FS

SSD

1, 2, 4, 10, 20, 30, 40, 50, 60, 70, 80 8

FxMark: evaluate & analyze manycore scalability of file systems FxMark: File systems:

tmpfs Memory FS

ext4

XFS

>4,700

J/NJ

Journaling FS

Storage medium: # core:

3 applications

19 micro-benchmarks

btrfs

F2FS

CoW FS

Log FS

SSD

1, 2, 4, 10, 20, 30, 40, 50, 60, 70, 80 9

Microbenchmark: unveil hidden scalability bottlenecks Data block read Low Sharing Level



Medium

R

R

File

High

R R

Block

R R

Process

Operation

R

Legend:

10

Stress different components with various sharing levels

11

Evaluation ●

Data block read

R

R

Linear scalability

250

Low:

200

File systems:

M ops/sec

Legend

btrfs ext4 ext4NJ F2FS tmpfs XFS

150

100

50

0

Storage medium: 12

0

10

20

30

40 50 #core

60

70

80

Outline ●

Background



FxMark design



Analysis of five Linux file systems –

What are scalability bottlenecks?



Pilot solution



Related work



Summary 13

Summary of results: file systems are not scalable DRBM 9

140

1.8

9

1.6

8

3.5

1.4

7

1.2

6

M ops/sec

100

5 4

100 80 60

3 50

50

40

2

20

1 0

0

10

20

30

40 50 #core

60

70

0

80

0

10

20

30

DWSL

0

80

0

10

20

30

40 50 #core

MRPL

60

70

0

80

0

10

20

30

MRPM

40 50 #core

60

70

0.6

3

0.4

2

0.2

1

0

80

0

10

20

30

MRPH

40 50 #core

60

70

0

80

0

70

8

4.5

450

7

4

400

7

60

3.5

350

3

300

4 3

2.5 2

50

1

0

0

0

0

70

80

0

10

20

30

MWCM

60

70

80

0.4

0.45

0.35

0.4

0.2 0.15

0.3 0.25 0.2

60

70

0

80

0

10

20

30

40 50 #core

60

70

0

80

0.4

0.4

90k

0.35

80k

0.25 0.2 0.15 0.1

0.05 0

0.3

messages/sec

M ops/sec

0.1

10

20

30

40 50 #core

60

70

80

0

40 50 #core

60

70

80

10

20

30

40 50 #core

60

70

80

20

30

40 50 #core

60

70

80

1.5 1 0.5

0

10

20

30

40 50 #core

60

70

0

80

0.2 0.15

0

10

20

30

40 50 #core

60

70

80

70k 60k

400

50k 40k 30k

40 50 #core

60

70

80

70

0

10

20

30

40 50 #core

8 6 4 2

0

10

20

30

40 50 #core

14

60

70

80

60

70

80

0

0

10

20

30

40 50 #core

0

0

60

70

80

80

10

20

30

40 50 #core

60

70

80

70

80

DRBM:O_DIRECT

0.45

0.45

0.4

0.4

0.35

0.35

0.3 0.25 0.2

0.3 0.25 0.2 0.15

0.1

0.1

0.05

0.05 0 0

10

20

30

40 50 #core

btrfs ext4 ext4NJ F2FS tmpfs XFS

10

70

1

0.5

Legend

12

60

1.5

0.5

0

80

14

300

0

60

40 50 #core

0.5

16

100 30

40 50 #core DBENCH

200

20

30

30

2

18

600

10

20

RocksDB 700

0

10

20

MWCL

0.15

0

10

DRBL:O_DIRECT

0.05

0

0

2.5

MWRM

0.3

500

0k

10

0.1

10k 0

0

0.25

20k

0.05 0

30

0.35

Exim

0.35

0.2

20

0

80

2

MWRL

0.3

100k

0.15

10

2

DWOM:O_DIRECT

0.25

0

2.5

0.4

0.45

0.3

80

0.1

DWOL:O_DIRECT 0.45

70

0.2

0.05 40 50 #core

60

0.5

0.1

0.05

40 50 #core

0.6

0.15

0.1

30

30

MWUM

M ops/sec

M ops/sec

0.25

20

20

0.7

0.35

0.3

10

10

M ops/sec

0.5

0

0

MWUL

0.45

0

40 50 #core

M ops/sec

60

GB/sec

40 50 #core

ops/sec

30

70

3

100

1

0

20

200

1

10

0

60

4

0.5

20 10

250 150

2

40 50 #core

5

1.5

20

0

M ops/sec

5

30

6

M ops/sec

30

M ops/sec

M ops/sec

40

M ops/sec

120

60

20

MRDM 8

80

10

MRDL 500

6

2 1.5 1

5

50

2.5

0.5

9

M ops/sec

M ops/sec

70

4

80

40

M ops/sec

60

5

140

100

M ops/sec

40 50 #core

1 0.8

3 M ops/sec

120

6

M ops/sec

M ops/sec

M ops/sec

100

DWTL 4

8

150

DWAL 10

7 150

DWOM 2

M ops/sec

200

DWOL 160

M ops/sec

200

DRBH 10

M ops/sec

250

M ops/sec

DRBL 250

60

70

80

0

10

20

30

40 50 #core

60

Summary of results: file systems are not scalable DRBM 9

140

1.8

9

1.6

8

3.5

1.4

7

1.2

6

M ops/sec

100

5 4

100 80 60

3 50

50

40

2

20

1 0

0

10

20

30

40 50 #core

60

70

0

80

0

10

20

30

DWSL

0

80

0

10

20

30

40 50 #core

MRPL

60

70

0

80

0

10

20

30

MRPM

40 50 #core

60

70

0.6

3

0.4

2

0.2

1

0

80

0

10

20

30

MRPH

40 50 #core

60

70

0

80

0

70

8

4.5

450

7

4

400

7

60

3.5

350

3

300

4 3

2.5 2

50

1

0

0

0

0

70

80

0

10

20

30

MWCM

60

70

80

0.4

0.45

0.35

0.4

0.2 0.15

0.3 0.25 0.2

60

70

0

80

0

10

20

30

40 50 #core

60

70

0

80

0.4

0.4

90k

0.35

80k

0.25 0.2 0.15 0.1

0.05 0

0.3

messages/sec

M ops/sec

0.1

10

20

30

40 50 #core

60

70

80

0

40 50 #core

60

70

80

10

20

30

40 50 #core

60

70

80

20

30

40 50 #core

60

70

80

1.5 1 0.5

0

10

20

30

40 50 #core

60

70

0

80

0.2 0.15

0

10

20

30

40 50 #core

60

70

80

70k 60k

400

50k 40k 30k

40 50 #core

60

70

80

70

0

10

20

30

40 50 #core

8 6 4 2

0

10

20

30

40 50 #core

15

60

70

80

60

70

80

0

0

10

20

30

40 50 #core

0

0

60

70

80

80

10

20

30

40 50 #core

60

70

80

70

80

DRBM:O_DIRECT

0.45

0.45

0.4

0.4

0.35

0.35

0.3 0.25 0.2

0.3 0.25 0.2 0.15

0.1

0.1

0.05

0.05 0 0

10

20

30

40 50 #core

btrfs ext4 ext4NJ F2FS tmpfs XFS

10

70

1

0.5

Legend

12

60

1.5

0.5

0

80

14

300

0

60

40 50 #core

0.5

16

100 30

40 50 #core DBENCH

200

20

30

30

2

18

600

10

20

RocksDB 700

0

10

20

MWCL

0.15

0

10

DRBL:O_DIRECT

0.05

0

0

2.5

MWRM

0.3

500

0k

10

0.1

10k 0

0

0.25

20k

0.05 0

30

0.35

Exim

0.35

0.2

20

0

80

2

MWRL

0.3

100k

0.15

10

2

DWOM:O_DIRECT

0.25

0

2.5

0.4

0.45

0.3

80

0.1

DWOL:O_DIRECT 0.45

70

0.2

0.05 40 50 #core

60

0.5

0.1

0.05

40 50 #core

0.6

0.15

0.1

30

30

MWUM

M ops/sec

M ops/sec

0.25

20

20

0.7

0.35

0.3

10

10

M ops/sec

0.5

0

0

MWUL

0.45

0

40 50 #core

M ops/sec

60

GB/sec

40 50 #core

ops/sec

30

70

3

100

1

0

20

200

1

10

0

60

4

0.5

20 10

250 150

2

40 50 #core

5

1.5

20

0

M ops/sec

5

30

6

M ops/sec

30

M ops/sec

M ops/sec

40

M ops/sec

120

60

20

MRDM 8

80

10

MRDL 500

6

2 1.5 1

5

50

2.5

0.5

9

M ops/sec

M ops/sec

70

4

80

40

M ops/sec

60

5

140

100

M ops/sec

40 50 #core

1 0.8

3 M ops/sec

120

6

M ops/sec

M ops/sec

M ops/sec

100

DWTL 4

8

150

DWAL 10

7 150

DWOM 2

M ops/sec

200

DWOL 160

M ops/sec

200

DRBH 10

M ops/sec

250

M ops/sec

DRBL 250

60

70

80

0

10

20

30

40 50 #core

60

Summary of results: file systems are not scalable DRBM 9

140

1.8

9

1.6

8

3.5

1.4

7

1.2

6

M ops/sec

100

5 4

100 80 60

3 50

50

40

2

20

1 0

0

10

20

30

40 50 #core

60

70

0

80

0

10

20

30

DWSL

0

80

0

10

20

30

40 50 #core

MRPL

60

70

0

80

0

10

20

30

MRPM

40 50 #core

60

70

0.6

3

0.4

2

0.2

1

0

80

0

10

20

30

MRPH

40 50 #core

60

70

0

80

0

70

8

4.5

450

7

4

400

7

60

3.5

350

3

300

4 3

2.5 2

50

1

0

0

0

0

70

80

0

10

20

30

MWCM

60

70

80

0.4

0.45

0.35

0.4

0.2 0.15

0.3 0.25 0.2

60

70

0

80

0

10

20

30

40 50 #core

60

70

0

80

0.4

0.4

90k

0.35

80k

0.25 0.2 0.15 0.1

0.05 0

0.3

messages/sec

M ops/sec

0.1

10

20

30

40 50 #core

60

70

80

0

40 50 #core

60

70

80

10

20

30

40 50 #core

60

70

80

20

30

40 50 #core

60

70

80

1.5 1 0.5

0

10

20

30

40 50 #core

60

70

0

80

0.2 0.15

0

10

20

30

40 50 #core

60

70

80

70k 60k

400

50k 40k 30k

40 50 #core

60

70

80

70

0

10

20

30

40 50 #core

8 6 4 2

0

10

20

30

40 50 #core

16

60

70

80

60

70

80

0

0

10

20

30

40 50 #core

0

0

60

70

80

80

10

20

30

40 50 #core

60

70

80

70

80

DRBM:O_DIRECT

0.45

0.45

0.4

0.4

0.35

0.35

0.3 0.25 0.2

0.3 0.25 0.2 0.15

0.1

0.1

0.05

0.05 0 0

10

20

30

40 50 #core

btrfs ext4 ext4NJ F2FS tmpfs XFS

10

70

1

0.5

Legend

12

60

1.5

0.5

0

80

14

300

0

60

40 50 #core

0.5

16

100 30

40 50 #core DBENCH

200

20

30

30

2

18

600

10

20

RocksDB 700

0

10

20

MWCL

0.15

0

10

DRBL:O_DIRECT

0.05

0

0

2.5

MWRM

0.3

500

0k

10

0.1

10k 0

0

0.25

20k

0.05 0

30

0.35

Exim

0.35

0.2

20

0

80

2

MWRL

0.3

100k

0.15

10

2

DWOM:O_DIRECT

0.25

0

2.5

0.4

0.45

0.3

80

0.1

DWOL:O_DIRECT 0.45

70

0.2

0.05 40 50 #core

60

0.5

0.1

0.05

40 50 #core

0.6

0.15

0.1

30

30

MWUM

M ops/sec

M ops/sec

0.25

20

20

0.7

0.35

0.3

10

10

M ops/sec

0.5

0

0

MWUL

0.45

0

40 50 #core

M ops/sec

60

GB/sec

40 50 #core

ops/sec

30

70

3

100

1

0

20

200

1

10

0

60

4

0.5

20 10

250 150

2

40 50 #core

5

1.5

20

0

M ops/sec

5

30

6

M ops/sec

30

M ops/sec

M ops/sec

40

M ops/sec

120

60

20

MRDM 8

80

10

MRDL 500

6

2 1.5 1

5

50

2.5

0.5

9

M ops/sec

M ops/sec

70

4

80

40

M ops/sec

60

5

140

100

M ops/sec

40 50 #core

1 0.8

3 M ops/sec

120

6

M ops/sec

M ops/sec

M ops/sec

100

DWTL 4

8

150

DWAL 10

7 150

DWOM 2

M ops/sec

200

DWOL 160

M ops/sec

200

DRBH 10

M ops/sec

250

M ops/sec

DRBL 250

60

70

80

0

10

20

30

40 50 #core

60

Summary of results: file systems are not scalable DRBM 9

140

1.8

9

1.6

8

3.5

1.4

7

1.2

6

M ops/sec

100

5 4

100 80 60

3 50

50

40

2

20

1 0

0

10

20

30

40 50 #core

60

70

0

80

0

10

20

30

DWSL

0

80

0

10

20

30

40 50 #core

MRPL

60

70

0

80

0

10

20

30

MRPM

40 50 #core

60

70

0.6

3

0.4

2

0.2

1

0

80

0

10

20

30

MRPH

40 50 #core

60

70

0

80

0

70

8

4.5

450

7

4

400

7

60

3.5

350

3

300

4 3

2.5 2

50

1

0

0

0

0

70

80

0

10

20

30

MWCM

60

70

80

0.4

0.45

0.35

0.4

0.2 0.15

0.3 0.25 0.2

60

70

0

80

0

10

20

30

40 50 #core

60

70

0

80

0.4

0.4

90k

0.35

80k

0.25 0.2 0.15 0.1

0.05 0

0.3

messages/sec

M ops/sec

0.1

10

20

30

40 50 #core

60

70

80

0

40 50 #core

60

70

80

10

20

30

40 50 #core

60

70

80

20

30

40 50 #core

60

70

80

1.5 1 0.5

0

10

20

30

40 50 #core

60

70

0

80

0.2 0.15

0

10

20

30

40 50 #core

60

70

80

70k 60k

400

50k 40k 30k

40 50 #core

60

70

80

70

0

10

20

30

40 50 #core

8 6 4 2

0

10

20

30

40 50 #core

17

60

70

80

60

70

80

0

0

10

20

30

40 50 #core

0

0

60

70

80

80

10

20

30

40 50 #core

60

70

80

70

80

DRBM:O_DIRECT

0.45

0.45

0.4

0.4

0.35

0.35

0.3 0.25 0.2

0.3 0.25 0.2 0.15

0.1

0.1

0.05

0.05 0 0

10

20

30

40 50 #core

btrfs ext4 ext4NJ F2FS tmpfs XFS

10

70

1

0.5

Legend

12

60

1.5

0.5

0

80

14

300

0

60

40 50 #core

0.5

16

100 30

40 50 #core DBENCH

200

20

30

30

2

18

600

10

20

RocksDB 700

0

10

20

MWCL

0.15

0

10

DRBL:O_DIRECT

0.05

0

0

2.5

MWRM

0.3

500

0k

10

0.1

10k 0

0

0.25

20k

0.05 0

30

0.35

Exim

0.35

0.2

20

0

80

2

MWRL

0.3

100k

0.15

10

2

DWOM:O_DIRECT

0.25

0

2.5

0.4

0.45

0.3

80

0.1

DWOL:O_DIRECT 0.45

70

0.2

0.05 40 50 #core

60

0.5

0.1

0.05

40 50 #core

0.6

0.15

0.1

30

30

MWUM

M ops/sec

M ops/sec

0.25

20

20

0.7

0.35

0.3

10

10

M ops/sec

0.5

0

0

MWUL

0.45

0

40 50 #core

M ops/sec

60

GB/sec

40 50 #core

ops/sec

30

70

3

100

1

0

20

200

1

10

0

60

4

0.5

20 10

250 150

2

40 50 #core

5

1.5

20

0

M ops/sec

5

30

6

M ops/sec

30

M ops/sec

M ops/sec

40

M ops/sec

120

60

20

MRDM 8

80

10

MRDL 500

6

2 1.5 1

5

50

2.5

0.5

9

M ops/sec

M ops/sec

70

4

80

40

M ops/sec

60

5

140

100

M ops/sec

40 50 #core

1 0.8

3 M ops/sec

120

6

M ops/sec

M ops/sec

M ops/sec

100

DWTL 4

8

150

DWAL 10

7 150

DWOM 2

M ops/sec

200

DWOL 160

M ops/sec

200

DRBH 10

M ops/sec

250

M ops/sec

DRBL 250

60

70

80

0

10

20

30

40 50 #core

60

Data block read DRBL 250

R

R

All file systems linearly scale

200 M ops/sec

Low:

150 100 50 0

0

10

20

30

40 50 #core

60

70

80

DRBM

R

R

XFS shows performance collapse

200 M ops/sec

Medium:

250

150 100

XFS

50 0

0

10

20

30

40 50 #core

60

70

80

DRBH

R

All file systems show performance collapse

9 8 7 M ops/sec

High:

R

10

6 5 4 3 2 1 0

0

10

20

30

40 50 #core

60

70

80

18

Page cache is maintained for efficient access of file data OS Kernel

2. look up a page cache 1. read a file block

R Page cache 5. copy page

3. cache miss 4. read a page from disk

Disk

19

Page cache hit OS Kernel

2. look up a page cache

1. read a file block

R Page cache 4. copy page

3. cache hit

Disk

20

Page cache can be evicted to secure free memory OS Kernel

Page cache

Disk

21

… only when not being accessed OS Kernel

1. read a file block

Reference counting is used to track # of accessing tasks

R Page cache 4. copy page

access_a_page(...) { atomic_inc(&page->_count); ... atomic_dec(&page->_count); }

Disk

22

Reference counting becomes a scalability bottleneck R

R

R

...

R

R

10 9 8 7

M ops/sec

R

20 CPI DRBH (cycles-per-instruction)

access_a_page(...) { atomic_inc(&page->_count); ... atomic_dec(&page->_count); }

6

100 CPI (cycles-per-instruction)

5 4 3 2 1 0

23

0

10

20

30

40 50 #core

60

70

80

Reference counting becomes a scalability bottleneck R

R

R

...

R

R

10 9 8 7

M ops/sec

R

20 CPI DRBH (cycles-per-instruction)

access_a_page(...) { atomic_inc(&page->_count); ... atomic_dec(&page->_count); }

6

100 CPI (cycles-per-instruction)

5 4 3 2 1 0

High contention on a page reference counter → Huge memory stall 0

10

20

30

40 50 #core

60

70

80

Many more: directory entry cache, XFS inode, etc 24

Lessons learned

High locality can cause performance collapse

Cache hit should be scalable → When the cache hit is dominant, the scalability of cache hit does matter.

25

Data block overwrite DWOL

W

Ext4, F2FS, and btrfs show performance collapse

140 120 M ops/sec

Low:

W

160

100 80 60 40

ext4

20 0

0

10

20

30

40 50 #core

F2FS btrfs 60

70

80

DWOM

All file systems degrade gradually

1.8 1.6 1.4 M ops/sec

Medium:

W W

2

1.2 1 0.8 0.6 0.4 0.2 0

0

10

20

30

40 50 #core

60

70

80

26

Btrfs is a copy-on-write (CoW) file system ●

Directs a write to a block to a new copy of the block → Never overwrites the block in place → Maintain multiple versions of a file system image W

Time T

Time T

Time T+1

CoW triggers disk block allocation for every write W

Time T

Time T+1

Block Allocation

Block Allocation

Disk block allocation becomes a bottleneck Ext4 → journaling, F2FS → checkpointing 28

Lessons learned Overwriting could be as expensive as appending → Critical at log-structured FS (F2FS) and CoW FS (btrfs)

Consistency guarantee mechanisms should be scalable → Scalable journaling → Scalable CoW index structure → Parallel log-structured writing 29

Data block overwrite DWOL

W

Ext4, F2FS, and btrfs show performance collapse

140 120 M ops/sec

Low:

W

160

100 80 60 40

ext4

20 0

0

10

20

30

40 50 #core

F2FS btrfs 60

70

80

DWOM

All file systems degrade gradually

1.8 1.6 1.4 M ops/sec

Medium:

W W

2

1.2 1 0.8 0.6 0.4 0.2 0

0

10

20

30

40 50 #core

60

70

80

Entire file is locked regardless of update range ●

All tested file systems hold an inode mutex for write operations –

Range-based locking is not implemented

***_file_write_iter(...) { mutex_lock(&inode->i_mutex); ... mutex_unlock(&inode->i_mutex); }

Lessons learned A file cannot be concurrently updated – Critical for VM and DBMS, which manage large files

Need to consider techniques used in parallel file systems → E.g., range-based locking

Summary of findings ●

High locality can cause performance collapse



Overwriting could be as expensive as appending



A file cannot be concurrently updated



All directory operations are sequential



Renaming is system-wide sequential



Metadata changes are not scalable



Non-scalability often means wasting CPU cycles



Scalability is not portable 33

See

er p a p our

Summary of findings Many themcan arecause unexpected and collapse counter-intuitive ● High of locality performance → Contention at file system level ● Overwriting could be as expensive as appending to maintain data dependencies ● A file cannot be concurrently updated ●

All directory operations are sequential



Renaming is system-wide sequential



Metadata changes are not scalable



Non-scalability often means wasting CPU cycles



Scalability is not portable 34

See

er p a p our

Outline ●

Background



FxMark design



Analysis of five Linux file systems



Pilot solution –

If we remove contentions in a file system, is such file system scalable?



Related work



Summary 35

RocksDB on a 60-partitioned RAMDISK scales better A single-partitioned RAMDISK tmpfs

700

700

600

600

500

500

400

400

ops/sec

ops/sec

A 60-partitioned RAMDISK

300 200

200

100

100

0

0

10

20

30

40 50 #core

60

70

0

80

2.1x

300

0

10

20

30

** Tested workload: DB_BENCH overwrite **

36

btrfs ext4 F2FS tmpfs XFS

40 50 #core

60

70

80

RocksDB on a 60-partitioned RAMDISK scales better A single-partitioned RAMDISK tmpfs

700

700

600

600

500

500

400

400

ops/sec

ops/sec

A 60-partitioned RAMDISK

300 200

200

100

100

0

0

10

20

30

40 50 #core

60

70

0

80

2.1x

300

0

10

20

30

btrfs ext4 F2FS tmpfs XFS

40 50 #core

60

70

80

Reduced contention on file systems ** Tested workload: DB_BENCH overwrite helps improving performance and**scalability 37

But partitioning makes performance worse on HDD

ops/sec

A single-partitioned HDD F2FS

A 60-partitioned HDD

500

500

400

400

300

300

200

200

100 0

btrfs ext4 F2FS XFS

2.7x

100

0

5

10 #core

15

0

20

0

5

** Tested workload: DB_BENCH overwrite **

38

10

15

20

But partitioning makes performance worse on HDD

ops/sec

A single-partitioned HDD F2FS

A 60-partitioned HDD

500

500

400

400

300

300

200

200

100 0

btrfs ext4 F2FS XFS

2.7x

100

0

5

10 #core

15

0

20

But reduced spatial locality degrades performance → Medium-specific (e.g.,**spatial locality) ** Testedcharacteristics workload: DB_BENCH overwrite should be considered 39

0

5

10

15

20

Related work ●

Scaling operating systems –



Mostly use memory file system to opt out the effect of I/O operations

Scaling file systems –



Scalable file system journaling ●

ScaleFS [MIT:MSThesis'14]



SpanFS [ATC'15]

Parallel log-structured writing on NVRAM ●

NOVA [FAST'16] 40

Summary ●





Comprehensive analysis of manycore scalability of five widely-used file systems using FxMark Manycore scalability should be of utmost importance in file system design New challenges in scalable file system design –



Minimizing contention, scalable consistency guarantee, spatial locality, etc.

FxMark is open source –

https://github.com/sslab-gatech/fxmark 41

Understanding Manycore Scalability of File Systems - Taesoo Kim

Exim mail server on RAMDISK btrfs ext4 ... Exim email server at 80 cores. RAMDISK ...... Mostly use memory file system to opt out the effect of I/O operations.

2MB Sizes 1 Downloads 128 Views

Recommend Documents

Understanding Manycore Scalability of File Systems - Taesoo Kim
From mechanical HDD to flash SSD. – IOPS of ... F2FS. XFS m e ssa g e s/se c. Exim email server at 80 cores. RAMDISK. SSD. HDD ... Data block read. Sh a rin.

Performance Bottlenecks in Manycore Systems: A ...
general Divide-and-Merge methodology, which divides the feature space into ... memory-allocation-related system calls, resulting in a 211% performance ...

Serverless Network File Systems
A dissertation submitted in partial satisfaction of the requirements for the degree of. Doctor of Philosophy in. Computer Science in the. GRADUATE DIVISION.

Systems-Understanding-Aid.pdf
any payment, but you are able to access an enormous collection of Systems ... On-line and Download Ebook Understanding Information Retrieval Systems.

Early Experience and Evaluation of File Systems on ...
results of two database applications are diverse with two file ... It may not be a good choice to rewrite database systems .... with support for RAID 0, 1, 5 and 10.

Increasing the Scalability of the Fitting of Generalised Block ... - DERI
As social network and media data becomes increasingly pop- ular, there is an growing ... Popular approaches, including community finding. [Clauset et al., 2004] ...

Increasing the Scalability of the Fitting of Generalised Block ... - DERI
In recent years, the summarisation and decompo- sition of social networks has become increasingly popular, from community finding to role equiva- lence.

Report Complementary Systems for Understanding ...
Mar 25, 2008 - Reaction times (RTs) and error rates (ERs) were recorded ... tion, we applied whole-brain family-wise error (FWE) correction for multiple.

Report Complementary Systems for Understanding Action Intentions
Mar 25, 2008 - C. Donders Centre for Cognitive Neuroimaging ..... was controlled with Presentation software (Neurobehavioral ... science, London, UK).