PLP: Page Latch‐free Shared‐everything OLTP Ippokratis Pandis† Pınar Tözün‡ Ryan Johnson Anastasia Ailamaki‡ †IBM Almaden Research Center ‡École Polytechnique Fédérale de Lausanne University of Toronto
OLTP on Modern Hardware 256
Sun UltraSPARC
HW contexts/chip
IBM POWER 64
Intel Core/Nehalem Intel Itanium Intel Pentium
16
AMD Opteron
4
1 Oct‐93 Apr‐97 Oct‐00 Apr‐04 Oct‐07 Apr‐11
Year
2
OLTP on Modern Hardware HW contexts/chip
IBM POWER 64
Intel Core/Nehalem Intel Itanium Intel Pentium
16
1200
Sun UltraSPARC
AMD Opteron
4
1 Oct‐93 Apr‐97 Oct‐00 Apr‐04 Oct‐07 Apr‐11
Year
Throughput / # HW Contexts
256
TATP – GetSubData Sun Niagara T2 Linear Scalability
1000 800 600 400 200 0 1
2 4 8 16 32 64 # HW Contexts
3
OLTP on Modern Hardware HW contexts/chip
IBM POWER 64
Intel Core/Nehalem Intel Itanium Intel Pentium
16
1200
Sun UltraSPARC
AMD Opteron
4
1 Oct‐93 Apr‐97 Oct‐00 Apr‐04 Oct‐07 Apr‐11
Year
Throughput / # HW Contexts
256
TATP – GetSubData Sun Niagara T2 Linear Scalability
1000 800 600 400 200 0 1
2 4 8 16 32 64 # HW Contexts
4
OLTP on Modern Hardware HW contexts/chip
IBM POWER 64
Intel Core/Nehalem Intel Itanium Intel Pentium
16
1200
Sun UltraSPARC
AMD Opteron
4
1 Oct‐93 Apr‐97 Oct‐00 Apr‐04 Oct‐07 Apr‐11
Year
Throughput / # HW Contexts
256
TATP – GetSubData Sun Niagara T2 Linear Scalability
1000 800 600 400 200 0 1
2 4 8 16 32 64 # HW Contexts
More HW Contexts != Higher Throughput 5
Workers
Requests
Shared‐Everything natassa
ippokratis
ryan
pınar
Logical Physical
6
Workers
Requests
Shared‐Everything
natassa
ippokratis
ryan
pınar
Logical Physical
7
Workers
Requests
Shared‐Everything
natassa
ippokratis
ryan
pınar
Logical Physical
Contention on shared data objects 8
natassa
ippokratis
ryan
pınar
Workers
Requests
Shared‐Nothing – Physically Partitioned
9
Shared‐Nothing – Physically Partitioned
Workers
Requests
Explicit contention control
natassa
ippokratis ryan
pınar
10
Shared‐Nothing – Physically Partitioned
Workers
Requests
Explicit contention control Distributed transactions
natassa
ippokratis ryan
pınar
11
Shared‐Nothing – Physically Partitioned
Workers
Requests
Explicit contention control Distributed transactions yannis Load imbalances miguel
natassa
ippokratis ryan
pınar
thomas renata danica
12
Shared‐Nothing – Physically Partitioned
Workers
Requests
Explicit contention control Distributed transactions yannis Load imbalances miguel
natassa
ippokratis ryan
pınar
thomas renata danica
13
Shared‐Nothing – Physically Partitioned
Workers
Requests
Explicit contention control Distributed transactions yannis Load imbalances miguel High repartitioning cost thomas renata natassa
ippokratis ryan
pınar
danica
14
Shared‐Nothing – Physically Partitioned
Workers
Requests
Explicit contention control Distributed transactions yannis Load imbalances miguel High repartitioning cost thomas renata natassa
ippokratis ryan
pınar
danica
Great for some workloads, not all 15
Logically Partitioned
Workers
Requests
Range
Worker
A – H natassa
ippokratis
ryan
pınar I – N O – S T – Z
Logical Physical
16
Logically Partitioned Range A – H
Requests Workers
Worker
I – N natassa
ippokratis
pınar ryan
O – S T – Z
Logical Physical
17
Logically Partitioned Range A – H
Requests Workers
Worker
I – N natassa
ippokratis
pınar ryan
O – S T – Z
Logical Physical
18
Logically Partitioned Range A – H
Requests Workers
Worker
I – N natassa
ippokratis
pınar ryan
O – S T – Z
Logical Physical
Contention at the physical layer 19
Physiological Partitioning • Extends logical partitioning at the physical layer Range
– Multi‐rooted Btree – Alternative heap page designs
Worker
A – H I – N O – S
natassa
ippokratis
pınar ryan
T – Z
Logical Physical
20
Physiological Partitioning • Extends logical partitioning at the physical layer Range
– Multi‐rooted Btree – Alternative heap page designs
Worker
A – H I – N O – S
natassa
ippokratis
pınar ryan
T – Z
Logical Physical
Contention eliminated at both logical & physical layers Fast repartitioning 21
Outline • • • • •
Introduction Types of Critical Sections Physiological Partitioning (PLP) Results Conclusion
22
Critical Sections Unscalable
Core
Core
Core
Core
Core
Locking, Latching
Core
Fixed
Core
Core
Core
Core
Core
Composable
Core
Core
Core
Core
Core
Core
Point‐to‐point communication
Core
Logging
Unscalable Fixed / Composable 23
Breakdown of the Critical Sections …and its impact on performance
60
Fixed Other unscalable Locking
40 30
\
20 10
PLP‐Leaf
PLP‐Regular
Logically Partitioned
0
Conventional
CSs per Transaction
50
Composable Latching
Probe one customer, update balance 4 socket Quad AMD
24
Breakdown of the Critical Sections …and its impact on performance
Probe one customer, update balance 4 socket Quad AMD
60
30
700 Logically Partitioned 600
40
\
20
Conventional
500 400 300 200
10
100 PLP‐Leaf
PLP‐Regular
Logically Partitioned
0
Conventional
CSs per Transaction
50
Composable Latching
Throughput (Ktps)
Fixed Other unscalable Locking
0 0
5 10 # HW Contexts
15
25
Breakdown of the Critical Sections …and its impact on performance
Probe one customer, update balance 4 socket Quad AMD
60
CSs per Transaction
50
Composable Latching
\
20
Conventional
500 400 300 200
31%
10
Logically Partitioned 600
40 30
700
Throughput (Ktps)
Fixed Other unscalable Locking
100 PLP‐Leaf
PLP‐Regular
Logically Partitioned
Conventional
0
0 0
5 10 # HW Contexts
15
Latching related CSs remain with logical‐partitioning 26
Outline • • • • •
Introduction Types of Critical Sections Physiological Partitioning (PLP) Results Conclusion
27
Physical Conflicts Range
Worker
A – M N – Z
Logical Physical
Index
Heap
28
Physical Conflicts Range
Worker
A – M N – Z
Logical Physical
Index
Heap
29
Physical Conflicts Range
Worker
A – M N – Z
Logical Physical
Index
Heap
30
Physical Conflicts Range
Worker
A – M N – Z
Logical Physical
Index
Heap
31
Physical Conflicts Range
Worker
A – M N – Z
Logical Physical
Index
Heap
32
Physical Conflicts Range
Worker
A – M N – Z
Logical Physical
Index
Heap
Conflicts on both index & heap pages 33
Physiological Partitioning (PLP) Range
Worker
• Multi‐rooted Btree
R1: A – M R2: N – Z
Index
R1
Logical R2
– Routing table is the root
Physical
Heap
34
Physiological Partitioning (PLP) Range
Worker
• Multi‐rooted Btree
R1: A – M R2: N – Z
Index
R1
Logical R2
Physical
– Routing table is the root
Both logical & physical partitioning Reduces contention on index root Parallel structure modification operations Fast index repartitioning
Heap
35
Physiological Partitioning (PLP) Range
Worker
• Multi‐rooted Btree
R1: A – M R2: N – Z
Index
R1
Logical R2
Physical
– Routing table is the root
Both logical & physical partitioning Reduces contention on index root Parallel structure modification operations Fast index repartitioning
Heap
36
Physiological Partitioning (PLP) Range
Worker
• Multi‐rooted Btree
R1: A – M R2: N – Z
Index
R1
Logical R2
Physical
– Routing table is the root
Both logical & physical partitioning Reduces contention on index root Parallel structure modification operations Fast index repartitioning
HeapNo need to latch index pages Still need to latch heap pages
37
Heap Pages : Alternatives PLP‐Partition R1
R2
38
Heap Pages : Alternatives PLP‐Partition R1
R2
Two‐step record inserts Repartitioning worst‐case: Scan entire partition
39
Heap Pages : Alternatives PLP‐Partition R1
R2
Two‐step record inserts Repartitioning worst‐case: Scan entire partition
40
Heap Pages : Alternatives PLP‐Partition R1
R2
Two‐step record inserts Repartitioning worst‐case: Scan entire partition
41
Heap Pages : Alternatives PLP‐Partition R1
R2
Two‐step record inserts Repartitioning worst‐case: Scan entire partition
PLP‐Leaf R1
R2
Two‐step record inserts Fragmentation Repartitioning worst‐case: Scan few pages 42
Heap Pages : Alternatives PLP‐Partition R1
R2
Two‐step record inserts Repartitioning worst‐case: Scan entire partition
PLP‐Leaf R1
R2
Two‐step record inserts Fragmentation Repartitioning worst‐case: Scan few pages 43
Heap Pages : Alternatives PLP‐Partition R1
PLP‐Leaf
R2
R1
Two‐step record inserts Repartitioning worst‐case: Scan entire partition
R2
Two‐step record inserts Fragmentation Repartitioning worst‐case: Scan few pages
Latch free OLTP
44
Outline • • • • •
Introduction Types of Critical Sections Physiological Partitioning (PLP) Results Conclusion
45
Setup • All prototypes built on top of Shore‐MT – State‐of‐the‐art open‐source DBMS
• Machines used – Sun Niagara T2, 64 HW ctxs • In order, 1.4GHz, 64GB RAM
– 4 socket quad‐core AMD Opteron, 16 HW ctxs • OoO, 2.4GHz, 64GB RAM
• #Partitions = #HW contexts available
46
Breakdown of the Critical Sections Probe one customer, update balance
60
Composable
Other unscalable
Latching
Locking 40
30
20
10
PLP‐Leaf
PLP‐Regular
Logically Partitioned
0
Conventional
CSs per Transaction
50
Fixed
47
Breakdown of the Critical Sections Probe one customer, update balance
60
Composable
Other unscalable
Latching
Locking 40
30
20
10
PLP‐Leaf
PLP‐Regular
Logically Partitioned
0
Conventional
CSs per Transaction
50
Fixed
48
Breakdown of the Critical Sections Probe one customer, update balance
60
Composable
Other unscalable
Latching
Locking 40
30
20
10
PLP‐Leaf
PLP‐Regular
Logically Partitioned
0
Conventional
CSs per Transaction
50
Fixed
49
Breakdown of the Critical Sections Probe one customer, update balance
60
CSs per Transaction
50
Fixed
Composable
Other unscalable
Latching
Locking 40
30
20
10
PLP‐Leaf
PLP‐Regular
Logically Partitioned
Conventional
0
PLP eliminates majority of the unscalable CSs 50
Performance on Multicores Sun Niagara T2 64 HW ctxs, in order, 1.4GHz 400
700
PLP Logically Partitioned Conventional
300 250 200 150 100
500 400 300 200
50
100
0
0
0
16
32
# HW Contexts
48
64
4 socket Quad AMD 16 HW ctxs, OoO, 2.8GHz PLP Logically Partitioned Conventional
600
Throughput (Ktps)
350
Throughput (Ktps)
TATP – GetSubData
0
5
10
# HW Contexts
15
Benefits increase with faster hardware 51
16 32 48
PLP‐Leaf
PLP‐Regular
Logically Part.
Conventional
Heap Latch Contention
PLP‐Leaf
350
PLP‐Regular
Logically Part.
Conventional
PLP‐Leaf
PLP‐Regular
Logically Part.
Conventional
PLP‐Leaf
250
PLP‐Regular
300
Logically Part.
Conventional
Time breakdown (per xct)
Contention for heap pages Not‐tuned (unpadded) TPC‐B Sun Niagara T2
Other
200
150
100
50
0
64
# HW Contexts
Avoids heap false sharing problems 52
Repartitioning Cost Scenario: Splitting a partition to two (466MB) 1 primary and 1 secondary index 8KB pages, 100B records, 32B keys, 3‐level B+tree
Heap Primary & Secondary Index Records Moved (Updates, Inserts, Deletes) Shared‐nothing PLP‐Partition PLP‐Leaf
53
Repartitioning Cost Scenario: Splitting a partition to two (466MB) 1 primary and 1 secondary index 8KB pages, 100B records, 32B keys, 3‐level B+tree
Heap Primary & Secondary Index Records Moved (Updates, Inserts, Deletes) Shared‐nothing
233MB
2.4 million inserts 2.4 million deletes
PLP‐Partition PLP‐Leaf
54
Repartitioning Cost Scenario: Splitting a partition to two (466MB) 1 primary and 1 secondary index 8KB pages, 100B records, 32B keys, 3‐level B+tree
Heap Primary & Secondary Index Records Moved (Updates, Inserts, Deletes) Shared‐nothing
233MB
2.4 million inserts 2.4 million deletes
PLP‐Partition
233MB
2.4 million updates
PLP‐Leaf
55
Repartitioning Cost Scenario: Splitting a partition to two (466MB) 1 primary and 1 secondary index 8KB pages, 100B records, 32B keys, 3‐level B+tree
Heap Primary & Secondary Index Records Moved (Updates, Inserts, Deletes) Shared‐nothing
233MB
2.4 million inserts 2.4 million deletes
PLP‐Partition
233MB
2.4 million updates
PLP‐Leaf
56
Repartitioning Cost Scenario: Splitting a partition to two (466MB) 1 primary and 1 secondary index 8KB pages, 100B records, 32B keys, 3‐level B+tree
Heap Primary & Secondary Index Records Moved (Updates, Inserts, Deletes) Shared‐nothing
233MB
2.4 million inserts 2.4 million deletes
PLP‐Partition
233MB
2.4 million updates
PLP‐Leaf
8.3KB
85 updates
57
Repartitioning Cost Scenario: Splitting a partition to two (466MB) 1 primary and 1 secondary index 8KB pages, 100B records, 32B keys, 3‐level B+tree
Heap Primary & Secondary Index Records Moved (Updates, Inserts, Deletes) Shared‐nothing
233MB
2.4 million inserts 2.4 million deletes
PLP‐Partition
233MB
2.4 million updates
PLP‐Leaf
8.3KB
85 updates
PLP‐Leaf: low repartitioning cost + latch‐free 58
Conclusion • Multicores expose the bottlenecks of DBMSs • Understanding the critical sections is crucial – Identify the harmful ones and eliminate them
• Physiological partitioning – – – –
Apply the right partitioning at both logical & physical layers Thread local locks & latch free data accesses Eliminate majority of unscalable critical sections Benefits of shared‐nothing with easy repartitioning
Thank you!
59