Andrew Newell (Purdue University), Gabriel Kliot

Viewer
Transcript

Andrew Newell (Purdue University), Gabriel Kliot (Google), Ishai Menache (Microsoft), Aditya Gopalan (Indian Institute of Science), Soramichi Akiyama (University of Tokyo), and Mark Silberstein (Technion)

¡  Dynamic, state changes at runtime ¡  Interactive, users demand fast responses ¡  Run at large scale

2

Interactive, users actively chatting Dynamic, state is always changing

Chatroom

Other examples

¡  ¡  ¡ 

Social networks Online gaming Internet of Things

Millions of users 3

Server Actors

User clients Bob Sue Jon

¡  ¡ 

Hi Hi Hi

Scaling to millions of users •  CPU to handle requests •  Memory to store state

State kept by actors Upon receiving message §  Update state §  Send message to actors §  Create actors

4

Server Actors Many servers

¡  ¡ 

Examples: Orleans, Erlang, Akka Eliminate cost of development at scale §  Add enough servers to handle load §  Fault tolerance and correctness

¡ 

Latency suﬀers

5

¡  Inter-‐server: messaging overhead ¡  Intra-‐server: resource allocation

¡  Scaling dynamic interactive services

6

¡  Inter-‐server messaging problem & solution ¡  Intra-‐server resource allocation problem &

solution ¡  Evaluation on Orleans

7

Server Actors

At scale, many messages cross server boundaries

8

Proﬁle of request latency

Worker 32%

Other 10%

Network 1%

Receiver 32% Sender 25% Typical workload on multiple Orleans servers

Over half of latency is due to inter-‐server message processing

Goal, reduce remote messaging with better actor placement 9

Random placement

Colocation placement First call

Load balancing

Static workload remote messaging

Remote messaging

Load balancing Dynamic workload remote messaging

Remote messaging always high on dynamic workloads 10

¡ 

Balanced graph partitioning

4 vertices

§  Vertices: actors §  Edge weights: messaging §  Partitions: servers

¡ 

Messaging graphs

2 cut edges

3 vertices

3 vertices

§  Reasonable partition exists §  Dynamically changes

¡ 

Cost constraints

§  Scales with actors and servers §  Minimize actor movements 11

1. 

Decentrally ﬁnd a good partner

Round Round 213

1.  1 swap at a time 2.  Cooldown timer

2. 

Perform swap protocol

1.  Improve balance 2.  Reduce

messaging

3. 

Repeat

Swap request

12

1.  2.  3. 

A identiﬁes and sends candidate actor set to B B selects candidate set B picks swapping subsets

Server A

Server B

19 20 aactors ctors

19 18 actors

1.  Improve balance 2.  Reduce remote

4. 

messaging

B responds with swap decision

13

¡  Inter-‐server messaging problem & solution ¡  Intra-‐server resource allocation problem &

solution ¡  Evaluation on Orleans

14

Orleans server

Orleans server Actors

Actors

Staged Event Driven Architecture (SEDA)

Receive

Worker

Individual thread pools

Staged Event Driven Architecture (SEDA)

Send

Receive Orleans default: thread per core per stage

Worker

Send

How to allocate threads and does it matter? 15

64 diﬀerent thread allocation runs, average latency collected 8

Worker threads

7 6 5 4 3 2 1

32

50 30.7 31.7 30.8 38.2 38.2 37.2 32.3

Orleans default thread allocation

50 24.9 26.9 30.7 30.7 31.4 36.6 32.4 50 25.4 24.5 23.6 25.7 25.6 25.2 28.5

3x reduction by reducing to just enough 18.1 18.6 18.6 p 18.8 threads er stage

50 19.1 18.6 20.4 20.4 23.1 23.1 23.7 50 16.8 15.8 18.7 50 11.9 12.7

10

14 14.7 15.8 15.8 15.5

50 13.2

9.9 11.4 10.4 12.1 12.1 12.9

50

50

∞ 1

50 2

3

50 4

50 5

50 6

50 7

50

Too few threads has huge repercussions

8

Sender threads 16

Stages

¡ 

Arrival rate Arrival rate Service rate Arrival rate Service rate Arrival rate Processor Service ursage ate Processor Service ursage ate Processor usage Processor usage

Existing work §  Allocate, check, repeat

¡ 

Our solution

§  Measure and directly ﬁnd

global optimum among all stages

Measurement input

Threads per stage

17

¡  Inter-‐server messaging problem & solution ¡  Intra-‐server resource allocation problem &

solution ¡  Evaluation on Orleans

18

¡  Implemented in Orleans §  Distributed balanced graph partitioning §  Dynamic thread allocation

¡  Mimic production Halo Presence workload §  Maintains stats of players in real-‐time §  Clients query for stats of all players in some game §  Both dynamic and interactive 19

Clients

In-‐game Queries for stats of a game

10 Orleans servers

Dynamically changing at runtime •  Players start playing •  Players move between games •  Players stop playing 100k players 12.5k games 20

Remote Messaging %

Actor Movements Per Minute 8000

100 90 80 70 60 50 40 30 20 10 0

Random placement, 90%

7000 6000 5000 4000 3000

Converges at 12%

2000 1000

0

17 32 44 Time (minutes)

0 0

16 32 48 Time (minutes)

<10 minutes to go from 90% -‐> 12% 21

At 90% remote messaging

At 12% remote messaging

Send

Receive

Worker

1 thread

5 threads 2 threads

Send

Receive

Worker

1 thread

6 threads 1 threads

22

Median latency (ms)

50 40 30

-‐20%

20 10 0

99th percentile latency (ms)

-‐40%

Baseline

Actor Partitioning

w/ Thread Allocation

800 600

-‐70%

400

-‐20%

200 0

Baseline

Actor Partitioning

w/ Thread Allocation 23

¡  Demonstrated latency problems and

solutions for distributed actor systems §  Actor placement §  Thread management (in Orlean’s open source)

¡  Better resource management in distributed

actor systems

§  Easy to develop at scale §  Low latency -‐> interactive services

¡  Techniques can apply beyond actor systems 24

¡  Orleans, open source

https://github.com/dotnet/orleans ¡  E-‐mails §  [email protected] §  [email protected] §  [email protected] §  [email protected] §  [email protected] §  [email protected]

25

faculty openings in - Purdue Engineering - Purdue University

$Zhang, Yitang - Purdue Math - Purdue University$

Zhang, Yitang - Purdue Math - Purdue University

faculty openings in - Purdue Engineering - Purdue University

Excellence in Research - Purdue University

Kacey Beddoes, Purdue University

Local Information, Income Segregation, and ... - Purdue University

Andrew Bowie - sikkim university library

self and social identity - Psychological Sciences - Purdue University

Carl Conway, CEC, clinical chef instructor at Purdue University ...

beamer-purdue - A Beamer template inspired by the Purdue ... - GitHub

Outreach Notice - Purdue Agriculture

Jacky Swan Sue Newell

Outreach Notice - Purdue Agriculture

Andrew T. Stephen INSEAD Donald R. Lehmann Columbia University ...