Andrew  Newell  (Purdue  University),  Gabriel  Kliot  (Google),   Ishai  Menache  (Microsoft),  Aditya  Gopalan  (Indian  Institute  of  Science),   Soramichi  Akiyama  (University  of  Tokyo),  and  Mark  Silberstein  (Technion)  

¡  Dynamic,  state  changes  at  runtime   ¡  Interactive,  users  demand  fast  responses   ¡  Run  at  large  scale  

2  

Interactive,   users  actively   chatting   Dynamic,  state   is  always   changing  

Chatroom  

Other  examples  

¡  ¡  ¡ 

Social  networks   Online  gaming   Internet  of  Things  

Millions  of   users   3  

Server   Actors  

User  clients   Bob   Sue   Jon  

¡  ¡ 

Hi   Hi   Hi  

Scaling  to  millions  of  users   •  CPU  to  handle  requests   •  Memory  to  store  state  

State  kept  by  actors   Upon  receiving  message   §  Update  state   §  Send  message  to  actors   §  Create  actors  

4  

Server   Actors   Many  servers  

¡  ¡ 

Examples:  Orleans,  Erlang,  Akka   Eliminate  cost  of  development  at  scale   §  Add  enough  servers  to  handle  load   §  Fault  tolerance  and  correctness  

¡ 

Latency  suffers  

5  

¡  Inter-­‐server:  messaging  overhead   ¡  Intra-­‐server:  resource  allocation  

    ¡  Scaling  dynamic  interactive  services  

6  

¡  Inter-­‐server  messaging  problem  &  solution   ¡  Intra-­‐server  resource  allocation  problem  &  

solution   ¡  Evaluation  on  Orleans  

7  

Server   Actors  

At  scale,  many   messages  cross   server   boundaries  

8  

Profile  of  request  latency  

Worker   32%  

Other   10%  

Network   1%  

Receiver 32%   Sender   25%   Typical  workload  on  multiple  Orleans  servers  

Over  half  of  latency  is   due  to  inter-­‐server   message  processing  

Goal,  reduce  remote   messaging  with  better   actor  placement   9  

Random  placement  

Colocation  placement   First  call  

Load  balancing  

Static  workload  remote  messaging  

Remote  messaging  

Load  balancing   Dynamic  workload  remote  messaging  

Remote  messaging  always  high   on  dynamic  workloads   10  

¡ 

Balanced  graph  partitioning  

4  vertices  

§  Vertices:  actors   §  Edge  weights:  messaging   §  Partitions:  servers  

¡ 

Messaging  graphs  

2  cut  edges  

3  vertices  

3  vertices  

§  Reasonable  partition  exists   §  Dynamically  changes  

 

¡ 

Cost  constraints  

§  Scales  with  actors  and  servers   §  Minimize  actor  movements   11  

1. 

Decentrally  find   a  good  partner  

Round   Round  213  

1.  1  swap  at  a  time   2.  Cooldown  timer  

2. 

Perform  swap   protocol  

1.  Improve  balance   2.  Reduce  

messaging  

3. 

Repeat  

Swap  request  

12  

1.  2.  3. 

A  identifies  and   sends  candidate   actor  set  to  B   B  selects  candidate   set   B  picks  swapping   subsets  

Server  A  

Server  B  

19   20  aactors   ctors  

19   18  actors  

1.  Improve  balance   2.  Reduce  remote  

4. 

 

messaging  

B  responds  with   swap  decision  

13  

¡  Inter-­‐server  messaging  problem  &  solution   ¡  Intra-­‐server  resource  allocation  problem  &  

solution   ¡  Evaluation  on  Orleans  

14  

Orleans  server  

Orleans  server   Actors  

Actors  

Staged  Event  Driven   Architecture  (SEDA)  

Receive  

Worker  

Individual  thread  pools  

Staged  Event  Driven   Architecture  (SEDA)  

Send  

Receive   Orleans  default:   thread  per  core   per  stage  

Worker  

Send  

How  to  allocate  threads   and  does  it  matter?   15  

64  different  thread  allocation  runs,   average  latency  collected   8  

Worker  threads  

7   6   5   4   3   2   1  

32  

50   30.7   31.7   30.8   38.2   38.2   37.2   32.3  

Orleans  default   thread  allocation  

50   24.9   26.9   30.7   30.7   31.4   36.6   32.4   50   25.4   24.5   23.6   25.7   25.6   25.2   28.5  

3x  reduction  by   reducing  to  just  enough   18.1   18.6   18.6   p 18.8   threads   er  stage  

50   19.1   18.6   20.4   20.4   23.1   23.1   23.7   50   16.8   15.8   18.7   50   11.9   12.7  

10  

14   14.7   15.8   15.8   15.5  

50   13.2  

9.9   11.4   10.4   12.1   12.1   12.9  

50  

50  

∞   1  

50   2  

3  

50   4  

50   5  

50   6  

50   7  

50  

Too  few  threads  has   huge  repercussions  

8  

Sender  threads   16  

Stages  

¡ 

Arrival  rate   Arrival  rate   Service   rate   Arrival   rate   Service   rate   Arrival   rate   Processor   Service  ursage   ate   Processor   Service  ursage   ate   Processor  usage   Processor  usage  

Existing  work   §  Allocate,  check,  repeat  

¡ 

Our  solution  

§  Measure  and  directly  find  

global  optimum  among  all   stages  

Measurement  input  

Threads  per  stage  

17  

¡  Inter-­‐server  messaging  problem  &  solution   ¡  Intra-­‐server  resource  allocation  problem  &  

solution   ¡  Evaluation  on  Orleans  

18  

¡  Implemented  in  Orleans   §  Distributed  balanced  graph  partitioning   §  Dynamic  thread  allocation  

 

¡  Mimic  production  Halo  Presence  workload   §  Maintains  stats  of  players  in  real-­‐time   §  Clients  query  for  stats  of  all  players  in  some  game   §  Both  dynamic  and  interactive   19  

Clients  

In-­‐game   Queries  for   stats  of  a  game  

10  Orleans  servers  

Dynamically  changing  at  runtime   •  Players  start  playing   •  Players  move  between  games   •  Players  stop  playing   100k  players   12.5k  games   20  

Remote  Messaging  %  

Actor  Movements  Per  Minute   8000  

100   90   80   70   60   50   40   30   20   10   0  

Random  placement,  90%  

7000   6000   5000   4000   3000  

Converges  at  12%    

2000   1000  

0  

17   32   44   Time  (minutes)  

0   0  

16   32   48   Time  (minutes)  

<10  minutes  to  go  from  90%  -­‐>  12%   21  

At  90%  remote   messaging  

At  12%  remote   messaging  

Send  

Receive  

Worker  

1  thread  

5  threads   2  threads  

Send  

Receive  

Worker  

1  thread  

6  threads   1  threads  

22  

Median   latency  (ms)  

50   40   30  

-­‐20%  

20   10   0  

99th  percentile   latency  (ms)  

-­‐40%  

Baseline  

Actor   Partitioning  

w/  Thread   Allocation  

800   600  

-­‐70%  

400  

-­‐20%  

200   0  

Baseline  

Actor  Partitioning  

w/  Thread   Allocation   23  

¡  Demonstrated  latency  problems  and  

solutions  for  distributed  actor  systems   §  Actor  placement   §  Thread  management  (in  Orlean’s  open  source)  

¡  Better  resource  management  in  distributed  

actor  systems  

§  Easy  to  develop  at  scale   §  Low  latency  -­‐>  interactive  services  

 

¡  Techniques  can  apply  beyond  actor  systems   24  

¡  Orleans,  open  source  

https://github.com/dotnet/orleans   ¡  E-­‐mails   §  [email protected]   §  [email protected]   §  [email protected]   §  [email protected]   §  [email protected]   §  [email protected]  

  25  

Andrew Newell (Purdue University), Gabriel Kliot

Examples: Orleans, Erlang, Akka. ▫ Eliminate cost of development at scale. ▫ Add enough servers to handle load. ▫ Fault tolerance and correctness. ▫ Latency suffers. 5. Actors. Server. Many servers ...

3MB Sizes 1 Downloads 189 Views

Recommend Documents

faculty openings in - Purdue Engineering - Purdue University
The Division of Environmental and Ecological Engineering (EEE) at Purdue ... the impact of natural and engineered systems on the environment, and establish ... and treatment of waste streams, and the design and management of resilient.

Zhang, Yitang - Purdue Math - Purdue University
Euclidean books. Any way the conjecture might be thousand years old. If ... Chern's call and worked a whole summer for Prof Chern's pet project and ... Page 3 ...

faculty openings in - Purdue Engineering - Purdue University
EEE seeks to characterize the impact of natural and engineered systems on the environment, and establish engineered systems that function under ecological ...

Excellence in Research - Purdue University
Jan 15, 2014 - guished Professor of Electrical and Computer Engineering, received the 2013 Herbert. Newby McCoy ... level of global science policy and diplomacy,” Ejeta said. .... nanoHUB.org online science and engineering gateway and.

Excellence in Research - Purdue University
Jan 15, 2014 - conference this fall in Camden, Maine. PopTech is a global community of ..... Call for Abstracts: 2014 Purdue Conferences —. Compressors ...

Kacey Beddoes, Purdue University
Kacey Beddoes, Purdue University. Invited Talk at CELT, University of Washington. October 2, 2013. PEER REVIEW. AS A SUBJECT. OF INQUIRY. Page 2. • Grounded in STS. • Social construction of knowledge. • PR is a site of knowledge production. •

Local Information, Income Segregation, and ... - Purdue University
Aug 5, 2016 - in which migrants are selected depends on the degree of wage ... and is correlated with productivity.2 Workers may choose to relocate to a new city. ... traceable across space – information technology has made it much ..... Would a sm

Andrew Bowie - sikkim university library
retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying .... why philosophical positions become generally accepted in a wider ...... the world in which the tree is where one leaves one's signature in a.

self and social identity - Psychological Sciences - Purdue University
Cell 4: Self-Directed Threat, High Commitment: Acceptance ................. .... One source of confusion in the literature is that the term “social identification” has been .... gorization and stereotyping is to simplify or save energy, rather th

self and social identity - Psychological Sciences - Purdue University
domain provides the frame for their judgments (e.g., Doosje et al. 1998 ..... host group) they emphasized their dual identity, by decreasing identification with ...... The good, the bad, and the manly: effects of threats to one's prototypicality on .

Carl Conway, CEC, clinical chef instructor at Purdue University ...
May 5, 2015 - Carl Conway, CEC, clinical chef instructor at Purdue University. American ... West Lafayette, Indiana, was named the best culinary educator in the ... Culinary Team USA, the official representative for the United States in major ...

Carl Conway, CEC, clinical chef instructor at Purdue University ...
May 5, 2015 - "This is an amazing honor and the highlight of my culinary career,” Conway said. “It's even more special because I was able to receive this ...

beamer-purdue - A Beamer template inspired by the Purdue ... - GitHub
May 19, 2016 - A Beamer template inspired by the Purdue Visual. Identity ... x(t)e−jωt dt. (1). 4/10 ... PDF plots are nice, but nothing beats the native look of.

Outreach Notice - Purdue Agriculture
May 5, 2014 - manage the wildlife program on the Saco Ranger District in Conway, New ... form found at the bottom of this document and return it via email.

Jacky Swan Sue Newell
Interest in Knowledge Management (herein KM) has seen an exponential growth .... embedded in collective systems of meaning and action (and so it cannot be ..... file in Excel and a presentation file in Powerpoint, either of which can become ...

Outreach Notice - Purdue Agriculture
May 5, 2014 - Higher education opportunities are nearby. A campus for Granite State College ... Adult education classes are offered at the local high school.

Andrew T. Stephen INSEAD Donald R. Lehmann Columbia University ...
Nov 10, 2010 - Business, Columbia University, 3022 Broadway, Uris Hall, New York, ..... (i) movie, (ii) TV show, (iii) casual shoes, (iv) doctor, (v) computer for work, and (vi) ..... research findings from “a team of scientists at a Harvard Univer