2015-06 dotScale talk (external).pdf

Viewer
Transcript

Cluster management at Google with Borg 2015-06 dotScale john wilkes / [email protected] Principal Software Engineer Derived from EuroSys'15 paper (http://goo.gl/1C4nuo)

Cluster management at Google with

the system we internally call

Borg

2015-06 dotScale john wilkes / [email protected] Principal Software Engineer Derived from EuroSys'15 paper (http://goo.gl/1C4nuo)

Image by Connie Zhou

User view job hello_world = { runtime = { cell = 'ic' } // Cell (cluster) to run in binary = '.../hello_world_webserver' // Program to run args = { port = '%port%' } // Command line parameters requirements = { // Resource requirements ram = 100M disk = 100M (optional) . cpu = 0.1 } replicas = 510000 // Number of tasks }

User view

Binary

User view What just happened?

Config file webbrowsers browsers web

borgcfg

Cell

BorgMaster BorgMaster UIshard shard BorgMaster UI BorgMaster UI shard read/UI BorgMaster UI shard shard persistent store (Paxos)

Scheduler scheduler

linkshard shard link link shard linkshard shard link

Borglet

Borglet

Borglet

Borglet

User view Hello world! Hello Hello Hello world! world! Hello Hello Hello world! world! Hello Hello Hello Hello world! world! world! Hello world! world! world! Hello Hello world! Hello Hello Hello world! Hello world! Hello world! world! Hello world! world! world! world! Hello Hello Hello Hello world! Hello world! world! world! world!

Hello Hello world! Hello Hello world! Hello Hello world! world! world! Hello world! Hello world! world! Hello Hello world! world!

Hello world!

Hello Hello world! Hello HelloHello Hello world! Hello Hello world! world! world! Helloworld! world! world! Hello world! world! Hello Hello Hello world! world! Hello world! world!

Hello world!

Image by Connie Zhou

User view

Failures

task-eviction rates and causes 9

Failures

A 2000-machine service will have >10 task exits per day This is not a problem: it's normal Images by Connie Zhou

Efficiency Advanced binpacking algorithms Experimental placement of production VM workload, July 2014

available resources

one machine

stranded resources

Efficiency Multiple applications per machine CPI^2 paper, EuroSys 2013

tasks per machine

Efficiency

# machines

shared cell (original)

Sharing clusters between prod/batch helps

shared cell (compacted)

non-prod load (compacted) prod-only load (compacted)

Segregating them would need more machines

13

Efficiency

# machines

shared cell (original)

Sharing clusters between prod/batch helps

shared cell (compacted)

non-prod load (compacted)

overhead prod-only load (compacted)

Segregating them would need more machines

14

Efficiency Sharing clusters between prod/batch helps

Waste

Segregating them would need more machines

15 production cells from a larger pool, omitting small ones (<5000 machines)

15

Efficiency

Resource reclamation

limit: amount of resource requested potentially reusable resources

reservation: estimate of future usage usage: actual resource consumption time 16

Efficiency

Resource reclamation could be more aggressive

Nov/Dec 2013 17

Efficiency

Resource reclamation could be more aggressive

Nov/Dec 2013 18

A few other moving parts Config file webbrowsers browsers web

borgcfg

Cell

UI BorgMaster UI BorgMaster UI BorgMaster UI shard BorgMaster read/UI shard BorgMaster shard shard shard persistent store (Paxos)

Scheduler scheduler

linkshard shard link link shard linkshard shard link

Borglet

Borglet

Borglet

Borglet

A few other moving parts

job config

master

agent

app

A few other moving parts system config

security

accounting/planning

storage job config

master

agent

app monitoring binaries + data distribution Diagram from an original by Cody Smith.

A few other moving parts system config

security

accounting/billing

storage job config

master agent

app monitoring

binaries + data distribution Diagram from an original by Cody Smith.

Kubernetes κυβερνήτης:

pilot or helmsman of a ship http://kubernetes.io

Kubernetes Direct Borg analogues: ● ● ● ● ●

Borg containers => Docker containers alloc (task group) => pod (container group) Borglet => Kubelet persistent, declarative specs reconciliation loops

Kubernetes New / improved: ● ● ● ●

labels + label queries service abstraction composable microservices IP per pod

Observations: 1. Resiliency is achieved only by ruthless attention to detail a. ubiquitous software fault tolerance b. persistent, declarative specs

2. We get efficiency by: a. sharing resources b. reclaiming unused allocations

3. Containers make users more productive

[email protected] http://kubernetes.io http://goo.gl/1C4nuo (Borg paper) Images by Connie Zhou

Guide-R001-ResearcherWelcomeGuide-201506.pdf