BTRPLACE

An extensible VM manager to face up to SLA expectations in a cloud

Fabien Hermenier [email protected]

Research Team OASIS, INRIA/I3S SLA Management in Cloud @ Compas, 01/15/2013

HOSTING PLATFORMS Operators are looking for: •manageability •security •efficient resource usage •...

SLA Management in Cloud @ Compas, 01/15/2013

HOSTING PLATFORMS Operators are looking for: •manageability •security •efficient resource usage •...

SLA Management in Cloud @ Compas, 01/15/2013

VIRTUAL APPLIANCE Clients are looking for: •performance •reliability •isolation •...

SLA Management in Cloud @ Compas, 01/15/2013

PLACEMENT CONSTRAINTS VM-host affinity (DRS 4.1)

Dedicated instances (EC2)

MaxVMsPerServer (DRS 5.1)

The constraint I needed in 2012

apr. 2011

mar. 2012

sep. 2012

?? 2013

•SLAs at the infrastructure level • a unachieved story in which users are not the heroes •current algorithms are not extensible by design SLA Management in Cloud @ Compas, 01/15/2013

A CUSTOMIZABLE PLACEMENT ALGORITHM ? Some problems : •

constraints expressed by non-expert users



numerous specific placement constraints



concurrent placement constraints

One proposition : •

an extensible library of high-level placement constraints



a composable VM placement algorithm SLA Management in Cloud @ Compas, 01/15/2013

BTRPLACE A customizable VM placement algorithm

✓configurable

✓composable SLA Management in Cloud @ Compas, 01/15/2013

CONFIGURATION SCRIPTS namespace datacenter; $servers = @N[1..12]; $racks = {@N[1..4],@N[5..8],@N[9..12]}; export $racks to *; namespace sysadmin; import datacenter; import client.*; vmBtrplace: large; fence(vmBtrplace, @N1); lonely(vmBtrplace); ban($clients, @N5);

namespace clients.app1; import datacenter; VM[1..7]: small; VM[8..10]: large; $T1 = {VM1, VM2, VM3}; $T2 = VM[4..7]; $T3 = VM[8,10]; for $t in $T[1..3] { spread($t); } among($T3,$racks); export $me to sysadmin;

•provide datacenter and appliances descriptions •human friendly definition of a viable datacenter SLA Management in Cloud @ Compas, 01/15/2013

SAMPLE RECONFIGURATION The reconfiguration plan :

Btrplace

0’00  to  0’02:  relocate(VM2,N2) 0’00  to  0’04:  relocate(VM6,N2) 0’02  to  0’05:  relocate(VM4,N1)   0’04  to  0’08:  shutdown(N4) 0’05  to  0’06:  allocate(VM1,‘cpu’,3)

spread({VM3,VM2}); preserve({VM1},’ucpu’, 3); offline(@N4);

SLA Management in Cloud @ Compas, 01/15/2013

IMPLEMENTATION •

the core-RP models the VMs placement wrt. their resource usage



placement constraints are interpreted to specialize the core-RP



an implementation based on constraint programming •

deterministic composition



high expressivity



the model is the implementation SLA Management in Cloud @ Compas, 01/15/2013

MODELING CORE-RP • actions

are modeled wrt. their impact on resources using slices

• to

place the d-slices: 2 bin-packing constraints

• to

schedule the slices: a home-made cumulatives SLA Management in Cloud @ Compas, 01/15/2013

MODELING THE PLACEMENT CONSTRAINTS Using variables of the core-RP : spread({VM1,VM2}): host) ^ allDif f erent(dhost , d 1 2 host ! dst ed ^ dhost = c c 1 2 1 2 host ! dst ed dhost = c c 2 1 2 1

SLA Management in Cloud @ Compas, 01/15/2013

THE CONSTRAINTS LIBRARY Initially :

spread, gather, among, splitAmong, ban, fence, lonely, quarantine, capacity, preserve, root, offline, oversubscription, noIdles

Pending :

overbook, sequentialVMTransitions, maxOnlineNodes singleRunningCapacity, singleResourceCapacity, onlines, cumulatedResourceCapacity, maxSpareResources, minSpareResources, ...

•multiple concerns: performance, isolation, reliability, administration, ...

•manipulate servers state, VM placement, resource allocation, action schedule

SLA Management in Cloud @ Compas, 01/15/2013

OPTIMIZING THROUGH FILTER spread({VM3,VM2,VM8}); lonely({VM7}); preserve({VM1},’ucpu’, 3); offline(@N6); ban($ALL_VMS,@N8); fence(VM[1..7],@N[1..4]); fence(VM[8..12],@N[5..8]);

• focus

only on supposed mis-placed VMs

• provide

RPs with less VMs to manage

• beware

of under estimations ! SLA Management in Cloud @ Compas, 01/15/2013

OPTIMIZING TROUGH PARTITIONING spread({VM3,VM2,VM8}); lonely({VM7}); preserve({VM1},’ucpu’, 3); offline(@N6); ban($ALL_VMS,@N8); fence(VM[1..7],@N[1..4]); fence(VM[8..12],@N[5..8]);

• constraints

may introduce independent RPs

• provide

smaller RPs, solvable in parallel

• beware

of resource fragmentation ! SLA Management in Cloud @ Compas, 01/15/2013

EVALUATION

• is

Brplace flexible in practice ?

• does •a

Btrplace makes the VMs placement reliable ?

complete approach for large problems, really ?

SLA Management in Cloud @ Compas, 01/15/2013

EXPRESSIVITY

The current library : • covers

VMWare DRS and EC2 placement constraints

• provides

new relevant placement constraints

EXTENSIBILITY

Constraints implementation : • concise: +/• «fast»

30 loc. per constraint

to implement for an experienced user

• Fit4Green

EU projects : un-experienced users of Btrplace SLA Management in Cloud @ Compas, 01/15/2013

BTRPLACE EASES SERVER MAINTENANCE 8 servers run HA 3-tiers appliances Time

Event

Reconfiguration Plan

2’10

+ban({WN8})

3 + 3 relocations in 0’42

4’30

+ban({WN4})

2 + 7 relocations in 1’02

7’05

-ban({WN4})

no reconfiguration

11’23

+ban({WN4})

no solution

11’43

-ban({WN8}) +ban({WN4})

2 relocations in 0’28

Btrplace prevented the mis-reconfigurations SLA Management in Cloud @ Compas, 01/15/2013

SCALABILITY A simulated datacenter : 5,000 servers • up to 1,700 3-tiers appliances (30,000 VMs) • a resource usage up to 73% •

2 scenario: Load Increase (LI): 10% of the applications ask for 30% more uCPU • Network Rewiring (NR): 5% of the servers are turned off for a network maintenance •

SLA Management in Cloud @ Compas, 01/15/2013

THE FILTER OPTION ● ●

240 180

LI NR LI−filter NR−filter

120

25 Time (sec)

Time (sec)

300

60 0



20

LI NR LI−filter NR−filter

15 ●

10 5

15

20 25 Virtual machines (x 1,000)

30

Solving duration

15

20 25 Virtual machines (x 1,000)

30

Reconfiguration duration

•reduces the solving duration •reduces the delay to start actions SLA Management in Cloud @ Compas, 01/15/2013

THE PLACEMENT CONSTRAINTS 20 15 10 5 0



33%

66%



100% ●





15

20 25 Virtual machines (x 1,000)

Solving duration

30

4 2 0 −2 −4

Time (sec)

Time (sec)

NR case ●



15

33%

66% ●

100% ●

20 25 Virtual machines (x 1,000)



30

Reconfiguration duration

•the core-RP resolution dominates the solving duration •no impact on the reconfiguration plans SLA Management in Cloud @ Compas, 01/15/2013

THE PLACEMENT CONSTRAINTS 10 0







−10 −20





15

33%

66%

100%

20 25 Virtual machines (x 1,000)

Solving duration

30

Time (sec)

Time (sec)

LI case 8 6 4 2 0 −2



33%

66%

100% ●



15





20 25 Virtual machines (x 1,000)

30

Reconfiguration duration

•no or negative overhead •placement constraints simplifies the core-RP resolution •except during the phase transition, no impact on the plans SLA Management in Cloud @ Compas, 01/15/2013

PARTITIONING 180

Time (sec)

150

Time (sec)

50

LI + filter NR + filter

120 90 60

Partitioning duration

40



30 20 10

30 0



0

0

1000 2000 3000 4000 Partition size (servers)

Solving duration

5000

0



4





8

















60,000 servers 360,000 VMs

12 16 20 24 28 Partitions of 2,500 servers

Partitioning duration

•reduces the solving duration •the number of slaves to solve sub-RPs limits the scalability •no impact on the quality of the reconfiguration plans •too small partitions may alter the solvability SLA Management in Cloud @ Compas, 01/15/2013

5

100.0

4

99.8

3 2 1 15

Availability (%)

Partition size (x 1k servers)

GLOBAL AVAILABILITY

20 25 30 Virtual machines (x 1000)

99.6 99.4 99.2 99.0

The operator can establish a trade-off between: •a high resource usage (big consolidation ratio) •resource fragmentation (partitions size) SLA Management in Cloud @ Compas, 01/15/2013

BTRPLACE a VM placement algorithm extensible by design • declarative configuration scripts to state the constraints • expressivity : constraints cover several concerns • scalability through partitioning • part of the open source OW2 - Entropy •

The next BtrPlace new constraints, new concerns • automatic, optimistic partitioning • violatable constraints with context-aware penalties •

SLA Management in Cloud @ Compas, 01/15/2013

ABOUT BTRPLACE Online demo : http://btrp.inria.fr/sandbox The Btrplace constraint catalog (draft):

http://www-sop.inria.fr/members/Fabien.Hermenier/btrpcc/

Publications on my webpage : http://sites.google.com/site/hermenierfabien/ SLA Management in Cloud @ Compas, 01/15/2013

SOME PUBLICATIONS The origins with Entropy Entropy: a consolidation manager for cluster. F. Hermenier, X. Lorca, J.-M. Menaud, G. Muller, J. Lawall. In VEE 2009

Toward Btrplace through use cases: Fault tolerance: Dynamic Consolidation of Highly-Available Web Applications. F. Hermenier, J. Lawall, J.-M. Menaud, G. Muller. Research Report 2011 An energy aware framework for VMs placement in cloud federated data centres. C. Dupont, G. Giuliani, F. Hermenier,T. Shulze, A. Somov, E-energy 2012

The theory behind Btrplace: Bin Repacking Scheduling in Virtualized Datacenters. F. Hermenier, S. Demassey, X. Lorca. In CP 2011

The no-longer cursed paper about Btrplace fundaments (this talk): Btrplace: A Flexible Consolidation Manager for Highly Available Applications. F. Hermenier, J. Lawall, G. Muller. To appear in IEEE TDSC 2013 SLA Management in Cloud @ Compas, 01/15/2013

Fabien Hermenier

SLA Management in Cloud @ Compas, 01/15/2013. BTRPLACE ... •performance. •reliability ... an extensible library of high-level placement constraints.

2MB Sizes 9 Downloads 126 Views

Recommend Documents

Fabien Hermenier Jean-Marc Menaud
VIRTUAL APPLIANCE. Clients are looking for: •performance. •reliability ... «fast reconfigurations», «load balancing», «low energy consumption», «low gas ...

Fabien HERMENIER, FLUX, U. Utah & OASIS, INRIA ...
DATACENTERS interconnected servers hosting distributed applications. 3 ... 4 servers, 3 apps. 5 ... resource+placement constraints come with a new service:.

Cluster-Wide Context Switch of Virtualized Jobs - Fabien Hermenier
developers only focus on the algorithm to select the jobs to run. ▷ the cluster-wide context switch takes care of the rest. ▻ detects the changes to perform.

Cluster-Wide Context Switch of Virtualized Jobs - Fabien Hermenier
Architecture. Implementation ... The implementation leverages the consolidation manager Entropy .... actions feasible in parallel are grouped into a same step.

CV Fabien Terpan 2017-02-03.pdf
38. La CJUE et les marchés publics de défense, L'encadrement des intérêts nationaux de. Page 3 of 27. CV Fabien Terpan 2017-02-03.pdf. CV Fabien Terpan ...