Condor

(by University of Wisconsin-Madison)

High Throughput Distributed Computing System Presented by Mark Silberstein 2006

CDP 236370, Technion

1

Definitions ●









Cluster [pool] – a group of interconnected computers (resources) Batch (job) – self-contained piece of software for unattended execution Batch/queuing system – system for automatic scheduling and invocation of jobs, competing for resource [multiple resources] High Performance System – optimized for low latency of every job High Throughput System – optimized to increase utilization of resources –

Ex: printer queue CDP 236370, Technion

2

Batch system – take 1 multiple identical resources ● ●



CPU

CPU

CPU

Job Queue Invokes jobs and brings results back Job “babysitting” –

Invoke only once



Job failures

CDP 236370, Technion

3

Batch system – take 2

distributed heterogeneous resources Job Queue Invokes jobs and brings results back Job “babysitting”

● ●

CPU

CPU

CPU



Remote control Report resource characteristics (metadata) Job requirements –

“I want CPU at least than ...”

CDP 236370, Technion

4

Batch system – take 3 distributed heterogeneous resources

Multiple users

Job Queue Invokes jobs and brings results back Job “babysitting” Remote control Job requirements Resource attributes (metadata)

● ●

CPU

CPU

CPU



Security Resource sharing policies – QoS Queue – Access control

CDP 236370, Technion

5

Batch system – take 4 distributed heterogeneous resources + Multiple users

Non-dedicated resources – cycle stealing Job Queue + Access control Invokes jobs and brings results back Job “babysitting” Remote control Job requirements Resource attributes (metadata) –

update

periodic

Security Resource sharing policies – QoS

CPU

CPU

CPU ●

On-demand job eviction



Fault tolerance

Respecting resource policies CDP 236370, Technion 6 ●

Condor at glance Submission hosts ●

Basic idea – “classified advertisement” matching –

Resources publish their capabilities



Jobs publish their requirements



Matchmaker finds best match

Matchmaker

CPU

CDP 236370, Technion

CPU

Execution hosts

7

Condor architecture Submission host: schedd and shadow ●



Schedd - Job Queue –

Holds DB of all jobs submitted for execution (fault-tolerant)



Requests resources from MM



Claiming logic



Ensures only-once semantics

Submission host

Sched d

Shadow1 Shadow2

Shadow (per running job) –

Remote invocation



Input/Output staging



Job “babysitting” - failure identification



Sometimes works as I/O proxy CDP 236370, Technion

CPU

CPU Matchmaker

CPU

8

Condor architecture ●

Execution host: startd and starter Startd – resource manager –

Monitoring ● ●







Schedd

Keeps track of resource usage

Shadow Matchmaker

Periodically sends resource attributes to MM Enforces local policies

Execution gateway

startd



Security



Spawns starter



Communicates with schedd

Starter (per running job) –

Communicates with shadow (I/O)



Environment creation and cleanup



Controls job execution

CDP 236370, Technion

Execution gateway

Starter Job1

Resource monitoring

Starter Job2

9

Matchmaker Collector –

Central registry of the pool metadata ●



Collector

Condor brain



fy



Attempts to match requests with resources

Noti



Periodically pulls info from collector

Negotiator h is bl Pu



b u S

All pool entities send reports to collector

Negotiator –

e b i r sc

No tify



Notifies happy pairs Maintains fair share of resources between users CDP 236370, Technion

10

Condor description language ClassAd ●











Used to describe entities - resources, jobs, daemons, etc. Schema-less!!! Mapping of attribute names to expressions Both descriptive and functional



Simple examples:

Ex1: Simple [ CPU=200; RAM=30 ] Ex2: Reference to local [ MyCPU=200; RAM=20; Power=(RAM+MyCPU)]

Ex3: Reference to other Expressions can contain [Type=job;Exec=test.exe; Requirements=other.RAM>200 attributes from other ] classads Protocol for expression evaluation

[ Type=resource; RAM=300;]

CDP 236370, Technion

11

Matching constraints ●

Matching process is symmetric: –

Matched only if both resource and job requirement expressions are true

[Type=job; Exec=test.exe; Requirements=other.RAM>200 ] [ Type=resource; RAM=300; Requirements=(Exec==test.e xe)]

CDP 236370, Technion

12

Example of resource classad MyType = "Machine" TargetType = "Job" Name = "[email protected]" Machine = "ds-i1.cs.technion.ac.il" Rank = 0.000000 CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000) CondorVersion = "$CondorVersion: 6.4.7 Jan 26 2003 $" CondorPlatform = "$CondorPlatform: INTEL-LINUX-GLIBC22 $" VirtualMemory = 1014294 Disk = 34126016 CondorLoadAvg = 0.000000 LoadAvg = 1.000000 KeyboardIdle = 26038 Arch = "INTEL" OpSys = "LINUX" UidDomain = "cs.technion.ac.il" FileSystemDomain = "cs.technion.ac.il" Subnet = "132.68.37" HasIOProxy = TRUE

...

CpuBusyTime = 2109520 CpuIsBusy = TRUE State = "Owner" EnteredCurrentState = 1084352386 Activity = "Idle" EnteredCurrentActivity = 1084352386

Start = (Scheduler =?= "[email protected]") || (((Keybo ardIdle > 15 * 60) && (((LoadAvg - CondorLoadAvg) <= 0.300000 && ((CurrentTime EnteredCurrentState) >= 60)) || (State != "Unclaimed" && State != "Owner"))))

Requirements = START CDP 236370, Technion

13

Matchmaker in details ●

Collector stores –

All resources' classads



All schedds' classads ●

Represent only number of jobs, and their owners, but not jobs themselves



Information is always outdated



Periodically removes staled data ●

soft registration

CDP 236370, Technion

14

Idle state (periodic update and soft state) Remove stale data ( garbage collection)

Collector Schedd Classad: Number of idle jobs in queue, IP:port

Startd classad: Resource state, Resource characteristics

Schedd

Startd

Important: this diagram is valid throughout all life of schedd and startd CDP 236370, Technion

15

Negotiator ●

Periodic negotiation cycle 1)Pulls all classads (once per cycle) 2)Contacts each schedd according to priority and gets job classad

3)For each job traverses all resources' classads and attempts to match each one 4)If found 1)Chooses best match according to global and local policies 2)Notifies matched parties 3)Remove matched classad 5)If not found – tries next job from the same schedd, or next schedd CDP 236370, Technion

16

Claiming and running

Startd

Negotiator Sta e rv D ) M rtd e at s hI e ch IP, R a tc ID M (

Schedd

Activate claim: are you available? Yes Run job Alive Alive Job finished, has more? Release claim: no, thanks CDP 236370, Technion

17

Negotiation state sequence diagram Schedd

Negotiator Get next schedd with idle jobs

Choose next job to match

Get next job classad Claim ID and address of matched startd

Collector

Startd

Fetch all classads

Single negotiation cycle Perform matchmaking Assign Claim ID

I am claimed

Activate claim with received Claim ID Repeat until there are idle jobs

Validation of correct match

Send job classad Start new job Job complete CDP 236370, Technion

18

Startd resource monitoring ●



Periodic sampling of system resources –

CPU utilization, Hard Disk, Memory ...



User-defined attributes



If job is running – total running time, total load by job, ...

Published in classad and can be matched with

CDP 236370, Technion

19

Startd policies (support for cycle-stealing) ●

Resource owner can configure –

When resource is considered available ●



What to do when owner is back ●



Suspend job to RAM

How to evict job ●



Ex: only after 15 min after keyboard is idle

Job should be killed at most 5 sec after I want resource back

Pool manager has no control over these policies CDP 236370, Technion

20

Global resource sharing policies ●



How should resources be shared between users? What happens without policies: –



1000 Computers, User A starts 1000 jobs, 5 hours long, User B will have to wait ;(((

Solution – fair share –

User with higher priority can preempt another job ●



Priorities change dynamically according to the resource usage : more resources – worse priority Priouser(t)=k*Priouser(t-dt) + (1-k)*(number of used resources) , where k=0.5dt/(priority half life) CDP 236370, Technion

21

Putting policies together: Negotiation cycle revisited (Condor 6.6 series) ●

Periodic negotiation cycle

1)Pull all classads (once per cycle) and optimize for matching 2)Order all schedd requests by user priorities // higher priority – served first

New job

3)For each user While (user quota is not exceeded AND has more job requests ) do 1)Contact schedd and get next job classad 2)Traverse all resources' classads and attempt to match one by one 1)If not found – notify schedd; goto NEW JOB 2)If match is found, AssignWeights(), add to matched list 3)ChooseBestMatch() and Notify() CDP 236370, Technion

22

Putting policies together: Negotiation cycle revisited(cont) ●

Function AssignWeight() 1) Assign preemption weight:  ●





2 – if resource is idle 1 – if resource  is busy and prefers  new job over current one  (Resource Rank evaluation) 0 – if resource is busy, current user has higher priority and global  policy permits preemptions

2) Evaluate job preferences (Job Rank evaluation) ●

Function ChooseBestMatch() : lexicographic order –

Sort according to job rank, pick best one



Among all with equal best rank– sort according to preemption  CDP 236370, Technion 23 weight

Condor and MPI parallel jobs ●



Problem: MPI requires synchronous invocation and execution of multiple instances of a program. Why it is a problem: –

Negotiator matches only one job at a time



Schedd knows to invoke one job at a time



Different failure semantics: single instance failure IS A whole MPI job failure



Startd might prevent single job, but this would kill the whole MPI run CDP 236370, Technion

24

MPI Universe ●









Each Startd capable of running MPI job publishes attribute: “DedicatedScheduler=”. Each MPI sub-job has a requirement to run on a host with DedicatedScheduler defined Negotiator matches all such hosts and passes them to Schedd Dedicated Schedd is responsible for synchronous invocation and failure semantics Dedicated Schedd can preempt any job on that host CDP 236370, Technion

25

Condor in the Technion ●





Condor is deployed in DSL,SSDL and CBL ( total ~200 CPUs) Gozal: R&D projects for Condor enhancements. Among them –

High availability



Distributed management and configuration



Resource sandbox



On the web: http://dsl.cs.technion.ac.il/projects/gozal/

Superlink-online: genetic linkage analysis portal CDP 236370, Technion

26

References ●

www.condorproject.org –

Condor administration manual



Research papers



Slides from the previous year lecture

CDP 236370, Technion

27

Condor High Throughput Distributed Computing System

Environment creation and cleanup. – Controls job execution. Job1. Shadow startd ... Information is always outdated. – Periodically removes staled data.

494KB Sizes 0 Downloads 264 Views

Recommend Documents

DISTRIBUTED SYSTEM AND GRID COMPUTING .pdf
5. a) Explain the architecture of distributed file system. 7. b) What are ... 11. a) What is cloud computing? ... DISTRIBUTED SYSTEM AND GRID COMPUTING .pdf.

DISTRIBUTED SYSTEM AND GRID COMPUTING .pdf
... explain deployment models of cloud computing. 8. ************ www.parikshapapers.in. Page 2 of 2. DISTRIBUTED SYSTEM AND GRID COMPUTING .pdf.

Efficient Distributed Quantum Computing
Nov 16, 2012 - tum circuit to a distributed quantum computer in which each ... Additionally, we prove that this is the best you can do; a 1D nearest neighbour machine .... Of course there is a price to pay: the overhead depends on the topology ...

Efficient Distributed Quantum Computing
Nov 16, 2012 - 3Dept. of Computer Science & Engineering, University of Washington, .... fixed low-degree graph (see Tab. 2 ... With degree O(log N) the over-.

DISTRIBUTED SYSTEM AND GRID COMPUTING .pdf
6. a) Discuss distributed shared memory architecture. 7. b) Discuss design issues involved in distributed file system. 6. 7. a) Discuss Grid computing architecture.

High-Throughput Contention-Free Concurrent ... - Semantic Scholar
emerging wireless communication technology, numerous paral- lel turbo decoder ... ample, HSPA+ extends the 3G communication standards and can provide ...

Distributed Computing Seminar
System that permanently stores data. • Usually ... o Local hard drives managed by concrete file systems. (EXT .... First two use an operations log for recovery.

Distributed Computing Seminar
Server instantiates NFS volume on top of local file ... (Uptime of some supercomputers on the order of hours.) .... A chunkserver that is down will not get the.

Reconstructing Signaling Pathways from High Throughput Data
can also be applied to other high throughput data analysis problems. ...... Affymetrix GeneChip image processing follows a similar procedure but only to Ad-.

Combinatorial chemistry and high-throughput ...
Random screening of large proprietary collections: When do we become ... 16], but the available data regarding their activity in ... first, a large chemical collection of drug-like mo- lecules to .... from ideal for this) and from the analysis of FT

A Distributed Throughput-Optimal CSMA/CA
time, non-zero carrier sense delay and data packet collisions. ... in [4] to include data packet collisions. ... By definition, the first packet in success at time t + 1 in.

Distributed Operating System
IJRIT International Journal of Research in Information Technology, Volume 1, ... control unifies the different computers into a single integrated compute and ... resources, connections between these processes, and mappings of events ... of excellent

Distributed File System
Hadoop file. System. Clustered- based, asymmetric. , parallel, object based. Statef ul ... File System http://hadoop.apache.org/core/docs/current/hdfs_de sign.html.

Prediction Services for Distributed Computing - CiteSeerX
Users of distributed systems such as the TeraGrid and ... file transfer time, 115 percent for mean queue wait time, ..... The disadvantages are that it requires detailed knowledge of ..... ing more experience managing these deployed services. We.

Distributed Computing - Principles, Algorithms, and Systems.pdf ...
Distributed Computing - Principles, Algorithms, and Systems.pdf. Distributed Computing - Principles, Algorithms, and Systems.pdf. Open. Extract. Open with.

Programming-Distributed-Computing-Systems-A-Foundational ...
... more apps... Try one of the apps below to open or edit this item. Programming-Distributed-Computing-Systems-A-Foundational-Approach-MIT-Press.pdf.

iOverbook - Distributed Object Computing - Vanderbilt University
A 16:1 CPU overbooking ratio means that one physical CPU. (pCPU) core can be ..... demand when flash crowds occur by refining itself through learning new ...