Multi-Toroidal Interconnects For Tightly Coupled Supercomputers Supplementary Material Appendix I Detailed Failure Model Yariv Aridor, Tamar Domany, Oleg Goldshmidt, Member, IEEE, Yevgeny Kliteynik, Edi Shmueli, and Jos´e E. Moreira, Member, IEEE

W

A PPENDIX I D ETAILED FAILURE M ODEL

E model failures at the coarse granularity level of system management (cf. Section III-A): if any component of an allocation unit (e.g., a CPU, a memory chip, a network chip, etc.) fails we treat it as a failure of the allocation unit as a whole. This is also consistent with the system management approach of Blue Gene/L [1]. We assume that all failures are independent of each other, so that it is enough to model failures of a single component. For each component (an inter-switch link or an allocation unit) we model failures as a Poisson stochastic process: the probability that a failure will occur by the time t since the component became operational is given by P (t) = 1 − exp(−µt),

(1)

where µ is the failure rate per unit time. The probability density function of failures is therefore given by dP (t) p(t) = = µ exp(−µt), (2) dt and hence the mean uptime of the component becomes Z ∞ 1 U= (3) tp(t)dt = . µ 0 We also assume that once a failure occurs it will take a constant downtime D to repair the component. This assumption is based on the observation that large scalable systems are built from modules that can be easily substituted, so no complicated Manuscript received March 20, 2006; revised November 29, 2006. Y. Aridor, T. Domany, O. Goldshmidt, Y. Kliteinik, and E. Shmueli are with the IBM Haifa Research Lab. J. Moreira is with the IBM Watson Research Center.

repairs are needed: in case of a failure the defective module will be simply swapped for a new one in constant time. The main reliability parameter of a component, its mean time between failures, M T BF , is just the sum of mean uptime and mean downtime, M T BF = U + D.

(4)

We assume that all inter-switch links and all allocation units are identical, and we only need to specify mean uptimes UL and UAU and mean downtimes DL and DAU for links and allocation units separately to fully specify the failure model. Our simulator generates a random time to failure, T T F , from the probability distribution (1), where µ is related to the mean uptime U via (3), for every link and every allocation unit after each failure and recovery. Thus, failures and recoveries become additional scheduling events, just like, e.g., job terminations. In fact, failed allocation units and links will be a part of the specification of the machine state before each invocation of the partition allocation algorithm. The short experience with operational Blue Gene/L systems and the very preliminary accumulated failure statistics suggest link uptimes of the order of a few years. This led us to experimenting with µL = 0.004, µL = 0.002, and µL = 0.001 failures/week, corresponding to average link uptimes UL of roughly 5, 10, and 20 years, respectively. Considering that a 8 × 8 × 8 machine with the line topology of Fig. 5(b) has 15 × 64 × 3 = 2880 links we can compute the link weekly failure rate for the whole system (Table I). Comparing these values to the estimated failure rates for the other components of Blue Gene/L [2] we see that link failures are dominant for uptimes less than 15 years. This is reasonable, since a machine where nodes and switches are unstable is not viable, while there is some level of redundancy for links.

TABLE I L INK FAILURE RATES IN A 8 × 8 × 8 SYSTEM FOR DIFFERENT VALUES OF LINK UPTIMES . T HE LAST ROW PRESENTS THE FAILURE RATE FOR ALL THE OTHER COMPONENTS SCALED FROM THE ESTIMATES FOR

B LUE G ENE /L ( FROM [2]). uptime per link (years)

failure rate per link per week

links in a 8×8×8 system

failure rate per system per week

5

0.004

2880

11.52

10

0.002

2880

5.76

20

0.001

2880

other components (scaled)

2.88 3.57

For simplicity, we assume that links are much less reliable than nodes and switches, and ignore the possibility of other components’ failures, i.e., we assume UAU = ∞ and DAU = 0. This will serve our goal of investigating how the obviously higher link redundancy in the multi-toroidal architecture gives it an advantage compared to a 3D torus. We also assume that once a failure occurs it will take a constant downtime D to repair the component. This assumption is based on the observation that large scalable systems are built from modules that can be easily substituted, so no complicated repairs are needed: in case of a failure the defective module will be simply swapped for a new one in constant time. We assume that spare hardware is not readily available, and that replacing a faulty custom communication link takes approximately a week, including delivery to the customer site, etc. Thus, in our simulations we adopt 1 week as the value of the average downtime DL . Note that one needs to be careful when simulating different loads, because changing the offered load is usually done by changing the ratio between the time scales of the jobs’ interarrival times and runtimes. With failures in the picture we have a third time scale — that of failures. In our simulations we scaled it similarly to the inter-arrival times to change the offered load. Since the absolute failure rate (µL ) should remain the same, this means that effectively we changed the offered load by scaling the jobs’ runtimes. In other words, two workloads that differed only by offered loads consisted of jobs of the same sizes and shapes arriving at the same times, but in one of them the jobs were longer than in the other by a constant factor. Also note that failures may occur in a “busy” partition, i.e., a partition in which a job is currently running. Therefore, we must also model the actions that the system will perform in such a case. There are several options: • the running job will be pushed to the head of the waiting queue and restarted from the beginning according to the normal scheduling rules; • the running job will be pushed to the tail of the waiting queue and restarted from the beginning according to the normal scheduling rules; • the running job is checkpointed at regular intervals while it runs, and on failure it will be pushed to the head of the waiting queue and restarted from the last checkpoint

before the failure according to the normal scheduling rules. In our simulations we choose the third option: we assume that every job is checkpointed frequently (with a period much shorter than the job’s runtime and much shorter that the mean link uptime and downtime), and if a failure occurs the job is inserted into the head of the queue and is restarted as soon as possible, essentially from the point of failure. R EFERENCES [1] G. Almasi et al., “System Management in the Blue Gene/L Supercomputer,” in 3rd Workshop on Massively Parallel Processing, Nice, France, 2003. [2] N. R. Adiga et al., “An Overview of the Blue Gene/L Supercomputer,” in Supercomputing, 2002.

Multi-Toroidal Interconnects For Tightly Coupled ...

Yevgeny Kliteynik, Edi Shmueli, and José E. Moreira, Member, IEEE. APPENDIX I ... and hence the mean uptime of the component becomes. U = ∫. ∞. 0 ... Manuscript received March 20, 2006; revised November 29, 2006. Y. Aridor, T.

74KB Sizes 0 Downloads 150 Views

Recommend Documents

Multi-Toroidal Interconnects For Tightly Coupled ...
memory, and network connections, capable of running one or more concurrent ..... cables — the torus is often wired as shown here. A 3D torus architecture is defined .... assess the advantages of the new architecture afforded by the additional ...

Pattern Growth Mining Tightly Coupled on RDBMS - CiteSeerX
Recently, an FP-tree based frequent pattern mining method, called FP-growth, ..... 277 “Great English Muffins” (143) and the less one was 1559 “CDR Apple.

Pattern Growth Mining Tightly Coupled on RDBMS - CiteSeerX
pattern growth mining approach by means of database programming for ..... 277 “Great English Muffins” (143) and the less one was 1559 “CDR Apple ... dependent of the database programming language with SQL-Extensions. ... Agarwal, R., Shim., R.:

100GbE and beyond for warehouse scale computing interconnects
Jul 28, 2011 - sumer trend is the migration from local compute/storage model to a cloud computing paradigm. As com- putation and storage continues to ...

Methods of forming electrical interconnects on integrated circuit ...
Jul 26, 2006 - (73) Assignee: Samsung Electronics Co., Ltd.,. _. _. SuWOmsi' ... rises the ste s of olishin the second electricall conduc. (56). References Clted .... inter-insulating layer 3 is formed, and then patterned to form a wiring layer 5.

Asymmetrically-loaded interdigital coupled line for ...
Apr 10, 2008 - for the coupling degree in exploring a microstrip bandpass filter with a fractional .... Electronics Letters online no: 20080206 doi: 10.1049/el: ...

A Framework for Simplifying Trip Data into Networks via Coupled ...
simultaneously cluster locations and times based on the associated .... In the context of social media ... arrival-type events (e.g. Foursquare check-in data [20]).

A Weakly Coupled Adaptive Gossip Protocol for ...
autonomous policy-based management system for ALAN. The preliminary .... Fireflies flash at a predetermined point in a periodic oscillation that can be ...

Validity of the phase approximation for coupled ...
original system. We use these results to study the existence of oscillating phase-locked solutions in the original oscillator model. I. INTRODUCTION. The use of the phase dynamics associated to nonlinear oscil- lators is a .... to the diffusive coupl

Computation with mechanically coupled springs for ...
results of computer simulations indicate that the network of mechanically coupled springs can ..... the networks with the best performance and the worst one. The angle .... even when a further limitation on degrees of freedom was added to the ...

Coupled Minimum-Cost Flow Cell Tracking for High ...
Jul 16, 2010 - five separate datasets, each composed of multiple wells. ... Phone: 1-518-387-4149. ...... ond line of the “Source” and “Target” equations.

The Chubby lock service for loosely-coupled ... - Research at Google
This paper describes a lock service called Chubby. ... tralized lock service, even a highly reliable one. .... locks during network partitions, so the loss of locks on.

Coupled Snakelet Model for Curled Textline ...
using coupled snakes and external energies of neighboring ... Figure 1: Curled textline definition .... on the data set used in the CBDAR 2007 document image.

Quasi-Resonant Interconnects: A Low Power Design ...
In this paper, a low power, low latency on-chip interconnect design methodology is ... man Kodak Company, Manhattan Routing, and Intrinsix Corporation. 641.

Reconfigurable interconnects in DSM systems, a ...
(OS) and its influence on communication between the processing nodes of the system .... and secondly the Apache web server v.1.3 concurrently run with the.

Output Feedback Control for Spacecraft with Coupled ...
vehicles [2], [10], the six-DOF rigid body dynamics and control problem for ... adaptive output feedback attitude tracking controller was developed in [12]. Finally ...

Capacitive-Ended Interdigital Coupled Lines for UWB ...
IEEE MICROWAVE AND WIRELESS COMPONENTS LETTERS, VOL. 16, NO. 8, AUGUST 2006. Capacitive-Ended Interdigital Coupled Lines for UWB Bandpass Filters With Improved. Out-of-Band Performances. Sheng Sun, Student Member, IEEE, and Lei Zhu, Senior Member, IE

New tools for G-protein coupled receptor (GPCR) drug discovery ...
New tools for G-protein coupled receptor (GPCR) drug discovery: combination of baculoviral expression system and solid state NMR. Venkata R. P. Ratnala.

Domain Adaptation with Coupled Subspaces - Semantic Scholar
With infinite source data, an optimal target linear pre- ... ward under this model. When Σt = I, adding ... 0 as the amount of source data goes to infinity and a bias.

Coupled Flow Discrete Element
2.4 Comparison between the Analytical Solution and the DEM for Single .... 3 Discrete element simulation of particle-fluid interaction using a software coupling.

Grating coupled vertical cavity optoelectronic devices
Feb 26, 2002 - This application is a continuation of application Ser. ... the expense of a larger threshold current. ..... matriX calculation for a slab Waveguide.