Provisioning On-Chip Networks under Buffered RC ...

Viewer
Transcript

Provisioning On-Chip Networks under Buffered RC Interconnect Delay Variations Mosin Mondal Rice University

Tamer Ragheb Rice University

Xiang Wu AMD

Abstract—A Network-on-Chip (NoC) replaces on-chip communication implemented by point-to-point interconnects in a multi-core environment by a set of shared interconnects connected through programmable crosspoints. Since an NoC may provide a number of paths between a given source and destination, manufacturing or runtime faults on one interconnect does not necessarily render the chip useless. It is partly because of this fault tolerance that NoCs have emerged as a viable alternative for implementing communication between functional units of a chip in the nanometer regime, where high defect rates are prevalent. In this paper, we quantify the fault tolerance offered by an NoC against process variations. Specifically, we develop an analytical model for the probability of failure in buffered global NoC links due to interconnect dishing, and effective channel length variation. Using the developed probability model, we study the impact of link failure on the number of cycles required to establish communications in NoC applications.

I. I NTRODUCTION With CMOS technology scaling, hardware defects increase both in the devices and interconnects, resulting significant yield loss. Process variations in the nanometer regime can produce chips with performance metrics significantly different from the designed values. For instance, lithographic errors in printing geometries in the nanometer devices cause significant variation in the effective channel length and resistance of the devices [1]. Moreover, imperfect planarization of the chip surface by chemical mechanical polishing (CMP) method causes dishing and erosion in the global interconnects producing considerable variation in the interconnect parasitics [2]. Since the relative importance of interconnects on the performance of integrated circuits increases with feature size reduction, the International Technology Roadmap for Semiconductors has predicted that traditional interconnects will be a major bottleneck for technology nodes beyond 45 nm [3]. In long global interconnects, generally associated with repeaters necessary to reduce the delay, the variations in the effective channel length and interconnect parasitics will cause the performance metrics to vary from the designed values leading to yield loss in conventional point-to-point communication based designs. Network-on-Chip (NoC) emerged as a vital technology for system-on-chip (SoC) and chip multiprocessors to establish a controlled communication mechanism between the different blocks with minimal collisions and errors [4]. NoC replaces dedicated point-to-point interconnects with an ensemble of links that work as shared resources by time-multiplexing. This ensemble of links is programmed through programmable cross points that gives the flexibility to implement the same communication with less interconnects and better reliability.

Adnan Aziz University of Texas

Yehia Massoud Rice University

The reliability originates from the alternative paths that exist between a source and a destination preventing the chip from failure in the presence of hardware faults. Therefore, NoC can solve many of the problems associated with conventional interconnects and emerges as a viable solution for SoC interconnection. Moreover, NoC is a promising technique for establishing communication between different chips in 3-D integrated circuits as well. In this paper, we develop an analytical model for approximating the probability of link failure caused by delay variation due to chemical-mechanical polishing and effective channel length variation. The rest of this paper is organized as follows. In Section II, we discuss the impact of process variation on the link failure. A high-level analytical model for the link failure is developed in Section III considering variations both in the interconnect and devices. In Section IV, we examine the impact of the interconnect defects on the ability of the NoC as a whole to implement desired functionality. Specifically, we determine how many cycles it takes to perform LDPC decoding on mesh-structured NoCs as a function of the interconnect failure rate. II. L INK FAILURE C AUSED BY P ROCESS VARIATION Process variation has become significant with decreasing feature size and it impacts the performance of modern integrated circuits by causing large variations in delay, power and crosstalk noise. Process variation has direct impact on the yield since the design constraints may not be met. In the context of an NoC environment, delay faults caused by process variation are critical since they can produce collision of packets and increase the latency. In this section, we analyze the effect of process variation on the links used in communicating between different cores in NoC designs. Process variation changes the delay of the links from the nominal values and thereby can lead to delay violations. The links used in transferring packets between cores spread over long distances, in the order of few millimeters, on the global metal layers. Since the delay of such long interconnects will be large, optimal number of buffers of appropriate size are inserted to minimize the delay. The delay of a link is affected by process variations both in the interconnect and the buffers. The first source of variation is the chemical mechanical planarization (CMP) used in the back-end-of-line (BEOL) process step for modern copper interconnects. CMP causes surface imperfections in the interconnect wires because of dishing and erosion. The change in the top surface of the global interconnects due to dishing effects, as shown in Figure 1, changes the resistance of the wires considerably whereas the capacitance values are not

Proceedings of the 8th International Symposium on Quality Electronic Design (ISQED'07) 0-7695-2795-7/07 $20.00 © 2007

Dielectric level before CMP Dielectric level after CMP

1.0 0.8

Probability

dishing

Copper Dielectric

Dishing Only Leff Only Dishing and Leff

0.6 0.4 0.2

Fig. 1.

Dishing of global interconnects. 0 110

120

130

140

150

160

170

180

Delay (ps)

be affected much since the height of the sidewall practically stays unaffected. This fact is also supported by [5] where the authors show that the variation in capacitance is around 3% and hence negligible. The other source of delay variation in the links can be attributed to the variation of effective channel lengths of transistors used in the buffers. The effective channel length (Lef f ) varies due to lithographic errors in printing the geometries. Effective channel length variation affects the buffer delay since the effective output resistance (Ro ), the input capacitance (Cin ) and output capacitances (Co ) of the buffer are dependent on the effective channel length. In the analysis of the probability of link failure due to delay violation, we consider the effect of both dishing and effective channel length variation on delay. To demonstrate the variation in the delay of a link due to process variation, let us consider a link of length 5mm with dimensions typical in the 65nm technology node. The delay of the link is optimized by inserting an optimal number of buffers of appropriate sizes. The nominal delay of the buffered link is 146.3ps. Figure 2 depicts the variation in delay caused by the variation of the interconnect resistance, R, and effective channel length, Lef f , when they vary individually and together. The cumulative distribution functions (CDFs) of the delay due to dishing, Lef f variation, and both of them are shown in the figure, considering Gaussian distribution of the parameters. The mean and 3σ values of the delay distribution due to the abovementioned parameter variations are shown in Table I. It is noticed that the mean in each case is almost the same as the nominal delay, whereas the 3σ values are different for the cases mentioned in Table I. For variation in the interconnect resistance due to dishing, the 3σ value is 12.72ps, whereas the value grows to 19.54ps for variation in Lef f . When both variations are considered, the 3σ value is found to be 23.37ps. It can be noted that although Lef f has a bigger impact on the delay variation, we need to consider the effect of both variations for accurate analysis of link failure due to process variation. III. P ROBABILITY OF LINK

FAILURE

In NoC applications, each link has an associated nominal delay that becomes a distribution around a mean value (generally close to the nominal value) depending on the variations in the resistance of the interconnect, R, and in the effective transistor channel length, Lef f . In order to calculate the variation in delay, we derive the delay equation of the interconnect with buffers using the widely accepted Elmore delay model for

Fig. 2. Cumulative distribution functions (CDF) showing the effect of parameter variation on delay. TABLE I M EAN AND 3- SIGMA VALUES OF DELAY DISTRIBUTION DUE TO PARAMETER VARIATIONS

Variation due to

Mean (ps)

3-sigma (ps)

Dishing

146.52

12.72

Lef f

146.17

19.54

Dishing and Lef f

146.20

23.37

general RC tree delay. Elmore delay is considered to have a high degree of fidelity since an optimal or near optimal design obtained by using an Elmore delay will also produce a near optimal design based on more accurate delay models [6]. The Elmore delay model of an interconnect with associated buffers is shown in Figure 3, where Ro , Co and Cin are the effective output resistance, output capacitance and input capacitance of the buffers, respectively. Besides, Cs is the total interconnect self capacitance, Cc is the total coupling capacitance to one neighbor, R is the total interconnect resistance, k is the number of buffers inserted in the interconnect and η is the Miller factor for coupling capacitance, having the worst case coupling when η = 2. The Elmore delay for one section of the k-section interconnect can be found in the conventional manner as: Dsec

= +

Cs 2ηCc + + Cin ) k k R 0.38Cs 0.76ηCc ( + + 0.69Cin ) k k k

0.69Ro (Co +

(1)

For the actual computation of the Elmore delay, the values of the different resistances and capacitances need to be known. The resistance and capacitance values associated with the buffers can be analytically found by using different expressions: the effective output resistance of the buffer, Ro , depends on different device parameters including the threshold voltages and operating points determined by the gate to source voltages (VGS,n and VGS,p ) and the threshold voltages (VT n and VT p ). In this work, Ro is calculated as the average of the resistances of the NMOS and PMOS transistors assuming equal probabilities for logic ’1’ and ’0’, as given by Equation (2): Ro =

Proceedings of the 8th International Symposium on Quality Electronic Design (ISQED'07) 0-7695-2795-7/07 $20.00 © 2007

0.5Lp /Wp 0.5Ln /Wn + µn Cox (VGS,n − VT n ) µp Cox (VGS,p − |VT p |)

(2)

2Cc Rout

R Cin Probability

Cs

Cout

One Section 2ȘCc/nk

PDF of XY Gaussian Approximation

0.04

2ȘCc/nk

2ȘCc/nk

Rout

R/nk

R/nk

R/nk

Cout

Cs/nk

Cs/nk

Cs/nk

0.03

0.02

0.01

0 0

X=N(10,1) Y=N(5,0.5)

20

40

60

80

100

XY

Cin

Fig. 4. Product of the normally distributed variables with positive mean and small standard deviations closely follow a Gaussian distribution. Fig. 3.

Delay model for interconnect with buffers inserted.

where Wn and Wp are the transistor widths, µn and µp are the mobility values, and Cox is the oxide capacitance per unit area. The input capacitance of the buffer, Cin , which is the total gate capacitance, is given by: Cin = Cox (Wn Ln + Wp Lp )

(3)

The output capacitance or the total drain capacitance, Co , of the buffer is: Co

=

(Wn En + Wp Ep )Cj + 2(Wn + En )Cjsw

+

2(Wp + Ep )Cjsw

(4)

where En and Ep are the diffusion lengths. Cj is the bottomplate capacitance per unit area associated with the diffusion junction, and Cjsw is the side-wall capacitance due to the perimeter of the junction per unit length. Once the delay of one section is found by the above equations, the delay of the k-section link can be simply expressed as: Dlink

= k.Dsec

(5)

Since we are analyzing the impact of the variations in R and Lef f of the NMOS (Ln ) and PMOS (Lp ), the link delay needs to be computed in terms of R, Ln and Lp . Combining all the above equations, the total delay can be expressed as: Dlink

the products of a Gaussian random variable with either a constant, or another Gaussian variable. The distribution of the product of a Gaussian random variable with a constant follows Gaussian distribution, but the distribution of the product of two normal variables, in general, is not Gaussian. However, for our analysis, the random variables (R and Lef f ) always have a positive value and the 3σ value is much less than the mean, which allow us to approximate the product terms as Gaussian variable. Ware and Lad, in [8], explored the problem of approximating the sum of products of Gaussian variables in detail, which we use in this work. When the 3σ variations are much less than the mean, the product of two normal random variables will closely follow a normal distribution whereas in general it will not. Gaussian approximation is quite accurate when we consider the cases with positive mean and standard deviation much smaller than the mean. Figure 4 depicts such an example where the Gaussian approximation is fairly acceptable. For approximating the product of two such normally distributed random variables, we use the expressions of the mean and standard deviation from [8]. Let us consider two Gaussian variables X1 and X2 with means µ1 and µ2 and standard deviations σ1 and σ2 . The random variable X = X1 .X2 can be approximated as a Gaussian variable with mean µ and standard deviation σ as given by: µ = µ1 µ2 + ρσ1 σ2

(7)

σ 2 = µ21 σ22 + µ22 σ12 + σ12 σ22 + 2ρµ1 µ2 σ1 σ2 + ρ2 σ12 σ22

(8)

= α1 R + α2 Ln R + α3 Lp R + α4 Ln + α5 L2n + α6 Ln Lp + α7 Lp + α8 L2p

(6)

where αi terms represent the coefficients of the random variables. In the analysis of the delay variation, the variation in the interconnect resistance and the effective channel length are considered to be Gaussian, as considered in [7], since Gaussian distribution realistically models the variations in the resistance and channel length. The variations of the effective channel length in the PMOS and NMOS transistors are assumed to be independent since the PMOS and NMOS are fabricated at different steps of the manufacturing process. Moreover, the variation of interconnect resistance due to dishing effects are independent of the channel length variation due to lithographic and doping uncertainties. From the expressions of total delay, as given by Equations (6), it can be noted that the computation involves the addition of different product terms, which are

and

where ρ is the correlation coefficient between X1 and X2 . For example, ρ will be 0 while finding the product of R and Lp , whereas its value will be 1 for Lp .Lp . Once the product variables are determined, they are summed up to get the final random variable for the link delay, Dlink . However, some of the product terms are correlated to each other now. For example, R is correlated to RLn and RLp . If each term of Equation 6 is denoted as Xi (µi , σi ), then the mean of the overall delay will be

Proceedings of the 8th International Symposium on Quality Electronic Design (ISQED'07) 0-7695-2795-7/07 $20.00 © 2007

µlink =

8 X i=1

µi

(9)

1.0

Probability of Link Failure

Probability of Delay

0.5 0.8

0.6

0.4

0.2 Numerical Monte Carlo Our Method 0 130

140

150

160

170

Comparison of cumulative distribution functions (CDF) of the link

0.3 0.2 0.1 0 0

180

Delay (ps)

Fig. 5. delay.

0.4

10

20

30

40

50

Percentage Slack

Fig. 6. Probability of link failure as a function of the percentage delay slack.

around the mean value of 65nm, meaning that the effective transistor channel length is varying with 3σ variations of up to 15% [7]. The dishing effect on the resistance of the interconnect is taken to cause 3σ variations of up to 18% [10]. T σlink = S.K.S (10) Spice simulation revealed that the combined nominal delay is 146.3ps. Monte Carlo simulations were performed on the where S is the row vector [σ1 , σ2 , . . . , σ8 ] and K is the design and the distribution of the delay had mean value of correlation matrix 146.2ps and the 3σ points were at 122.82ps and 169ps. The   values of the mean and 3σ points computed by our method 1 0 0 ρ1,4 ρ1,5 0 0 0  0  were found to be 155.3ps, 136.4ps and 174.2ps, respectively. 1 0 ρ2,4 0 ρ2,6 ρ2,7 0    0  Since our method is based on the Elmore delay model which 0 1 0 0 0 ρ3,7 ρ3,8    ρ1,4 ρ2,4  gives the upper bound of the actual delay, our method yields 0 1 ρ4,5 ρ4,6 ρ4,7 0    ρ1,5  values greater than the SPICE values. However, the error in the 0 0 ρ4,5 1 0 ρ5,7 ρ5,8    0  mean value is 6.15% and the errors at the 3σ points are 11.4% ρ2,6 0 ρ4,6 0 ρ6,7 1 0    0  and 2.7% respectively. For verifying the statistical correctness ρ2,7 ρ3,7 ρ4,7 ρ5,7 ρ6,7 1 ρ7,8 of our method, we performed numerical Monte Carlo analysis 0 0 ρ3,8 0 ρ5,8 0 ρ7,8 1 on the delay Equation (6) where we generated independent and the terms ρi,j indicate the correlation coefficients between random Gaussian vectors corresponding to the interconnect Xi and Xj . resistance (R), and the effective channel lengths (Ln for the Once the mean and standard deviations are found, we can NMOS and L for the PMOS). Figure 5 shows the cumulative p denote the overall link delay as a normal distribution with distribution functions (CDF) obtained by numerical Monte mean µlink and standard deviation σlink . If the link has a Carlo simulations as compared to the Gaussian CDF generated design margin Dm , then the probability of link failure will be by using the mean and standard deviations from our method. the probability of the delay being more than Dnom + Dm : The two CDFs match very closely that proves the accuracy of the statistical methods used in this work. P [Dlink > Dnom + Dm ] = 1 − P [Dlink < Dnom + Dm ] Finally, using the analytical values of the mean and sigma, Dnom + Dm − µlink ) (11) we can find the probability of link failure from Equation (11). = Q( σlink For the purpose of designing the packet scheduling in the NoC where Q is the well known Borjesson expression [9]: application, it will be very useful if we have a probability of 1 1 −x2 /2 link failure as a function of the delay slack (margin) with √ Q(x) ≈ [ ]√ e (12) 2 which the link is designed. It is obvious that if the slack (1 − a)x + a x + b 2π is high, the link will be more tolerant to process variations, with a = 1/π and b = 2π. however, at the cost of reduced performance. Figure 6 shows the probability of link failure as a function of the percentage A. Results delay slack, for the link considered above. As expected, with To verify the correctness of the analytical method developed the increase in the slack time, the failure probability decreases. in this paper, we considered a link of length 5mm between two cores in an NoC environment. Each wire in the bus is IV. I MPACT OF INTERCONNECT FAILURES ON THE N O C 0.5µm in width and 1.2µm in height. The separation between adjacent lines is 0.5µm. All these dimensions are typical to In this section, we study the effects of interconnect failure the global interconnects found in the 65nm technology node. on the ability of the NoC to implement desired communicaThree buffers, with size 100X the minimum sized buffers in tion. Our assumption is that interconnect failures are identified 65nm technology, are inserted in each line for minimized by post-manufacturing test, and that the routes for the NoC are delay. The random variation in Lef f is considered as 5% determined offline. Computation of the standard deviation needs to handle the correlation of the product terms properly. The overall standard deviation can be found by

Proceedings of the 8th International Symposium on Quality Electronic Design (ISQED'07) 0-7695-2795-7/07 $20.00 © 2007

The benefits of an NoC are most pronounced in settings where there is a large amount of communication. Many DSP subsystems have this property. These systems are typically implemented as an ensemble of processing elements (PEs) operating in parallel, with a very large amount of communication between PEs. In our experiments we will focus on an NoC organized as a mesh, implementing the communication for Low-Density Parity Check (LDPC) decoding, which is computationally very challenging, and requires a great deal of communication [11].

B. Scheduling algorithms Given a traffic matrix M , and a NoC represented by G, an optimum schedule is one which uses the least number of cycles to transfer all the packets encoded by M . Optimum scheduling is desirable, because it leads to higher throughput. However, it is difficult to compute optimum schedules. Consider the example, adapted from Wu et al. [13], shown in Figures 7 and 8. For this example, G can transfer the 3 packets {a, c, f } in one cycle. After a simple enumeration, it is clear that no two packets from {b, d, e} can be transfered in one cycle, meaning that any schedule selecting {a, c, f } for a cycle will take at least 4 cycles. However, G can transfer the packets in M in 3 cycles, as shown in Figure 9, which demonstrates greedy scheduling is suboptimum.

2

6

3

5

4

Fig. 7. A mesh-structured switch fabric G. Each vertex can be either a source or a sink, but not both, in a cycle.

   

A. Formalization A Network-on-Chip is an ensemble of links and programmable crosspoints that connect a set of source nodes S to set of sink nodes T [12]. As previously stated, each link is essentially a multi-bit bus. We represent the NoC as an undirected graph G = (V, E). Sources, sinks and intermediate crosspoints all correspond to vertices, and the interconnects are modeled as edges. We restrict our attention to fabrics with no internal buffering of packets—such fabrics are considerably cheaper to implement. We will refer to a switch fabric and its graph interchangeably. The fabric operates on fixed-size packets; segmentation and reassembly are assumed to be performed outside the fabric. A traffic matrix is an |S| × |T | matrix M , where Mij is a non-negative integer encoding the number of packets to be transferred from source i to sink j. Given a fabric and matrix M , a schedule is a collection of configurations, where each configuration consists of choices for all programmable crosspoints. These choices result in a set of channels that connect a subset of S to a subset of T . Since the fabric does not buffer packets internally, for a configuration to be valid, no two channels can intersect each other. For each configuration, a fixed-duration cycle is allocated to program the fabric and transfer packets. During each cycle, the transfer is implemented by passing exact one packet through each channel. A schedule Σ is said to complete the matrix M , if by following the procedure above for each configuration in Σ, we can transfer all packets encoded in M from S to T .

1

1a 0 0 0 1e 0

0 0 0 0 0 0

0 0 0 0 0 0

1b 0 1c 0 0 0

0 0 0 0 0 0

0 0 1d 0 1f 0

   

Fig. 8. Traffic matrix M for the fabric in Figure 7. The superscripts are packet identifiers, e.g., we will refer to the packet from source 1 to sink 2 as a.

C. Experiments In previous work [13], the authors proved that computing optimum schedules is NP-hard and developed heuristics for computing schedules. We apply the same heuristics to compute schedules for mesh-structured NoCs with defective interconnects implementing LDPC decoding. We assume interconnects are defective independently of each other, with a fixed probability p, and report the average number of cycles of the schedule as a function of p. An LDPC code is a block code, where there are C bits per block, which include D parity checks. It is most naturally represented as a bipartite graph on a set of C code nodes and D check nodes. The decoding algorithm [14] involves iterations of message passing back and forth between connected code and check nodes, and it is this communication that defines the traffic matrix. Figure 10 shows, as a function of the defect probability p, the average number of cycles in the schedules computed by our heuristic. The NoC is organized as a 23×23 mesh, and the traffic matrix corresponds to the communication pattern for an LDPC code with 48 check nodes and 96 code nodes. For each p, we created 50 different NoCs and attempted to generate a schedule for each of them. The average is taken only over

a c

d

f a d

e

b

e

Greedy Decomp

b c f

Optimum Decomp

Fig. 9. Greedily constructed and optimum schedules for G and M as presented in Figure 7 and 8, respectively.

Proceedings of the 8th International Symposium on Quality Electronic Design (ISQED'07) 0-7695-2795-7/07 $20.00 © 2007

22.0

20.5

21.5

Normalized Total Delay

Number of Cycles in Schedule

21.0

20.0 19.5 19.0 18.5 18.0

20.5 20.0 19.5 19.0

17.5 17.0 0

21.0

0.04

0.08

0.12

18.5 0

0.16

Link Failure Probability

Fig. 10.

Number of Infeasible Instances

0.08

0.12

0.16

Link Failure Probability

Number of cycles in schedule vs. link failure probability.

12

Fig. 12.

Normalized total delay vs. link failure probability.

of cycles needed to implement the communication. We also showed that there exists a particular value of the link failure probability that minimizes the total time to implement the desired communication.

10 8 6

R EFERENCES

4

[1] P. Gupta, and F. Heng, “Towards a systematic-variation aware timing methodology,” Proc. DAC, June 2004. [2] T. Tugbawa, T. Park, D. Boning, L. Camilletti, M. Brongo and P. Lefevre, “A Mathematical Model of Pattern Dependencies in Cu CMP processes,” Proc. CMP Symposium, Electrochemical Society Meeting, October 1999. [3] SIA, International Technology Roadmap for Semiconductors, 2005. [4] W. Dally and B. Towles, “Route Packets Not Wires: On chip Interconnection Networks,” in Design Automation Conference, June 2001. [5] L. He, A. B. Kahng, K. H. Tam and J. Xiong, “Design of integrated-circuit interconnects with accurate modeling of chemical-mechanical planarization,” Proc. SPIE Microlithography, March 2005. [6] R. Gupta, B. Krauter, B. Tutuianu, J. Willis, and L. T. Pileggi, “The Elmore delay as bound for RC trees with generalized input signals,” Proc. DAC, June 1995. [7] L. He, A. B. Kahng, K. H. Tam and J. Xiong, “Simultaneous Buffer Insertion and Wire Sizing Considering Systematic CMP Variation and Random Leff Variation,” Proc. ISPD, April 2005. [8] R. Ware and F. Lad,“Approximating the distribution for Sums of Products of Normal variables,” University of Canterbury,New Zealand. [9] P. Borjesson and C. Sundberg, “Simple Approximation of the Error Function Q(x) for Communications Applications,” IEEE Transactions on Communication, Mar. 1979. [10] S. Nassif, “Modeling and analysis of manufacturing variations,” Proceedings of the Custom Integrated Circuits Conference, May 2001. [11] A. J. Blanksby and C. J. Howland, “A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-density Parity-check Decoder”, IEEE Journal of Solid-State Circuits, Mar 2002. [12] J. Turner and N. Yamanaka, “Architectural Choices in Large Scale ATM Switches”, IEICE Transactions, 1998. [13] X. Wu and M. Mohiyuddin and A. Prakash and A. Aziz, “Scheduling Traffic Matrices On General Switch Fabrics”, IEEE Hot Interconnects, Aug 2006. [14] R. G. Gallager, “Low-density parity-check codes”, MIT, Cambridge, MA, 1962.

2 0 0

0.04

0.08

0.12

0.16

Link Failure Probability

Fig. 11.

0.04

Number of infeasible instances vs. link failure probability.

instances where we were able to generate a schedule. With the increase of p, we see more instances where our heuristic could not produce any schedule, because the NoC became disconnected, as shown in Figure 11. With increasing clock frequency, the design margin is reduced and the probability of link failure increases. Since the total time to implement the communication is the product of the cycle time and the number of cycles, the required total time will be minimized for a particular value of the probability of link failure. The total time to implement the communication as a function of link failure probability is shown in Figure 12. It was found that the optimum time is achieved for p ≈ 0.02 for the LDPC decoding example. V. C ONCLUSIONS In this paper, we investigated the impact of delay faults caused by dishing and effective channel length variation on NoC interconnect failures. We derived an analytical model approximating the probability of link failure as a function of cycle time. The accuracy of the analytical model was verified by comparing with Monte Carlo based simulations. For an example of LDPC decoding, we explored the dependency of the number of cycles required to implement the communication on the link failure probability. We concluded that probability of link failure increases with the clock frequency since the design margin is reduced, which in turn increases the number

Proceedings of the 8th International Symposium on Quality Electronic Design (ISQED'07) 0-7695-2795-7/07 $20.00 © 2007

Sustainability of Service Provisioning Systems under ...