End-to-End Error Correction and Online Diagnosis for On-Chip Networks

CDSC Annual Review May

31st-Jun

1st

, 2011

Saeed Shamshiri, Amirali Ghofrani Kwang-Ting (Tim) Cheng

UC Santa Barbara Department of ECE

Error Correction Codes

Online Diagnosis

Switch-to-Switch (S2S)

Experimental Results

XY Routing

Expected Values 0.0001

0.0001

1

2

 Interleaved SEC Hamming(21,16).  4-degree interleaving provides 4-bit burst correction.

0.0001 3

0.0000

0.0000

0.0001 4

0.0000

0.0001 1

0.0000

0.0001 2

0.0001 3

0.0000

0.0000

4 0.0000

4.00E-03

Expected Value

3.50E-03

0.0001

0.0002

5

6

0.0003 7

0.0000

0.0000

0.0004 8

0.0000

0.0003 5

0.0000

0.0002 6

0.0001

3.00E-03

7

0.0000

0.0000

8 2.50E-03

0.0000

2.00E-03 1.50E-03

0.0002 9

End-to-End (E2E)

0.0005 10

0.0000

0.0009 11

0.0000

0.0013 12

0.0000

0.0009 9

0.0000

0.0005 10

0.0002 11

0.0000

0.0000

1.00E-03

12 0.0000

5.00E-04

Mesh

0.00E+00 1

0.0004 13

0.0011 14

0.0000

 Interleaved error-locality-aware 2G4L(26,16).  4-degree interleaving provides 16-bit burst correction.  E2E approach is four times cheaper than S2S.

0.0021 15

0.0000

0.0002

0.0000

0.0005

1

2 0.0000

0.0021 B

0.0000

0.0009 3

0.0000

0.0036 A

0.0000

0.0009 1

0.0000

0.0004 15

0.0000

0.0013 4

0.0000

0.0011 14

7

a. Right-going links

0.0003 6

0.0004 7

0.0000

0.0000

0.0000

4

7.00E-04

0.0000

0.0005 8

0.0000

0.0004 5

0.0000

0.0003 6

S8

Expected Value

0.0000

0.0000

4.00E-04

8

0.0001 10

0.0002 11

0.0002 12

0.0002 9

0.0000

0.0001 10

3.00E-04

0.0001 11

1.00E-04

12

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0001

0.0001

0.0001

0.0001

0.0001

0.0000

0.00E+00 3

13

14 0.0000

15 0.0000

16 0.0000

100000000

100000

0.0000

14

15

0.0000

0.0000

7

0.9 0.8

S3

S4

S5

S6

S7

1

B(24,l)

B(23,l)

B(22,l)

B(21,l)

B(20,l)

B(19,l)

B(18,l)

100

B(17,l) B(13,l)

B(16,l) B(12,l)

B(15,l) B(11,l)

B(14,l) B(10,l)

B(9,l) B(5,l)

B(8,l) B(4,l)

B(7,l) B(3,l)

B(6,l) B(2,l)

10

0.96

0.6

0.3

Min Routing

2

3

4

5

6

7

8

9

0.92

XY-Route Hybrid-Route Min-Route

0.4

1 1

0.94

0.5

B(l)

Number of Corrections

6 p6 0 0 0 0 0 1 0 0 0 0 32

E2E Defect Observation Escape Rate

Number of Packets

1

2

(1/6)

3

4

1

D

8

5

11

12

A

15

16

S

2

4

Parity check matrix of 2G4L(26,16) 1 2 3 4 p1 p2 p3 p4 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8

5 p5 0 0 0 0 1 0 0 0 0 0 16

6 p6 0 0 0 0 0 1 0 0 0 0 32

7 8 p7 p8 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 64 128

9 d1 1 1 0 0 1 1 0 0 0 0 51

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 d2 d3 d4 p9 d5 d6 d7 d8 d9 p10 d10 d11 d12 d13 d14 d15 d16 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 0 1 0 0 1 0 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 85 127 173 256 283 309 327 413 420 512 579 541 679 910 1003 841 744

5

1

1

6

3

2 A

2

3 3

6

0.5

D

8

11

12

3 B

2

2 S

0.17

0.33

B

0.33

1 14

1

0.5

14

0.17

15

16 (1/4)

Expected Value 1

2

3

Suspicion Value 4

(1/3)

1

2

3

4

D

8

11

12

15

16

Decoder of 2G4L(26,16) 5

A

Synthesis Results Area (um^2) BCH(26,16) 2G4L(26,16)

Power (mW)

0.028

6

B

0.042

0.028

D

8

5

11

12

A

15

16

S

0.043

0.083

6

B

0.125

0.083

Latency (ns)

Encoder

Decoder

Encoder

Decoder

Encoder

Decoder

2872

21043 22173

2.5107

13.758 14.0756

0.9

2.2365

0.78

3.4 3.35

2744

0.014

S

0.042

14

0.014

0.125

14

0.043

49 0

40 0

43 0 46 0

31 0

34 0 37 0

22 0 25 0

28 0

13 0

16 0 19 0

10 0

10

3

40

Usage Probability

0.86 70

NUMBER OF ROUTES

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 p7 p8 p9 p10 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15 d16 0 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 0 1 0 1 0 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 1 64 128 256 512 873 443 886 389 778 381 762 669 595 975 247 494 988 209 418 836

50% Noise

0.88

0.1

Parity check matrix of BCH(26,16) 5 p5 0 0 0 0 1 0 0 0 0 0 16

0% Noise 20% Noise

0.9

0.2

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Implementations 1 2 3 4 p1 p2 p3 p4 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8

Accuracy

0.98

0.7

A(g)

Bit Position Endoded Data Bits p1 p2 p3 p4 Parity bit p5 coverage p6 p7 p8 p9 p10 Decimal

S2

Accuracy of Diagnosis

1000

Bit Position Endoded Data Bits p1 p2 p3 p4 Parity bit p5 coverage p6 p7 p8 p9 p10 Decimal

S1

0.0000

1

10000

0

Mesh 5

b. Up-going links

16

Accuracy

1000000

Number of Syndromes

10000000

13

0

Encoder of 2G4L(26,16)

S7

6.00E-04

0.0001 7

1

Error-locality-aware Codes

 2G4L(26,16):  Same cost as BCH(26,16)  More reliable

S6

2.00E-04

0.0001

 BurstCode(26,20):  25% higher code-rate than BCH(26,16)  Only reliable against adjacent errors

S5

S3

5.00E-04

0.0001 5

9

 Designing codes for burst (local) and random (global) errors

S2 S1

0.0002 3

0.0000

S4

5

0.0000

0.0005 2

3

16

0%

5%

10%

15%

20%

25%

30%

35%

40%

Conclusion  A comprehensive end-to-end solution for error correction, data collection, and defect diagnosis and replacement for on-chip networks has been proposed.  Four interleaved 2G4L(26,16) provide two random and up to 16 adjacent-bit error corrections per flit.  E2E error pattern information is gathered in a centralized software on the host processor and used for diagnosis of defective wires.  Under heavy noise, high escape rate, uncertainty about routing, and many other harmful effects, the collected data are still accurate enough for diagnosis.  The collected data can also be used for other purposes such as diagnosis of defective routers, locating the intermittent faults, and many other interesting system observations.

End-to-End Error Correction and Online Diagnosis for ...

Jun 1, 2011 - End-to-End Error Correction and Online. Diagnosis for On-Chip ... Online Diagnosis. Experimental .... 4-degree interleaving provides 16-bit.

222KB Sizes 0 Downloads 110 Views

Recommend Documents

End-to-End Error Correction and Online Diagnosis for On ... - CiteSeerX
Department of Electrical and Computer Engineering. University of California ... permanent and transient faults due to the accelerated aging effects such as ...

End-to-End Error Correction and Online Diagnosis for On ... - CiteSeerX
e2e data gathering and online diagnosis approach that locates the defective wires ...... as a master and dispatches tasks to other cores. In the proposed scheme ...

Dynamic forward error correction
Nov 5, 2003 - sponding error correction data therebetWeen during a plural ity of time frames. ..... mobile sWitching center, or any communication device that can communicate .... data according to a cyclical redundancy check (CRC) algo rithm and ...

Transparent Error Correction for Communication ... - IEEE Xplore
Jun 15, 2011 - TCP/IP throughput by an order of magnitude on a 1-Gb/s link with 50-ms ... protocols, aggregating traffic for high-speed encoding and using a.

BULATS Writing Part One Error Correction - UsingEnglish.com
Your company is going to hold a conference at the end of the year and it is your job to find ... I saw your conference centre advertised at Best Conference and Trade Fair Monthly and ... Work with someone else to edit the email on the last page.

A New Error Correction Code
communications (e.g., satellite communications, digital .... octal values have been shown in Table I (Iteration 2 and 3). Obviously, at the end of coding ... Figure 4. Error Resolution Table with hot-bits and the error-bit. TABLE III. COMPARISON OF T

Network Coding, Algebraic Coding, and Network Error Correction
Abstract— This paper discusses the relation between network coding, (classical) algebraic coding, and net- work error correction. In the first part, we clarify.

Network coding and error correction - Information ...
Department of Information Engineering. The Chinese University of Hong Kong. Shatin, N,T. Hong Kong, China http://www.ie.cuhk.edu.hk/people/raymond.php.

On Packet Size and Error Correction Optimisations ... - Semantic Scholar
Recent sensor network platforms with greater computa- ... platforms [2], [3]. These resource-rich platforms have increased processing capabilities which make erasure code handling viable and efficient [3]. Moreover, improved radio designs [4] facilit

On Packet Size and Error Correction Optimisations ... - Semantic Scholar
it only accounts for interference and does not consider packet transmissions. Because CQ relies on the receiver ... latency of packets, which accounts for packet transmission time plus the inter-packet interval (IPI). Definition 1. ..... sage out of

Lexical Error Diagnosis for Second Language Learners ...
Moreover, our solution is based on natural language processing (NLP) techniques which suit the highly inflected nature ...... In Proceedings, 33 rd Annual Meeting of the Association for ... Learning”. In Linguistik online Journal, May 17, 2003. 9.

Urban Water Demand with Periodic Error Correction - from Ron Griffin
The U. S. Texas Water Resources Institute Technical Report TR-331. College Station,. TX: Texas A&M University. http://ron-griffin.tamu.edu/reprints/.

Characterizations of Network Error Correction/Detection ...
the code in terms of error correction/detection and erasure correction. We have ...... Thus there is a host of problems to be investigated in the direction of our work ...

A Burst Error Correction Scheme Based on Block ...
B.S. Adiga, M. Girish Chandra and Swanand Kadhe. Innovation Labs, Tata Consultancy ..... Constructed. Ramanujan Graphs,” IJCSNS International Journal of Computer. Science and Network Security, Vol.11, No.1, January 2011, pp.48-57.

Urban Water Demand with Periodic Error Correction - from Ron Griffin
The sample spans nine states (Alaska, California, Florida, Indiana, Kansas, Minnesota, Ohio,. Texas, and ... merit a preliminary examination of the dependent variable, total daily quantity of water demanded per capita. ... measures are average within

Error Correction on a Tree: An Instanton Approach - Semantic Scholar
Nov 5, 2004 - of edges that originate from a node are referred to as its degree. In this Letter we discuss primarily codes with a uniform variable and/or check node degree distribution. Note that relations between the ..... [9] J. S. Yedidia,W. T. Fr

Error Correction: A Traffic Light Approach
they volunteer or are asked to speak in class, they will flash one of the following three cards: Red: When a student flashes a red card, the student does not want ...

Correction
Nov 25, 2008 - Sophie Rutschmann, Faculty of Medicine, Imperial College. London ... 10550 North Torrey Pines Road, La Jolla, CA 92037; †Cancer and.

Correction
Jan 29, 2008 - Summary of empirical and computed Arrhenius parameters. SLO mutant. Experimental Arrhenius parameters. Calculated Arrhenius parameters ...