End-to-End Error Correction and Online Diagnosis for On-Chip Networks

CDSC Annual Review May

31st-Jun

1st

, 2011

Saeed Shamshiri, Amirali Ghofrani Kwang-Ting (Tim) Cheng

UC Santa Barbara Department of ECE

Error Correction Codes

Online Diagnosis

Switch-to-Switch (S2S)

Experimental Results

XY Routing

Expected Values 0.0001

0.0001

1

2

 Interleaved SEC Hamming(21,16).  4-degree interleaving provides 4-bit burst correction.

0.0001 3

0.0000

0.0000

0.0001 4

0.0000

0.0001 1

0.0000

0.0001 2

0.0001 3

0.0000

0.0000

4 0.0000

4.00E-03

Expected Value

3.50E-03

0.0001

0.0002

5

6

0.0003 7

0.0000

0.0000

0.0004 8

0.0000

0.0003 5

0.0000

0.0002 6

0.0001

3.00E-03

7

0.0000

0.0000

8 2.50E-03

0.0000

2.00E-03 1.50E-03

0.0002 9

End-to-End (E2E)

0.0005 10

0.0000

0.0009 11

0.0000

0.0013 12

0.0000

0.0009 9

0.0000

0.0005 10

0.0002 11

0.0000

0.0000

1.00E-03

12 0.0000

5.00E-04

Mesh

0.00E+00 1

0.0004 13

0.0011 14

0.0000

 Interleaved error-locality-aware 2G4L(26,16).  4-degree interleaving provides 16-bit burst correction.  E2E approach is four times cheaper than S2S.

0.0021 15

0.0000

0.0002

0.0000

0.0005

1

2 0.0000

0.0021 B

0.0000

0.0009 3

0.0000

0.0036 A

0.0000

0.0009 1

0.0000

0.0004 15

0.0000

0.0013 4

0.0000

0.0011 14

7

a. Right-going links

0.0003 6

0.0004 7

0.0000

0.0000

0.0000

4

7.00E-04

0.0000

0.0005 8

0.0000

0.0004 5

0.0000

0.0003 6

S8

Expected Value

0.0000

0.0000

4.00E-04

8

0.0001 10

0.0002 11

0.0002 12

0.0002 9

0.0000

0.0001 10

3.00E-04

0.0001 11

1.00E-04

12

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0001

0.0001

0.0001

0.0001

0.0001

0.0000

0.00E+00 3

13

14 0.0000

15 0.0000

16 0.0000

100000000

100000

0.0000

14

15

0.0000

0.0000

7

0.9 0.8

S3

S4

S5

S6

S7

1

B(24,l)

B(23,l)

B(22,l)

B(21,l)

B(20,l)

B(19,l)

B(18,l)

100

B(17,l) B(13,l)

B(16,l) B(12,l)

B(15,l) B(11,l)

B(14,l) B(10,l)

B(9,l) B(5,l)

B(8,l) B(4,l)

B(7,l) B(3,l)

B(6,l) B(2,l)

10

0.96

0.6

0.3

Min Routing

2

3

4

5

6

7

8

9

0.92

XY-Route Hybrid-Route Min-Route

0.4

1 1

0.94

0.5

B(l)

Number of Corrections

6 p6 0 0 0 0 0 1 0 0 0 0 32

E2E Defect Observation Escape Rate

Number of Packets

1

2

(1/6)

3

4

1

D

8

5

11

12

A

15

16

S

2

4

Parity check matrix of 2G4L(26,16) 1 2 3 4 p1 p2 p3 p4 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8

5 p5 0 0 0 0 1 0 0 0 0 0 16

6 p6 0 0 0 0 0 1 0 0 0 0 32

7 8 p7 p8 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 64 128

9 d1 1 1 0 0 1 1 0 0 0 0 51

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 d2 d3 d4 p9 d5 d6 d7 d8 d9 p10 d10 d11 d12 d13 d14 d15 d16 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 0 1 0 0 1 0 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 85 127 173 256 283 309 327 413 420 512 579 541 679 910 1003 841 744

5

1

1

6

3

2 A

2

3 3

6

0.5

D

8

11

12

3 B

2

2 S

0.17

0.33

B

0.33

1 14

1

0.5

14

0.17

15

16 (1/4)

Expected Value 1

2

3

Suspicion Value 4

(1/3)

1

2

3

4

D

8

11

12

15

16

Decoder of 2G4L(26,16) 5

A

Synthesis Results Area (um^2) BCH(26,16) 2G4L(26,16)

Power (mW)

0.028

6

B

0.042

0.028

D

8

5

11

12

A

15

16

S

0.043

0.083

6

B

0.125

0.083

Latency (ns)

Encoder

Decoder

Encoder

Decoder

Encoder

Decoder

2872

21043 22173

2.5107

13.758 14.0756

0.9

2.2365

0.78

3.4 3.35

2744

0.014

S

0.042

14

0.014

0.125

14

0.043

49 0

40 0

43 0 46 0

31 0

34 0 37 0

22 0 25 0

28 0

13 0

16 0 19 0

10 0

10

3

40

Usage Probability

0.86 70

NUMBER OF ROUTES

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 p7 p8 p9 p10 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15 d16 0 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 0 1 0 1 0 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 0 1 64 128 256 512 873 443 886 389 778 381 762 669 595 975 247 494 988 209 418 836

50% Noise

0.88

0.1

Parity check matrix of BCH(26,16) 5 p5 0 0 0 0 1 0 0 0 0 0 16

0% Noise 20% Noise

0.9

0.2

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Implementations 1 2 3 4 p1 p2 p3 p4 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8

Accuracy

0.98

0.7

A(g)

Bit Position Endoded Data Bits p1 p2 p3 p4 Parity bit p5 coverage p6 p7 p8 p9 p10 Decimal

S2

Accuracy of Diagnosis

1000

Bit Position Endoded Data Bits p1 p2 p3 p4 Parity bit p5 coverage p6 p7 p8 p9 p10 Decimal

S1

0.0000

1

10000

0

Mesh 5

b. Up-going links

16

Accuracy

1000000

Number of Syndromes

10000000

13

0

Encoder of 2G4L(26,16)

S7

6.00E-04

0.0001 7

1

Error-locality-aware Codes

 2G4L(26,16):  Same cost as BCH(26,16)  More reliable

S6

2.00E-04

0.0001

 BurstCode(26,20):  25% higher code-rate than BCH(26,16)  Only reliable against adjacent errors

S5

S3

5.00E-04

0.0001 5

9

 Designing codes for burst (local) and random (global) errors

S2 S1

0.0002 3

0.0000

S4

5

0.0000

0.0005 2

3

16

0%

5%

10%

15%

20%

25%

30%

35%

40%

Conclusion  A comprehensive end-to-end solution for error correction, data collection, and defect diagnosis and replacement for on-chip networks has been proposed.  Four interleaved 2G4L(26,16) provide two random and up to 16 adjacent-bit error corrections per flit.  E2E error pattern information is gathered in a centralized software on the host processor and used for diagnosis of defective wires.  Under heavy noise, high escape rate, uncertainty about routing, and many other harmful effects, the collected data are still accurate enough for diagnosis.  The collected data can also be used for other purposes such as diagnosis of defective routers, locating the intermittent faults, and many other interesting system observations.

End-to-End Error Correction and Online Diagnosis for ...

Jun 1, 2011 - End-to-End Error Correction and Online. Diagnosis for On-Chip ... Online Diagnosis. Experimental .... 4-degree interleaving provides 16-bit.

222KB Sizes 0 Downloads 128 Views

Recommend Documents

No documents