Offline Compression for On-Chip RAM Nathan Cooprider and John Regehr University of Utah

Microcontrollers (MCUs) ●

10 billion units / year



$12.5 billion market in 2006



Products with embedded computing are usually… − −



Low cost, high volume Very sensitive to unit cost

Bottom line: Pressure to use cheap MCUs

On-Chip RAM is Small ●

Kilobytes, not megabytes or gigabytes



Atmel AVR (8-bit RISC) examples − − −



mega48 – 0.5 KB RAM – $1.50 mega128 – 4 KB RAM – $8.75 mega256 – 8 KB RAM – $10.66

SRAM can dominate power consumption of a sleeping chip

Out of RAM – What Next? ●

Remove application features?



Buy MCUs with more RAM?



Manually reduce RAM usage?

Out of RAM – What Next? ●

Remove application features?



Buy MCUs with more RAM?



Manually reduce RAM usage?

Out of RAM – What Next? ●

Remove application features?



Buy MCUs with more RAM?



Manually reduce RAM usage?

Out of RAM – What Next? ●

Remove application features?



Buy MCUs with more RAM?



Manually reduce RAM usage?

A Closer Look ●

Is RAM used efficiently? −

Performed value profiling for embedded apps ●

− ●

Apps already heavily tuned for RAM usage

Result: Average byte stores four values!

MCUs are Harvard architecture − − −

Data in SRAM – 6 transistors / bit Code in Flash “ROM” – 1 transistor / bit 4-32x more ROM than RAM

RAM Compression ●

Automated sub-word packing for statically allocated scalars, pointers, structs, arrays −



Driven by whole-program dataflow analysis −



No heap on most MCUs

Sound for interrupt-driven concurrency

Compression level can be tuned

Method x ≝ variable that occupies n bits

Method x ≝ variable that occupies n bits Vx ≝ conservative estimate of value set

Method x ≝ variable that occupies n bits Vx ≝ conservative estimate of value set

log2|Vx| < n ⇒ RAM compression possible

Method x ≝ variable that occupies n bits Vx ≝ conservative estimate of value set

log2|Vx| < n ⇒ RAM compression possible Cx ≝ another set such that |Cx| = |Vx|

Method x ≝ variable that occupies n bits Vx ≝ conservative estimate of value set

log2|Vx| < n ⇒ RAM compression possible Cx ≝ another set such that |Cx| = |Vx| fx ≝ bijection from Vx to Cx

Method x ≝ variable that occupies n bits Vx ≝ conservative estimate of value set

log2|Vx| < n ⇒ RAM compression possible Cx ≝ another set such that |Cx| = |Vx| fx ≝ bijection from Vx to Cx n - log2|Cx| ⇒ bits saved through compression of x

Example Compression void (*function_queue[8])(void);

Example Compression void (*function_queue[8])(void); x

n = size of a function pointer = 16 bits

Example Compression

x

Vx &function_A &function_B &function_C NULL

Example Compression

x

Vx

n = 16 bits |Vx| = 4

log2|Vx| < n 2 < 16



Example Compression

x

Vx

Cx 0 1 2 3

fx ≝ Vx to Cx ≝ compression

fx-1 ≝ Cx to Vx ≝ decompression

Example Compression ROM x

Cx

Vx = {

,

,

,

}

0 1 2 3

fx ≝ compression table scan fx-1 ≝ decompression table lookup

Example Compression ROM x

Cx

Vx = {

,

,

,

}

0 1 2

128 bits reduced to 16 bits

3

112 bits of RAM saved

Implementation ●

Source-to-source transformation for C −



What about value sets of size 1? −



Rewrite declaration, initializer, reads, writes

Constant propagation (then DCE, DDE)

Optimizations −

Avoid table-driven compression funcs when possible



Align hot compressed values on word boundaries



Merge redundant compression tables



Compile-time compression when storing constants

RAM Compression Results

⇒ simulator unavailable

RAM Compression Results Constant Prop / DCE 10% RAM reduction 20% ROM reduction 5.9% duty cycle reduction

⇒ simulator unavailable

RAM Compression Results Constant Prop / DCE 10% RAM reduction 20% ROM reduction 5.9% duty cycle reduction

Compression 22% RAM reduction 3.6% ROM reduction 29% duty cycle increase ⇒ simulator unavailable

Tuning RAM Compression ●

Can elect to not compress some compressible variables −



For each compressible variable compute a cost / benefit ratio − −



But which ones?

Cost – estimated penalty in ROM or CPU cycles Benefit – RAM savings

Sort compressible variables by ratio and compress until some threshold is met

Cost/Benefit Ratio

∑ C i Ai B i V  C ≝ access profile A,B ≝ platform-specific costs V ≝ cardinality of value set

Cost/Benefit Ratio

∑ C i Ai B i V  C ≝ access profile A,B ≝ platform-specific costs V ≝ cardinality of value set

S u −S c Su ≝ original size Sc ≝ compressed size

Turning the RAM Knob 0%

Turning the RAM Knob 10%

Turning the RAM Knob 20%

Turning the RAM Knob 30%

Turning the RAM Knob 40%

Turning the RAM Knob 50%

Turning the RAM Knob 60%

Turning the RAM Knob 70%

Turning the RAM Knob 80%

Turning the RAM Knob 90%

Turning the RAM Knob 100%

Turning the RAM Knob 95%

Compression Spectrum

Compression Spectrum 95%

Compression Spectrum 95%

Conclusion ●

● ●

RAM likely to remain scarce in low-cost, low-power systems RAM is used inefficiently Manually tweaking data sizes (even among char, short, long, etc.) is unpleasant



Useful to trade haves for have-nots



CComp implements RAM compression http://www.cs.utah.edu/~coop/research/ccomp/

Analysis Times Benchmark drive1 drive2 osc genericbase ap rfmtoleds cnttoledsandrfm testdrip

HH:MM:SS 00:00:13 00:00:22 00:00:44 00:01:17 00:01:18 00:01:23 00:01:32 00:01:42

Benchmark sensetorfm testtimestamping ident surge hfs testtinysec tinydb

HH:MM:SS 00:01:42 00:02:09 00:03:13 00:03:42 00:06:06 00:12:34 01:58:45

No RAM Compression

⇒ simulator unavailable

Full RAM Compression

⇒ simulator unavailable

Offline Compression for On-Chip RAM

$12.5 billion market in 2006. ○ Products with embedded computing are usually… .... Benefit – RAM savings. ○ Sort compressible variables by ratio and.

730KB Sizes 2 Downloads 206 Views

Recommend Documents

No documents