MBZip: A Case for Compressing Multiple Data Blocks Raghavendra K, Biswabandan Panda and Madhu Mutyam Department of CSE, IIT Madras 1

Problem Definition

To propose a framework that can compress multiple data blocks into one single block (zip) at the LLC and at the DRAM. 2

Our Solution: MBZip, where multiple consecutive blocks that share common data pattern are compressed together into a zipped block using BDI[1], and thus need only only one set of encoding bits and a common base, in total.

compressible

2-7

address - index - offset

log2(# sets)

log2(block size)

tag

index

offset

t

address - index - zip - offset

log2(# sets)

tag

index

T

t0

t1

100%

b0

b1

t2

6

• On an average, more than 30% of the columns residing in a single page, when grouped together in groups of two to six columns, can be compressed into a single column. Created by Peter Downing – Educational Media Access and Production © 2011

b0 - b3 b1 b2 64-bytes

T0

T1

b0 b0 b1 B0 B0 B1 B1 B1

64-bytes

64-bytes

(b) Compressed cache (BDI)

(c) Zipped cache (MBZip)

b2

b3

b4

b5

...

8kB page

b2

b3 - b8 b4 - b8 b5 - b8 . . . 8kB page

Zipped DRAM Page (MBZip)

1. A block of data is either stored in uncompressed or zipped format (utmost 6 consecutive columns zipped into a single column).

bwaves, GemsFDTD, h264ref, mesa, zeusmp, calculix, gromacs, sjeng bzip2, soplex, omnetpp, ammp, galgel leslie3d, mgrid, twolf, vortex2 hmmer, lbm, mcf, milc

2. For each column, 8 bits of metadata information (3 encoding bits & 5 valid bits) is stored in a reserved DRAM space. A metadata cache to store metadata of frequently used rows. 3. Using MBZip-M, we can service multiple block requests with a single read, & hence improve performance. 4. The same block of data might be present in 6 different columns. This replication does not change the generic DRAM address mapping apart from reserving space for meta data.

ZF and CS Neither ZF nor CS

• ZF: If more than 20% of the blocks can be zipped into a single block. • CS: If the ratio of improvement in performance by going from 1MB to 2MB LLC is greater than 10%. • 70 4-core and 25 8-core workload mixes. • gem5 simulator, LLC of 4MB/8MB for 4-/8-cores, DDR3 with 8KB page size, Cache block size – 64B

Results

10

• Harmonic Speedup (HS) compared to a system with no compression. HS for 4-core: 15.4% and 21.9% improvement by MBZip-C and MBZip-CM, respectively • Bandwidth reduction for 4-core (in terms of DRAM reads) – 29.6% and 39.7% reduction by MBZip-C and MBZip-CM, respectively. BDI MBZip-C MBZip-CM

BDI MBZip-C MBZip-CM 1.25

1.2

1.2

1.15

1.15

1.1

1.1

MBZip life cycle (MBZip-CM)

8 Zipped/uncompressed block response

ammp bwaves bzip2 calculix galgel GemsFDTD gromacs h264ref hmmer lbm leslie3d mcf mesa mgrid milc omnetpp sjeng soplex twolf vortex2 zeusmp average

2-5

ammp bwaves bzip2 calculix galgel GemsFDTD gromacs h264ref hmmer lbm leslie3d mcf mesa mgrid milc omnetpp sjeng soplex twolf vortex2 zeusmp average

1

t1

8-bytes

Generic DRAM Page

Opportunity at the DRAM

t0

Zipping at the DRAM (MBZip-M)

b0 b1 64-bytes

• On an average, around 25% of the cache blocks, when grouped together in groups of two to eight blocks, can be compressed into a single cache block.

t3

8-bytes

20%

100% 80% 60% 40% 20% 0%

offset

1. Similar to BDI, MBZip doubles the # of tags per set. The index bits are shifted by the maximum number of blocks a zipped block is allowed to hold). 2. Half of the tags retain the generic index function, whereas the remaining employ zipped index function. 3. Each block in a zipped block has it's own set of coherence bits.

b0 b0 b1 b2 b2 b2 b3 b3

7

40%

4

t1

(a) Generic cache

60%

0%

zip

t0

32-bytes 64-bytes

8

80%

log2(# zb) log2(block size)

Zip Friendly (ZF)

Cache Sensitive (CS)

t: Generic tag T: Zipped tag

Opportunity at the LLC uncompressible

Zipping at the LLC (MBZip-C)

6

Evaluation

9

1. Handling multiple types of cache blocks: uncompressible (64B), compressible (< 64B), and zipped (8 to 64B). 2. Accessing these blocks (including the ones residing in a zipped block), without incurring additional latency. 3. Mapping fixed size virtual pages to variable size compressed DRAM pages.

Motivation

1. Existing compression techniques such as BDI [1] and LCP [2], compress a single cache block/DRAM column independently. 2. Applications exhibit data locality that spread across multiple consecutive data blocks.

3

Challenges

5

Zipped/uncompressed dirty block

LLC

Write buffer

Request

DRAM valid bits update Read queue

Write queue

DRAM

From secondary storage

1. A page is brought into DRAM is stored in the zipped format. 2. Either an uncompressed block or a zipped block is transferred to the cache along with the corresponding valid bits, and a generic or zipped indexing function is chosen accordingly . 3. When a zipped block containing dirty data is evicted from the cache, the entire block is written to the write buffer and from thereon to the write queue. (The zipped block might contain other clean blocks.) 4. This dirty zipped block is written back to the DRAM and the valid bits are updated . Also, the valid bits of the previous five columns are updated so that stale data is not serviced.

1.05

1.05

1

1 WS HS Normalized WS and HS for 4-core system

11

WS HS Normalized WS and HS for 8-core system

Reference

[1] Pekhimenko et. al., “Base-delta-immediate compression: practical data compression for on-chip caches”, PACT 2012, pp 377-388. [2] Pekhimenko et. al., “Linearly compressed pages: a lowcomplexity, low-latency main memory compression framework ”, MICRO 2013, pp 172-184.

Generic Research Poster - 4 Templates

MBZip: A Case for Compressing Multiple Data Blocks ... encoding bits & 5 valid bits) is stored in a reserved ... The same block of data might be present in 6.

488KB Sizes 1 Downloads 180 Views

Recommend Documents

Generic integration templates for fictive communication ...
scenarios through fictive interaction. Cognitive Linguistics, 24(2). Turner, M., 2014. The origin of ideas: blending, creativity, and the human spark. Forthcoming in.

Homecoming Poster 2017 (4).pdf
Pep Assembly Main Gym 1:10-2:50 p.m.. Parade Downtown 4:45 p.m.. Homecoming Football Game vs. Elkhorn 7:00 pm. Court Presentation at Half-Time of Football Game. Fireworks at Conclusion of Football Game. Page 1 of 1. Homecoming Poster 2017 (4).pdf. Ho

survey research poster 2018spring.pdf
survey research poster 2018spring.pdf. Open. Extract. Open with. Sign In. Details. Comments. General Info. Type. Dimensions. Size. Duration. Location. Modified.

Research Articles Generic inefficiency of equilibria in ...
... of reaching date-event σt. Thus, agent h's state prices u (cs) are summable. ..... Of course, all equilibria will be Pareto efficient if markets are complete. Huang.

POSTER GLASGOW TORY Conference demo MAY 4 march 2017.pdf
POSTER GLASGOW TORY Conference demo MAY 4 march 2017.pdf. POSTER GLASGOW TORY Conference demo MAY 4 march 2017.pdf. Open. Extract.

templates - Martha Stewart
SHIVERING BATS SHADE and BEDECKED WITH BATS. Enlarge template to fit projects. FRIGHTFUL BITES. Enlarge templates to fit cookies. DREARY DRAPES. Download full-size template from marthastewart.com/halloween, or use these as a guide to draw your own. B

4 research synthesis.pdf
38. Page 3 of 13. 4 research synthesis.pdf. 4 research synthesis.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying 4 research synthesis.pdf.

generic postcard.pdf
Page 1 of 1. a little. Note. a little. Note. a little. Note. a little. Note. Page 1 of 1. generic postcard.pdf. generic postcard.pdf. Open. Extract. Open with. Sign In.

Template - Generic -
Dec 1, 2016 - This three-hour awareness session, led by the New Jersey Office of Homeland Security and Preparedness (NJOHSP), provides you with an ...

Generic Desired Adaptation Outcomes
Robust policies, programmes and actions for CC adaptation. 3. Accurate weather forecasting, reliable seasonal predictions, climate projections & effective early.

Personal Weightloss Templates
ExcelPowerpoint Excel, ExcelOnline Template His And Her Weight. Loss Tracke. Page 2 of 2. 14995914054 ... mplates.pdf. 14995914054 ... mplates.pdf. Open.

Generic WBT Rules.pdf
Page 1 of 5. Rule #1. Follow. directions. quickly! Page 1 of 5. Page 2 of 5. Rule #2. Raise your. hand for. permission to. speak. Page 2 of 5. Page 3 of 5. Rule #3.

generic 1..414 -
The impact test specimen and notch location and orientation shall be as given in the Section requiring such tests. When qualifying pipe in the 5G or 6G position, ...

generic meal planner.pdf
Page 1 of 1. Menu. Monday Beverages Meat. Tuesday Bread/Bakery Produce. Wednesday Canned Goods Cleaners. Thursday Dairy Paper Goods. Friday Dry/Baking Goods Personal Care. Saturday Frozen Foods Other. Sunday. Shopping List. Our Family's. Weekly Meal

Generic WBT Rules.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Generic WBT ...

templates (2).pdf
620-006497.003. 08 MM. M6X1. 36 MM. 82 MM. T. L. L = T + 15 MM. Page 1 of 1. templates (2).pdf. templates (2).pdf. Open. Extract. Open with. Sign In. Details.

D Templates: A Tutorial - GitHub
In the next chapters, you'll see how to define function, struct and class templates. ...... can find on the D Programming Language website, but they act in a natural ...... opDispatch is a sort of operator overloading (it's in the same place in the o

Generic iReady REsults.pdf
Generic iReady REsults.pdf. Generic iReady REsults.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Generic iReady REsults.pdf. Page 1 of 1.

Optimistic Generic Broadcast
Sep 28, 2005 - client → server: “book room 5”. 2 server → client: “room booked” client server book room 5 ..... identical instances share state finite resources ...

AUTOMORPHISMS AND AUTOEQUIVALENCES OF GENERIC ...
map. While it is very easy to describe explicit examples of algebraic K3 surfaces, the non-algebraic ones are usually presented with a rather abstract approach ...

Generic Personalized PDF
major ERP vendors have all made significant acquisitions, intended to enhance or replace the. EPM and BI ... The value of ERP standardization lies in leveraging best practices across repeatable process such as accounting and logistics. Performance ma

AUTOMORPHISMS AND AUTOEQUIVALENCES OF GENERIC ...
category of OX-modules instead of the bounded derived category of coherent sheaves (Section 4.3). ...... Algebraic structures and moduli spaces, CRM Proc.

SPSP Poster 2015
www.sciencecox.com. Twitter: @ScienceCox. Gay-Stereotypic. (Shopping). Stereotype-Neutral. (Generic). Straight-Stereotypic. (Sports). Participants (N = 469) made gay-or-straight judgments about a series of fictitious men's. Facebook profiles, which h

poster -@ ! si
SUMMARY. The evolution of ... iodide acts as a primitive electron- donor, through peroxidase, and has ... iodine ions that acts in the unique environment of the ...