Computer Architecture, Semester 1, 2017 Assignment — RISC-V RV32I ISS — Stage 2 Your task for Stage 2 of this assignment is to add simulation of cache memories to the ISS from Stage 1. The organization of the simulated computer system is shown in the diagram at the right. There are separate level-1 instruction and data caches attached directly to the CPU. Instruction fetches are served by the instruction cache, and data loads and stores are served by the data cache. The level-2 cache is optional. Misses in each of the level-1 caches are served by the level-2 unified cache if it is present, or by main memory otherwise. Misses in the level2 cache, if present, are served by main memory.
5,6&9&38
/HYHO ,QVWUXFWLRQ &DFKH
/HYHO 'DWD &DFKH
The parameters of the caches are specified using command line arguments: • -b1 value: block size in bytes for each of the level-1 caches. • -s1 value: total cache size in Kbytes for each of the level-1 caches.
/HYHO 8QLILHG &DFKH
• -a1 value: associativity for each of the level-1 caches; may be the word “full” for full associativity. • -b2 value: block size in bytes for the level-2 cache. • -s2 value: total cache size in Kbytes for the level-2 cache.
0DLQ0HPRU\
• -a2 value: associativity for the level-2 cache; may be the word “full” for full associativity. If all of the options for the level-2 cache are omitted, the level-2 cache is not present. All of the values must be powers of two. The block size for the level-1 caches must be no larger than that for the level-2 cache. Additional cache requirements are: • The level-1 instruction cache is read-only, with LRU replacement. • The level-1 data cache is write-through/write-around, with LRU replacement. • The level-2 cache is write-back/write-allocate, with LRU replacement. In Stage 2, there are two new ISS commands to be implemented: Command
Operation performed
pi address
Probe the instruction cache hierarchy (level-1 instruction cache, level-2 cache, main memory) to check whether the block containing the address is in each cache, and if so, display the block contents.
pd address
Probe the data cache hierarchy (level-1 instruction cache, level-2 cache, main memory) to check whether the block containing the address is in each cache, and if so, display the block contents.
Examples of these commands are: pi 2000 Level-1 instruction cache: hit, set = 200, entry = 0: 80000137 00010113 00000413 00001537 Level-2 cache: hit, set = 200, entry = 0: 80000137 00010113 00000413 00001537 Memory: 80000137 pd 891a2b50 Level-1 data cache: miss, set = 2b5 Level-2 cache: hit, set = 2b5, entry = 0, dirty: 891a2b50 00000000 00000000 00000000 Memory: 00000000 1
Probing a level-1 cache checks for a hit or miss, and in the case of a hit, displays the block contents. It then probes the level-2 cache in the same way, followed by probing memory to display the stored value of the word. The skeleton program provided for Stage 1 already includes all the command-line option and ISS command processing. In the memory_accessible class, there are pure virtual methods for reading and writing a block of data words, and for probing at an address. There are corresponding methods that you should implement in the memory class. In the processor class, you should implement the methods to probe instruction and data addresses. These are called by the command interpreter for the pi and pd commands, respectively. The skeleton also includes the cache class; you need to implement the methods in the cache.cpp file. The rvsim program instantiates a cache class object for each of the level-1 and level-2 caches, and provides constructor arguments to configure each object according to the command line options and requirements described on page 1. For the undergraduate course, the cycle_reporting argument will be set to false, and the address_cycles and data_cycles arguments can be ignored. For the postgraduate course, please see the description on page 3 for details of these arguments. Each cache object is required to count the number of accesses, the number of misses, and for write-back caches, the number of times a replaced block is dirty. The report_accesses method is called at completion of a simulation to report values on the standard output, as illustrated by the following example: Level-1 Level-1 Level-1 Level-1 Level-2 Level-2 Level-2
instruction cache access count: 22600035 instruction cache miss rate: 0.000001 data cache access count: 10800018 data cache miss rate: 0.074291 cache access count: 3001586 cache miss rate: 0.266771 cache dirty on replacement rate: 0.500300
You must develop your program as an extension to your existing RISC-V ISS in the 2017/s1/ca/rvsim directory in your SVN repository. If your ISS for Stage 1 did not pass all the tests for that assignment, you may need to correct some errors in order to pass test cases for cache behaviour, as the cache tests will require correct execution of some instructions. Details of the cache test suite will be made available later. I will provide a web submission script that checks out your SVN directory, makes your ISS, and runs it with the cache test cases. Compliance with this development process will count toward the assessment of the assignment. The script will compare your output with our expected output using the “diff -iw” command (differences ignoring case and white-space). Your work will be assessed on the following criteria: • Program builds and runs using web submission script — 100 points • Correct cache behaviour and reporting of results — 1300 points • Program efficiency, based on run time not exceeding a limit— 100 points The marks for this assignment will comprise 15% of your final assessment for the course. The deadline for submission is 11:59pm Sunday 11 June 2017.
2
Postgraduate requirements If you are enrolled in the postgraduate course (COMP SCI 7026), you must implement the following additional requirements for analyzing the performance effect of memory stalls on the CPU. If the -c command line option is specified, the rvsim program will set cycle_reporting to true, and will provide the values of the following additional options to the level-2 cache and memory objects: • -ca2 value: Number of cycles for the level-2 cache to lookup a block address. • -cd2 value: Number of cycles for the level-2 cache to access and transfer a word from level-2 cache storage to the requesting level-1 cache. • -cam value: Number of cycles for the main memory to lookup an address. • -cdm value: Number of cycles for the main memory to access and transfer a word from main memory storage to the requesting level-2 cache. These values are only used for read accesses from the level-2 cache and main memory; they are the cycles during which the CPU is stalled. It is assumed that all write accesses are fully buffered and do not delay the CPU. The effects of contention between buffered writes and subsequent reads are ignored. For the level-1 caches, the rvsim program sets the cycle_reporting argument to false and the address_cycles and data_cycles arguments to 0. It is assumed that a hit in a level-1 cache does not stall the CPU. If a level-1 cache misses and the block is read from the level-2 cache, the level-2 cache counts the number of cycles required for it to provide the block to the level-1 cache, since the CPU is stalled for those cycles. Additionally, if the level-2 cache misses and the block is read from main memory, the memory counts the number of cycles required for it to provide the block to the level-2 cache, since the CPU is stalled for those additional cycles. On completion of a simulation, if the -c option is specified on the command line, the rvsim program uses the accumulated read-cycle counts from the level-2 cache and the main memory to calculate the actual CPI for the code running on the CPU, to contrast it with the ideal CPI without memory stalls. An example of the output is: Instructions executed: 22600035 Level-1 instruction cache access count: 22600035 Level-1 instruction cache miss rate: 0.000001 Level-1 data cache access count: 10800018 Level-1 data cache miss rate: 0.074291 Level-2 cache access count: 3001586 Level-2 cache miss rate: 0.266771 Level-2 cache dirty on replacement rate: 0.500300 CPU cycle count: 24600040 Stall cycles accessing level-2 cache: 4015780 Stall cycles accessing memory: 19217688 Average CPI (ideal): 1.088496 Average CPI (actual): 2.116524
3