Homework 3 -- LRU Approximations
Overview
Due: EOD Mar. 3rd, 2026
The goals of this homework are:
- Get experience hacking on part of gem5.
- Practice considering trade-offs in cache design.
- Work through simulation with more complex workloads (useful for projects!).
- As usual, you may work with a group.
Setup
In this lab, we’ll investigate different replacement policies, some of which use fewer bits of information per set than full LRU.
Step 1: LRU Approximations
Check out prior LRU approximations and see how they work:
- Tree PLRU: gem5 already implements a pseudo-LRU replacement policy based on trees (
tree_plru_rp.hh). - NRU (comparison baseline): Replace a block that has not been used recently, using 1 bit per block to remember whether a block has been used in the recent past. See this paper for more details.
Your required implementation for this step is NMRU: replace a random block that is not the most recently used. This tutorial will tell you how to accomplish this in gem5.
Step 2: Beyond LRU
LRU is not the best policy for all workloads. One example is when the working set is slightly larger than cache capacity: imagine a simple case where you’re streaming through a vector many times. LRU will always keep the more recently used data, so by the time you get to the beginning again, the beginning of the vector has been evicted.
Look through the available cache replacement policies in gem5 (gem5/src/mem/cache/replacement_policies/), and select
one that is thrash-resistant. We’ll call that policy THRASH in the rest of this homework.
Step 3: Benchmarking
You will mainly benchmark with SPEC 2017. Follow the directions here to run SPEC 2017. Don’t wait until the last minute to figure this part out; it is probably the hardest part. Sometimes it’s hard to get every workload from SPEC working. If you can get at least 5 workloads running, that is enough.
Second, create a benchmark that you know will have thrashing behavior at some cache level (you can pick). Include this in your benchmark set to check whether THRASH actually delivers on thrash resistance.
Step 4: Architectural Exploration
Suppose you are designing the L1D and L2 data caches of a new processor based on gem5’s out-of-order O3CPU. You’re trying to decide whether it is worth implementing a complex policy, higher associativity, or both.
Run the simulations of the discussed policies (NMRU, NRU, PLRU, LRU, THRASH) on the SPEC workloads. Assume you have a 64KB L1D and a 2MB L2. Assume both L1D and L2 policies are the same, and that you are using 8-way set associativity by default. Please use these additional parameters: L1I uses LRU, frequency is 2GHz, and the core is 8-issue OoO.
Now answer the following questions:
- Q1. How well do the three pseudo-LRU policies approximate LRU? Do they ever do better by accident?
- Q2. Is the thrash-resistant policy ever useful? Can you explain when and why? (Remember your microbenchmark.)
- Q3. What matters more: higher associativity or a more sophisticated policy?
- Q4. What’s your overall recommendation? Performance is most important, and storage cost is a secondary consideration.
Your report should describe the configurations you simulated and use charts/graphs to back up your answers and analysis. Feel free to choose the experiments as appropriate, or use microbenchmarks if you like.
What to Hand In
Please turn in to Bruin Learn separately:
- A PDF of the report.
- A zip file containing the
config.iniandstats.txtfor the gem5 simulations.
How We Will Grade This
50 points for completing the implementation/experiments, and 50 points for the analysis.
Citation
This homework is based on gem5 101 homework 5, but we use cooler workloads now. Thanks Zhengrong Wang!