Course Project

The final component of this course is a project; there is no final exam.

Quick Facts

Teams

You may work in a team of 1-4 people. I think 2-3 generally works better for projects of moderate size. A team of 4 really needs justification: is there really going to be enough work to keep everyone engaged? (are there genuinely 4 parallel tasks at any point in the project?) A solo team can work okay if the student feels confident about what’s required to do the work, and doesn’t mind the extra effort (but having even one more could bring some creativity etc.).

Project Content

The goal of the project is to give you an opportunity to get hands-on experience with architecture experimentation and maybe even research. There are two main options:

  1. Hacksim: Hack an architecture simulator to implement a new architecture or microarchitecture idea, and evaluate it on some meaningful workloads. Given the limited time for this class, it is completely okay to re-implement an existing technique. The homeworks have prepared you for using gem51, so the natural tool to use will be gem5.

  2. Open-ended: Propose a research idea and evaluate it using any means you like. You are free to combine with ongoing research from your own studies, or with another course, provided the scope of the project implementation submitted for this course is sufficient. Please see me if this is unclear.

Project Proposal

The project proposal is a short document that explains basically what problem you are trying to address, how you’ll do it, and what the goals are. It is not graded, but it is required – I will use this to give you quick feedback on whether your project looks feasible/sufficient.

Please include:

  1. Topic: The problem and your basic idea to address the problem
  2. Method: Simulator, workloads, metrics, basic experiments
  3. Plan: Stages of implementation/evaluation – backup if you can’t finish everything

Hacksim Ideas:

The following is a short list of ideas.

In all of these projects, you may cut as many corners as you like: think about what is the minimum implementation needed to understand the phenomenon or technique you are studying.

Improving gem5’s OOO Core:

  1. Value Prediction: Implement value prediction in gem5. Maybe something simple, or even something more complicated like VTAGE?

  2. Practical Large-windows? Apple M1 has a HUGE reorder buffer (ROB). Do they implement something like a waiting-instruction buffer (WIB) or CFP to make this practical/worthwhile? (Remember, WIB takes dependent instructions out of the reservation stations).
    Is this what makes them so good? ¯_(ツ)_/¯ Model this in gem5 and characterize some workloads and get some insights (something like SPEC 2017?). You could even start with a very simplified model: mark instructions as “not-in-IQ” if they are dependent on a long-latency load, and don’t count them towards the queue size.

  3. Selective Replay: Implement selective replay in gem5 (AFAIK gem5 replays all instructions after a mispeculation).

  4. Is there anything left to do in branch prediction? Branch prediction has been crucial to out-of-order processors success, and was an area of significant improvement for several decades. Tricky because there are already some advanced predictors available in gem5; check to see what’s there first.

Understanding Microarchitecture Performance

  1. Simplified Performance Counters: Dick sites recently proposed a set of (four universal and simple performance counters)[https://www.sigarch.org/performance-counters-id-like-to-see-part-i/]. Implement them in gem5. Characterize some workloads using these counters to evaluate how helpful they were for programmers, or maybe propose your own set of counters. This is mostly a microarchitecture project, but it would be nice to expose your new counters through some instructions in gem5.

  2. gem5 Considered Harmful: Configure gem5 to be as similar as possible to a CPU and memory system that you have access to. Write or gather some microbenchmarks and figure out in what ways gem5 ``screws things up’’.

Going Vertical (ideas spanning ISAs/microarchitecture/compilers):

  1. Tensor-Cores for CPUs: Propose a few extensions to x86 (or some easier-to-modify ISA like RISCV) for implementing faster dense-linear-algebra. Evaluate on matrix convolutions/multiplications from Alexnet/VGG/Resnet, or even just simpler DSP kernels.
    Figure out whether the instruction extensions were worth it, or whether existing SIMD is simply enough.

  2. Hash-table Accelerator: Propose an ISA extension for accelerating hash tables. HTA can be a source of inspiration. Take this as an example of other simple functionality that can be added through specialized instructions.

Report guidelines:

The final project report should document the work that you have performed and the findings. You should strive to make the presentation of your report the same quality as the papers you have read during the quarter, even if it ends up being much shorter. The paper should stand alone as well – the concepts should be understandable by your classmates without having to read additional papers (ie. including relationship to existing work). Please format the paper nicely, and organize the paper with good structure. Finally, please include a statement of work which describes how each student contributed to the project.

As you may have noticed, papers typically follow one of a few canonical structures. The one below is a reasonable approach.

Benchmark Suggestions:

If you have a simulator-hacking project, you’ll need to choose a set of benchmarks that are appropriate for evaluation of your idea.

SpecCPU 2017: (Download here – Licensed for UCLA only, technically should be used on tetracosa): This is a fairly broad benchmark suite for single core applications.

A few “simpler” ones:

Footnotes:

1 I am kidding of course; nothing can actually prepare you or anyone for modifying gem5.