Homework 2 -- Magic Instructions

Logistics

Due Date: 11:59pm Monday, Feb. 17th

Short Summary

Implement a ``magic’’ instruction in gem5 – override an existing instruction with whatever functionality you like, and write a very short writeup on what it is + show a trace or log of it working.
If this already makes sense to you, you may skip to “What to hand in” – the rest is an optional step-by-step guide.

Overview

Purpose The purpose of this assignment is to help you become familiar with gem5’s language for describing instruction sets.
Here you will go through the ISA files in src/arch/x86/isa and understand how instructions are decoded and broken down into micro-ops which are ultimately executed. At a minimum this will give you more experience playing with gem5 source code, and this could be a useful introduction for some projects.

The background for this assignment is that, for many years, there has existed an instruction which no one has bothered to implement in gem5. This instruction is within the “x87” extension to the x86 ISA, which is a mostly-abandoned extension for performing floating point (FP) operations using a stack-based register interface. (SSE instructions are much more commonly used by compilers today for FP operations). It now (2025) also appears that even simple x87 programs fail to run in gem5.

This instruction in question is FSUBR, and performs “reverse subtract”.
As you can guess, it does a subtraction of FP values, but its operands are reversed. The instruction FSUBR is an obscure instruction within a abandoned and broken extension, which compilers wouldn’t even bother to use even if they were compiling to x87.

Your job is to make this instruction useful. (though technically I don’t care if you override some other instruction)

Verify that the instruction is unimplemented in gem5

First, check the gem5’s FSUBR is unimplemented by writing a short program that calls the fsubr instruction and checks the result. To make sure that FSUBR is used for subtraction, we need to explicitly use it using the inline assembly feature of GCC.
This is an example piece of assembly that you can use, that subtracts the two single-precision floating point input values and returns the result.

  float reverse_subtract(float in1, float in2)
  {
    float ret = 0.0;
    asm ("fsubr %2, %0" : "=&t" (ret) : "%0" (in1), "u" (in2));
    return ret;
  }

Just to give some more context, assembly instructions are written inline with the rest of the code using the ‘asm’ code block. This code block contains two portions: the instruction portion, and the constraint portion. The instruction portion is a string containing the assembly instructions. The GNU C compiler does not check this string for correctness, so anything is allowed. The constraint portion specifies what GCC can or cannot do with the input and output operands, what registers or memory are affected by the instruction portion. There is documentation available from GCC and other sources. Or just ask chatGPT.

So in this example, we use the “fsubr” mnemonic in the ‘asm’ code block. In our case the instruction string is fsubr %2, %0. After that we specify constraints on the output operand, which we ask to be the variable ret. We then specify the two input operands: in1 and in2. The letters t and u specify the top and the second to top register of the x87 stack.

Now, write a simple C program that uses this function. It should actually work and provide the right results on an x86 machine… though that’s irrelevant for us now. : ) Feel free to check the assembly code generated by the compiler using gcc’s -S flag, or use “objdump -D binary”.

Now simulate this program using gem5 (use any core model and parameter settings you like), and verify that gem5 does not support the instruction.

Getting a hang of gem5’s decoder

Remember in microarchitectures for X86 (including our simulator), instructions are decoded into micro-ops.
A regular instruction is typically referred to as a macro-op, while the smaller parts are referred to as micro-ops. To implement an instruction in gem5, we first provide the ISA decoder with the information on the macro-op, then we provide an implementation of the macro-op in terms of micro-ops. Finally, we implement the micro-ops that are not already implemented.

Let’s go through the file one_byte_opcodes.isa (within ./gem5/src/arch/x86/isa/decoder) to understand how gem5 decodes instructions from the x86 ISA. The file is written in a language designed specifically to express instruction sets. The contents of the file are ultimately converted to a C++ switch case (a big one). At the top of the file you’ll see:

'X86ISA::OneByteOpcode': decode OPCODE_OP_TOP5 {
    format Inst {
        0x00: decode OPCODE_OP_BOTTOM3 {
            0x6: decode MODE_SUBMODE {
                0x0: UD2();
                default: PUSH(sEv);
            }

We first decode the topmost 5 bits (most significant) of the opcode byte. Each subsequent nesting level decodes another set of bits. So all together, what this means is that the opcode 0x06 is some kind of Push… Cool! There is a little bit more documentation here.

Finding the Unimplemented instruction

First, we have to find and replace the unimplemented macro-op in the decoder. To do that, read the entire ISA manual AMD, or find the relevant portion for fsubr.

Presuming you’ve done the above, we learn that we can locate the fsubr instruction here in the decoder, specifically in x87.isa:

0x1B: decode OPCODE_OP_BOTTOM3 default Inst::UD2() {
      0x0: decode MODRM_REG {

All x87 instructions begin with an opcode byte in the range 0xD8 to 0xDF. Therefore the topmost 5 bits always are 0x1B. That’s why the number before the first parens is 0x1B, because this file only covers those opcodes.
After that, it decodes the bottom 3 bits (e.g. all zeroes in the snippet above).

You can take a look at Table A-15 (page 443) in the manual. mentioned above for the instructions represented by different cases for the bottom three bits. For example, FSUB and FSUBR are represented by opcodes 0xD8 and 0xDC, ie. the cases 0x0 and 0x4. To distinguish between the functionality provided by these different opcodes for the same instruction, you could read about the ModRM field of x86 in the linked manual.

In the file x87.isa, you can check that we have FSUB appearing the cases statements for 0x0 and 0x4. You can also observe that FSUBR’s implementation is missing (what we mean here, is that there is a function call to fsubr(), which is not implemented. We’ll need to replace that…).

Replacing the Unimplemented instruction

In the older versions of this homework, you would actually implement the fsubr instruction as a new macro-op. If you’d like to see how that works, you can check it out here. That can be useful if you want to use the operands etc.

Presuming you don’t care what your instruction does, you can decode your instruction into a dummy instruction, called “BasicOperate”.

0x5: BasicOperate::magicInst({ {/*code to execute*/} }, IsMagic);

You can put some code in the “code to execute” portion above, which would get executed when the dynamic instruction is actually executed in the simulator, but we can also leave it blank.

Notice there’s a new flag we added called “IsMagic”.
Flags describe properties of static instructions – e.g. is this for control flow. For us, it just helps us identify our special instruction. You’ll need to define this flag in StaticInstFlags.py.

         "IsHtmCancel",  # Explicitely aborts a HTM transaction
+        "IsMagic", # magic instructions cool!
         "IsInvalid",  # An invalid instruction

You’ll also need to add a way to access that property in the static instruction definition in src/cpu/static_inst.hh.

     bool isMicroop() const { return flags[IsMicroop]; }
+    bool isMagic() const { return flags[IsMagic]; }
     bool isDelayedCommit() const { return flags[IsDelayedCommit]; }

Do something useful

At this point, gem5 should compile, and you have a magic instruction that you can do anything with. Your assignment is to do anything you want that can be construed as useful.

A very simple idea, that meets the minimum requirements, is to print out when that instruction reaches fetch, dispatch, rename etc. Another idea is to have the instruction start/stop a timer so you can time a region of interest in the program. But feel free to do whatever you like.

If you want to mess around with the out of order core, you’ll probably want access to the flag from the dynamic instruction, which you can do by providing a new interface in src/cpu/o3/dyn_inst.hh:

     bool isSerializing()  const { return staticInst->isSerializing(); }
+    bool isMagic()  const { return staticInst->isMagic(); }
     bool isSerializeBefore() const

What to Hand In to Canvas

A report PDF containing:
- A couple sentences on what you did with your magic instruction.
- Some evidence of the magic instruction doing what you said it should do (a trace, a log, or some stats)
- Source of any program used to test your instruction
A patch file containing the changes made to gem5, e.g. with git diff.

Please turn in the PDF separately than the patch.

How we will grade this:

70 points for completing the assignment, and 30 points for the report.

CS251a