Homework 2 -- Implementing a Useless Instruction

Logistics

Due Date: 11:59pm Friday, Feb. 17th

Expected Duration: A few hours only (more if you try to understand gem5 operation in detail)

Note: As usual, you may work in pairs on this assignment.

Overview

Purpose The purpose of this assignment is to help you become familiar with gem5’s language for describing instruction sets.
Here you will go through the ISA files in src/arch/x86/isa and understand how instructions are decoded and broken down into micro-ops which are ultimately executed. At a minimum this will give you more experience playing with gem5 source code, and this could be a useful introduction for some projects.

The background for this assignment is that, for many years, there has existed an instruction which no one has bothered to implement in gem5. This instruction is within the “x87” extension to the x86 ISA, which is a mostly-abandoned extension for performing floating point (FP) operations using a stack-based register interface. (SSE instructions are much more commonly used by compilers today for FP operations). This instruction in question is FSUBR, and performs “reverse subtract”. As you can guess, it does a subtraction of FP values, but its operands are reversed. The instruction FSUBR is an obscure instruction within a abandoned extension: it is so worthless, in fact, that compilers don’t generally bother to use it, and is thus so uncommon that it has no implementation in gem5, even though it would be absolutely trivial to implement it.

This trivial task is your assignment.

Verify that Gem5 is Broken

First, check the gem5’s FSUBR implementation is still broken, by writing a short program that calls the fsubr instruction and checks the result. To make sure that FSUBR is used for subtraction, we need to explicitly use it using the inline assembly feature of GCC.
This is an example piece of assembly that you can use, that subtracts the two single-precision floating point input values and returns the result.

  float reverse_subtract(float in1, float in2)
  {
    float ret = 0.0;
    asm ("fsubr %2, %0" : "=&t" (ret) : "%0" (in1), "u" (in2));
    return ret;
  }

Just to give some more context, assembly instructions are written inline with the rest of the code using the ‘asm’ code block. This code block contains two portions: the instruction portion, and the constraint portion. The instruction portion is a string containing the assembly instructions. The GNU C compiler does not check this string for correctness, so anything is allowed. The constraint portion specifies what GCC can or cannot do with the input and output operands, what registers or memory are affected by the instruction portion. There is documentation available from GCC and other sources. Or just ask chatGPT I guess.

So in this example, we use the “fsubr” mnemonic in the ‘asm’ code block. In our case the instruction string is fsubr %2, %0. After that we specify constraints on the output operand, which we ask to be the variable ret. We then specify the two input operands: in1 and in2. The letters t and u specify the top and the second to top register of the x87 stack.

Now, write a simple C program that compares the result of reverse subtract (FSUBR) and subtract (FSUB). Verify that the instruction works correctly by compiling it and running it with some inputs on your host x86 machine. You may also look at the assembly code generated by the compiler using the -S flag, or use “objdump -D binary”.

Now simulate this program using gem5 (use any core model and parameter settings you like), and verify that the answer is wrong. You’ll get a warning that gem5 does not support the instruction.
IMPORTANT: Please make sure to test with a variety of inputs. Specifically test a case where the answer is not one of the inputs, as that is the default behavior for an unimplemented instruction. : _ )

Notes:

Background on Decoding in Gem5

Remember in microarchitectures for X86 (including our simulator), instructions are decoded into micro-ops.
A regular instruction is typically referred to as a macro-op, while the smaller parts are referred to as micro-ops. To implement an instruction in gem5, we first provide the ISA decoder with the information on the macro-op, then we provide an implementation of the macro-op in terms of micro-ops. Finally, we implement the micro-ops that are not already implemented. We will carry out these steps for the FSUBR instruction. Our implementation of FSUBR will mirror that of FSUB, whose implementation is already available in gem5. The fastest and most appropriate way to do this assignment will involve basically pattern matching and copy and paste without needing to fully understand completely how each piece works, since gem5’s decoder is somewhat annoyingly complex. : )

There are many ways in which instructions are encoded in the x86 ISA. We will focus on the x87 subset. You can read more about instruction encoding in a manual provided by AMD. Let’s go through the file one_byte_opcodes.isa (within ./gem5/src/arch/x86/isa/decoder) to understand how gem5 decodes instructions from the x86 ISA. The file is written in a language designed specifically to express instruction sets. The contents of the file are ultimately converted to a C++ switch case (a big one). At the top of the file you’ll see:

'X86ISA::OneByteOpcode': decode OPCODE_OP_TOP5 {
    format Inst {
        0x00: decode OPCODE_OP_BOTTOM3 {
            0x6: decode MODE_SUBMODE {
                0x0: UD2();
                default: PUSH(sEv);
            }

We first decode the topmost 5 bits (most significant) of the opcode byte. Each subsequent nesting level decodes another set of bits. So all together, what this means is that the opcode 0x06 is some kind of Push.

There is a little bit more documentation here.


Now lets look at where we decode the fsubr instruction, specifically in x87.isa.

0x1B: decode OPCODE_OP_BOTTOM3 default Inst::UD2() {
      0x0: decode MODRM_REG {

All x87 instructions begin with an opcode byte in the range 0xD8 to 0xDF. Therefore the topmost 5 bits always are 0x1B. That’s why the number before the first parens is 0x1B, because this file only covers those opcodes.
After that, it decodes the bottom 3 bits (e.g. all zeroes in the snippet above).

You can take a look at Table A-15 (page 443) in the manual. mentioned above for the instructions represented by different cases for the bottom three bits. For example, FSUB and FSUBR are represented by opcodes 0xD8 and 0xDC, ie. the cases 0x0 and 0x4. To distinguish between the functionality provided by these different opcodes for the same instruction, you will have to understand the meaning of the ModRM field of the instruction. Read about it in the manual linked to.

Hacking the Decoder for FSUBR

In the file x87.isa, you can check that we have FSUB appearing the cases statements for 0x0 and 0x4. You can also observe that FSUBR’s implementation is missing (what we mean here, is that there is a function call to fsubr(), which is not implemented. We’ll need to replace that…).

  1. As a first step, lookup what the required functionality is for your particular FSUBR. The opcode could either be 0xD8 or 0xDC. Note that there are equivalently two versions of FSUB in gem5:
    FSUB1 and FSUB2. Then go look in the manual what that instruction should do. (it’s a bit cryptic, feel free to guess and check) Remember that x87 has a stack of operands; “st(0)” is the head of the stack.

  2. Now, notice the placeholder fsubr() in the file x87.isa, with something similar to those specified for FSUB in the same file. I.e. something like Inst::FSUBR1(…) or Inst::FSUBR2(…), you are asking for that instruction to be used, instead of the placeholder, which simply prints a warning that the instruction is not implemented.

  3. Now, we need to provide an implementation of the new macro-op in terms of some micro-ops. Again we will mirror the implementation of the FSUB1 or FSUB2. (or implement both if you like!) Go to the directory src/arch/x86/isa/insts/x87/arithmetic/. This directory holds the definition of different x87 arithmetic instructions in terms of micro-ops. Take a look at how the FSUB instruction has been implemented using micro-ops. FSUB1 and FSUB2 correspond to the two different opcodes that we mentioned before. For each type, we have to provide three different implementations: one that only uses registers (_R), one that reads one of the operands from the memory (_M) using the address provided in the instruction and the last one uses the address of the instruction pointer to read the operand (_P). The micro-ops used for the three implementations should be straight forward to understand (just kidding, at least the parts you need to modify are fairly straightforward).
    • Note that the way gem5’s instruction parser works requires us to define all the three implementations for the FSUBR instruction.
    • There is a little more documentation on gem5 microops here.
  4. Lastly, we need to provide an implementation of the micro-op subfp … but we already have one. You can check that the implementation is already available in the file: src/arch/x86/isa/microops/fpop.isa. So, you would not need to do anything for this step. If you’d like to, you can create a reverse version. :p

  5. Compile gem5 for x86 ISA to test that you did not make any mistakes in the implementation.

  6. Use your test program to verify that your new instruction and miccroop works. There should be no more warnings!

What to Hand In to Canvas

  1. A report PDF containing:
    • A short paragraph on what your experiences or what like/don’t like about gem5 instruction decoding.
    • Source code of your test program, and output from gem5 from running it.
  2. A patch file containing the changes made to src/arch/x86/isa/insts/x87/arithmetic/ and src/arch/x86/isa/decoder/x87.isa. You can generate the patch using the command ‘git diff src/arch/x86/isa > /tmp/changes.patch’.

Please turn in the PDF separately than the patch.

How we will grade this:

80 points for completing the assignment, and 20 points for question 2.

Attribution

This is based on a gem5 101 exercise, but with added snark.