Brief Note on Simulating SPEC2017 on GEM5
This is a brief note on how to simulate SPEC2017, a standard (if complicated) CPU benchmark, on gem5.
Options
SPEC2017 has complicated compile scripts. So, we provide two paths forward, one easy, one not so much.
Option 1: Use Prebuilt SPEC 2017 and Use Provided Script
We provide an prebuilt version of SPEC2017 and the script to run simulation
in /usr/eda/CS251A/zhengrong/spec2017
. These following commands will copy
the folder to your docker root directory (if you follow the docker
instruction and mount the /usr/eda/CS251A/your_username
to /root
in docker), and start simulation.
# log into tetracosa first, and then copy.
cp -r /usr/eda/CS251A/zhengrong/spec2017 /usr/eda/CS251A/<YOUR USERNAME>/
# log into docker.
docker exec -ti cs251a-<YOUR USERNAME> /bin/bash
# there should be a spec2017 in /root
cd /root/spec2017
# build all benchmarks
make buildall
# simulate with gem5.
make simall
After the simulation you should see something like:
Exiting @ tick 159318111500 because a thread reached the max instruction count
This is because we limit the number of simulated instructions (otherwise it takes
forever to finish). The default simall
takes about 3 hours to finish on tetracosa.
You can modify the variable SIM_INSTS
in spec2017/Makefile
to adjust the simulation
time (currently 50 million instructions).
Notes:
- The simulation results are located in
spec2017
folder. For example, forlbm_s
, the results can be found at:cd /root/spec2017/benchspec/CPU/619.lbm_s/run/run_base_refspeed_mytest-m64.0000/m5out
-
You will probably want to read through
spec2017/Makefile
to understand what’s going on and modify the simulation options to do your own architectural exploration. - By default you this script is setup to fast-forward, i.e. use a simpler CPU to quikly skip some
of the less-important instructions. This may cause some incompatibility if
you are directly setting CPU parameters (e.g. issue width) in se.py for system.cpu[i]. For example,
gem5 will complain that it’s an AtomicSimpleCPU and there is no issue width to set. The problem is
that SPEC2017 script uses fast forwarding to skip the initialization phase, which uses AtomicSimpleCPU.
The actual CPU (O3CPU) is created in Simulation.py. Specifically, in run() there is
switch_cpus[i]
which is the actual O3CPU if you want to set the issue width and other parameters. Also, you can always print() in the python script to verify that you have correctly set the parameter.
Option 2: Download yourself and install
If you want to manually do this without a script, we provide the following more detailed steps.
First, download and install SPEC 2017.
(SPEC 2017 Download – Licensed for UCLA only, technically for tetracosa)
From here, the basic workflow is to compile it, do a fake run to get the arguments for the binary, and finally simulate it in gem5. This is by no means the official instructions or guaranteed to work on your machine. You can also follow the instructions on the official website of SPEC2017.
Compile SPEC2017
First go to the folder of SPEC2017, and set up the environment. This gives you many useful commands to navigate through the SPEC2017, compile and run it.
> source shrc
Here we use lbm_s
as an example. For other workloads it should be similar. Now let’s go to where lbm_s
is:
> go lbm_s
The first thing is to do fake run. This will let the building system set up all the folder, inputs and so on. You can also do a full run, which will take much longer time to finish, but it’s a good way to verify that you SPEC2017 works on your native machine.
# Remove existing build
> rm -r build
> runcpu --fake --config gcc-linux-x86 lbm_s
This should create build
and run
folder. Now let’s compile the program:
> cd build/build_base_mytest-m64.0000
> specmake
This should compile and gives a lbm_s
binary in the folder.
Simulate it in GEM5
First we need to get arguments to run the binary. Go the the run
directory.
> go lbm_s
> cd run/run_base_refspeed_mytest-m64.0000
> specinvoke -n
../run_base_refspeed_mytest-m64.0000/lbm_s_base.mytest-m64 2000 reference.dat 0 0 200_200_260_ldc.of > lbm.out 2>> lbm.err
This gives us the command line arguments to run lbm_s
:
reference.dat 0 0 200_200_260_ldc.of
Now simulate it in gem5. This command will start to simulate lbm_s
using AtomicSimpleCPU
. You can also specify other CPU types and add cache, and there
are detailed instructions here.
> /where/gem5/is/build/X86/gem5.opt \
> /where/gem5/is/configs/example/se.py \
> --cmd=../../build/build_base_mytest-m64.0000/lbm_s \
> --options="2000 reference.dat 0 0 200_200_260_ldc.of" \
> --mem-size=8GB
As an example to simulate using O3 cpu:
> /where/gem5/is/build/X86/gem5.opt \
> /where/gem5/is/configs/example/se.py \
> --cmd=../../build/build_base_mytest-m64.0000/lbm_s \
> --options="2000 reference.dat 0 0 200_200_260_ldc.of" \
> --mem-size=8GB \
> --cpu-type=DerivO3CPU \
> --caches --l2cache \
> --l1d_size=32kB --l1i_size=32kB --l2_size=512kB
Finally, here is how you can fast forward using AtomicSimpleCPU
for 1 million instructions and then switch to DerivO3CPU
:
> /where/gem5/is/build/X86/gem5.opt \
> /where/gem5/is/configs/example/se.py \
> --cmd=../../build/build_base_mytest-m64.0000/lbm_s \
> --options="2000 reference.dat 0 0 200_200_260_ldc.of" \
> --mem-size=8GB \
> --cpu-type=DerivO3CPU \
> --caches --l2cache \
> --l1d_size=32kB --l1i_size=32kB --l2_size=512kB \
> --fast-forward=1000000
The same warning about changing parameters with fast-forwarding and se.py from the previous section applies here as well.