Thread Lab

Due: Friday, Dec 2nd, 2022, 11:55 PM PST

Notes

Introduction

In this lab, you will implement several versions of a code which computes a histogram of a set of random numbers. Histogram is a simple kernel that demonstrates the perils of parallel programming, while also having a few interesting optimizations. Also, this is a key kernel in a variety of algorithms (e.g., radix sort).

You will be optimizing for two test cases, each in their own function:

  1. a set of 100 million numbers, and a histogram of 8 buckets
  2. a set of 25 million numbers, and a histogram of 16 million buckets

The cache access behavior for each of these is quite different, so optimizations may affect them differently.

Downloading and running the Lab

A few things to try out

Here are a few things you can try out to speed up your code while ensuring its correctness (be careful of race conditions). Some of these approaches will work better than others. We don’t want to tell you exactly which ones are the best, so that you suffer through at least a little experimentation. Some approaches may work better for certain test cases. In no particular order:

As you may expect, this list only captures some optimization techniques that are helpful in general, but it is by no means complete: there are other optimizations for you to discover that may work well for particular test case(s)!

Grading Policy

Extra Credit

For this lab, you will get 10% extra credit if your code achieves a geomean speed up of 4x. Have fun!!

No competition this time, the scoreboard is only for fun (and likely many solutions should approximately tie, since the design space of possible algorithms/implementations is not that high).