Solid Angle: Magical missteps and the memory wall

Friday, December 26, 2008

Magical missteps and the memory wall

Two links today:

Magical Wasteland: Serial Missteps on the Parallel Road

IEEE Spectrum: Multicore is Bad News for Supercomputers (via ArsTechnica)

I've spent a lot of time thinking about where multicore hardware is going and how that is going to affect software architectures. My current thinking is that designing software for UMA (uniform memory access) is not going to scale. This is hardly a revolutionary statement.

Let's start with the IEEE Spectrum article first. The article details a study by Sandia National Labs showing that performance of general-purpose multiple cores really starts to deteriorate for certain applications after 8-16 cores due to the memory wall. The "memory wall" is the fact that while our ability to cram transistors on a die keeps ever-increasing, memory bandwidth grows at a piddling rate. Eventually, you've got a large number of cores starving for data.

Now, an important piece of information is these results only apply to certain types of workloads. For instance, simulating a nuclear bomb does not hit the memory wall because the data and calculations can be partitioned to a small amount of memory corresponding to a small spatial area. Thus you don't have a lot of memory bandwidth used because all of your data is in-cache and you are happy.

What the study says is for certain types of applications which involve traversing massive amounts of data and can not be sufficiently partitioned, your performance goes out the window. It then goes on to talk about fancy hardware (stacked memory) that may fix the problem. This hardware is sufficiently far in the future that I'm not going to bank on it.

The challenge for game programmers with future multicore designs will be to architect our systems and engines to use well-partitioned and locally coherent memory access. This will become increasingly more important as the number of cores increases. Right now, with systems such as the 360 having only 6 cores, or consumer PCs only having 4 cores, memory bottlenecks, while troublesome, are not at the severity they will be with 16, 32, or 64 cores.

Which brings me back to why designing for UMA is troublesome. UMA makes the promise that you can access any memory you need at any time -- which is true, but it doesn't make any promises about how fast that access will be. Cache logic is not going to be sufficient to hide all latencies of poorly designed algorithms and systems. Obviously, this has been true for many years even with single core designs, but the point is the problem is about to get many times worse.

If you design algorithms such that they can be run on processors with a small local memory, those will run well on UMA designs . The reverse is not true. Local memory designs force you to make good decisions about memory accesses, and also prepare you for architectures which are not UMA, which for all we know right now the next crop of consoles may very well be.

So what does all of this have to do with the Magical Wasteland article on the PS3? Due to NDAs, I won't comment on the article beyond saying I think its characterization of the PS3 is fair. But I will say this: it may very well be the case that five or ten years from now we realize that the Cell architecture was ahead of its time. Maybe not in the specific implementation, but just keeping the idea alive that, hey, maybe touching every cache line in the system in a garbage collection algorithm isn't the best of ideas. Because sooner or later, we'll all be programming with a local memory model, even if we don't actually have local memory.

Solid Angle

Friday, December 26, 2008

Magical missteps and the memory wall

No comments:

Post a Comment