Tuesday, December 30, 2008

XNA Studio 3.0: First impressions

I've been doing some graphics prototyping in my spare time lately, and I decided to see if there was a better way to go about it.

I've used RenderMonkey in the past, and while it certainly has its uses, ultimately it left me dissapointed. For a straightforward shader it's fine, but when you start getting into more complicated techniques it starts to break down. After clicking on a billion buttons to get my render targets and passes set up the way I needed I really wished I could just write some code. The lack of any ability to compute anything on the CPU side is what really frustrated me. What I ended up doing was computing a lot of things shader-side that in a real application would be done on the CPU, and it needlessly complicated the shaders.

Another avenue I've pursued is using the DirectX sample framework or an OpenGL sample, and working from there. Even in these simplified environments I find you end up doing a lot more bookkeeping then actual code. Additionally, OpenGL seems very unstable on my laptop -- even vanilla samples are crashing in the Nvidia DLLs. 

The last couple days I've been playing around with Microsoft's XNA Studio 3.0 as a graphics prototyping tool, and my impression so far is very favorable. The API is pretty straightforward, and for the most part the abstractions seem in the right place. Porting the current thing I'm working on from C++ to C# took no time at all, and so far I've spent much more time writing meaningful code rather than working on scaffolding. 

The GameComponent architecture they have is interesting -- for example, I found myself needing an orbit camera. Rather than write one myself, I just grabbed a component someone else has written. It was one of those few times where the code just dropped in. The only drawback is they haven't implemented taking input from the 360 controller, but that's easy enough for me to write and a lot less involved than doing the whole thing.

I was also surprised how easy it was to get my little project up and running on my Xbox 360. I didn't have to do any code changes, and it all Just Works. The debugging is solid but experienced Xbox 360 developers will miss all the nice tools you get with the real SDK. I'd be really nice if Microsoft released a lightweight version of PIX for XNA that worked with the 360, but I guess you can't have everything.

There have been some minor annoyances. Some of the C# syntax for dealing with vertex arrays can be cumbersome -- new'ing a Vector3 never feels right to me. You also lose access to some hardware features. For example, for some reason it thinks I can't create a floating point depth buffer on my laptop, when I'm very certain the GPU I have can handle that. 

On the 360 side, they simplify a lot of the hardware details, but this has limitations. I can't find any way to resolve the depth buffer on the 360 to a texture in XNA. While these limitations are understandable given the intended audience of XNA, it is somewhat annoying.

All in all, I think it is a pretty good framework so far. I don't think you are going to max out the hardware with it, but for a large category of games it will work really well. 

Monday, December 29, 2008

Interesting GDC Talk #2: Hitting 60hz with the Unreal Engine

Hitting 60Hz with the Unreal Engine: Inside the Tech of MORTAL KOMBAT vs DC UNIVERSE

This is another talk I will be attending. I work at the same studio as Jon and have had more than a few conversations with him about this very topic, and he has given internal presentations on the subject. I don't want to give too much about the talk away, but it is a very interesting case study in taking a graphics pipeline completely designed for a 30hz game, and modifying it to meet a game's performance goals without sacrificing a whole bunch of functionality. Just one example: when we originally got our hands on Unreal, the sum of all post processing took nearly half of a 60hz frame. While Epic has optimized this over time, this gives an idea of some of the challenges on the GPU. It is easy enough to remove functionality until it runs fast enough, but the real trick is to preserve quite a bit of that functionality and still get it under time.

The section on porting the Unreal particle system to the SPU is definitely worth paying attention to - this was a collaborative effort done by a few tech group guys at Chicago. It underscores the difficulty of dealing with legacy code bases that are not designed for NUMA architectures, and the ingenuity and hard work required to overcome that hurdle. This talk should definitely be on your list.

Sunday, December 28, 2008

Interesting GDC talk #1: Light Pre-pass Renderer

The Light Pre-Pass Renderer: Renderer Design for Multiple Lights

GDC is fast approaching and I'll be attending this year. One of the talks I'm looking forward to is Wolfgang Engel's talk on a novel renderer design. I've been following his various entries on this topic for the last year. Additionally, if you have access to the PS3 dev site, the Uncharted guys gave a talk on a very similar renderer.

The technique is a hybrid between forward rendering techniques and deferred rendering techniques. In a forward renderer you are evaluating material and lighting properties simulatenously, and in a deferred renderer you evaluate material properties into a G-Buffer and then evalute each light sampling that buffer.

In the light pre-pass renderer, it is broken down further -evaluate the bare minimum material properties needed for the lighting equation, then evaluate the lights into an accumulation buffer, then apply the lighting in a final pass of geometry. While you still have one more pass than in a typical deferred setup, the light pre-pass renderer does give you a lot more flexibility for materials. 

I like this approach over forward rendering or deferred rendering because I think it fits the box today's consoles lay out very nicely. Deferred rendering requires a huge number of render targets. Forward rendering requires either huge complicated shaders or more rendering passes, and neither of these things scale very well as you add more lights or more material types. The light pre-pass approach gives you the linear scalability of deferred rendering while allowing the material variety of forward rendering. 

When it comes down to it, what is most interesting is the array of choices you have when implementing the technique. The technique can work well with statically computed lighting such as  lightmaps if you choose to go that way. You have different choices in how to approximate specular and accumulate it in either a separate or combined RT. You have a variety of choices of how far you want to implement MSAA. Particularly when you have a cross-platform renderer, having this sort of flexibility without massive rearchitecture on each platform can make things a lot easier. I plan to explore this technique further.

Friday, December 26, 2008

Magical missteps and the memory wall

Two links today:

I've spent a lot of time thinking about where multicore hardware is going and how that is going to affect software architectures. My current thinking is that designing software for UMA (uniform memory access) is not going to scale. This is hardly a revolutionary statement.

Let's start with the IEEE Spectrum article first. The article details a study by Sandia National Labs showing that performance of general-purpose multiple cores really starts to deteriorate for certain applications after 8-16 cores due to the memory wall. The "memory wall" is the fact that while our ability to cram transistors on a die keeps ever-increasing, memory bandwidth grows at a piddling rate. Eventually, you've got a large number of cores starving for data.

Now, an important piece of information is these results only apply to certain types of workloads. For instance, simulating a nuclear bomb does not hit the memory wall because the data and calculations can be partitioned to a small amount of memory corresponding to a small spatial area. Thus you don't have a lot of memory bandwidth used because all of your data is in-cache and you are happy.

What the study says is for certain types of applications which involve traversing massive amounts of data and can not be sufficiently partitioned, your performance goes out the window. It then goes on to talk about fancy hardware (stacked memory) that may fix the problem. This hardware is sufficiently far in the future that I'm not going to bank on it.

The challenge for game programmers with future multicore designs will be to architect our systems and engines to use well-partitioned and locally coherent memory access. This will become increasingly more important as the number of cores increases. Right now, with systems such as the 360 having only 6 cores, or consumer PCs only having 4 cores, memory bottlenecks, while troublesome, are not at the severity they will be with 16, 32, or 64 cores. 

Which brings me back to why designing for UMA is troublesome. UMA makes the promise that you can access any memory you need at any time -- which is true, but it doesn't make any promises about how fast that access will be. Cache logic is not going to be sufficient to hide all latencies of poorly designed algorithms and systems. Obviously, this has been true for many years even with single core designs, but the point is the problem is about to get many times worse. 

If you design algorithms such that they can be run on processors with a small local memory, those will run well on UMA designs . The reverse is not true. Local memory designs force you to make good decisions about memory accesses, and also prepare you for architectures which are not UMA, which for all we know right now the next crop of consoles may very well be.

So what does all of this have to do with the Magical Wasteland article on the PS3? Due to NDAs, I won't comment on the article beyond saying I think its characterization of the PS3 is fair. But I will say this: it may very well be the case that five or ten years from now we realize that the Cell architecture was ahead of its time. Maybe not in the specific implementation, but just keeping the idea alive that, hey, maybe touching every cache line in the system in a garbage collection algorithm isn't the best of ideas. Because sooner or later, we'll all be programming with a local memory model, even if we don't actually have local memory.

Thursday, December 25, 2008

Game Architect on Good Middleware

An excellent article on what makes good middleware was posted on Game Architect. Kyle, the author, hits the major points -- anyone who's used bad middleware has certainly run into these issues. 

I would make a distinction between middleware and engines. Something like Unreal is an engine. It isn't something you integrate into your application, it forms the framework of your application. Because it is a framework, it has made some architectural decisions for you: memory, I/O, the data pipeline, etc. 

Due to an engine enforcing architecture, the questions used to evaluate it are different (and probably the subject of another blog entry). The advantage of a complete engine is tight integration and a lot less work to get up and running, particularly on the tool side of things. The disadvantage is the engine may have made architectural decisions that don't fit your game. Like most engineering decisions, it is a tradeoff everyone has to decide for themselves.


Welcome to Solid Angle, a blog about game programming. 

I've been programming in the game industry for about nine years, and I still love it. 

It's been a while since I ran a blog. My last one was less focused, and had topics ranging from politics to programming to how to shuffle poker chips (one of my more popular entries). This one will be strictly about game development.

Initially, my entries probably will just be links and comments to other game development blogs. I've found quite a lot of very interesting blogs on game development and have always wanted a place that aggregated them. After looking around for one, I decided to make one (I'm sure there are others, I just haven't found them). I may branch out into more original content later as I have things to write about.

Anyway I hope you stick around.