Sunday, June 21, 2009

Leaky abstractions in XNA

So continuing my exploration of XNA, this weekend I did some more work on my little toy project.

The first thing I did was get it running on 360. I was happy to see that XNA seems to be able to figure out how to deal with my various render targets, including one MRT, without too much trouble, and the performance was far superior on the 360 than on my laptop. I get about 200 fps on the 360 vs 60 on the laptop.

There was one issue worth noting.

First, some background on the deferred lighting approach I am using:

  1. Render normal + depth into a G buffer for all primitives. Depth writes and tests are enabled in this step.
  2. Render the lights into a lighting buffer using the G buffer. Depth writes and tests are disabled for this pass.
  3. Apply the lighting to each primitive using the lighting from step 2 while computing albedo and (eventually) other material properties on the fly. Depth tests are enabled but not writes.

So the first problem on the 360 is XNA blows away the depth buffer I lay down in step 1 by the time I get to step 3. After some searching on the internets, I discovered this is expected behavior.

I tried setting my render targets to PreserveContents, which does work, but is completely wasteful since I don't give a hoot about restoring the actual color contents of any of these buffers. This dipped performance down to 150fps.

My next attempt was to restore the depth buffer manually from my G Buffer. But this was exhibiting z-fighting, possibly due to slightly different methods of Z calculation for my G-Buffer vs the depth-buffer leading to small differences in the computed Z values. I didn't feel that messing around with z biasing would be a robust solution, so I abandoned this effort.

The solution I ended up choosing was to just clear the z buffer again and reconstruct it during step #3. Since my scenes are so simple this gets me back to just slightly under 200 fps.

It's not an ideal solution, since I had in mind some uses for a stencil buffer laid down in step #1 that would accelerate step #2 (mainly, masking off unlit pixels for the skybox).

XNA's EDRAM handling is a great example of a leaky abstraction. Only having a 10 MB EDRAM buffer does make render target management trickier, but in Microsoft's attempt to completely hide it from XNA programmers, I think they've just made things more frustrating. The concept of a limited buffer for render targets is not that hard to get your head around. You have to understand EDRAM anyway since techniques in XNA that work perfectly on Windows (like what I was doing) will break on the 360. Even worse, you have no real good idea *why* it's breaking unless you understand the limitations of EDRAM and take a guess at what Microsoft is doing under the hood. So what is really being saved here? Just let me deal with EDRAM myself.

Sunday, June 14, 2009

Adventures in XNA continued

This weekend I played around in XNA a little bit more (completely personal stuff, nothing to do with work, opinions are my own, etc). I'm still find it very fun for the most part but the lack of access to the metal can be frustrating at times.

For the most part I've just been experimenting with deferred lighting. As far as what I'm trying to accomplish, I view this stuff like a musician doing scales. Good practice, but the goal is to get familiar with the techniques rather than produce anything "real".

I'd already built up a quick and dirty deferred lighting implementation a couple months before. This weekend I removed some of the hacks I had, added HDR + bloom, threw in some simple terrain, played around with LogLuv encoding, and fixed some artifacts from my first pass implementation.

I suppose that sounds a lot but the nice thing about XNA is there are a bazillion samples out there. The deferred lighting is the thing I'm really concentrating on, so for the other stuff I just grabbed anything I could find. Terrain, and HDR and bloom came pretty much as-is from samples/demos, as did a FPS counter.

As far as the deferred lighting goes, I finally got the half texel offset stuff cleared up. In Direct3D 9, pixel coordinates and texture coordinates don't line up, so when doing something like sampling a normal buffer or lighting buffer, if you don't offset them properly you'll be sampling the lighting from the wrong texel. This entry by Wolfgang Engel was a big help here.

Reading Engel's ShaderX7 article, I also understood why the specular lighting has to be multiplied by n dot l, and fixed up some artifacts I would have due to that (mainly specular appearing on backfaces).

My first pass at HDR used FP16 render targets for everything. I changed the final apply lighting pass to encode into LogLUV, and then implemented the encoding for the lighting buffer suggested by Pat Wilson's article in ShaderX7. A side effect of the very simple material model I'm using allowed me to use a 8:8:8:8 buffer for this and still allow for high range when accumulating lighting. I currently don't have separate diffuse and specular albedo, so when I apply lighting the equation looks like this:

albedo*(diffuseLighting  + specularLighting)

This is part of the joy of a small demo - no artists telling me they have to have a separate specular albedo :). Anyway, I realized that I can just add those together before writing to the lighting buffer, and just do a straightforward encoding of the result in LogLUV space. I do want to put control of the glossiness in the material model, but that will require encoding the depth into 24 bits of a render target and then including an 8 bit specular power in the remainder. (I have to render depth to a separate target because XNA gives no facility for sampling the depth buffer).

In the process of all of this I wasn't quite getting the performance I wanted. I'm doing this on my Dell M1330 laptop which while no slouch, has trouble running World of Warcraft at a decent frame rate. But given such a simple scene, I was just shy of 60 fps so I decided to see if I could fire up NVPerfHUD and see what was going on. You can run NVPerfHUD with XNA apps, but a side effect I discovered is all vertex processing is done in software.

This is a bummer since it greatly throws off timings (a draw call for one mesh took 5 ms on the CPU for a unbelievably simple vertex shader), but I was able to find some GPU hotspots, some of which I at improved by pulling stuff out of the shader and on to the CPU.

Anyway, not sure how much I'll be working on this stuff but when I do I'll try to put the odd post up here. I haven't tried running my latest incarnation on the 360, which will probably be my next step. I think I've got the render targets set up so it should work, assuming the XNA runtime isn't doing anything retarded under the hood. But without PIX on the 360 it'll be hard to really dig into that.