I'm not sure why, but Microsoft seems intent on crippling XNA for the 360. Perhaps they want to sell more dev kits.
I recently had some more time to work on my little toy project. After some work, I've now got a deferred lighting implementation on the PC.
For the lighting buffer construction, at first I was using a tiled approach similar to Uncharted, which did not require blending during the lighting stage. It did work for the most part, and allowed me to use LogLUV for encoding the lighting information, which was faster. But it had issues - I didn't have any lighting target ping-ponging set up, so I was stuck with a fixed limit of seven lights per tile. Also, even with smallish tiles, you end up doing a lot of work on pixels not actually affected by the lights in question. So I wanted to compare it to a straightforward blending approach, and switched back to an FP16 target, and render the light volumes directly (using the stencil approach detailed in ShaderX7's Light Pre-Pass article).
So this all worked great and my little toy is rendering 100 lights. Of course, on the 360, there's a problem. Microsoft, in its infinite wisdom, decided that the FP10 buffer format on 360 would blow people's minds and it is not supported in XNA. They are using an actual FP16 target, which does not support blending.
So I guess it is going to be back to alternate lighting buffer encoding schemes, bucketing, render target ping-ponging for me. It's not a huge deal, but it is frustrating.
It is a real shame that XNA gives the impression that the 360 GPU is crippled, when in reality it is anything but. Couple lack of FP10 support with inability to sample the z-buffer directly, and the lack of control of XNA's use of EDRAM, and they've managed to turn the 360 into a very weak, very old PC.
Least common denominator approaches generally haven't fared that well over the years. An XBLA title implemented in XNA is going to be at a fundamental disadvantage -- I don't think you are going to see anything approaching the richness of Shadow Complex, for example.
At the end of the day, Microsoft needs to figure out where they are going with XNA. If they are going to dumb it down and keep it as a toy for people who can't afford a real development kit (people who've been bumping into these low ceilings much longer than me), then they should keep on their current path.
The potential for XNA is really much more, though. Today I wrote a pretty decent menu system in about 45 minutes, that handles gamepad, keyboard, and mouse input seamlessly. I don't think I could write that in C++/DirectX anywhere near as fast. If you start looking down the road to future generations of hardware, I'm not worried about the overhead of C# being fundamentally limiting. Games today already use much less efficient scripting languages than C#, and while you are limited to the heavy lifting Microsoft has chosen to implement for you today, who is to say that a future version of XNA couldn't allow shelling out to C++ for really performance intensive stuff?
XNA has a chance to become something really great that would be very powerful for a large class of games. It remains to be seen if Microsoft will let it.
Sunday, August 23, 2009
Wednesday, August 19, 2009
One has to have had inflated expectations to experience disillusionment
A colleague sent along this item, which asks if Transactional Memory is beyond the "trough of disillusionment".
I've never had any expectations that STM would be some silver-bullet solution to concurrency, and from the get-go just viewed it as just another tool in the toolbox. Granted, it is a technique that I haven't had much practical experience with yet -- it's on my TODO list. Others might disagree with me, but I'm not even sure how much of a major factor it is going to be in writing games. Of course, if some major piece of middleware is built around it, I suppose a lot of people will end up using STM, but that doesn't necessarily make it a good idea.
The latest piece of evidence against STM as a silver bullet comes from conversations I've had with colleagues and friends who have a lot of experience building highly-scalable web or network servers. STM advocates hail transactions as a technique with decades of research, implementation, and use. About this they are correct. The programming model is stable, and the problems are well known. But what has struck me is how often my colleagues with much more experience in highly-scalable network servers try to avoid traditional transactional databases. If data can be stored outside of a database reliably, they do so. There are large swaths of open source software devoted to avoiding transactions with the database. The main thrust is to keep each layer independent and simple, and talk to a database as little as possible. The reasons? Scalability and cost. Transactional databases are costly to operate and very costly to scale to high load.
I found the link above a little too dismissive of the costs of STM, particularly with memory bandwidth. I've already discussed the memory wall before, but I see this as a serious problem down the road. We're already in a situation where memory access is a much more serious cost to performance than the actual computation we're doing, and that's with a small number of cores. I don't see this situation improving when we have 16 or more general-purpose cores.
A digression about GPUs. GPUs are often brought up as a counter-argument to the memory wall as they already have a very large number of cores. GPUs also have a very specialized memory access pattern that allow for this kind of scalability - for any given operation (i.e. draw call), they generally have a huge amount of read-only data and a relatively small amount of data they write to compared to the read set. Those two data areas are not the same within a draw call. With no contention between reads and writes, they avoid the memory issues that a more general purpose processor would have.
STM does not follow this memory access model, and I do not dismiss the concerns of having to do multiple reads and writes for a transaction. Again, we are today in a situation where just a single read or write is already hideously slow. If your memory access patterns are already bad, spreading it out over more cores and doubling or tripling the memory bandwidth isn't really going to help. Unlike people building scalable servers, we can't just spend some money on hardware -- we've got a fixed platform and have to use it the best we can.
I don't think that STM should be ignored -- some problems are simpler to express with transactions than with alternatives (functional programming, stream processing, message passing, traditional locks). But I wouldn't design a game architecture around the idea that all game code will use STM for all of its concurrency problems. To be fair, Sweeney isn't proposing that either, as he proposes a layered design that uses multiple techniques for different types of calculations.
What I worry about though is games are often written in a top-down fashion, with the needs at the gameplay level dictating the system support required. If at that high level the only tool being offered is STM with the expectation that it is always appropriate, I think it will be easy to find yourself in a situation where refactoring that code to use other methods for performance or fragility reasons may be very difficult and very expensive than if the problem had been tackled with a more general toolbox in the first place.
Concurrency is hard, and day to day I'm still dealing with the problems of the now, rather than four or five years down the road. So I will admit I have no fully thought out alternative to offer.
The one thing I think we underestimate is the ability of programmers to grow and tackle new challenges. The problems we deal with today are much harder and much more complex than those of just a decade ago. Yes, the tools are better for dealing with those problems, and the current set of tools for dealing with concurrency are weak.
That means we need to write better tools -- and more importantly, a better toolbox. Writing a lock-free sw/sr queue is much harder than using one. What I want is a bigger toolbox that includes a wide array of solutions for tackling concurrency (including STM), not a fruitless search for a silver bullet that I don't think exists, and not a rigid definition of what tools are appropriate for different types of game problems.
I've never had any expectations that STM would be some silver-bullet solution to concurrency, and from the get-go just viewed it as just another tool in the toolbox. Granted, it is a technique that I haven't had much practical experience with yet -- it's on my TODO list. Others might disagree with me, but I'm not even sure how much of a major factor it is going to be in writing games. Of course, if some major piece of middleware is built around it, I suppose a lot of people will end up using STM, but that doesn't necessarily make it a good idea.
The latest piece of evidence against STM as a silver bullet comes from conversations I've had with colleagues and friends who have a lot of experience building highly-scalable web or network servers. STM advocates hail transactions as a technique with decades of research, implementation, and use. About this they are correct. The programming model is stable, and the problems are well known. But what has struck me is how often my colleagues with much more experience in highly-scalable network servers try to avoid traditional transactional databases. If data can be stored outside of a database reliably, they do so. There are large swaths of open source software devoted to avoiding transactions with the database. The main thrust is to keep each layer independent and simple, and talk to a database as little as possible. The reasons? Scalability and cost. Transactional databases are costly to operate and very costly to scale to high load.
I found the link above a little too dismissive of the costs of STM, particularly with memory bandwidth. I've already discussed the memory wall before, but I see this as a serious problem down the road. We're already in a situation where memory access is a much more serious cost to performance than the actual computation we're doing, and that's with a small number of cores. I don't see this situation improving when we have 16 or more general-purpose cores.
A digression about GPUs. GPUs are often brought up as a counter-argument to the memory wall as they already have a very large number of cores. GPUs also have a very specialized memory access pattern that allow for this kind of scalability - for any given operation (i.e. draw call), they generally have a huge amount of read-only data and a relatively small amount of data they write to compared to the read set. Those two data areas are not the same within a draw call. With no contention between reads and writes, they avoid the memory issues that a more general purpose processor would have.
STM does not follow this memory access model, and I do not dismiss the concerns of having to do multiple reads and writes for a transaction. Again, we are today in a situation where just a single read or write is already hideously slow. If your memory access patterns are already bad, spreading it out over more cores and doubling or tripling the memory bandwidth isn't really going to help. Unlike people building scalable servers, we can't just spend some money on hardware -- we've got a fixed platform and have to use it the best we can.
I don't think that STM should be ignored -- some problems are simpler to express with transactions than with alternatives (functional programming, stream processing, message passing, traditional locks). But I wouldn't design a game architecture around the idea that all game code will use STM for all of its concurrency problems. To be fair, Sweeney isn't proposing that either, as he proposes a layered design that uses multiple techniques for different types of calculations.
What I worry about though is games are often written in a top-down fashion, with the needs at the gameplay level dictating the system support required. If at that high level the only tool being offered is STM with the expectation that it is always appropriate, I think it will be easy to find yourself in a situation where refactoring that code to use other methods for performance or fragility reasons may be very difficult and very expensive than if the problem had been tackled with a more general toolbox in the first place.
Concurrency is hard, and day to day I'm still dealing with the problems of the now, rather than four or five years down the road. So I will admit I have no fully thought out alternative to offer.
The one thing I think we underestimate is the ability of programmers to grow and tackle new challenges. The problems we deal with today are much harder and much more complex than those of just a decade ago. Yes, the tools are better for dealing with those problems, and the current set of tools for dealing with concurrency are weak.
That means we need to write better tools -- and more importantly, a better toolbox. Writing a lock-free sw/sr queue is much harder than using one. What I want is a bigger toolbox that includes a wide array of solutions for tackling concurrency (including STM), not a fruitless search for a silver bullet that I don't think exists, and not a rigid definition of what tools are appropriate for different types of game problems.
Thursday, August 13, 2009
Diminishing returns
One thing that has been on my mind since SIGGRAPH is the problem diminishing returns poses: when do you switch from an approach, algorithm, or model because any gains to be had are increasingly diminishing?
The specific thing that has got me thinking about this is the rapid approach of fully programmable GPUs. So far this is not looking like it will be another evolutionary change to the venerable D3D/OpenGL programming model, and will in fact be a radical change in the way we program graphics. Which is just another way for saying it will be a change in the way we *think* about graphics.
At SIGGRAPH there was a panel of various industry and academic luminaries discussing the ramifications -- is the OpenGL/D3D model dead? (not yet), what will be the model that replaces it? (no one knows), is this an interesting time to be a graphics programmer? (yes). A colleague pointed out that the members of the panel lacked a key constituency -- a representative from a game studio that's just trying to make a game without a huge graphics programming team. The old model is on its last legs, the new world is so open that to call it a "model" would be an insult to programming models. If you're an academic or an engine maker, this doesn't present a problem, in fact, it is a huge opportunity -- back to the old-school, software renderer days. Anything's possible!
But for your average game developer, it could mean you are one poor middleware choice away from disaster. You don't have the resources of the engine creators, so being ripped asunder from the warm embrace of the familiar D3D/OpenGL model can be a little terrifying. To put it another way: the beauty of a model like D3D/OpenGL is that no matter what engine or middleware you use, when it comes to the renderer, there is a lot of commonality. In this new world, there are a bunch of competing models or approaches -- that's part of the point. Engine creators will have a bevy of approaches to choose from -- but if you're just trying to get a game done, and you find your engine's choice of approach doesn't match what you need to do, well, you've got a lot of work all of the sudden.
But we face these choices in software development all the time: when to abandon an algorithm or model because of diminishing returns. Change too soon and you've done a lot of extra work you could have avoided by just refining the existing code. Change too late and you miss opportunities that could differentiate your offerings. We like to pretend like doing cost/benefit analysis on this kind of stuff is easy, as if we were comparing a Volvo against a Toyota at the car dealer. But often the issues can be quite complex, and the fallout quite unexpected.
It's cliche, but we live in interesting times.
The specific thing that has got me thinking about this is the rapid approach of fully programmable GPUs. So far this is not looking like it will be another evolutionary change to the venerable D3D/OpenGL programming model, and will in fact be a radical change in the way we program graphics. Which is just another way for saying it will be a change in the way we *think* about graphics.
At SIGGRAPH there was a panel of various industry and academic luminaries discussing the ramifications -- is the OpenGL/D3D model dead? (not yet), what will be the model that replaces it? (no one knows), is this an interesting time to be a graphics programmer? (yes). A colleague pointed out that the members of the panel lacked a key constituency -- a representative from a game studio that's just trying to make a game without a huge graphics programming team. The old model is on its last legs, the new world is so open that to call it a "model" would be an insult to programming models. If you're an academic or an engine maker, this doesn't present a problem, in fact, it is a huge opportunity -- back to the old-school, software renderer days. Anything's possible!
But for your average game developer, it could mean you are one poor middleware choice away from disaster. You don't have the resources of the engine creators, so being ripped asunder from the warm embrace of the familiar D3D/OpenGL model can be a little terrifying. To put it another way: the beauty of a model like D3D/OpenGL is that no matter what engine or middleware you use, when it comes to the renderer, there is a lot of commonality. In this new world, there are a bunch of competing models or approaches -- that's part of the point. Engine creators will have a bevy of approaches to choose from -- but if you're just trying to get a game done, and you find your engine's choice of approach doesn't match what you need to do, well, you've got a lot of work all of the sudden.
But we face these choices in software development all the time: when to abandon an algorithm or model because of diminishing returns. Change too soon and you've done a lot of extra work you could have avoided by just refining the existing code. Change too late and you miss opportunities that could differentiate your offerings. We like to pretend like doing cost/benefit analysis on this kind of stuff is easy, as if we were comparing a Volvo against a Toyota at the car dealer. But often the issues can be quite complex, and the fallout quite unexpected.
It's cliche, but we live in interesting times.
Sunday, August 9, 2009
Fresh from SIGGRAPH
You don't know what heat is until you spend a week in New Orleans in August.
Here's a quick list of my favorites:
Favorite course:
Advances in Real-Time Rendering.
Favorite technical paper:
Inferred Lighting
Favorite somewhat technical talk:
Immersive and Impressive: The Impressionistic Look of Flower on the PS3
Favorite non-technical talk:
Making Pixar's "Partly Cloudy": A Director's Vision
Here's a quick list of my favorites:
Favorite course:
Advances in Real-Time Rendering.
Favorite technical paper:
Inferred Lighting
Favorite somewhat technical talk:
Immersive and Impressive: The Impressionistic Look of Flower on the PS3
Favorite non-technical talk:
Making Pixar's "Partly Cloudy": A Director's Vision
Wednesday, July 8, 2009
Things to do when writing a tutorial
Something I've learned over the years that if you are writing a tutorial for a tool you've written, it pays dividends to actually perform the steps of the tutorial yourself as you write it.
This seems obvious, but often while developing a tool you end up developing a set of test data as you go. Often features or changes you add later can break functionality that you only used early on. It can be tempting to try to plow through writing the tutorial, since you know how all the features work -- why bother actually doing them?
If you actually perform the operations without having any existing data, you can uncover a lot of bugs, or features that don't work particularly well. Lately, I've gone even further: Take your fully developed test data, tear it down, and then build it back up again. This tests both the creation code paths and the destruction code paths.
If nothing else, doing the above saves the embarrassment of releasing a tool to your artists and designers, and the first time the try the most basic of things, it crashes, because you haven't exercised that code path for a week or two. One last regression test is worth the extra effort.
This seems obvious, but often while developing a tool you end up developing a set of test data as you go. Often features or changes you add later can break functionality that you only used early on. It can be tempting to try to plow through writing the tutorial, since you know how all the features work -- why bother actually doing them?
If you actually perform the operations without having any existing data, you can uncover a lot of bugs, or features that don't work particularly well. Lately, I've gone even further: Take your fully developed test data, tear it down, and then build it back up again. This tests both the creation code paths and the destruction code paths.
If nothing else, doing the above saves the embarrassment of releasing a tool to your artists and designers, and the first time the try the most basic of things, it crashes, because you haven't exercised that code path for a week or two. One last regression test is worth the extra effort.
Sunday, June 21, 2009
Leaky abstractions in XNA
So continuing my exploration of XNA, this weekend I did some more work on my little toy project.
The first thing I did was get it running on 360. I was happy to see that XNA seems to be able to figure out how to deal with my various render targets, including one MRT, without too much trouble, and the performance was far superior on the 360 than on my laptop. I get about 200 fps on the 360 vs 60 on the laptop.
There was one issue worth noting.
First, some background on the deferred lighting approach I am using:
So the first problem on the 360 is XNA blows away the depth buffer I lay down in step 1 by the time I get to step 3. After some searching on the internets, I discovered this is expected behavior.
I tried setting my render targets to PreserveContents, which does work, but is completely wasteful since I don't give a hoot about restoring the actual color contents of any of these buffers. This dipped performance down to 150fps.
My next attempt was to restore the depth buffer manually from my G Buffer. But this was exhibiting z-fighting, possibly due to slightly different methods of Z calculation for my G-Buffer vs the depth-buffer leading to small differences in the computed Z values. I didn't feel that messing around with z biasing would be a robust solution, so I abandoned this effort.
The solution I ended up choosing was to just clear the z buffer again and reconstruct it during step #3. Since my scenes are so simple this gets me back to just slightly under 200 fps.
It's not an ideal solution, since I had in mind some uses for a stencil buffer laid down in step #1 that would accelerate step #2 (mainly, masking off unlit pixels for the skybox).
XNA's EDRAM handling is a great example of a leaky abstraction. Only having a 10 MB EDRAM buffer does make render target management trickier, but in Microsoft's attempt to completely hide it from XNA programmers, I think they've just made things more frustrating. The concept of a limited buffer for render targets is not that hard to get your head around. You have to understand EDRAM anyway since techniques in XNA that work perfectly on Windows (like what I was doing) will break on the 360. Even worse, you have no real good idea *why* it's breaking unless you understand the limitations of EDRAM and take a guess at what Microsoft is doing under the hood. So what is really being saved here? Just let me deal with EDRAM myself.
The first thing I did was get it running on 360. I was happy to see that XNA seems to be able to figure out how to deal with my various render targets, including one MRT, without too much trouble, and the performance was far superior on the 360 than on my laptop. I get about 200 fps on the 360 vs 60 on the laptop.
There was one issue worth noting.
First, some background on the deferred lighting approach I am using:
- Render normal + depth into a G buffer for all primitives. Depth writes and tests are enabled in this step.
- Render the lights into a lighting buffer using the G buffer. Depth writes and tests are disabled for this pass.
- Apply the lighting to each primitive using the lighting from step 2 while computing albedo and (eventually) other material properties on the fly. Depth tests are enabled but not writes.
So the first problem on the 360 is XNA blows away the depth buffer I lay down in step 1 by the time I get to step 3. After some searching on the internets, I discovered this is expected behavior.
I tried setting my render targets to PreserveContents, which does work, but is completely wasteful since I don't give a hoot about restoring the actual color contents of any of these buffers. This dipped performance down to 150fps.
My next attempt was to restore the depth buffer manually from my G Buffer. But this was exhibiting z-fighting, possibly due to slightly different methods of Z calculation for my G-Buffer vs the depth-buffer leading to small differences in the computed Z values. I didn't feel that messing around with z biasing would be a robust solution, so I abandoned this effort.
The solution I ended up choosing was to just clear the z buffer again and reconstruct it during step #3. Since my scenes are so simple this gets me back to just slightly under 200 fps.
It's not an ideal solution, since I had in mind some uses for a stencil buffer laid down in step #1 that would accelerate step #2 (mainly, masking off unlit pixels for the skybox).
XNA's EDRAM handling is a great example of a leaky abstraction. Only having a 10 MB EDRAM buffer does make render target management trickier, but in Microsoft's attempt to completely hide it from XNA programmers, I think they've just made things more frustrating. The concept of a limited buffer for render targets is not that hard to get your head around. You have to understand EDRAM anyway since techniques in XNA that work perfectly on Windows (like what I was doing) will break on the 360. Even worse, you have no real good idea *why* it's breaking unless you understand the limitations of EDRAM and take a guess at what Microsoft is doing under the hood. So what is really being saved here? Just let me deal with EDRAM myself.
Sunday, June 14, 2009
Adventures in XNA continued
This weekend I played around in XNA a little bit more (completely personal stuff, nothing to do with work, opinions are my own, etc). I'm still find it very fun for the most part but the lack of access to the metal can be frustrating at times.
For the most part I've just been experimenting with deferred lighting. As far as what I'm trying to accomplish, I view this stuff like a musician doing scales. Good practice, but the goal is to get familiar with the techniques rather than produce anything "real".
I'd already built up a quick and dirty deferred lighting implementation a couple months before. This weekend I removed some of the hacks I had, added HDR + bloom, threw in some simple terrain, played around with LogLuv encoding, and fixed some artifacts from my first pass implementation.
I suppose that sounds a lot but the nice thing about XNA is there are a bazillion samples out there. The deferred lighting is the thing I'm really concentrating on, so for the other stuff I just grabbed anything I could find. Terrain, and HDR and bloom came pretty much as-is from samples/demos, as did a FPS counter.
As far as the deferred lighting goes, I finally got the half texel offset stuff cleared up. In Direct3D 9, pixel coordinates and texture coordinates don't line up, so when doing something like sampling a normal buffer or lighting buffer, if you don't offset them properly you'll be sampling the lighting from the wrong texel. This entry by Wolfgang Engel was a big help here.
Reading Engel's ShaderX7 article, I also understood why the specular lighting has to be multiplied by n dot l, and fixed up some artifacts I would have due to that (mainly specular appearing on backfaces).
My first pass at HDR used FP16 render targets for everything. I changed the final apply lighting pass to encode into LogLUV, and then implemented the encoding for the lighting buffer suggested by Pat Wilson's article in ShaderX7. A side effect of the very simple material model I'm using allowed me to use a 8:8:8:8 buffer for this and still allow for high range when accumulating lighting. I currently don't have separate diffuse and specular albedo, so when I apply lighting the equation looks like this:
This is part of the joy of a small demo - no artists telling me they have to have a separate specular albedo :). Anyway, I realized that I can just add those together before writing to the lighting buffer, and just do a straightforward encoding of the result in LogLUV space. I do want to put control of the glossiness in the material model, but that will require encoding the depth into 24 bits of a render target and then including an 8 bit specular power in the remainder. (I have to render depth to a separate target because XNA gives no facility for sampling the depth buffer).
In the process of all of this I wasn't quite getting the performance I wanted. I'm doing this on my Dell M1330 laptop which while no slouch, has trouble running World of Warcraft at a decent frame rate. But given such a simple scene, I was just shy of 60 fps so I decided to see if I could fire up NVPerfHUD and see what was going on. You can run NVPerfHUD with XNA apps, but a side effect I discovered is all vertex processing is done in software.
This is a bummer since it greatly throws off timings (a draw call for one mesh took 5 ms on the CPU for a unbelievably simple vertex shader), but I was able to find some GPU hotspots, some of which I at improved by pulling stuff out of the shader and on to the CPU.
Anyway, not sure how much I'll be working on this stuff but when I do I'll try to put the odd post up here. I haven't tried running my latest incarnation on the 360, which will probably be my next step. I think I've got the render targets set up so it should work, assuming the XNA runtime isn't doing anything retarded under the hood. But without PIX on the 360 it'll be hard to really dig into that.
For the most part I've just been experimenting with deferred lighting. As far as what I'm trying to accomplish, I view this stuff like a musician doing scales. Good practice, but the goal is to get familiar with the techniques rather than produce anything "real".
I'd already built up a quick and dirty deferred lighting implementation a couple months before. This weekend I removed some of the hacks I had, added HDR + bloom, threw in some simple terrain, played around with LogLuv encoding, and fixed some artifacts from my first pass implementation.
I suppose that sounds a lot but the nice thing about XNA is there are a bazillion samples out there. The deferred lighting is the thing I'm really concentrating on, so for the other stuff I just grabbed anything I could find. Terrain, and HDR and bloom came pretty much as-is from samples/demos, as did a FPS counter.
As far as the deferred lighting goes, I finally got the half texel offset stuff cleared up. In Direct3D 9, pixel coordinates and texture coordinates don't line up, so when doing something like sampling a normal buffer or lighting buffer, if you don't offset them properly you'll be sampling the lighting from the wrong texel. This entry by Wolfgang Engel was a big help here.
Reading Engel's ShaderX7 article, I also understood why the specular lighting has to be multiplied by n dot l, and fixed up some artifacts I would have due to that (mainly specular appearing on backfaces).
My first pass at HDR used FP16 render targets for everything. I changed the final apply lighting pass to encode into LogLUV, and then implemented the encoding for the lighting buffer suggested by Pat Wilson's article in ShaderX7. A side effect of the very simple material model I'm using allowed me to use a 8:8:8:8 buffer for this and still allow for high range when accumulating lighting. I currently don't have separate diffuse and specular albedo, so when I apply lighting the equation looks like this:
albedo*(diffuseLighting + specularLighting)
This is part of the joy of a small demo - no artists telling me they have to have a separate specular albedo :). Anyway, I realized that I can just add those together before writing to the lighting buffer, and just do a straightforward encoding of the result in LogLUV space. I do want to put control of the glossiness in the material model, but that will require encoding the depth into 24 bits of a render target and then including an 8 bit specular power in the remainder. (I have to render depth to a separate target because XNA gives no facility for sampling the depth buffer).
In the process of all of this I wasn't quite getting the performance I wanted. I'm doing this on my Dell M1330 laptop which while no slouch, has trouble running World of Warcraft at a decent frame rate. But given such a simple scene, I was just shy of 60 fps so I decided to see if I could fire up NVPerfHUD and see what was going on. You can run NVPerfHUD with XNA apps, but a side effect I discovered is all vertex processing is done in software.
This is a bummer since it greatly throws off timings (a draw call for one mesh took 5 ms on the CPU for a unbelievably simple vertex shader), but I was able to find some GPU hotspots, some of which I at improved by pulling stuff out of the shader and on to the CPU.
Anyway, not sure how much I'll be working on this stuff but when I do I'll try to put the odd post up here. I haven't tried running my latest incarnation on the 360, which will probably be my next step. I think I've got the render targets set up so it should work, assuming the XNA runtime isn't doing anything retarded under the hood. But without PIX on the 360 it'll be hard to really dig into that.
Subscribe to:
Posts (Atom)