Tuesday, December 22, 2009

More Stencil States for Light Volume Rendering

A while back I wrote a short entry on stencil states for light volumes. The method I posted works but relies on using a zfail stencil operation. Shortly after, I quickly discovered that it ran considerably slower on ATI cards than on the original Nvidia card I had been writing on, and have been meaning to post an update.

On certain hardware, using anything but Keep in zfail can disable early stencil  -- specifically, ATI PC hardware, and this caused quite a slowdown.

The solution I figured out (and I'm sure others have) was to switch to a method which only relies on stencil pass operations:

AlphaBlendEnable = false
StencilEnable = true

ColorWriteChannels None

DepthBufferEnable = true

StencilDepthBufferFail Keep

// render frontfaces so that any pixel in back of them have stencil decremented
CullMode CounterClockwise
StencilFunction Always

// If a pixel is in back of the volume frontface, then it is potentially inside the volume
StencilPass Increment;

// render volume

// render backfaces so that only pixels in back of the backface have stencil decremented
CullMode Clockwise
// pass stencil test if reference value < buffer, so we only process pixels marked above.
// Reference value is 0. This is not strictly necessary but an optimization
StencilFunction Less
// If a pixel is back of the volume backface, then it is outside of the volume, and should not be considered

// render volume
AlphaBlendEnable = true
ColorWriteChannels RGB
// only process pixels with 0 < buffer
StencilFunction Less
// zero out pixels for so we don't need a separate clear for next volume
StencilPass Zero

//render a screen space rectangle scissored to the projection of the light volume

There is a problem with this method -- if the light volume intersects the near plane, it won't work, because the portion of the light volume in front of the near plane will never increment the stencil buffer.

My solution to this was pretty simple -- if the light volume intersects the near plane, I use the zfail method from the earlier post. Otherwise, I use the stencil pass operation. For most lights, we're using the fastest path on both the major brands of cards. I briefly scanned some papers and articles on shadow volumes (a very similar problem), hoping to find an alternate way to cap volumes intersecting the near plane, but didn't see anything that looked particularly easy to implement or would necessarily perform that well, and this method got performance on ATIs and Nvidias mostly on par.

What about two-sided stencil? This is a mode in DX9 where you can render both backfaces and frontfaces in one pass, with separate stencil operations on each. Because the stencil increment/decrement operations wrap around  (i.e. decrementing 0 becomes 255, incrementing 255 becomes 0), ordering doesn't really matter (although you have to make the StencilFunction Always on both). I did some quick tests using two sided stencil and my initial results showed it was actually slower than rendering both passes separately. I didn't spend much time on it so it is possible that I simply screwed something up, and plan to revisit it at some point.

1 comment: