On certain hardware, using anything but Keep in zfail can disable early stencil -- specifically, ATI PC hardware, and this caused quite a slowdown.
The solution I figured out (and I'm sure others have) was to switch to a method which only relies on stencil pass operations:
AlphaBlendEnable = false
StencilEnable = true
ColorWriteChannels = None
DepthBufferEnable = true
StencilDepthBufferFail = Keep
// render frontfaces so that any pixel in back of them have stencil decremented
CullMode = CounterClockwise
StencilFunction = Always
// If a pixel is in back of the volume frontface, then it is potentially inside the volume
StencilPass = Increment;
// render volume
// render backfaces so that only pixels in back of the backface have stencil decremented
CullMode = Clockwise
// pass stencil test if reference value < buffer, so we only process pixels marked above.
// Reference value is 0. This is not strictly necessary but an optimization
StencilFunction = Less
// If a pixel is back of the volume backface, then it is outside of the volume, and should not be considered
StencilPass = Decrement
// render volume
AlphaBlendEnable = true
ColorWriteChannels = RGB
// only process pixels with 0 < buffer
StencilFunction = Less
// zero out pixels for so we don't need a separate clear for next volume
StencilPass = Zero
//render a screen space rectangle scissored to the projection of the light volume
There is a problem with this method -- if the light volume intersects the near plane, it won't work, because the portion of the light volume in front of the near plane will never increment the stencil buffer.
My solution to this was pretty simple -- if the light volume intersects the near plane, I use the zfail method from the earlier post. Otherwise, I use the stencil pass operation. For most lights, we're using the fastest path on both the major brands of cards. I briefly scanned some papers and articles on shadow volumes (a very similar problem), hoping to find an alternate way to cap volumes intersecting the near plane, but didn't see anything that looked particularly easy to implement or would necessarily perform that well, and this method got performance on ATIs and Nvidias mostly on par.
What about two-sided stencil? This is a mode in DX9 where you can render both backfaces and frontfaces in one pass, with separate stencil operations on each. Because the stencil increment/decrement operations wrap around (i.e. decrementing 0 becomes 255, incrementing 255 becomes 0), ordering doesn't really matter (although you have to make the StencilFunction Always on both). I did some quick tests using two sided stencil and my initial results showed it was actually slower than rendering both passes separately. I didn't spend much time on it so it is possible that I simply screwed something up, and plan to revisit it at some point.
This comment has been removed by a blog administrator.
ReplyDelete