Sunday, January 25, 2009

C0DE517E on understanding your choices

Link (Too Young)

And when something does not look right, do not fix it, but find out first why it does not.

If it's too saturated, the solution is not to add a desaturation constant, but first to check out why. Are we taking a color to an exponent? What sense that operation has? Why we do that? Is the specular too bright? Why? Is our material normalized, or does it reflect more light than the one it receives? Think.
The entry I'm linking today wanders around a bit but eventually lands in a good spot.

Ultimately with games we're just trying to make something fun, and being visually interesting is part of that. We're not in the business of shipping perfectly accurate simulations of light, nor is that possible anyway. It may not be desirable depending on your art style -- I've always felt the ongoing arguments about "graphics not being important" in games is more about "photorealism is not the end all be all." 

Photorealism in games is a discussion for another entry some other day. Back to the linked article, if I were to sum it up I would say it is about understanding your choices. A well placed hack is often necessary, but do you understand what the hack does and (often more important) why it is needed?

In rendering we are almost always approximating some ideal, be it the behavior of reflected light on various surfaces or the behavior of skin over muscle, fat, and bone. The ideal may not be something that exists in the real world -- it may be a stylized alternate reality created only in the minds of your artists. Even these alternate realities have consistent rules, ones often based in the physical realities of the real world, or more importantly, based on how humans perceive those physical realities. If you go back to old Walt Disney cartoons, there is a consistency of action and reaction in any given movie, a set of unwritten rules that the animators provided for their audience. 

So as the author suggests, when presented with something that doesn't look right, a little thought into why it doesn't look right can go a long way. What ideal are you trying to approximate? Why does your approximation produce wrong results to the naked eye? How can you better improve the approximation to produce the desired results? Some times, you may find a bug that can be fixed, or a better technique for fixing the issue.

It may be the case that the answer is to add a simple constant hack. If you go through the process of determining what is going wrong, you will at least have a much better understanding of why that hack is necessary, and how it interacts with the rest of the system. This is true beyond just rendering; understanding the choices you make is key to good software engineering in general. 

Wednesday, January 21, 2009

The world is... the world

(This is the third and final part of a three part series. Part 1. Part 2. )

We've established that a generic tree implementation is not a good choice for implementing the SceneTree data structure and algorithms. This begs the question: then why call this thing a tree?

A group of people I work with obsess over naming classes, systems, functions, member variables, etc. We view it as a very important part of programming -- the name should describe what it does. If I have trouble coming up with a good name for something, then maybe it isn't clear to me what it does, and if it is not clear to me, how is it going to be clear to others?

The best documentation of the code is always going to be the code itself. Obviously, you want to comment your code, but if you pick clear and concise names for things, there is less you need to communicate via comments. If a user comes across a function named DrawCircle, it is a pretty good bet that function will draw a circle. If it doesn't draw a circle, or it draws a circle and formats the hard drive, that would be quite a surprise.

The name SceneTree implies both an interface and an implementation that is based on trees. We've seen from the previous entries that we don't need or want either. So naming it SceneTree and delivering something else would be a case of bad naming.

I don't have an alternate suggestion off the top of my head. To be honest, I'm not absolutely sure we need a separate name for transform update. The important concept is that we separate transform update from the renderer. I've worked in frameworks where this transform update was part of the World system.

In summary, a generic tree design and implementation is not a good choice for world object transform update and hierarchy composition. This is due to many of the same reasons that make a scene graph a bad choice, and due to the way that gameplay code tends to access transforms and bones. Given that a generic tree is a bad choice, the name SceneTree is a bad name.

Tuesday, January 13, 2009

Characters are a special sort of tree but not a SceneTree

(This is the second part of a three part series, here is the first part. )

At this point we've established for the vast majority of objects in our world, a SceneTree implemented as a straightforward tree structure is not a good fit. But what about characters or other objects that are driven by skeletal animation?

At first glance, a straightforward tree structure seems like a really good fit. What is a skeleton if not if bones connected by joints, where each bone may have many children but only one parent? Each bone's transform is relative to its parent. To compose the world transforms of each bone in the skeleton, we merely traverse the tree, multiplying each bone's relative transform with the world transform

If we have a straightforward tree data structure, then we have a base SceneTreeNode class, and anything that we need to be in the tree derives from that node. Well, our bones are a tree, so it makes sense to make a bone a node, right?
class Bone : public SceneTreeNode
{
};
I'm going to describe why the above is not the best decision on a number of fronts . It certainly works, it is simple, tons of commercial and noncommercial projects have used it, what could possibly be the problem?

Let's think about this in terms of our gameplay code, the high level code that makes our game different than every other game out there. We want this code to be as easy to write as is possible. One part of accomplishing this is to hide any complexity the gameplay code doesn't need to know about from it.

Gameplay code doesn't need to know about the hierarchy, and in fact, is going to be much happier if it doesn't know about it. It usually just wants to retrieve a small handful of specific bones' transforms, or attaches other objects to a bone. With Bones as full fledged citizens in the SceneTree, and a straightforward tree structure as the implementation, how would gameplay code go about this? It would need to traverse the SceneTree to find the bone it is interested in and retrieve the cached world transform. This is not a very convenient, and we'd probably add a GetBoneTransform helper to the Character node to hide these details.

We've still got an implementation of GetBoneTransform that hops around a lot of different pieces of memory, causing cache misses all along the way. Maybe this is a performance bottleneck, so we decide to use some kind of efficient indexed lookup to cache the bones at the character node level. We implement GetBoneTransform in terms of this efficient lookup. Attachments can be handled similarly -- rather than use the built-in tree traversal mechanisms, most likely the code will end up caching that list of attachments somewhere else for easy access in gameplay code

If we're going to abstract the tree away from gameplay code, then what is the advantage to making bones full-fledged citizens in the tree? In fact, there are significant design and performance disadvantages.

Bone hierarchies are mostly static, most of the time. I say mostly because sometimes a game may swap different body parts in, or switch level of detail on the skeletons. Practically, though, the hierarchy doesn't really change in any sort of significant fashion. Given this knowledge, a much better implementation is to lay out our bones in a flat array, with a simple index to their parent. This index may be in a separate array depending on cache usage patterns. The array can be laid out in such a way that compositing relative transforms we get from the animation system into absolute transforms is a simple run through an array. There are tricks you can use to make this more efficient, of course, but the point is an array traversal is going to be much better than hopping all around memory calling virtual functions on a base SceneTreeNode class. The approach also lends itself much better to offloading to another CPU due to better memory locality, or easier DMA if the CPU has NUMA.

Do we need a tree at all? Not that I can see -- from the previous entry we've got a list of root level objects, some of which may have an array of tightly packed bones, and an array of attachments. Their attachments may also have bones and attachments, so yes, conceptually, we've got a tree that we can traverse. But gameplay code never needs to know anything about this for the most part, as it deals with bones and attachments directly and isn't very concerned about the internals of the hierarchy.

We don't need a base SceneTreeNode class that is exposed for all -- the internals of how we update attachments and bones and objects are just that: internal. As we've shown, a straightforward tree structure doesn't really fit what we want to do very well as an implementation. From experience, you can spend a fair amount of time optimizing your updates of objects. The ability to special case, offload work to other CPUs, or any number of other abstractions makes it very easy to do so without breaking gameplay code. A generic tree structure does not provide the API we need for gameplay code, nor does it provide a good implementation at the level.

Tomorrow I will conclude with some thoughts on why naming matters.

Monday, January 12, 2009

The world is not a SceneTree

I wanted to expand on something I thought of while writing my thoughts on the SceneTree. Before I made the argument that transform update and animation should not be tightly coupled with rendering. Now I want to touch on something else: it may be useful to think of the SceneTree as conceptually a tree, but the API and implementation should not be implemented as a global tree data structure.

I jumped the gun on posting this entry, so those of you with RSS feeds may have seen an earlier draft go by. It was pointed out to me that perhaps I was trying to tackle too much in one entry, so I decided to split it up. The win for the reader is (hopefully) less jumbled thoughts from me. The win for me is three days' worth of entries. Win-win!

The title of this entry was shamefully stolen from Tom Forsyth's article on scene graphs (just say no):
The world is not a big tree - a coffee mug on a desk is not a child of the desk, or a sibling of any other mug, or a child of the house it's in or the parent of the coffee it contains or anything - it's just a mug. It sits there, bolted to nothing. Putting the entire world into a big tree is not an obvious thing to do as far as the real world is concerned.
Very succinctly put. Tom was talking about scene graphs, but the same criticism applies to the SceneTree introduced in the previous entry. Summing it up, we replace the scene graph with three systems: the SpatialGraph, the RenderQueue, and the SceneTree. The SpatialGraph deals with visibility determination. The RenderQueue is constructed from the SpatialGraph and efficiently issues render commands. The SceneTree handles animation and transform update. It is certainly a better breakdown that scene graphs, which try to do it all. Unfortunately, Tom's criticism of scene graphs still applies to the idea of a SceneTree.

I have no idea if the original author of the SceneTree concept is using a global tree structure. I just want to demonstrate that a straightforward implementation of SceneTree which used such a structure suffers from many of the same problems that a scene graph does. Given the amount of scene graphs I've seen over time, it is obvious that some people might be tempted to choose a global tree structure to implement a SceneTree.

First, let's take a step back and think about what we really need to accomplish with the SceneTree:
  • We have a number of objects that are in the world.
  • These objects have transforms describing where they are in the world.
  • For a relatively small number of objects, their transforms may be relative to some other frame of reference (i.e. a gun may be attached to a character).
  • There may be different types of attachment -- you may have situations where you want translation to be relative and rotation absolute, or vice versa. Some attachments may be accomplished through physical constraints simulated in the physics system rather than perfect constraints in the update system.
  • When a transform changes due to animation, physics, or other means, we want those changes to propagate to relative objects in the most efficient manner without doing redundant work.
  • We want to have a large number of objects in the world with a large number of characters receiving animation.
What you'll find is while we have the concept of a global hierarchy, in reality we have a mostly boring common case and a relatively small number of objects that actually have hierarchical relationships.

The boring case is what Tom describes. Look around the room you are in right now, and you will find the vast majority of objects are just "there". They do not have any sort of child, parent, or sibling relationship with each other. While most will have physical constraints (gravity pushing down, whatever they are on pushing up), this is due to physical interaction, not a familial relationship. In most games, this sort of interaction is either static or simulated by a physics system.

Having one root node for the entire world and having all these objects have be children of that root node makes no sense, or at the very least conveys no useful information. What does it mean to be a child of "the world" ? In a game, most of these things are going to be static -- they are the literal immovable object. Why clutter up our SceneTree with such things? Sure we might want to attach something to them (destructible lamp on a wall, for example), but that's the same as attaching it to the root frame of reference if the wall doesn't move. There may be some other special cases specific to a game (the wall can not move but it can disappear), but that's just it -- they are special cases that can be handled at a higher level.

There may be no static object at all to put in a tree. Many games treat all the static objects in an atomically streamable unit as one giant blob. There's no reason to deal with individual objects if you can just submit a precomputed command list to the renderer, or an efficiently packed polygon soup to the physics system. 

Most of the same arguments apply with dynamic things -- the familial relationship with "the world" is not useful. What we've really got is a lot of potential root nodes for attachment -- a list or array of conceptual trees. But if nothing is attached to an object, does it need a hierarchical update? Probably not. Now, for code simplicity, it may be simpler to just treat those as a 1-node hierarchy, but if your number of objects which have attachments is relatively low to everything else, you may be better off using a separate data structure which only holds things that have an actual hierarchy. 

Dynamic objects provide other interesting opportunities for optimization which do not map well to a generic tree structure. For objects that are physically simulated, the code can take advantage of the physics system's knowledge of what actually moved that frame, and only update those objects. This is considerably more efficient than traversing an entire SceneTree and checking if an object needs to update. Obviously none of this precludes having a generic tree structure, it just brings into question what good it does.

We've established that for the grand majority of our objects, we do not need a generic tree representation to process transform updates. We still have cases that do indeed have hierarchical relationships such as characters or attachments. Tomorrow, I will discuss why a generic tree structure is not a good choice for characters and bones.

Friday, January 9, 2009

Joe Duffy on CAS operations and performance

Interesting results in a blog entry from Joe Duffy about the scalability of compare-and-swap instructions across x86 architectures. I found this via the gdalgorithms list. The money graph:

The most common occurrence of a CAS is upon lock entrance and exit. Although a lock can be built with a single CAS operation, CLR monitors use two (one for Enter and another for Exit). Lock-free algorithms often use CAS in place of locks, but due to memory reordering such algorithms often need explicit fences that are typically encoded as CAS instructions. Although locks are evil, most good developers know to keep lock hold times small. As a result, one of the nastiest impediments to performance and scaling has nothing to do with locks at all; it has to do with the number, frequency, and locality of CAS operations.

One thing that has been talked about on the recent gdalgorithms thread is understanding that your only options are not between lock-free and heavy operating system lock primitives. Some operating systems provide lighter weight synchronization primitives (critical sections on Win32 with a spincount come to mind). In the past on some consoles I've come up with faster mutex implementations than what the operating system provides, although this is not something I'd recommend to the weak of heart -- it is unbelievably easy to get this stuff wrong, even for very experienced developers.

I think the lesson from this entry is to always measure. Maybe for a specific application a fancy lock-free algorithm may be faster, but maybe not -- as we see from above it can depend on a lot of factors. The other thing to avoid is premature optimization -- get your algorithm correct and without races first using locks, and then evaluate whether a lock-free algorithm is appropriate and faster.

Sunday, January 4, 2009

Animation and physics sitting in a tree

Over at Diary of a Graphics Programmer, Wolfgang Engel points out a gamedev.net post introducing some nomenclature for the systems involved with rendering scene geometry. 

Instead of a scene graph approach (just say no), they propose the SpatialGraph (for visibility determination), the SceneTree (for animation and transform update), and the RenderQueue (filled from the SpatialGraph, just draws stuff fast). It is a division that makes much more sense than a scene graph, which tries to handle all of the above. 

One of my biggest dislikes about scene graphs is how they misled us all to think that animation and transform update belongs in the renderer in an engine design. The renderer just deals with the results of these operations -- it doesn't much care how the transforms were created, it just needs them. Coupling these two things doesn't really make much sense.

If anything, animation/transform update is much more tightly coupled with physics and collision. Conceptually animation is saying "here's where I would like to go" and the physics system says "here's where you can go." It is even more intertwined than that, because the physics system has complete control over things like rigid bodies and ragdolls -- saying both where they would like to go and where they can go. 

If you are doing any sort of ragdoll blending between animation and ragdoll simulation, you have a feedback loop. The animation system figures out its desired pose, and the ragdoll figures out its desired pose. There is a blending step but its not always obvious where that should go. Traditionally the animation system is responsible for blending transforms, but there's an argument that the physics simulation should do it because it knows the constraints and limitations on where the bones can be physically placed. 

I haven't gotten into other interesting issues such as vehicles, which can be a blend of physical simulation (the car moving) and animation (the door being opened) similar to ragdolls, and when you add attachments to the mix (gun turret), keeping the animation and physics systems in sync can be a challenge. 

I'm starting to think that animation and physics are two sides of the same coin. I'm calling the combined thing "simulation." Obviously different games are going to have different complexity of physics, and I'm not sure coupling these two things so tightly is a one size fits all thing. What I do know is that coupling animation/transform update with the renderer is almost never the right thing to do, even though there are still a large number of scene graph based rendering libraries available. 

Saturday, January 3, 2009

Good Middleware revisited

I was having a conversation with a good friend about the Good Middleware entry, and he made a very interesting point. He pointed out that in the future, good middleware is going to allow you to hook their multicore job scheduling just like you can hook memory allocation.

Let's start out with a quick review of why we hook memory allocation, as it will help us understand why we're going to want to hook job scheduling.
  1. Consoles are fixed memory environments. It is critical to balance overall memory utilization for your title. If you can provide five more megabytes for game data over the built-in system allocator, then that allows you to have more stuff in your game, which generally can lead to a better game.
  2. Different titles have different memory needs. A 2D side-scroller is going to have different memory requirements than a world game. Some types of games lend themselves to pre-allocation strategies, where you know ahead of time how many assets of what type you are going to have. Every game I've worked on has spent time tuning the memory allocator to squeeze out as much memory as possible, and what works in one title may not be applicable to another. 
  3. The system allocator may perform badly, particularly when accessed from multiple threads. Often, console operating systems will have poor allocator implementations that have a lot of overhead. Sometimes they will use allocators that are designed for a much wider range of application than games, or designed for systems with virtual memory. Others explicitly punt on providing anything but systems designed for large, bulk allocations and expect you to handle smaller allocations on your own. Finally, no matter how good an allocator may be, it does not have knowledge of your application's memory needs or usage patterns, which are things you can use to optimize your allocator
  4. The system allocator may not provide good leak detection or heap debugging tools. It may provide none at all. If you're doing a console game, I know of no standalone leak detection software such as BoundsChecker that is available. Often you have to provide these facilities yourself.

 CPU core usage can be thought of as a system resource that is arbitrated just like memory. Most middleware that does use jobs spread across multiple cores I've seen so far usually has its own internal scheduler running on top of a couple hardware threads. In the early days of multicore middleware, you couldn't even set the CPU affinity of these vendor threads without recompiling the library, something which is key on the 360. Increasingly, I think games are going to want to own their job scheduling and not just leave it up to the operating system to arbitrate CPU scheduling at the thread level. 
  1. Consoles have a fixed number of cores. Goes without saying but worth touching on. You have a fixed amount of CPU processing you can do in a 30hz/60hz frame. Ideally you want to maximize the amount of CPU time spent on real game work and minimize the amount spent on bookkeeping/scheduling. 
  2. You know more than the operating system does about your application. One approach to scheduling jobs would be to just create a thread per job and let them go. You could do that, but you would then watch your performance tank. Forget the fact that creating/destroying operating system threads is not a cheap task, and focus on the cost of context switches. Thread context switches are very expensive and with CPU architectures adding more and more registers, they are not going to get any cheaper. A lot of game work are actually small tasks more amenable to an architecture where a thread pool sits and just executes these jobs one after another. In this architecture, you do not have any thread context switches between jobs or while jobs are running, which means you are spending more time doing real game work and less time on scheduling.
  3. It is difficult to balance between competing job schedulers. Imagine a world with 16 to 32 cores, and both your middleware and your game need to schedule numerous jobs to a thread pool. It is going to be very difficult to avoid stalls or idle time due to the fact that you have two job schedulers in your system -- one from the middleware and one from your game. Either you are "doubling up" thread allocation on cores and paying the cost of context switches from one thread pool to the next, or you allocate each thread pool to a different set of cores and hope that the work balances out. Unfortunately, I don't think the latter is going to scale very well and you will end up with a lot of unused CPU capacity.
The third reason is the clincher as to why good middleware in the future will allow you to replace their job scheduler with your own. It's the only solution I can think of that will allow you to balance workloads across many, many cores. It is possible in the future that operating systems will provde good enough thread pool/microjob APIs that we won't have to be writing our own, but even in that case, you'd still want the hooks for debugging or profiling purposes. 

I haven't yet seen any middleware that really allows you to hook this stuff easily. Have you?



Friday, January 2, 2009

RTCD on Particle Rendering

Optimizing the rendering of a particle system.

Real Time Collision Detection has an entry on optimizing particle rendering which is a good survey of many techniques you can use to make your particles fly. Don't forget to read the comments, there are some good suggestions in there, too!

Over the last five+ years or so people have been a lot more open about this kind of stuff, and I like it. There's just too much you have to deal with these days to write a good game to have to figure it out on your own. I don't know about other people's experiences but earlier in my career there seemed to be much more of a secretive culture in game development than other types of software.

Whether it is entries like the above or things like Insomniac's Nocturnal initiative, people are much more open about ideas these days. Ultimately, we all have to execute, and doing that well is hard enough on its own. Being more open is healthy and ultimately will lead to better games across the board.