Monday, March 30, 2009

Please Get Your Physics Off My GPU

I hope to have some more substantial thoughts on GDC, but one nice trend was a number of talks that focused on offloading graphics work off of the GPU and onto CPUs (in this case, SPUs on the PS3).

For the last couple of years there has been a major push by the graphics card manufacturers to get non-graphic-y things onto the GPU. Cloth, physics, heck I'm sure there's even some GPU AI examples out there somewhere. These are things that console game developers I know don't particularly want or need.

The lion share of console games are GPU bound. The last thing I want to do is put more stuff on the GPU. So even if your cloth or physics solution runs really fast on the GPU, I'm not going to use it because there is no room at the inn. Even if a CPU solution is slower, it won't matter since I've got the spare processing capacity due to having to wait on the GPU, or have processing elements that are not used during a frame.

What I want to do is offload as much as possible to the CPU, since most games still probably are not maxing out the CPU capabilities of the PS3 or 360. It was nice to see some talks focusing on doing hybrid GPU/CPU solutions to things such as lighting or post processing, and I imagine this trend will continue.

Monday, March 23, 2009

The Problem with GDC

It always seems like I'll have a day that has 4 or 5 sessions booked at the same time that I want to see, and then on another day have some time slots with nothing that I want to see. Obviously, you can't please all of the people all of the time, but Wednesday definitely seems like the busy day.

Anyway, here's what my current schedule is looking like (I am not sure why blogger is inserting all this whitespace):


























































Session Title Date Start Time End Time
Discovering New Development Opportunities 03-25-2009 9:00 AM 10:00 AM
Hitting 60Hz with the Unreal Engine: Inside the Tech of MORTAL KOMBAT vs DC UNIVERSE 03-25-2009 10:30 AM 11:30 AM
Next-Gen Tech, but Last-Gen Looks? Tips to Make your Game Look Better - That Don't Include Bloom and Motion Blur. 03-25-2009 12:00 AM 1:00 PM
Out of Order: Making In-Order Processors Play Nicely 03-25-2009 2:30 PM 3:30 PM
Deferred Lighting and Post Processing on PlayStation(R)3 03-25-2009 4:00 PM 5:00 PM
The Unique Lighting in MIRROR'S EDGE: Experiences with Illuminate Labs Lighting Tools 03-26-2009 9:00 AM 10:00 AM
From Pipe Dream to Open World: The Terraforming of FAR CRY 2 03-26-2009 1:30 PM 2:30 PM
Morpheme & PhysX: A New Approach to Combining Character Animation and Simulation 03-26-2009 4:30 PM 5:30 PM
The Cruise Director of AZEROTH: Directed Gameplay within WORLD OF WARCRAFT 03-26-2009 3:00 PM 4:00 PM
Fast GPU Histogram Analysis and Scene Post-Processing 03-27-2009 9:00 AM 9:20 AM
Mixed Resolution Rendering 03-27-2009 10:30 AM 10:50 AM
Rendering Techniques in GEARS OF WAR 2 03-27-2009 2:30 PM 3:30 PM
Dynamic Walking with Semi-Procedural Animation 03-27-2009 4:00 PM 5:00 PM

Thursday, March 19, 2009

Changes

A short personal update:

Last Friday was my last day at Midway Games. After seven and a half years I decided it was time for me to move on and will soon be pursuing another opportunity. It was sad to leave in some ways, as there are many people there who I have enjoyed working with, but it was the right time for a move.

I'll have updates about the new opportunity soon, and hopefully some more content.

Tuesday, March 10, 2009

What Will the Future Bring?

It's been pretty obvious for a while that boxed retail in games will die someday, but recent news that Amazon will buy and sell used games definitely seems like one of the nails in the coffin.

I can't blame retailers such as Amazon, or BestBuy for entering the used game market, but I really wonder when the major players in the game industry will wake up and realize that the day when all games will be downloadable is sooner rather than later.

Another way to phrase the question: when will a major console manufacturer release a console that does not contain any sort of removable DVD/bluray drive?

Consumers are already used to downloadable games in many other forms - cell phones, the iPhone, web games, Steam, Gametap, XBLA, PSN store, etc. There are even rumors that the next version of the PSP will not contain a UMD drive.

So why not cut the cord? Imagine a console comes out in 2012 which does not contain any sort of optical drive, and instead just a large hard drive. All games, not just ones deemed small enough for a special "arcade section, are downloadable. 

There would be issues. Currently US broadband adoption rates are at 59% of households, well behind Japan or Europe. Even optimistic projections estimate that by 2012 only 77% of US households will have broadband, although it would be interesting to know what percentage of console gamers will have broadband. Still, it would be a definite leap of faith to exclude such a large percentage of households from buying your product.

As far as the user experience goes, I don't think there would be many problems. Even if games took multiple hours to download, I don't see how that is any worse than getting a game from GameFly or Amazon is now. Steam has experimented with allowing users to "predownload" popular titles before their release date, and a similar model could be used for the users that just gotta-have-it the day of release. 

Another advantage of this approach is some savings on cost-of-goods on the consoles themselves -- for example, the bluray drive in the PS3 is probably a big driver of the total cost of that system. 

The only question is how retailers would react. They could threaten to not sell the console hardware, but a colleague of mine had an idea about that: prepaid download codes. Retailers could sell these along with the hardware. It won't be as lucrative as current boxed retail sales, but then again, by pushing used game sales so hard, the retailers are eventually going to force game publishers and console manufacturers' hands.

Sunday, January 25, 2009

C0DE517E on understanding your choices

Link (Too Young)

And when something does not look right, do not fix it, but find out first why it does not.

If it's too saturated, the solution is not to add a desaturation constant, but first to check out why. Are we taking a color to an exponent? What sense that operation has? Why we do that? Is the specular too bright? Why? Is our material normalized, or does it reflect more light than the one it receives? Think.
The entry I'm linking today wanders around a bit but eventually lands in a good spot.

Ultimately with games we're just trying to make something fun, and being visually interesting is part of that. We're not in the business of shipping perfectly accurate simulations of light, nor is that possible anyway. It may not be desirable depending on your art style -- I've always felt the ongoing arguments about "graphics not being important" in games is more about "photorealism is not the end all be all." 

Photorealism in games is a discussion for another entry some other day. Back to the linked article, if I were to sum it up I would say it is about understanding your choices. A well placed hack is often necessary, but do you understand what the hack does and (often more important) why it is needed?

In rendering we are almost always approximating some ideal, be it the behavior of reflected light on various surfaces or the behavior of skin over muscle, fat, and bone. The ideal may not be something that exists in the real world -- it may be a stylized alternate reality created only in the minds of your artists. Even these alternate realities have consistent rules, ones often based in the physical realities of the real world, or more importantly, based on how humans perceive those physical realities. If you go back to old Walt Disney cartoons, there is a consistency of action and reaction in any given movie, a set of unwritten rules that the animators provided for their audience. 

So as the author suggests, when presented with something that doesn't look right, a little thought into why it doesn't look right can go a long way. What ideal are you trying to approximate? Why does your approximation produce wrong results to the naked eye? How can you better improve the approximation to produce the desired results? Some times, you may find a bug that can be fixed, or a better technique for fixing the issue.

It may be the case that the answer is to add a simple constant hack. If you go through the process of determining what is going wrong, you will at least have a much better understanding of why that hack is necessary, and how it interacts with the rest of the system. This is true beyond just rendering; understanding the choices you make is key to good software engineering in general. 

Wednesday, January 21, 2009

The world is... the world

(This is the third and final part of a three part series. Part 1. Part 2. )

We've established that a generic tree implementation is not a good choice for implementing the SceneTree data structure and algorithms. This begs the question: then why call this thing a tree?

A group of people I work with obsess over naming classes, systems, functions, member variables, etc. We view it as a very important part of programming -- the name should describe what it does. If I have trouble coming up with a good name for something, then maybe it isn't clear to me what it does, and if it is not clear to me, how is it going to be clear to others?

The best documentation of the code is always going to be the code itself. Obviously, you want to comment your code, but if you pick clear and concise names for things, there is less you need to communicate via comments. If a user comes across a function named DrawCircle, it is a pretty good bet that function will draw a circle. If it doesn't draw a circle, or it draws a circle and formats the hard drive, that would be quite a surprise.

The name SceneTree implies both an interface and an implementation that is based on trees. We've seen from the previous entries that we don't need or want either. So naming it SceneTree and delivering something else would be a case of bad naming.

I don't have an alternate suggestion off the top of my head. To be honest, I'm not absolutely sure we need a separate name for transform update. The important concept is that we separate transform update from the renderer. I've worked in frameworks where this transform update was part of the World system.

In summary, a generic tree design and implementation is not a good choice for world object transform update and hierarchy composition. This is due to many of the same reasons that make a scene graph a bad choice, and due to the way that gameplay code tends to access transforms and bones. Given that a generic tree is a bad choice, the name SceneTree is a bad name.

Tuesday, January 13, 2009

Characters are a special sort of tree but not a SceneTree

(This is the second part of a three part series, here is the first part. )

At this point we've established for the vast majority of objects in our world, a SceneTree implemented as a straightforward tree structure is not a good fit. But what about characters or other objects that are driven by skeletal animation?

At first glance, a straightforward tree structure seems like a really good fit. What is a skeleton if not if bones connected by joints, where each bone may have many children but only one parent? Each bone's transform is relative to its parent. To compose the world transforms of each bone in the skeleton, we merely traverse the tree, multiplying each bone's relative transform with the world transform

If we have a straightforward tree data structure, then we have a base SceneTreeNode class, and anything that we need to be in the tree derives from that node. Well, our bones are a tree, so it makes sense to make a bone a node, right?
class Bone : public SceneTreeNode
{
};
I'm going to describe why the above is not the best decision on a number of fronts . It certainly works, it is simple, tons of commercial and noncommercial projects have used it, what could possibly be the problem?

Let's think about this in terms of our gameplay code, the high level code that makes our game different than every other game out there. We want this code to be as easy to write as is possible. One part of accomplishing this is to hide any complexity the gameplay code doesn't need to know about from it.

Gameplay code doesn't need to know about the hierarchy, and in fact, is going to be much happier if it doesn't know about it. It usually just wants to retrieve a small handful of specific bones' transforms, or attaches other objects to a bone. With Bones as full fledged citizens in the SceneTree, and a straightforward tree structure as the implementation, how would gameplay code go about this? It would need to traverse the SceneTree to find the bone it is interested in and retrieve the cached world transform. This is not a very convenient, and we'd probably add a GetBoneTransform helper to the Character node to hide these details.

We've still got an implementation of GetBoneTransform that hops around a lot of different pieces of memory, causing cache misses all along the way. Maybe this is a performance bottleneck, so we decide to use some kind of efficient indexed lookup to cache the bones at the character node level. We implement GetBoneTransform in terms of this efficient lookup. Attachments can be handled similarly -- rather than use the built-in tree traversal mechanisms, most likely the code will end up caching that list of attachments somewhere else for easy access in gameplay code

If we're going to abstract the tree away from gameplay code, then what is the advantage to making bones full-fledged citizens in the tree? In fact, there are significant design and performance disadvantages.

Bone hierarchies are mostly static, most of the time. I say mostly because sometimes a game may swap different body parts in, or switch level of detail on the skeletons. Practically, though, the hierarchy doesn't really change in any sort of significant fashion. Given this knowledge, a much better implementation is to lay out our bones in a flat array, with a simple index to their parent. This index may be in a separate array depending on cache usage patterns. The array can be laid out in such a way that compositing relative transforms we get from the animation system into absolute transforms is a simple run through an array. There are tricks you can use to make this more efficient, of course, but the point is an array traversal is going to be much better than hopping all around memory calling virtual functions on a base SceneTreeNode class. The approach also lends itself much better to offloading to another CPU due to better memory locality, or easier DMA if the CPU has NUMA.

Do we need a tree at all? Not that I can see -- from the previous entry we've got a list of root level objects, some of which may have an array of tightly packed bones, and an array of attachments. Their attachments may also have bones and attachments, so yes, conceptually, we've got a tree that we can traverse. But gameplay code never needs to know anything about this for the most part, as it deals with bones and attachments directly and isn't very concerned about the internals of the hierarchy.

We don't need a base SceneTreeNode class that is exposed for all -- the internals of how we update attachments and bones and objects are just that: internal. As we've shown, a straightforward tree structure doesn't really fit what we want to do very well as an implementation. From experience, you can spend a fair amount of time optimizing your updates of objects. The ability to special case, offload work to other CPUs, or any number of other abstractions makes it very easy to do so without breaking gameplay code. A generic tree structure does not provide the API we need for gameplay code, nor does it provide a good implementation at the level.

Tomorrow I will conclude with some thoughts on why naming matters.

Monday, January 12, 2009

The world is not a SceneTree

I wanted to expand on something I thought of while writing my thoughts on the SceneTree. Before I made the argument that transform update and animation should not be tightly coupled with rendering. Now I want to touch on something else: it may be useful to think of the SceneTree as conceptually a tree, but the API and implementation should not be implemented as a global tree data structure.

I jumped the gun on posting this entry, so those of you with RSS feeds may have seen an earlier draft go by. It was pointed out to me that perhaps I was trying to tackle too much in one entry, so I decided to split it up. The win for the reader is (hopefully) less jumbled thoughts from me. The win for me is three days' worth of entries. Win-win!

The title of this entry was shamefully stolen from Tom Forsyth's article on scene graphs (just say no):
The world is not a big tree - a coffee mug on a desk is not a child of the desk, or a sibling of any other mug, or a child of the house it's in or the parent of the coffee it contains or anything - it's just a mug. It sits there, bolted to nothing. Putting the entire world into a big tree is not an obvious thing to do as far as the real world is concerned.
Very succinctly put. Tom was talking about scene graphs, but the same criticism applies to the SceneTree introduced in the previous entry. Summing it up, we replace the scene graph with three systems: the SpatialGraph, the RenderQueue, and the SceneTree. The SpatialGraph deals with visibility determination. The RenderQueue is constructed from the SpatialGraph and efficiently issues render commands. The SceneTree handles animation and transform update. It is certainly a better breakdown that scene graphs, which try to do it all. Unfortunately, Tom's criticism of scene graphs still applies to the idea of a SceneTree.

I have no idea if the original author of the SceneTree concept is using a global tree structure. I just want to demonstrate that a straightforward implementation of SceneTree which used such a structure suffers from many of the same problems that a scene graph does. Given the amount of scene graphs I've seen over time, it is obvious that some people might be tempted to choose a global tree structure to implement a SceneTree.

First, let's take a step back and think about what we really need to accomplish with the SceneTree:
  • We have a number of objects that are in the world.
  • These objects have transforms describing where they are in the world.
  • For a relatively small number of objects, their transforms may be relative to some other frame of reference (i.e. a gun may be attached to a character).
  • There may be different types of attachment -- you may have situations where you want translation to be relative and rotation absolute, or vice versa. Some attachments may be accomplished through physical constraints simulated in the physics system rather than perfect constraints in the update system.
  • When a transform changes due to animation, physics, or other means, we want those changes to propagate to relative objects in the most efficient manner without doing redundant work.
  • We want to have a large number of objects in the world with a large number of characters receiving animation.
What you'll find is while we have the concept of a global hierarchy, in reality we have a mostly boring common case and a relatively small number of objects that actually have hierarchical relationships.

The boring case is what Tom describes. Look around the room you are in right now, and you will find the vast majority of objects are just "there". They do not have any sort of child, parent, or sibling relationship with each other. While most will have physical constraints (gravity pushing down, whatever they are on pushing up), this is due to physical interaction, not a familial relationship. In most games, this sort of interaction is either static or simulated by a physics system.

Having one root node for the entire world and having all these objects have be children of that root node makes no sense, or at the very least conveys no useful information. What does it mean to be a child of "the world" ? In a game, most of these things are going to be static -- they are the literal immovable object. Why clutter up our SceneTree with such things? Sure we might want to attach something to them (destructible lamp on a wall, for example), but that's the same as attaching it to the root frame of reference if the wall doesn't move. There may be some other special cases specific to a game (the wall can not move but it can disappear), but that's just it -- they are special cases that can be handled at a higher level.

There may be no static object at all to put in a tree. Many games treat all the static objects in an atomically streamable unit as one giant blob. There's no reason to deal with individual objects if you can just submit a precomputed command list to the renderer, or an efficiently packed polygon soup to the physics system. 

Most of the same arguments apply with dynamic things -- the familial relationship with "the world" is not useful. What we've really got is a lot of potential root nodes for attachment -- a list or array of conceptual trees. But if nothing is attached to an object, does it need a hierarchical update? Probably not. Now, for code simplicity, it may be simpler to just treat those as a 1-node hierarchy, but if your number of objects which have attachments is relatively low to everything else, you may be better off using a separate data structure which only holds things that have an actual hierarchy. 

Dynamic objects provide other interesting opportunities for optimization which do not map well to a generic tree structure. For objects that are physically simulated, the code can take advantage of the physics system's knowledge of what actually moved that frame, and only update those objects. This is considerably more efficient than traversing an entire SceneTree and checking if an object needs to update. Obviously none of this precludes having a generic tree structure, it just brings into question what good it does.

We've established that for the grand majority of our objects, we do not need a generic tree representation to process transform updates. We still have cases that do indeed have hierarchical relationships such as characters or attachments. Tomorrow, I will discuss why a generic tree structure is not a good choice for characters and bones.

Friday, January 9, 2009

Joe Duffy on CAS operations and performance

Interesting results in a blog entry from Joe Duffy about the scalability of compare-and-swap instructions across x86 architectures. I found this via the gdalgorithms list. The money graph:

The most common occurrence of a CAS is upon lock entrance and exit. Although a lock can be built with a single CAS operation, CLR monitors use two (one for Enter and another for Exit). Lock-free algorithms often use CAS in place of locks, but due to memory reordering such algorithms often need explicit fences that are typically encoded as CAS instructions. Although locks are evil, most good developers know to keep lock hold times small. As a result, one of the nastiest impediments to performance and scaling has nothing to do with locks at all; it has to do with the number, frequency, and locality of CAS operations.

One thing that has been talked about on the recent gdalgorithms thread is understanding that your only options are not between lock-free and heavy operating system lock primitives. Some operating systems provide lighter weight synchronization primitives (critical sections on Win32 with a spincount come to mind). In the past on some consoles I've come up with faster mutex implementations than what the operating system provides, although this is not something I'd recommend to the weak of heart -- it is unbelievably easy to get this stuff wrong, even for very experienced developers.

I think the lesson from this entry is to always measure. Maybe for a specific application a fancy lock-free algorithm may be faster, but maybe not -- as we see from above it can depend on a lot of factors. The other thing to avoid is premature optimization -- get your algorithm correct and without races first using locks, and then evaluate whether a lock-free algorithm is appropriate and faster.

Sunday, January 4, 2009

Animation and physics sitting in a tree

Over at Diary of a Graphics Programmer, Wolfgang Engel points out a gamedev.net post introducing some nomenclature for the systems involved with rendering scene geometry. 

Instead of a scene graph approach (just say no), they propose the SpatialGraph (for visibility determination), the SceneTree (for animation and transform update), and the RenderQueue (filled from the SpatialGraph, just draws stuff fast). It is a division that makes much more sense than a scene graph, which tries to handle all of the above. 

One of my biggest dislikes about scene graphs is how they misled us all to think that animation and transform update belongs in the renderer in an engine design. The renderer just deals with the results of these operations -- it doesn't much care how the transforms were created, it just needs them. Coupling these two things doesn't really make much sense.

If anything, animation/transform update is much more tightly coupled with physics and collision. Conceptually animation is saying "here's where I would like to go" and the physics system says "here's where you can go." It is even more intertwined than that, because the physics system has complete control over things like rigid bodies and ragdolls -- saying both where they would like to go and where they can go. 

If you are doing any sort of ragdoll blending between animation and ragdoll simulation, you have a feedback loop. The animation system figures out its desired pose, and the ragdoll figures out its desired pose. There is a blending step but its not always obvious where that should go. Traditionally the animation system is responsible for blending transforms, but there's an argument that the physics simulation should do it because it knows the constraints and limitations on where the bones can be physically placed. 

I haven't gotten into other interesting issues such as vehicles, which can be a blend of physical simulation (the car moving) and animation (the door being opened) similar to ragdolls, and when you add attachments to the mix (gun turret), keeping the animation and physics systems in sync can be a challenge. 

I'm starting to think that animation and physics are two sides of the same coin. I'm calling the combined thing "simulation." Obviously different games are going to have different complexity of physics, and I'm not sure coupling these two things so tightly is a one size fits all thing. What I do know is that coupling animation/transform update with the renderer is almost never the right thing to do, even though there are still a large number of scene graph based rendering libraries available. 

Saturday, January 3, 2009

Good Middleware revisited

I was having a conversation with a good friend about the Good Middleware entry, and he made a very interesting point. He pointed out that in the future, good middleware is going to allow you to hook their multicore job scheduling just like you can hook memory allocation.

Let's start out with a quick review of why we hook memory allocation, as it will help us understand why we're going to want to hook job scheduling.
  1. Consoles are fixed memory environments. It is critical to balance overall memory utilization for your title. If you can provide five more megabytes for game data over the built-in system allocator, then that allows you to have more stuff in your game, which generally can lead to a better game.
  2. Different titles have different memory needs. A 2D side-scroller is going to have different memory requirements than a world game. Some types of games lend themselves to pre-allocation strategies, where you know ahead of time how many assets of what type you are going to have. Every game I've worked on has spent time tuning the memory allocator to squeeze out as much memory as possible, and what works in one title may not be applicable to another. 
  3. The system allocator may perform badly, particularly when accessed from multiple threads. Often, console operating systems will have poor allocator implementations that have a lot of overhead. Sometimes they will use allocators that are designed for a much wider range of application than games, or designed for systems with virtual memory. Others explicitly punt on providing anything but systems designed for large, bulk allocations and expect you to handle smaller allocations on your own. Finally, no matter how good an allocator may be, it does not have knowledge of your application's memory needs or usage patterns, which are things you can use to optimize your allocator
  4. The system allocator may not provide good leak detection or heap debugging tools. It may provide none at all. If you're doing a console game, I know of no standalone leak detection software such as BoundsChecker that is available. Often you have to provide these facilities yourself.

 CPU core usage can be thought of as a system resource that is arbitrated just like memory. Most middleware that does use jobs spread across multiple cores I've seen so far usually has its own internal scheduler running on top of a couple hardware threads. In the early days of multicore middleware, you couldn't even set the CPU affinity of these vendor threads without recompiling the library, something which is key on the 360. Increasingly, I think games are going to want to own their job scheduling and not just leave it up to the operating system to arbitrate CPU scheduling at the thread level. 
  1. Consoles have a fixed number of cores. Goes without saying but worth touching on. You have a fixed amount of CPU processing you can do in a 30hz/60hz frame. Ideally you want to maximize the amount of CPU time spent on real game work and minimize the amount spent on bookkeeping/scheduling. 
  2. You know more than the operating system does about your application. One approach to scheduling jobs would be to just create a thread per job and let them go. You could do that, but you would then watch your performance tank. Forget the fact that creating/destroying operating system threads is not a cheap task, and focus on the cost of context switches. Thread context switches are very expensive and with CPU architectures adding more and more registers, they are not going to get any cheaper. A lot of game work are actually small tasks more amenable to an architecture where a thread pool sits and just executes these jobs one after another. In this architecture, you do not have any thread context switches between jobs or while jobs are running, which means you are spending more time doing real game work and less time on scheduling.
  3. It is difficult to balance between competing job schedulers. Imagine a world with 16 to 32 cores, and both your middleware and your game need to schedule numerous jobs to a thread pool. It is going to be very difficult to avoid stalls or idle time due to the fact that you have two job schedulers in your system -- one from the middleware and one from your game. Either you are "doubling up" thread allocation on cores and paying the cost of context switches from one thread pool to the next, or you allocate each thread pool to a different set of cores and hope that the work balances out. Unfortunately, I don't think the latter is going to scale very well and you will end up with a lot of unused CPU capacity.
The third reason is the clincher as to why good middleware in the future will allow you to replace their job scheduler with your own. It's the only solution I can think of that will allow you to balance workloads across many, many cores. It is possible in the future that operating systems will provde good enough thread pool/microjob APIs that we won't have to be writing our own, but even in that case, you'd still want the hooks for debugging or profiling purposes. 

I haven't yet seen any middleware that really allows you to hook this stuff easily. Have you?



Friday, January 2, 2009

RTCD on Particle Rendering

Optimizing the rendering of a particle system.

Real Time Collision Detection has an entry on optimizing particle rendering which is a good survey of many techniques you can use to make your particles fly. Don't forget to read the comments, there are some good suggestions in there, too!

Over the last five+ years or so people have been a lot more open about this kind of stuff, and I like it. There's just too much you have to deal with these days to write a good game to have to figure it out on your own. I don't know about other people's experiences but earlier in my career there seemed to be much more of a secretive culture in game development than other types of software.

Whether it is entries like the above or things like Insomniac's Nocturnal initiative, people are much more open about ideas these days. Ultimately, we all have to execute, and doing that well is hard enough on its own. Being more open is healthy and ultimately will lead to better games across the board.

Tuesday, December 30, 2008

XNA Studio 3.0: First impressions

I've been doing some graphics prototyping in my spare time lately, and I decided to see if there was a better way to go about it.

I've used RenderMonkey in the past, and while it certainly has its uses, ultimately it left me dissapointed. For a straightforward shader it's fine, but when you start getting into more complicated techniques it starts to break down. After clicking on a billion buttons to get my render targets and passes set up the way I needed I really wished I could just write some code. The lack of any ability to compute anything on the CPU side is what really frustrated me. What I ended up doing was computing a lot of things shader-side that in a real application would be done on the CPU, and it needlessly complicated the shaders.

Another avenue I've pursued is using the DirectX sample framework or an OpenGL sample, and working from there. Even in these simplified environments I find you end up doing a lot more bookkeeping then actual code. Additionally, OpenGL seems very unstable on my laptop -- even vanilla samples are crashing in the Nvidia DLLs. 

The last couple days I've been playing around with Microsoft's XNA Studio 3.0 as a graphics prototyping tool, and my impression so far is very favorable. The API is pretty straightforward, and for the most part the abstractions seem in the right place. Porting the current thing I'm working on from C++ to C# took no time at all, and so far I've spent much more time writing meaningful code rather than working on scaffolding. 

The GameComponent architecture they have is interesting -- for example, I found myself needing an orbit camera. Rather than write one myself, I just grabbed a component someone else has written. It was one of those few times where the code just dropped in. The only drawback is they haven't implemented taking input from the 360 controller, but that's easy enough for me to write and a lot less involved than doing the whole thing.

I was also surprised how easy it was to get my little project up and running on my Xbox 360. I didn't have to do any code changes, and it all Just Works. The debugging is solid but experienced Xbox 360 developers will miss all the nice tools you get with the real SDK. I'd be really nice if Microsoft released a lightweight version of PIX for XNA that worked with the 360, but I guess you can't have everything.

There have been some minor annoyances. Some of the C# syntax for dealing with vertex arrays can be cumbersome -- new'ing a Vector3 never feels right to me. You also lose access to some hardware features. For example, for some reason it thinks I can't create a floating point depth buffer on my laptop, when I'm very certain the GPU I have can handle that. 

On the 360 side, they simplify a lot of the hardware details, but this has limitations. I can't find any way to resolve the depth buffer on the 360 to a texture in XNA. While these limitations are understandable given the intended audience of XNA, it is somewhat annoying.

All in all, I think it is a pretty good framework so far. I don't think you are going to max out the hardware with it, but for a large category of games it will work really well. 

Monday, December 29, 2008

Interesting GDC Talk #2: Hitting 60hz with the Unreal Engine

Hitting 60Hz with the Unreal Engine: Inside the Tech of MORTAL KOMBAT vs DC UNIVERSE

This is another talk I will be attending. I work at the same studio as Jon and have had more than a few conversations with him about this very topic, and he has given internal presentations on the subject. I don't want to give too much about the talk away, but it is a very interesting case study in taking a graphics pipeline completely designed for a 30hz game, and modifying it to meet a game's performance goals without sacrificing a whole bunch of functionality. Just one example: when we originally got our hands on Unreal, the sum of all post processing took nearly half of a 60hz frame. While Epic has optimized this over time, this gives an idea of some of the challenges on the GPU. It is easy enough to remove functionality until it runs fast enough, but the real trick is to preserve quite a bit of that functionality and still get it under time.

The section on porting the Unreal particle system to the SPU is definitely worth paying attention to - this was a collaborative effort done by a few tech group guys at Chicago. It underscores the difficulty of dealing with legacy code bases that are not designed for NUMA architectures, and the ingenuity and hard work required to overcome that hurdle. This talk should definitely be on your list.

Sunday, December 28, 2008

Interesting GDC talk #1: Light Pre-pass Renderer

The Light Pre-Pass Renderer: Renderer Design for Multiple Lights

GDC is fast approaching and I'll be attending this year. One of the talks I'm looking forward to is Wolfgang Engel's talk on a novel renderer design. I've been following his various entries on this topic for the last year. Additionally, if you have access to the PS3 dev site, the Uncharted guys gave a talk on a very similar renderer.

The technique is a hybrid between forward rendering techniques and deferred rendering techniques. In a forward renderer you are evaluating material and lighting properties simulatenously, and in a deferred renderer you evaluate material properties into a G-Buffer and then evalute each light sampling that buffer.

In the light pre-pass renderer, it is broken down further -evaluate the bare minimum material properties needed for the lighting equation, then evaluate the lights into an accumulation buffer, then apply the lighting in a final pass of geometry. While you still have one more pass than in a typical deferred setup, the light pre-pass renderer does give you a lot more flexibility for materials. 

I like this approach over forward rendering or deferred rendering because I think it fits the box today's consoles lay out very nicely. Deferred rendering requires a huge number of render targets. Forward rendering requires either huge complicated shaders or more rendering passes, and neither of these things scale very well as you add more lights or more material types. The light pre-pass approach gives you the linear scalability of deferred rendering while allowing the material variety of forward rendering. 

When it comes down to it, what is most interesting is the array of choices you have when implementing the technique. The technique can work well with statically computed lighting such as  lightmaps if you choose to go that way. You have different choices in how to approximate specular and accumulate it in either a separate or combined RT. You have a variety of choices of how far you want to implement MSAA. Particularly when you have a cross-platform renderer, having this sort of flexibility without massive rearchitecture on each platform can make things a lot easier. I plan to explore this technique further.

Friday, December 26, 2008

Magical missteps and the memory wall

Two links today:



I've spent a lot of time thinking about where multicore hardware is going and how that is going to affect software architectures. My current thinking is that designing software for UMA (uniform memory access) is not going to scale. This is hardly a revolutionary statement.

Let's start with the IEEE Spectrum article first. The article details a study by Sandia National Labs showing that performance of general-purpose multiple cores really starts to deteriorate for certain applications after 8-16 cores due to the memory wall. The "memory wall" is the fact that while our ability to cram transistors on a die keeps ever-increasing, memory bandwidth grows at a piddling rate. Eventually, you've got a large number of cores starving for data.

Now, an important piece of information is these results only apply to certain types of workloads. For instance, simulating a nuclear bomb does not hit the memory wall because the data and calculations can be partitioned to a small amount of memory corresponding to a small spatial area. Thus you don't have a lot of memory bandwidth used because all of your data is in-cache and you are happy.

What the study says is for certain types of applications which involve traversing massive amounts of data and can not be sufficiently partitioned, your performance goes out the window. It then goes on to talk about fancy hardware (stacked memory) that may fix the problem. This hardware is sufficiently far in the future that I'm not going to bank on it.

The challenge for game programmers with future multicore designs will be to architect our systems and engines to use well-partitioned and locally coherent memory access. This will become increasingly more important as the number of cores increases. Right now, with systems such as the 360 having only 6 cores, or consumer PCs only having 4 cores, memory bottlenecks, while troublesome, are not at the severity they will be with 16, 32, or 64 cores. 

Which brings me back to why designing for UMA is troublesome. UMA makes the promise that you can access any memory you need at any time -- which is true, but it doesn't make any promises about how fast that access will be. Cache logic is not going to be sufficient to hide all latencies of poorly designed algorithms and systems. Obviously, this has been true for many years even with single core designs, but the point is the problem is about to get many times worse. 

If you design algorithms such that they can be run on processors with a small local memory, those will run well on UMA designs . The reverse is not true. Local memory designs force you to make good decisions about memory accesses, and also prepare you for architectures which are not UMA, which for all we know right now the next crop of consoles may very well be.

So what does all of this have to do with the Magical Wasteland article on the PS3? Due to NDAs, I won't comment on the article beyond saying I think its characterization of the PS3 is fair. But I will say this: it may very well be the case that five or ten years from now we realize that the Cell architecture was ahead of its time. Maybe not in the specific implementation, but just keeping the idea alive that, hey, maybe touching every cache line in the system in a garbage collection algorithm isn't the best of ideas. Because sooner or later, we'll all be programming with a local memory model, even if we don't actually have local memory.

Thursday, December 25, 2008

Game Architect on Good Middleware


An excellent article on what makes good middleware was posted on Game Architect. Kyle, the author, hits the major points -- anyone who's used bad middleware has certainly run into these issues. 

I would make a distinction between middleware and engines. Something like Unreal is an engine. It isn't something you integrate into your application, it forms the framework of your application. Because it is a framework, it has made some architectural decisions for you: memory, I/O, the data pipeline, etc. 

Due to an engine enforcing architecture, the questions used to evaluate it are different (and probably the subject of another blog entry). The advantage of a complete engine is tight integration and a lot less work to get up and running, particularly on the tool side of things. The disadvantage is the engine may have made architectural decisions that don't fit your game. Like most engineering decisions, it is a tradeoff everyone has to decide for themselves.

Welcome

Welcome to Solid Angle, a blog about game programming. 

I've been programming in the game industry for about nine years, and I still love it. 

It's been a while since I ran a blog. My last one was less focused, and had topics ranging from politics to programming to how to shuffle poker chips (one of my more popular entries). This one will be strictly about game development.

Initially, my entries probably will just be links and comments to other game development blogs. I've found quite a lot of very interesting blogs on game development and have always wanted a place that aggregated them. After looking around for one, I decided to make one (I'm sure there are others, I just haven't found them). I may branch out into more original content later as I have things to write about.

Anyway I hope you stick around.