<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-263327529581652713</id><updated>2011-09-01T08:40:20.629-04:00</updated><category term='GPU'/><category term='logging'/><category term='deferred lighting'/><category term='templates'/><category term='downloadable games'/><category term='nvperfhud'/><category term='particle systems'/><category term='game industry'/><category term='programming paradigms'/><category term='rogue programming'/><category term='memory wall'/><category term='HLSL'/><category term='Nvidia Fermi'/><category term='RTTI'/><category term='random musings'/><category term='new orleans'/><category term='software transactional memory'/><category term='XNA'/><category term='C++'/><category term='exceptions'/><category term='cache misses'/><category term='Nvidia'/><category term='Coders at Work'/><category term='code bloat'/><category term='big o notation'/><category term='boxed retail'/><category term='OpenGL'/><category term='rendering'/><category term='ps3'/><category term='research'/><category term='Virtual Addressing'/><category term='gdc09'/><category term='least common denominator approaches'/><category term='graphics'/><category term='leaky abstractions'/><category term='multicore'/><category term='objected-oriented design'/><category term='programming models'/><category term='Memory Management'/><category term='hdr'/><category term='concurrency'/><category term='game programming game programming'/><category term='SIGGRAPH'/><category term='Brad Fitzpatrick'/><category term='silver bullets'/><category term='game programming'/><category term='gdc'/><category term='welcome'/><category term='hacks'/><category term='STL'/><category term='data-oriented design'/><category term='optimization'/><category term='xbox 360'/><category term='random thoughts'/><category term='middleware'/><category term='XNA Studio'/><category term='architecture'/><category term='DirectX'/><category term='Intel'/><category term='profiling'/><category term='spherical harmonics'/><title type='text'>Solid Angle</title><subtitle type='html'>Random thoughts about game development.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>46</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-205840599696581505</id><published>2011-02-12T14:00:00.003-05:00</published><updated>2011-02-12T15:07:54.232-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='Virtual Addressing'/><category scheme='http://www.blogger.com/atom/ns#' term='optimization'/><category scheme='http://www.blogger.com/atom/ns#' term='Memory Management'/><title type='text'>Virtual Addressing 101</title><content type='html'>If you haven't read Steven Tovey's excellent &lt;a href="http://altdevblogaday.com/42697297"&gt;article on alternatives to new and malloc&lt;/a&gt;, you should. I'll wait.&lt;br /&gt;&lt;br /&gt;All done? Good. One topic that was beyond the scope of that article is virtual addressing. Understanding virtual addressing is important to anyone implementing memory management on modern hardware. The PC and both next-gen consoles provide facilities for virtual address management, and it is important to understand the benefits and trade-offs of these facilities when doing memory management.&lt;br /&gt;&lt;br /&gt;I am going to simplify many of the details and present a more abstracted view of some made-up hardware. A full discussion of virtual address handling specific to an architecture would be beyond the scope of this entry. The specific details of hardware and OS virtual addressing vary between different architectures, and even different processor generations within the same architecture. In practice, it is always important to read your processor and OS manuals to understand the specific implementation you are working with.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Physical Addressing&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;Often we like to think of memory in a machine as one big array, somewhat like this:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-i4CategV1BE/TVaYoIhLNiI/AAAAAAAAABg/Epv8WojlXhs/s1600/Untitleddrawing+%25281%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="56" src="http://3.bp.blogspot.com/-i4CategV1BE/TVaYoIhLNiI/AAAAAAAAABg/Epv8WojlXhs/s320/Untitleddrawing+%25281%2529.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;This is the physical memory map of the Solid Angle PlayBox, a console so spectacularly&amp;nbsp;unsuccessful&amp;nbsp;you probably have never heard of it (or it may just be the fact I made it up). It has 256 MB of memory, physically addressed from 0x0 to 0x10000000.&lt;br /&gt;&lt;br /&gt;Real hardware doesn't necessary have one big contiguous lump of physical address space, or may have different physical address ranges mapping to the same memory, with different cache behavior. But again, we're trying to simplify things here.&lt;br /&gt;&lt;br /&gt;So this seems great, but what are the problems? The problem is fragmentation.&amp;nbsp;There are actually two types of fragmentation, and it is important to know the difference.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;External Fragmentation&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;/b&gt;When you hear the unqualified word "fragmentation", most often what is being referred to is &lt;i&gt;external fragmentation&lt;/i&gt;. External fragmentation occurs when memory has been partitioned into small, non-contiguous chunks, such that while the total amount of free memory is large enough for a big allocation, you can't actually fit it anywhere.&lt;br /&gt;&lt;br /&gt;A simple example, using a &lt;a href="http://www.memorymanagement.org/articles/alloc.html#first.fit"&gt;first-fit heap&lt;/a&gt;. Say someone wrote loading code and didn't really consider memory management while doing so (tsk tsk!). This loading code starts by allocating a large temporary buffer for streaming:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-Y7_i6lRNQhA/TVaejoAjl0I/AAAAAAAAABo/0Lt_kLCf7ZY/s1600/Untitleddrawing+%25283%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="56" src="http://2.bp.blogspot.com/-Y7_i6lRNQhA/TVaejoAjl0I/AAAAAAAAABo/0Lt_kLCf7ZY/s320/Untitleddrawing+%25283%2529.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;Then the loading code reads into the temp buffer, and creates a bunch of permanent data structures.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-wI0M6EgSDR4/TVae7MvMOaI/AAAAAAAAABs/BH0KsZM0BCA/s1600/Untitleddrawing+%25284%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="56" src="http://3.bp.blogspot.com/-wI0M6EgSDR4/TVae7MvMOaI/AAAAAAAAABs/BH0KsZM0BCA/s320/Untitleddrawing+%25284%2529.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;The loading code then frees the temporary buffer&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-Bp3BJBySzrY/TVafL6vNKuI/AAAAAAAAABw/jToPTbzCnYQ/s1600/Memory.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="56" src="http://2.bp.blogspot.com/-Bp3BJBySzrY/TVafL6vNKuI/AAAAAAAAABw/jToPTbzCnYQ/s320/Memory.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;Repeated many times, with some varying temporary buffer sizes, we could end up with a heap like this:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-sbKFStPUdxM/TVafnIMu4_I/AAAAAAAAAB0/SrtVJHkhFIo/s1600/Memory+%25281%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="56" src="http://3.bp.blogspot.com/-sbKFStPUdxM/TVafnIMu4_I/AAAAAAAAAB0/SrtVJHkhFIo/s320/Memory+%25281%2529.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;Now a large allocation comes along, which we have enough memory for, but because memory is partitioned we do not have a large enough contiguous block to fit it. That is external fragmentation, it is fragmentation &lt;i&gt;external&lt;/i&gt;&amp;nbsp;to the allocations.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-_Pvp_ucy7a8/TVagmlKTAGI/AAAAAAAAAB4/NkbBQSFQflA/s1600/Memory+%25282%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="87" src="http://3.bp.blogspot.com/-_Pvp_ucy7a8/TVagmlKTAGI/AAAAAAAAAB4/NkbBQSFQflA/s320/Memory+%25282%2529.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;b&gt;Internal Fragmentation&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;Internal fragmentation is the type of fragmentation you don't hear about much, or if you do, it is not usually described as fragmentation. Internal fragmentation occurs when the size of the memory manager's internal allocation is larger than what the application actually requested. This is fragmentation &lt;i&gt;internal&lt;/i&gt;&amp;nbsp;to the allocations.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;An example can be found with fixed-size block allocators. Often you can have a system that makes many allocations, all slightly varying in size. One solution to this is to use a fixed-size block allocator that uses a block size larger than any of your potential allocations. This can lead to a situation where a small amount of memory is unused in each allocation:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-xvC7Mm19URQ/TVasUR5lMPI/AAAAAAAAAB8/Bz7aSL8jD98/s1600/Memory+%25283%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="87" src="http://4.bp.blogspot.com/-xvC7Mm19URQ/TVasUR5lMPI/AAAAAAAAAB8/Bz7aSL8jD98/s320/Memory+%25283%2529.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;Internal fragmentation can occur with other allocators, such as the &lt;a href="http://en.wikipedia.org/wiki/Buddy_memory_allocation"&gt;buddy system&lt;/a&gt;.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Virtual Addressing&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;Most programmers at some point have heard the phrase "All problems in computer science can be solved by another level of indirection", attributed to &lt;a href="http://en.wikipedia.org/wiki/David_Wheeler_(computer_scientist)"&gt;Dan Wheeler&lt;/a&gt;. Many haven't heard the&amp;nbsp;corollary "...except for the problem of too many layers of indirection." This is a shame because I think both together describe the condition of the modern programmer.&lt;br /&gt;&lt;br /&gt;Virtual addressing is the direct application of this idea -- instead of accessing memory through its physical address, we add a level of indirection and access it through a virtual address. This indirection is performed in the hardware, so it is mostly transparent to the programmer, and fast, with caveats. Virtual addressing can mitigate many fragmentation issues.&lt;br /&gt;&lt;br /&gt;First, an important public service announcement.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Virtual Addressing != Paging to hard drive&lt;/b&gt;&lt;br /&gt;Do not confuse virtual addressing with virtual memory management systems that may page data to the hard drive (such as Windows or Linux).&amp;nbsp;I think these concepts sometimes become confused because many descriptions lump the two things together into a heading of "virtual memory." They are not the same thing -- paging systems are built on top of virtual addressing, but you do not need to page memory to the hard drive to reap the benefits of virtual addressing. You don't even need a hard drive!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Virtual Address Space&lt;/b&gt;&lt;br /&gt;Virtual addressing implementations are very specific to CPU architecture and OS, but they all share some common properties.&lt;br /&gt;&lt;br /&gt;They all have the concept of a virtual address space. The address space may be much larger than the physical memory of the machine -- for example, in our hypothetical console, we may have only 256 MB of physical memory, but with 32 bit pointers we have a 4 GB address space. In practice, architectures and OSes may limit the address space available to applications, either reserving address space for the kernel, or using portions of the address space to for different types of memory access (such as non-cached reads/writes).&amp;nbsp;On multi-process operating systems such as Windows or Linux, each process has its own address space.&lt;br /&gt;&lt;br /&gt;Address space is allocated independently from physical memory, and you do not have to have physical memory backing an address space allocation.&lt;br /&gt;&lt;br /&gt;The address space is divided into pages. Page sizes vary depending on architecture/OS, but common sizes are 4K, 64K, and 1 MB. Page sizes are always powers of two, as this simplifies the work of translating a virtual address into a physical one. A CPU/OS may only support a fixed page size, or may allow programmers to pick a page size when pages are allocated.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Page Table&lt;/b&gt;&lt;br /&gt;Virtual addresses are translated into physical addresses via a page table. A page table is a simple mapping between a virtual page and a physical page. Going back to our hypothetical console, which has a page size of 64KB, a page table might look like this (again, real world implementations vary):&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-P2TCJVssR5g/TVa7dCufPzI/AAAAAAAAACA/NlSIr6fAdJU/s1600/VirtualAddress.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="190" src="http://2.bp.blogspot.com/-P2TCJVssR5g/TVa7dCufPzI/AAAAAAAAACA/NlSIr6fAdJU/s320/VirtualAddress.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Each entry in the page table maps a virtual address page to a physical address page. A virtual address allocation may span multiple contiguous address pages, but does &lt;b&gt;not&lt;/b&gt;&amp;nbsp;require contiguous physical pages.&lt;br /&gt;&lt;br /&gt;When the CPU encounters an instruction which accesses a memory address, it must translate the virtual address into a physical address to know where the data is located in physical memory. With a 64KB page size, the upper 16 bits of a 32 bit address specify the page number, and the lower 16 bits the offset into the page. This is why page sizes are a power of 2 -- determining the page number becomes a simple bit mask and shift. The CPU looks up the virtual page entry in the page table, and finds the corresponding physical page number. This is done for &lt;b&gt;every &lt;/b&gt;memory access.&lt;br /&gt;&lt;br /&gt;Because this operation happens for every memory access, it needs to be fast and implemented in hardware.There's only one problem: the page table is far too big to be stored on the CPU chip.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Translation Lookaside Buffers&lt;/b&gt;&lt;br /&gt;The solution is a special cache for address translation. Because the CPU can not fit the entire page table in on-chip memory, it uses a &lt;a href="http://en.wikipedia.org/wiki/Translation_lookaside_buffer"&gt;translation lookaside buffer&lt;/a&gt; (TLB), which is a special cache that holds the most recently used page table entries. TLBs can often hold enough page entries for a large amount of address space, usually larger than the amount of memory the L1 or L2 caches can hold.&lt;br /&gt;&lt;br /&gt;Back to our memory access scenario, when the CPU must translate a virtual page into a physical page, it first looks in the TLB. If the page table entry is found, the address translation happens very quickly and the CPU continues on its work. If there is a TLB miss, this can often mean a TLB miss handler is invoked. This is actually a &lt;i&gt;software handler&lt;/i&gt;&amp;nbsp;provided by the operating system, as the entire page table is managed by the OS, not the CPU. Thus, TLB misses can be very expensive.&lt;br /&gt;&lt;br /&gt;On most modern processors, the TLB is multi-level, similar to how L1 and L2 caches work. &amp;nbsp;Thus the CPU may check a smaller, faster address translation cache before consulting the larger, slower TLB, before it resorts to the software handler of a full TLB miss.&lt;br /&gt;&lt;br /&gt;The expense of a TLB miss is another reason data locality is very important to performance. If you are hitting data structures willy nilly in address space, aside from the cache misses you will incur, you may incur a lot of TLB misses, too. This is a double-whammy of not keeping data accesses local!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Memory Protection&lt;/b&gt;&lt;br /&gt;Most CPUs also add the capability to specify what kind of access to a page is allowed. Page table entries can be constructed which disallow writes, or disallow code execution on some architectures. The former can be used to make sure application-level code does not overwrite kernel data structures, and the latter can be used to help protect against buffer overrun attacks by not making it possible for the CPU to jump into data-only memory. When invalid accesses occur, a HW exception is raised.&lt;br /&gt;&lt;br /&gt;You can often specify the memory protection for a page with API calls, which can sometimes be useful for debugging tricky memory overwrite problems, by protecting pages against writes and writing a custom HW exception handler.&lt;br /&gt;&lt;br /&gt;Memory protection is also how OSes implement demand-paging of memory from the hard drive. When the OS moves a physical page of memory to the hard drive, it modifies the virtual page table entry to prevent reads and writes. If that page is accessed, a HW exception occurs which the OS handles by loading the appropriate data from the hard drive into a physical page, and setting the page table entry to point to that physical page. Execution of the program then continues from where the exception was fired.&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Virtual Addressing-Aware Memory Management&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;The presence of virtual addressing has a great impact on memory management. While it does not necessarily change the fundamental behavior of many allocator types, it is important to understand when physical memory is actually committed. Physical pages returned to the OS can be used to make up much larger, contiguous allocations, so at the system level, many problems with external fragmentation are severely reduced.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Direct Page Allocation for Large Blocks&lt;/b&gt;&lt;br /&gt;For large allocations (&amp;gt; page size), the best memory allocation strategy is sometimes to allocate virtual address space and physical pages directly from the operating system. These types of allocations are often rare, happening when loading data. The advantage is you will not suffer external fragmentation from this allocation strategy, as the OS can always remap physical pages to a contiguous virtual address space if they are available, even if they are not contiguous in physical address space.&lt;br /&gt;&lt;br /&gt;The&amp;nbsp;trade-off&amp;nbsp;for doing this is internal fragmentation. Your large allocation may not be an exact multiple of page size, leading to memory that is wasted. First, you want to pick a good threshold for when to do direct page allocation -- this is not a good strategy for things that are not much larger than the page size.Wasted memory can be also be&amp;nbsp;mitigated&amp;nbsp;by choosing an appropriate page size for the allocation on architectures that allow this. For example, where waste would be a significant percentage of the allocation, you may want to choose 4K pages rather than 64K pages. The&amp;nbsp;trade-off&amp;nbsp;here is smaller pages mean many more TLB misses, which can hurt performance.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Stack Allocators&lt;/b&gt;&lt;br /&gt;One key thing with virtual addressing is you can allocate large regions of address space without committing physical memory to it. Stack allocators can be implemented by allocating a large region of address space, but only&amp;nbsp;committing&amp;nbsp;physical pages as the stack allocator pointer advances.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-ZVB7NVqh3So/TVbKRFHCXQI/AAAAAAAAACE/1Yzg4xLzVuw/s1600/Memory+%25284%2529.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="95" src="http://3.bp.blogspot.com/-ZVB7NVqh3So/TVbKRFHCXQI/AAAAAAAAACE/1Yzg4xLzVuw/s320/Memory+%25284%2529.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;The advantage here is you can choose a large maximum stack size without actually&amp;nbsp;committing&amp;nbsp;physical memory to it. While if you do hit the peak, those physical pages must come from somewhere, it allows for situations where your peak may be at a point where those pages are free from other systems (loading comes to mind).&lt;br /&gt;&lt;br /&gt;It should be noted that the C++/C call stack on Windows works exactly like this - when you specify a stack size for an application, you are specifying the size of the address space allocation, not the physical allocation. As the stack grows, the runtime allocates physical pages. This is done transparently with a special page called a guard page, which triggers a HW exception when it is accessed by the code, which causes an OS handler to execute which allocates physical memory for that page and set the next virtual page as the guard page.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;Pooled allocators&lt;/b&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;For small allocations, fixed-size pools are often a good solution. Virtual addressing can allow us to have multiple pools of different sizes without fragmenting overall memory.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Basically, we implement our fixed-size pool as a linked list of mini-pools, each some multiple of the page size. On our hypothetical console, 64KB may be a good mini-pool size. If a mini-pool is full, we allocate another set of pages from the OS. If a mini-pool becomes empty, we return the page to the OS. Again, because physical pages do not need to be contiguous when mapped to virtual address pages, these freed pages can be used for any size of allocation, from anywhere in the system.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;General Advice&lt;/b&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;When dealing with virtual allocations, the general rule of thumb is "return physical pages to the operating system whenever you can." If a physical page is allocated but not being used, the OS can not use it for some other, larger allocation that may need to occur. The days of allocating an entire console's memory space in one block and managing it yourself are largely gone, unless you wish to write your own page allocator (which can and has been done). There are some caveats to this, such as with page allocation thrashing, and allocations that are required to be physically contiguous (see below).&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Virtual Addressing Problems&lt;/span&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;Physically Contiguous Requirements&lt;/b&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Your particular platform may require certain allocations be performed in contiguous physical memory, such as GPU resources. This is often the case on consoles. Virtual addressing only mitigates external fragmentation for virtual allocations -- for these physical allocations, you still have to deal with fragmentation at the physical page level. Often the way to handle this is to set aside memory for physical resources up front in your application, and manage them separately from your virtual allocations.&amp;nbsp;&lt;/div&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;b&gt;Page Allocation Thrashing&lt;/b&gt;&lt;br /&gt;Allocating virtual address space and committing physical pages are not cheap operations. Particularly with stack allocators and pools, you want to avoid thrashing -- cases where a repeated pattern of allocs/frees cause pages to be allocated and freed in rapid succession. This can be worked around by thresholding when you free a physical page to the OS - for example, with a pool, you may require that some percentage of the previous physical page be free before freeing the next, totally free one. Additional strategies are only doing page frees at specific, known points where the performance hit is predictable.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Page Size and TLB Misses&lt;/b&gt;&lt;br /&gt;Page size can have a very important impact on performance. On platforms which allow you to choose page size when performing a virtual address space allocations, you want to pick the largest page size possible, as larger pages cause far less TLB misses.This is often a tricky balance between wasting memory due to internal fragmentation, and losing performance due to TLB misses. As always, data locality helps to reduce TLB misses.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Page Size and Physical Fragmentation&lt;/b&gt;&lt;br /&gt;On platforms with variable page sizes, you can run into problems where you can not allocate a large page even though the memory is free. This is due to external fragmentation of the physical pages themselves - if you allocate a large amount of 4K pages, free them, and try to allocate a 1MB page, it may not have enough contiguous physical memory to successfully allocate a 1 MB page. I've even seen some platforms that will not coalesce smaller pages into larger ones even if they are contiguous (i.e. once you've allocated physical memory as a 64KB page, it will never be coalesced into a 1 MB page). This can be mitigated similar to physical allocation restrictions -- allocate your large pages up front, and do your own page allocator that your other allocators work on top of.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Address Space Fragmentation&lt;/b&gt;&lt;br /&gt;It is possible to fragment virtual address space itself. One should be careful of reserving too much virtual address space for things like stack allocators, or leaking address space. While on console the address space is many times larger than the physical memory, and thus usually has enough slack to make up for carelessness, on PC, particularly when writing tools in 32 bit, you can run into situations where you fragment the virtual address space itself.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Summary&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;My hope is anyone reading this who did not have a good understanding of virtual addressing now understands a little better what is going on under the hood with memory management, at least at a basic level. As always, platform details differ, and if you are doing any kind of memory management work, you really should read the CPU and OS docs on memory management, virtual addressing, and the TLB for your specific platform.&lt;br /&gt;&lt;br /&gt;Even programmers who are not writing custom memory managers can benefit from understanding how virtual addressing works. Almost every memory access performs address translation -- and this translation is another important reason to keep data accesses local when designing data structures.&lt;br /&gt;&lt;b&gt;&lt;/b&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-205840599696581505?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/205840599696581505/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2011/02/virtual-addressing-101.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/205840599696581505'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/205840599696581505'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2011/02/virtual-addressing-101.html' title='Virtual Addressing 101'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-i4CategV1BE/TVaYoIhLNiI/AAAAAAAAABg/Epv8WojlXhs/s72-c/Untitleddrawing+%25281%2529.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-8283747808934684394</id><published>2011-02-06T13:23:00.000-05:00</published><updated>2011-02-06T13:23:25.602-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='logging'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><title type='text'>Lazy Logging Parameter Evaluation With Variadic Macros</title><content type='html'>&lt;div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;This entry is not rocket science, and probably won't be that informative to experienced programmers, but I've seen commercial code bases get something as simple as this wrong. It requires compiler support for variadic macros, which have been in Visual C++ for a while and are also supported by later versions of GCC. &amp;nbsp;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;br /&gt;Most games have some sort of logging system. Debugging by printf is one of the first debugging tools most programmers learn. While there are many other tools in the debugging toolbox, this particular one is usually not that far out of reach. Some problems just lend themselves to being solved by logging.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We want to minimize the performance impact of logging code, without having to limit the number of logging statements we place in code. We do not want to constantly recompile different configurations of the game with or without logging enabled.&amp;nbsp;While compile time stripping of logging during development will have the least performance impact, there are many times when you may be at a tester, designer or artist's desk and need to log key information. Providing them with a custom build is a productivity hit for everyone involved.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are two main performance hits for logging:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1. The cost of the logging itself (writing to the debug window, to a file, to a console, etc)&lt;/div&gt;&lt;div&gt;2. The cost of parameter evaluation&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Anyone who has put a log statement in a piece of code executed many times a frame knows it can absolutely kill performance, just by the fact that logging to a file or debug output can be time consuming itself. This first cost can be solved by a channel system that can selectively enable logging. Even beyond the performance cost, it is useful to enable different types of logging at different times. If you're debugging AI code, you are probably not interested in logging from the streaming system. &amp;nbsp;Log statements specify which channel they are on (say, by integer ID), and the logging code checks if that channel is enabled.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Where should this check occur? I've seen some code bases that do this in the logging function itself. This is a mistake, because even if you do not actually output anything, you are still paying the second cost, the cost of parameter evaluation.&lt;br /&gt;&lt;br /&gt;Logging, by nature, is very string-intensive. Often you will output human-readable debug names for various assets and entities when logging. Strings computed as parameters to log statements often incur performance penalties - memory allocations, string operations, etc. In addition to string overhead, some information you may wish to log may not be particularly cheap to calculate.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;float&amp;nbsp;ThisFunctionIsVeryExpensiveToEvaluate()&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;LogPrintf(LOG_INFO,&amp;nbsp;"Expensive but precious debug info:&amp;nbsp;%g\n",&amp;nbsp;ThisFunctionIsVeryExpensiveToEvaluate());&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;What we want is for the expensive function to only be evaluated if the LOG_INFO channel is enabled.&lt;br /&gt;&lt;br /&gt;The way to do this is to put the channel check in the macro itself, and only call the log function if the check succeeds. Here's some sample code that accomplishes this using variadic macros:&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;//&amp;nbsp;Define&amp;nbsp;this&amp;nbsp;to&amp;nbsp;0&amp;nbsp;to&amp;nbsp;disable&amp;nbsp;logging&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;#define&amp;nbsp;ENABLE_LOGGING&amp;nbsp;1&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;const&amp;nbsp;int&amp;nbsp;LOG_ERROR=0x1;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;const&amp;nbsp;int&amp;nbsp;LOG_WARNING=0x2;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;const&amp;nbsp;int&amp;nbsp;LOG_INFO=0x4;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;#if&amp;nbsp;ENABLE_LOGGING&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;Simple&amp;nbsp;channel&amp;nbsp;system&amp;nbsp;(you&amp;nbsp;want&amp;nbsp;IsLoggingChannelEnabled&amp;nbsp;to&amp;nbsp;be&amp;nbsp;cheap,&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;but&amp;nbsp;there&amp;nbsp;are&amp;nbsp;other&amp;nbsp;ways&amp;nbsp;to&amp;nbsp;implement&amp;nbsp;something&amp;nbsp;like&amp;nbsp;this)&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; static&amp;nbsp;int&amp;nbsp;GlobalEnabledLogChannels;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;Make&amp;nbsp;sure&amp;nbsp;your&amp;nbsp;compiler&amp;nbsp;inlines&amp;nbsp;this&amp;nbsp;function,&amp;nbsp;as&amp;nbsp;it&amp;nbsp;will&amp;nbsp;be&amp;nbsp;called&amp;nbsp;many&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;times&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;You&amp;nbsp;may&amp;nbsp;want&amp;nbsp;to&amp;nbsp;force&amp;nbsp;it&amp;nbsp;to&amp;nbsp;be&amp;nbsp;inlined&amp;nbsp;using&amp;nbsp;whatever&amp;nbsp;compiler-specific&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;syntax&amp;nbsp;is&amp;nbsp;available&amp;nbsp;to&amp;nbsp;you.&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; inline&amp;nbsp;bool&amp;nbsp;IsLoggingChannelEnabled(int&amp;nbsp;channel)&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; return&amp;nbsp;0&amp;nbsp;!=&amp;nbsp;(GlobalEnabledLogChannels&amp;nbsp;&amp;amp;&amp;nbsp;channel);&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;This&amp;nbsp;overload&amp;nbsp;is&amp;nbsp;present&amp;nbsp;to&amp;nbsp;handle&amp;nbsp;the&amp;nbsp;case&amp;nbsp;where&amp;nbsp;the&amp;nbsp;channel&amp;nbsp;argument&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;is&amp;nbsp;optional&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; inline&amp;nbsp;bool&amp;nbsp;IsLoggingChannelEnabled(const&amp;nbsp;char*)&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; {&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; return&amp;nbsp;true;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;Note:&amp;nbsp;I've&amp;nbsp;seen&amp;nbsp;many&amp;nbsp;logging&amp;nbsp;systems&amp;nbsp;which&amp;nbsp;make&amp;nbsp;the&amp;nbsp;log&amp;nbsp;channel&amp;nbsp;optional.&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;I'm&amp;nbsp;going&amp;nbsp;to&amp;nbsp;handle&amp;nbsp;this&amp;nbsp;case&amp;nbsp;to&amp;nbsp;show&amp;nbsp;how&amp;nbsp;it&amp;nbsp;is&amp;nbsp;done,&amp;nbsp;but&amp;nbsp;if&amp;nbsp;you&amp;nbsp;always&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;require&amp;nbsp;a&amp;nbsp;log&amp;nbsp;channel,&amp;nbsp;this&amp;nbsp;code&amp;nbsp;becomes&amp;nbsp;simpler&amp;nbsp;(for&amp;nbsp;instance,&amp;nbsp;you&amp;nbsp;can&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;make&amp;nbsp;format&amp;nbsp;a&amp;nbsp;required&amp;nbsp;argument&amp;nbsp;to&amp;nbsp;the&amp;nbsp;macro,&amp;nbsp;and&amp;nbsp;not&amp;nbsp;need&amp;nbsp;the&amp;nbsp;##&amp;nbsp;handling)&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; void&amp;nbsp;MyLogPrintf(const&amp;nbsp;char*&amp;nbsp;format,&amp;nbsp;...);&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; void&amp;nbsp;MyLogPrintf(int&amp;nbsp;channel,&amp;nbsp;const&amp;nbsp;char&amp;nbsp;*&amp;nbsp;format,...);&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;The&amp;nbsp;##&amp;nbsp;is&amp;nbsp;some&amp;nbsp;syntax&amp;nbsp;magic&amp;nbsp;to&amp;nbsp;make&amp;nbsp;GCC&amp;nbsp;ignore&amp;nbsp;the&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', Times, serif;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-size: 11px;"&gt;preceding&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;comma&amp;nbsp;if&amp;nbsp;no&amp;nbsp;arguments&amp;nbsp;after&amp;nbsp;channel&amp;nbsp;are&amp;nbsp;present&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; //&amp;nbsp;This&amp;nbsp;can&amp;nbsp;happen&amp;nbsp;if&amp;nbsp;no&amp;nbsp;channel&amp;nbsp;is&amp;nbsp;specified&amp;nbsp;in&amp;nbsp;the&amp;nbsp;log&amp;nbsp;print, as it is optional&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', Times, serif;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; #define&amp;nbsp;LogPrintf(channel,&amp;nbsp;...)&amp;nbsp;\&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;if(!IsLoggingChannelEnabled(channel))&amp;nbsp;{}&amp;nbsp;else&amp;nbsp;MyLogPrintf(channel,&amp;nbsp;##__VA_ARGS__)&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;#else&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; #if&amp;nbsp;_MSC_VER&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;//&amp;nbsp;__noop&amp;nbsp;is&amp;nbsp;Visual&amp;nbsp;C++&amp;nbsp;specific&amp;nbsp;syntax&amp;nbsp;for&amp;nbsp;"do&amp;nbsp;nothing".&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;#define&amp;nbsp;LogPrintf(...)&amp;nbsp;__noop&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; #else&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;// Compiler should strip this out - but always look at the disassembly to make sure!&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;inline&amp;nbsp;void&amp;nbsp;Noop()&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; {&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; }&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;#define&amp;nbsp;LogPrintf(...)&amp;nbsp;Noop()&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; #endif&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;#endif&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;//&amp;nbsp;example&amp;nbsp;log&amp;nbsp;statements&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;void&amp;nbsp;SomeFunction()&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;{&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;LogPrintf("Hello&amp;nbsp;world!\n");&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;LogPrintf(LOG_ERROR,&amp;nbsp;"You&amp;nbsp;should&amp;nbsp;see&amp;nbsp;this&amp;nbsp;very&amp;nbsp;important&amp;nbsp;error\n");&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;LogPrintf(LOG_INFO,&amp;nbsp;"Expensive&amp;nbsp;info:&amp;nbsp;%s\n",&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ThisFunctionIsVeryExpensiveToEvaluate().c_str());&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Hopefully blogger didn't mangle the formatting of all that.&lt;br /&gt;&lt;br /&gt;The key concept is to call IsLoggingChannelEnabled() in the macro itself. The if syntax it uses is specially constructed -- done this way it will not change the semantics of an if statement without braces. For example:&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;if&amp;nbsp;(rand()%2&amp;nbsp;==&amp;nbsp;0)&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;LogPrintf(LOG_INFO,"Rand&amp;nbsp;was&amp;nbsp;even\n");&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;else&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;LogPrintf(LOG_INFO,&amp;nbsp;"Rand&amp;nbsp;was&amp;nbsp;odd\n");&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;If we did something like this:&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;#define&amp;nbsp;LogPrintf(channel,&amp;nbsp;...)&amp;nbsp;\&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Georgia, 'Times New Roman', Times, serif; font-size: 11px;"&gt;if(IsLoggingChannelEnabled(channel))&amp;nbsp;MyLogPrintf(channel,&amp;nbsp;##__VA_ARGS__)&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;&lt;/pre&gt;&lt;br /&gt;that would change the meaning of the above if statement, and the else case would be on if(IsLoggingChannelEnabled(channel)), not the original rand check!&lt;br /&gt;&lt;br /&gt;A note on why I made the channel optional: In a particular legacy code base I was dealing with, the channel argument was optional on logging statements, and I had to handle that case without changing every log statement in the application. I wanted to show how you could support something like that.&lt;br /&gt;&lt;br /&gt;The main drawback with this approach is an increase in executable size due to all the log channel checks being inlined and the performance hit of the check itself on each log statement. It really depends on your particular game whether you are willing to pay these costs in your development/in-house build or not.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-8283747808934684394?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/8283747808934684394/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2011/02/lazy-logging-parameter-evaluation-with.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/8283747808934684394'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/8283747808934684394'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2011/02/lazy-logging-parameter-evaluation-with.html' title='Lazy Logging Parameter Evaluation With Variadic Macros'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-8329021290299540751</id><published>2010-12-04T09:53:00.000-05:00</published><updated>2010-12-04T09:53:50.731-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='programming paradigms'/><category scheme='http://www.blogger.com/atom/ns#' term='objected-oriented design'/><category scheme='http://www.blogger.com/atom/ns#' term='programming models'/><category scheme='http://www.blogger.com/atom/ns#' term='game programming game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='data-oriented design'/><title type='text'>Is Data-Oriented Design a Paradigm?</title><content type='html'>Recently there has been quite the flurry of tweets about OOP (objected oriented programming) and DoD (data oriented design). If you're unfamiliar with DoD, here's a nice &lt;a href="http://www.slideshare.net/DICEStudio/introduction-to-data-oriented-design"&gt;presentation&lt;/a&gt;. If you're unfamiliar with OOP, I'd like to know what cave you've been living in for the last few decades.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;DoD has caught on with game programmers because it puts a name to something anyone who has spent time optimizing a game already knew -- your data access patterns have a much bigger impact on your performance than the actual code you execute. I remember many an optimization session on &lt;i&gt;Stranglehold&lt;/i&gt;&amp;nbsp;where a reduction in L2 cache misses led to a perfectly correlated reduction in execution time.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;DoD goes farther in that it presents a set of guidelines for writing code up front that will run under the reality of the memory-processor speed gap. This does set it apart from a simple optimization technique as it is something you can use before the fact, rather than after the fact. Follow these guidelines, your program will perform better.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://dinodini.wordpress.com/2010/12/03/beam-me-up-scotty/"&gt;Dino Dini&lt;/a&gt; argues that this is nothing new, that game programmers have been doing this for decades. He's right. The underlying concepts are not that new, but giving it a name and a simple package of guidelines is new. &amp;nbsp;This has value, I think, because it helps educate programmers about these concepts. I am not discounting anyone's effort in this area, because I think a lot of programmers need to learn these concepts.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;That said, I don't think DoD approaches what one would call a programming paradigm. The consensus definition of programming paradigm is a "fundamental style of programming." It certainly is a style of programming, but I don't think it is fundamental.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;While I put on my flame retardant, let me explain what I mean. Structured/procedural and OOP are two programming paradigms that historically grew out of the need to manage software complexity. These are paradigms in which you could organize an entire code base. They contain methods for abstraction, and layered design.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;DoD says nothing about code complexity. It does not describe how to organize your entire code base. No matter what happens with the processor-memory gap, code complexity is a huge problem for any large project. DoD offers no tools for managing this complexity.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I can imagine a code base completely organized around the structured paradigm (and many exist). The same with OOP. &amp;nbsp;Many real world code bases mix a little bit of both paradigms -- platform APIs tend to be structured, application architecture these days tends to be OOP.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I can see how DoD fits into either of these paradigms. I don't know what a code base completely organized around DoD would look like. I don't think that's even a question that makes sense, as it is not tackling the same set of problems.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is fine, and does not take away from DoD at all. In fact, I think it frees us to discuss the realities of writing software for today's hardware without having to waste time arguing about OOP vs DoD. They are apples and oranges.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-8329021290299540751?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/8329021290299540751/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2010/12/is-data-oriented-design-paradigm.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/8329021290299540751'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/8329021290299540751'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2010/12/is-data-oriented-design-paradigm.html' title='Is Data-Oriented Design a Paradigm?'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-1972978472168835433</id><published>2010-03-09T10:54:00.000-05:00</published><updated>2010-03-09T10:54:35.299-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gdc'/><title type='text'>GDC</title><content type='html'>I'll be at GDC this week. My tentative session schedule is thus&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey; font-weight: bold;"&gt;Session Title&lt;/td&gt;    &lt;td style="border: 1px solid grey; font-weight: bold; text-align: center;"&gt;Date&lt;/td&gt;    &lt;td style="border: 1px solid grey; font-weight: bold; text-align: center;"&gt;Start Time&lt;/td&gt;    &lt;td style="border: 1px solid grey; font-weight: bold; text-align: center;"&gt;End Time&lt;/td&gt;    &lt;td style="border: 1px solid grey; font-weight: bold; text-align: center;"&gt;Location&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;Designing for Performance, Scalability &amp;amp; Reliability: StarCraft II's Approach&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-11&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;09:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;10:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 306, South Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;Go With the Flow! Fluid and Particle Physics in PixelJunk Shooter&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-11&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;15:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;16:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 306, South Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;God of War III: Shadows&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-11&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;16:30:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;17:30:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 304, South Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;Code and Complexity: Managing EVE's Expanding Universe&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-12&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;09:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;10:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 130, North Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;Taking Fluid Simulation Out of the Box: Particle Effects in Dark Void&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-12&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;09:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;10:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 304, South Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;Light, Perception, and the Modern Shader&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-12&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;12:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;13:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Esplanade Lobby, South Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;Creating the Active Cinematic Experience of Uncharted 2: Among Thieves&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-12&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;13:30:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;14:30:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 305, South Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;The Next Generation of Fighting Games: Physics &amp;amp; Animation in UFC 2009 Undisputed&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-12&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;15:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;16:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 135, North Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;APB: Creating a Powerful Customisation System for a Persistent Online Action Game&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-12&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;16:30:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;17:30:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 135, North Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;Three Big Lies: Typical Design Failures in Game Programming&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-13&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;09:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;10:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 125, North Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;Texture compression in real-time, using the GPU&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-13&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;10:30:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;10:55:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 132, North Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;R-Trees -- Adapting out-of-core techniques to modern memory architectures&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-13&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;11:05:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;11:30:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 132, North Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;The Rendering Tools and Techniques of Splinter Cell: Conviction&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-13&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;13:30:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;14:30:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 303, South Hall&lt;/td&gt;   &lt;/tr&gt;&lt;tr&gt;    &lt;td style="border: 1px solid grey;"&gt;Uncharted 2: HDR Lighting&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;2010-03-13&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;15:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;16:00:00&lt;/td&gt;    &lt;td style="border: 1px solid grey; text-align: center;"&gt;Room 305, South Hall&lt;/td&gt;   &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;div&gt;I believe Irrational folk will be in and out of the bar at the Marriott quite a bit in the evenings, so if you find yourself in the vicinity and see a big guy with glasses there, that's probably me, so stop by and say hi.&amp;nbsp;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-1972978472168835433?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/1972978472168835433/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2010/03/gdc.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1972978472168835433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1972978472168835433'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2010/03/gdc.html' title='GDC'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-3124033648478024588</id><published>2010-02-20T13:25:00.001-05:00</published><updated>2010-02-20T16:25:38.085-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='random musings'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><category scheme='http://www.blogger.com/atom/ns#' term='data-oriented design'/><title type='text'>Musings on Data-Oriented Design</title><content type='html'>Lately there has been a lot on the interwebs about "Data-Oriented Design."&amp;nbsp;Mike Acton tackles the problems with textbook OOP with the provocative title &lt;a href="http://macton.smugmug.com/gallery/8936708_T6zQX#593426709_ZX4pZ"&gt;Typical C++ Bullshit&lt;/a&gt;, Sony has an excellent presentation titled &lt;a href="http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf"&gt;Pitfalls of Object Oriented Programming&lt;/a&gt;, and&amp;nbsp;Games from Within discusses the subject&amp;nbsp;&lt;a href="http://gamesfromwithin.com/data-oriented-design"&gt;here&lt;/a&gt;.&amp;nbsp;&amp;nbsp;For any programmer wishing to write code that performs well on today's processors, I highly recommend reading all three.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_pOWyLHjoHD8/S4AXlFYHZjI/AAAAAAAAAAk/sMPl-ARz1i4/s1600-h/mem_cpu_gap.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_pOWyLHjoHD8/S4AXlFYHZjI/AAAAAAAAAAk/sMPl-ARz1i4/s320/mem_cpu_gap.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;The fundamental problem is pretty simple: C++ was designed during the early 80's, when the gap between processor performance and memory performance was small.&amp;nbsp;Now that gap is large.&amp;nbsp;Notice that the vertical scale on that graph is logarithmic -- the gap is nearly one thousand times larger than it was in the early 80's.&lt;br /&gt;&lt;br /&gt;It is understandable that textbook OOP, which came to be under such different hardware performance characteristics, would have performance problems with today's hardware.&lt;br /&gt;&lt;br /&gt;I've been thinking about this problem lately and my conclusion is we need better language and compiler support for the layout and access of data in systems languages. Whether that comes as modifications to C++ or as something new, I'm not going to wade into that swamp today.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Where we are&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;C itself is really just portable assembly language. It defines an abstract machine model but there is a pretty close mapping between C code and the assembly it generates. C++ kept this ability (as it is mostly a superset of C), but added in abstractions to help deal with large code bases. These abstractions necessarily came at a cost -- you can write C++ code that does not map very closely to the &lt;a href="http://www.rachelslabnotes.com/2009/10/the-hidden-cost-of-c/"&gt;assembly it generates&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;My proposition is that the data organization capabilities of both C and C++ are the equivalent of portable assembly language for data: a close mapping between the code and the data layout it generates. While the C++ standard does not actually specify a memory layout, the truth is the &lt;i&gt;de facto&lt;/i&gt;&amp;nbsp;standard in most compilers is the layout of structures or classes, minus some inserted vtable pointers, generally correspond 1-1 to how they are laid out in memory. Most operating system APIs depend on this fact, as you pass structures to them with strict memory layouts.&lt;br /&gt;&lt;br /&gt;To see why this is a problem, let me make an analogy with instruction scheduling. As processors became pipelined and then superscalar, the scheduling of instructions to keep all those pipelines full became a big problem. The early C and C++ compilers did a very poor job of it, and people resorted to either reorganizing their code or dropping down to assembly language to take proper advantage. Compilers have gotten a lot better at scheduling instructions over time -- to the point that things like inline assembly hurt the ability of the compiler to reorder instructions. With the advent of compiler intrinsics, which the compiler understands and can schedule along with other instructions, you're better off sticking in C or C++ rather than using inline assembly these days. While even in C (which again, is portable assembly language), you still run into code that the compiler does not generate machine instructions as you'd like, the tools to detect such problems are quite good and the mechanisms to fix them are usually localized to a particular function.&lt;br /&gt;&lt;br /&gt;Moving over to the data side, we are constantly stuck in a space equivalent to hand-scheduling instructions. I think this is the challenge of data-oriented techniques, is that you are forced to be in a head space where you are spending a fair amount of time doing analysis of data access and rearranging code and data structures rather than solving the actual problem your code is intended to solve. I'm sure there are people for which this comes quite naturally (I suspect Mike Acton is one), but for me, at least, this takes a considerable amount of mental effort.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Where we need to be&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;As I've thought about this more, I've realized that both C and C++ fail in offering any sort of tools to help the programmer tackle the problems of data organization. If the compiler is free to reschedule instructions, should we not let it be free to reorganize our data structures?&lt;br /&gt;&lt;br /&gt;Obviously, the compiler can not do this alone. One recurring theme in these presentations is that textbook OOP tends to focus on singular entities. A class has a virtual function that deals with late dispatch on &lt;i&gt;one&lt;/i&gt;&amp;nbsp;object. A class defines the layout for &lt;i&gt;one &lt;/i&gt;object. Obviously, you don't &lt;i&gt;have&lt;/i&gt;&amp;nbsp;to design your classes this way -- and in fact, the above presentations argue you shouldn't. But if you find yourself fighting with or avoiding the language abstractions rather than using them, what have you gained? In that sense, C++'s abstractions hurt us because they lull us into writing code that will run horribly. We need better abstractions.&lt;br /&gt;&lt;br /&gt;Both of these presentations move away from the model of classes that deal with &lt;i&gt;one&lt;/i&gt;&amp;nbsp;thing and move to code that deals with &lt;i&gt;sets&lt;/i&gt;&amp;nbsp;of things. If you are going to do a sphere in frustum test, you're going to be doing it on many things, not just one. Even when sets are not homogeneous, we deal with that by sorting them by type, and executing our operations in bulk on each type.&lt;br /&gt;&lt;br /&gt;We need more than sets, though, because different operations need different views on the data. Transform update may only be concerned with the matrix of a game entity, whereas higher level AI code may have a completely different view. We want our data to be laid out optimally for some of our operations, which may mean different data is stored in different places, or we may even have multiple copies of some data in order to support different operations.&lt;br /&gt;&lt;br /&gt;One of those views is the view we use for debugging. In our head space, we tend to think about single entities in the game world -- &lt;i&gt;this&lt;/i&gt;&amp;nbsp;projectile, &lt;i&gt;this &lt;/i&gt;character, &lt;i&gt;this &lt;/i&gt;mesh.&amp;nbsp;Textbook OOP tends to couple class layout with this debugging head-space, and is part of the attraction -- I don't have to care about what is going on with everything else in the program, I have everything I need to know about this mesh right here.&lt;br /&gt;&lt;br /&gt;The organization the computer needs is much different, though -- when doing frustum culling, for example, what we really want is just a big array of AABBs. When debugging why a specific mesh is being culled, though, it really helps to see all the data about that entity in one place. Otherwise, you spend a lot of time traversing data structures in the watch window, just to find out what state an object got in that caused it to flip its visible bit to false. So the view of the data that humans need is another important piece of the puzzle.&lt;br /&gt;&lt;br /&gt;This is the limit of my current musings. I want to write code that deals with sets of things as a natural part of the language and not just some templated container library. I want to be able to specify multiple views on my data, and have the compiler use this information to generate optimal data layout for certain operations. In the debugger, I want a debugging view which is similar to the textbook OOP view. I want a language that is designed to provide these things, and will tackle data layout as an optimization problem similar to register allocation, instruction scheduling, or inlining.&lt;br /&gt;&lt;br /&gt;Perhaps this is too radical a departure for a low-level language such as C or C++. &amp;nbsp;I would hope there are some research languages out there that do the kind of things I am talking about -- other duties have prevented me from doing anything more than a cursory literature search. Given that the processor-memory gap is only likely to get worse, I'd certainly hope there is.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-3124033648478024588?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/3124033648478024588/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2010/02/musings-on-data-oriented-design.html#comment-form' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/3124033648478024588'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/3124033648478024588'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2010/02/musings-on-data-oriented-design.html' title='Musings on Data-Oriented Design'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_pOWyLHjoHD8/S4AXlFYHZjI/AAAAAAAAAAk/sMPl-ARz1i4/s72-c/mem_cpu_gap.jpg' height='72' width='72'/><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-5286833991670027801</id><published>2009-12-22T10:30:00.001-05:00</published><updated>2009-12-22T10:31:54.897-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='deferred lighting'/><title type='text'>More Stencil States for Light Volume Rendering</title><content type='html'>A while back I wrote a short entry on &lt;a href="http://solid-angle.blogspot.com/2009/08/stencil-states-for-rendering-light.html"&gt;stencil states for light volumes&lt;/a&gt;. The method I posted works but relies on using a zfail stencil operation. Shortly after, I quickly discovered that it ran considerably slower on ATI cards than on the original Nvidia card I had been writing on, and have been meaning to post an update.&lt;br /&gt;&lt;br /&gt;On certain hardware, using anything but Keep in zfail can disable early stencil &amp;nbsp;-- specifically, &lt;a href="http://developer.amd.com/media/gpu_assets/Depth_in-depth.pdf"&gt;ATI PC hardware&lt;/a&gt;, and this caused quite a slowdown.&lt;br /&gt;&lt;br /&gt;The solution I figured out (and I'm sure others have) was to switch to a method which only relies on stencil pass operations:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: monospace; font-size: 12px;"&gt;&lt;span style="color: black;"&gt;AlphaBlendEnable&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;= false&lt;br /&gt;&lt;/span&gt;&lt;span style="color: black;"&gt;StencilEnable&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;= true&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue; font-family: monospace; font-size: small;"&gt;&lt;span style="font-size: 12px;"&gt;&lt;span style="color: black;"&gt;&lt;span style="color: black;"&gt;ColorWriteChannels&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;=&amp;nbsp;&lt;/span&gt;&lt;span style="color: black;"&gt;None&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: monospace; font-size: small;"&gt;&lt;span style="font-size: 12px;"&gt;&lt;span style="color: black;"&gt;DepthBufferEnable&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;= true&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue; font-family: monospace; font-size: small;"&gt;&lt;span style="font-size: 12px;"&gt;&lt;span style="color: black;"&gt;&lt;span style="color: black;"&gt;StencilDepthBufferFail&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;=&amp;nbsp;&lt;/span&gt;&lt;span style="color: black;"&gt;Keep&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: monospace; font-size: 12px;"&gt;&lt;span style="color: green;"&gt;// render frontfaces so that any pixel in back of them have stencil decremented&lt;br /&gt;&lt;/span&gt;&lt;span style="color: black;"&gt;CullMode&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;=&amp;nbsp;&lt;/span&gt;&lt;span style="color: black;"&gt;CounterClockwise&lt;br /&gt;&lt;/span&gt;&lt;span style="color: green;"&gt;&lt;span style="color: black;"&gt;&lt;span style="color: black;"&gt;StencilFunction&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;=&amp;nbsp;&lt;/span&gt;&lt;span style="color: black;"&gt;Always&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: monospace; font-size: 12px;"&gt;&lt;span style="color: green;"&gt;// If a pixel is in back of the volume frontface, then it is potentially inside the volume&lt;br /&gt;&lt;/span&gt;&lt;span style="color: black;"&gt;StencilPass&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;=&amp;nbsp;&lt;/span&gt;&lt;span style="color: black;"&gt;Increment&lt;/span&gt;&lt;span style="color: grey;"&gt;;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: green; font-family: monospace; font-size: small;"&gt;&lt;span style="font-size: 12px;"&gt;&lt;span style="color: black;"&gt;&lt;span style="color: green;"&gt;// render volume&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: green; font-family: monospace; font-size: small;"&gt;&lt;span style="font-size: 12px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: green; font-family: monospace; font-size: 12px;"&gt;// render backfaces so that only pixels in back of the backface have stencil decremented&lt;/span&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;code style="font-size: 12px;"&gt;&lt;span style="color: black;"&gt;CullMode&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;=&amp;nbsp;&lt;/span&gt;&lt;span style="color: black;"&gt;Clockwise&lt;br /&gt;&lt;/span&gt;&lt;span style="color: black;"&gt;&lt;span style="color: green;"&gt;// pass stencil test if reference value &amp;lt; buffer, so we only process pixels marked above.&lt;br /&gt;// Reference value is 0. This is not strictly necessary but an optimization&lt;br /&gt;&lt;/span&gt;&lt;span style="color: black;"&gt;StencilFunction&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;=&amp;nbsp;&lt;/span&gt;&lt;span style="color: black;"&gt;Less&lt;br /&gt;&lt;/span&gt;&lt;span style="color: green;"&gt;// If a pixel is back of the volume backface, then it is outside of the volume, and should not be considered&lt;br /&gt;&lt;/span&gt;StencilPass&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;=&amp;nbsp;&lt;/span&gt;&lt;span style="color: black;"&gt;Decrement&lt;br /&gt;&lt;span style="color: green;"&gt;&lt;span style="color: black;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="color: grey;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="color: green;"&gt;// render volume&lt;br /&gt;&lt;/span&gt;&lt;span style="color: black;"&gt;AlphaBlendEnable&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;= true&lt;br /&gt;&lt;/span&gt;&lt;span style="color: black;"&gt;ColorWriteChannels&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;=&amp;nbsp;&lt;/span&gt;&lt;span style="color: black;"&gt;RGB&lt;br /&gt;&lt;/span&gt;&lt;span style="color: green;"&gt;// only process pixels with 0 &amp;lt; buffer&lt;br /&gt;&lt;/span&gt;&lt;span style="color: black;"&gt;StencilFunction&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;=&amp;nbsp;&lt;/span&gt;&lt;span style="color: black;"&gt;Less&lt;br /&gt;&lt;/span&gt;&lt;span style="color: green;"&gt;// zero out pixels for so we don't need a separate clear for next volume&lt;br /&gt;&lt;/span&gt;&lt;span style="color: black;"&gt;StencilPass&amp;nbsp;&lt;/span&gt;&lt;span style="color: blue;"&gt;=&amp;nbsp;&lt;/span&gt;&lt;span style="color: black;"&gt;Zero&lt;br /&gt;&lt;/span&gt;&lt;span style="color: black;"&gt;&lt;span style="color: green;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="color: green;"&gt;//render a screen space rectangle scissored to the projection of the light volume&lt;br /&gt;&lt;/span&gt;&lt;/code&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;There is a problem with this method -- if the light volume intersects the near plane, it won't work, because the portion of the light volume in front of the near plane will never increment the stencil buffer.&lt;br /&gt;&lt;br /&gt;My solution to this was pretty simple -- if the light volume intersects the near plane, I use the zfail method from the earlier post. Otherwise, I use the stencil pass operation. For most lights, we're using the fastest path on both the major brands of cards. I briefly scanned some papers and articles on shadow volumes (a very similar problem), hoping to find an alternate way to cap volumes intersecting the near plane, but didn't see anything that looked particularly easy to implement or would necessarily perform that well, and this method got performance on ATIs and Nvidias mostly on par.&lt;br /&gt;&lt;br /&gt;What about two-sided stencil? This is a mode in DX9 where you can render both backfaces and frontfaces in one pass, with separate stencil operations on each. Because the stencil increment/decrement operations wrap around &amp;nbsp;(i.e. decrementing 0 becomes 255, incrementing 255 becomes 0), ordering doesn't really matter (although you have to make the StencilFunction Always on both). I did some quick tests using two sided stencil and my initial results showed it was actually slower than rendering both passes separately. I didn't spend much time on it so it is possible that I simply screwed something up, and plan to revisit it at some point.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-5286833991670027801?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/5286833991670027801/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/12/more-stencil-states-for-light-volume.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5286833991670027801'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5286833991670027801'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/12/more-stencil-states-for-light-volume.html' title='More Stencil States for Light Volume Rendering'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-5109996791381487416</id><published>2009-12-12T17:11:00.001-05:00</published><updated>2009-12-12T17:11:43.072-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='XNA'/><category scheme='http://www.blogger.com/atom/ns#' term='graphics'/><category scheme='http://www.blogger.com/atom/ns#' term='spherical harmonics'/><category scheme='http://www.blogger.com/atom/ns#' term='deferred lighting'/><title type='text'>Screen Space Spherical Harmonic Lighting</title><content type='html'>A while ago Jon Greenberg brought up the idea of accumulating lighting in screen space using spherical harmonics, in a blog entry entitled "&lt;a href="http://deadvoxels.blogspot.com/2009/08/has-someone-tried-this-before.html"&gt;Has anyone tried this before?&lt;/a&gt;"&lt;br /&gt;&lt;br /&gt;I've been doing deferred lighting experiments in XNA, and decided to give this technique a try. Please note I'm not doing any antialiasing and all screenshots are the definition of "programmer art" cobbled together from various freely available assets.&lt;br /&gt;&lt;br /&gt;Screenshots show full resolution deferred lighting on top and the screen space SH technique at the bottom in the &lt;a href="http://hdri.cgtechniques.com/~sponza/files/"&gt;Sponza Atrium&lt;/a&gt; by Marko Dabrovic:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_pOWyLHjoHD8/SyQQIx_q_wI/AAAAAAAAAAM/0hYnad_0IRA/s1600-h/compare.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_pOWyLHjoHD8/SyQQIx_q_wI/AAAAAAAAAAM/0hYnad_0IRA/s400/compare.png" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;The technique was pretty simple to get up and going, and produces some interesting results. The above images are with 27 point lights, 3 spot lights, and 1 directional. The directional is the only light evaluated at full resolution per-pixel, in the apply lighting stage.&lt;br /&gt;&lt;br /&gt;The basic idea is to use a quarter-sized lighting buffer (thus, in this case, 640x360) to accumulate 4-coefficient spherical harmonics. The nice thing is you only need the depth information to do so. I used 3 FP16 buffers to accumulate the SH constants. Points and spots are evaluated by rendering the light geometry into the scene and evaluating the SH coefficients for the light direction via cube map lookup, and then attenuating as normal. For the directional light, I evaluate that in the apply lighting shader. I'm not rendering any shadows.&lt;br /&gt;&lt;br /&gt;For diffuse lighting, it works pretty well, although due to the low number of SH coefficients, you will get some lighting wrapping around onto backfaces, which in practice just tends to give you "softer" lighting. That may or may not be desirable.&lt;br /&gt;&lt;br /&gt;Even though the lighting buffer is quarter-sized, you don't really lose any normal detail since SH accumulates the lighting from all directions. In my test scene, the earth models are the only ones with normal maps (deferred on the left, SH on the right)&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_pOWyLHjoHD8/SyQRuYz3-OI/AAAAAAAAAAU/2mWTswN89nc/s1600-h/normals.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_pOWyLHjoHD8/SyQRuYz3-OI/AAAAAAAAAAU/2mWTswN89nc/s400/normals.png" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;I found that when you upsample the lighting buffer during the apply lighting stage naively, you would get halos around the edges of objects. I fixed this using a bilateral filter aware of depth discontinuities.&lt;br /&gt;&lt;br /&gt;I was able to fake specular by extracting a dominant light direction from the SH, dotting that with the half vector, raising to the specular power, and multiplying that times the diffuse lighting result. It doesn't really give you great results, but it looks specular-ish. I tried using the lighting looked up at the reflected view vector but found that gave worse results.&lt;br /&gt;&lt;br /&gt;Performance-wise, in my little XNA program, which I'd hardly call optimized, the SH lighting is about the same as deferred lighting when I store specular luminance instead of a specular lighting color in the lighting buffer. Here's some screen shots showing 388 lights (384 points, 3 spots, and 1 directional):&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_pOWyLHjoHD8/SyQTDxH9ruI/AAAAAAAAAAc/wiDXOY-WUgg/s1600-h/compare387.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_pOWyLHjoHD8/SyQTDxH9ruI/AAAAAAAAAAc/wiDXOY-WUgg/s400/compare387.png" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;Note that there is at least one major optimization that could be performed when I'm calculating the SH coefficients for a light. Currently, my SH lookup cube map is in world space, but my light vectors are calculated in view space for points and spots. This causes me to make a matrix multiplication against the inverse view matrix in all the lighting shaders. This could probably be sped up quite a bit by calculating the SH lookup cubemap in view space each frame.&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;All in all, it is an interesting technique. I'm not very happy with the specular results at all, and the softness of the lighting could be a benefit or a drawback depending on the look you are going for. Jon also points out that the lighting calculations could easily be moved to the CPU on some platforms, since they only depend on depth information. I'm probably not going to explore the technique much further but thought I'd post what I'd found from the limited investigation I did.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-5109996791381487416?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/5109996791381487416/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/12/screen-space-spherical-harmonic.html#comment-form' title='14 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5109996791381487416'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5109996791381487416'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/12/screen-space-spherical-harmonic.html' title='Screen Space Spherical Harmonic Lighting'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_pOWyLHjoHD8/SyQQIx_q_wI/AAAAAAAAAAM/0hYnad_0IRA/s72-c/compare.png' height='72' width='72'/><thr:total>14</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-1728463818760864735</id><published>2009-12-04T16:06:00.002-05:00</published><updated>2009-12-06T11:37:06.682-05:00</updated><title type='text'>A Production Irradiance Volume Implementation Described</title><content type='html'>&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span style="font-size: 13px;"&gt;&lt;span style="font-family: 'Times New Roman';"&gt;&lt;span style="font-size: medium;"&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span style="font-size: 13px;"&gt;&lt;span style="font-family: 'Times New Roman';"&gt;&lt;span style="font-size: medium;"&gt;On a previous title I worked on, the dynamic lighting system we had could best be described as "an emergency hack." We found ourselves approaching an E3 demo without a viable dynamic lighting system -- the one in the engine we were licensing required&amp;nbsp;re-rendering&amp;nbsp;geometry for each light. Even using a completely separate lighting rig for dynamic objects (with a much smaller number of lights),&amp;nbsp;this was not practical on the hardware and ran too slow.&amp;nbsp;The engine in question would eventually have a much better dynamic lighting system, but that would not come for some time, and we needed something that worked right away.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The solution was to limit the number of lights that could affect dynamic objects and render 3 lights in a single pass. The three lights were chosen based on the strength of their contribution at the object's center point, and hysteresis was used to avoid light popping. Shadows darkened the scene instead of blocking a specific light, which is an old technique, but worked "well enough."&lt;br /&gt;&lt;br /&gt;I was never very happy with this solution, but it was good enough for us to ship with. It was too difficult for artists to light dynamic and static objects consistently due to the separate lighting rigs, and often the dynamic lighting would not match the static lighting very well. Dynamic lights did not take occlusion into account so you could often get bleeding through walls, which would require painful light placement and light channel tweaking.&lt;br /&gt;&lt;br /&gt;After that project shipped, I very much wanted to make a better system that would solve most of the problems. I wanted consistent results between static and dynamic lighting, I wanted a single lighting rig, and I wanted a better shadowing solution.&lt;br /&gt;&lt;br /&gt;A colleague on another title at the same studio was getting some good results with spherical harmonic-based lighting, albeit in a completely different genre. I had also recently read Natalya Tatarchuk's&lt;a href="http://www.blogger.com/goog_1259956320358"&gt; &lt;/a&gt;&lt;i&gt;&lt;a href="http://developer.amd.com/media/gpu_assets/Tatarchuk_Irradiance_Volumes.pdf"&gt;Irradiance Volumes for Games&lt;/a&gt;&lt;/i&gt;&amp;nbsp;presentation, and I felt that this was a viable approach that would help achieve my goals.&lt;br /&gt;&lt;br /&gt;The way it worked is artists placed arbitrary irradiance volumes in the map. An irradiance volume stores a point cloud of spherical harmonic samples describing incoming light. In the paper, they use an octree to store these samples, but I found that was not desirable since you had to subdivide in all three axes simultaneously -- thus if you needed more sampling detail in X and Z you were forced to also subdivide in Y. Our levels weren't very vertical, so those extra samples in Y were unnecessary and just took up memory.&lt;br /&gt;&lt;br /&gt;Instead, I used a kd-tree, which allowed me to stop subdividing an axis once it had reached an artist-specified minimum resolution.&lt;br /&gt;&lt;br /&gt;Another problem was what heuristic to use for choosing a sample set. The original paper used a GPU-based solution that rendered depth to determine if a cell contained geometry, and if so, subdivided. The idea is that places with geometry are going to have more lighting variation. The preexisting static lighting pipeline I was working in did not lend itself to GPU-based solution, so I did a similar approach using a CPU-side geometry database to determine if cells contained geometry. In practice, it was pretty fast.&lt;br /&gt;&lt;br /&gt;I would subdivide in a breadth-first manner until either I hit an artist-controlled minimum sampling resolution or &amp;nbsp;we hit the memory budget for that irradiance volume. This allowed me to have a fixed memory budget for my irradiance data, and basically the technique would produce as much detail as would fit in that budget for the volume. I also rendered a preview of the sampling points the heuristic would produce, allowing artists to visualize this before actually building lighting.&lt;br /&gt;&lt;br /&gt;Once I had a set of points, I sent it off to &lt;a href="http://www.illuminatelabs.com/products/beast"&gt;Beast&lt;/a&gt;&amp;nbsp;to calculate both direct and indirect lighting at each sample point. Once I had the initial SH dataset, I performed some postprocessing.&lt;br /&gt;&lt;br /&gt;The first step was to window the lighting samples to reduce ringing artifacts (see Peter Pike Sloan's &lt;i&gt;&lt;a href="http://www.ppsloan.org/publications/StupidSH36.pdf"&gt;Stupid Spherical Harmonic Tricks&lt;/a&gt;&lt;/i&gt;). The amount of windowing was exposed to artists as a "smoothing parameter". I had set up the toolchain so in the editor, I stored both the original Beast-produced SH samples (which took a minute or so to generate), and the postprocessed values. This allowed the artists to change various postprocessing variables without recomputing the lighting, allowing for faster iteration.&lt;br /&gt;&lt;br /&gt;What I did is remove redundant lighting samples. Within the KD-tree, the lighting samples are arranged as a series of 3D boxes -- finding the lighting at any arbitrary point within each box is done via trilinear interpolation. Because of the&amp;nbsp;hierarchical&amp;nbsp;nature of the KD-tree, each level split its box into two along one of the three axes. What I would do is compare the value at a "leaf" box point with the interpolated value from the parent box -- if the difference between these two SH coefficient sets was within a certain threshold, I would remove the leaf sample. After this process is done, we are only storing lighting samples for areas that actually have varying lighting.&lt;br /&gt;&lt;br /&gt;Samples were referenced by index into a sample array at each node of the KD-tree, which allowed me to further combine samples that were nearly identical. Finally, I encoded the sample coefficients as FP16s, to further save on memory. I was later going to revisit this encoding, as it had some decoding expense at runtime, and there probably were cheaper, better options out there.&lt;br /&gt;&lt;br /&gt;At runtime, each dynamically lit object would keep track of what irradiance volume it was in when it moved. Transitions between volumes were handled by having the artists make the volumes overlap when placing them -- since the sample data would essentially be the same in the areas of overlap, when you transitioned there would be no pop.&lt;br /&gt;&lt;br /&gt;A dynamically lit object would not just sample one point for lighting, but several. I would take the object's bounding box, shrink it by a fixed percentage, and sample the centers of each face. I would also sample the center point.&amp;nbsp;Dynamic lights would be added into the SH coefficient set analytically. I then extracted a dominant directional light from the SH set, and constructed a linear (4 coefficient) SH gradient + center sample. Rendering a directional light + a linear SH set achieves results similar to rendering a full 9 coefficient set, and is much faster on the GPU. Bungie used this same trick on Halo 3.&lt;br /&gt;&lt;br /&gt;The gradient allowed me to get a first order approximation of changing lighting across the dynamic object, which was a big improvement in the quality of the lighting and really helped make the dynamic lighting consistent with the static lighting. Evaluating a 4 SH gradient + directional light was about the same cost as if I'd evaluated a full 9 coefficient SH on the GPU, but produced much higher quality.&lt;br /&gt;&lt;br /&gt;The SH set for a dynamic object was constructed on another thread, and only happened if the object moved or its set of dynamic lights changed. This allowed us to support rendering a large number of dynamic objects.&lt;br /&gt;&lt;br /&gt;Sometimes the kd-tree subdivision heuristic would not generate high enough detail of sampling for a specific area -- for these cases I allowed the artists to place "irradiance detail volumes", which allowed the artists to override the sampling parameters for specific area of the irradiance volume - either forcing more detail, or using a smaller minimum sample resolution.&lt;br /&gt;&lt;br /&gt;Finally, for shadows, in outdoor areas we used a cascaded shadow map solution for the sun, and for interior areas, supported spotlights that cast shadows. The artists had to be careful placing these spotlights as we could not support a large number of shadow casting lights simultaneously. At the time we were rendering these lights as a separate geometry pass, but I had plans to support one shadow casting light + the SH lighting in a single pass.&lt;br /&gt;&lt;br /&gt;The end result was for anything car-sized or smaller, with statically placed lights using the same lighting rig as produced the lightmaps, you would have a very difficult time telling which objects were static and which were dynamic. One interesting side effect that was technically a "bug" but actually helped produce good results was the fact that samples underneath the floor would almost always be black, since no light reached them. When constructing the gradient, these samples would usually be used for the bottom face of the bounding box. In practice, though, this just made the object gradually get a little darker toward the floor -- which was not unpleasant, helped ground the object in the scene, and was kind of fake AO. In ShaderX 7, the article about Crackdown's lighting describes a similar AO hack, although theirs was intentional. But we decided to keep the happy accident.&lt;br /&gt;&lt;br /&gt;The biggest issue with the system was it didn't deal very well with very large dynamic objects, since a single gradient is not enough if your object spans tens of irradiance volume cells. For that game this wasn't a huge problem, but it might be for other games. Additionally, it still didn't solve the problem of things like muzzle flashes requiring multiple passes of geometry for statically lit items, and at the time I was starting to look to deferred lighting approaches to use for transient, high-frequency dynamic lights in general.&lt;br /&gt;&lt;br /&gt;The artists were very happy with the lighting, particularly on characters, and we were producing good results. But at about this time, the plug on the project was pulled and I was shifted off to other duties, and eventually the company would go bankrupt and I would move on to 2K Boston. But I felt that lighting approach was viable in a production environment, and I've since seen other games making presentations on various irradiance volume systems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-1728463818760864735?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/1728463818760864735/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/12/production-irradiance-volume.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1728463818760864735'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1728463818760864735'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/12/production-irradiance-volume.html' title='A Production Irradiance Volume Implementation Described'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-8534638579237030032</id><published>2009-10-17T13:00:00.001-04:00</published><updated>2009-10-17T14:23:35.777-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='STL'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><title type='text'>Where is the game architecture research?</title><content type='html'>I was reading this paper on &lt;a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html"&gt;EA's internal STL implementation&lt;/a&gt;, and it got me thinking -- where is the game architecture research?&lt;br /&gt;&lt;br /&gt;There is a large amount of academic research poured into real-time graphics, &lt;a href="http://www.etc.cmu.edu/projects/currentprojects.php"&gt;experimental gameplay and entertainment&lt;/a&gt;, AI, and even &lt;a href="http://www.etc.cmu.edu/projects/darkstar/"&gt;MMO server design&lt;/a&gt;. But I find there are a number of architecture issues unique to games that are lacking in any sort of research. I've done searches and not come up with a whole lot, maybe I'm just not using the correct keywords. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Memory is not a solved problem for game consoles&lt;/b&gt;&lt;br /&gt;Most if not all garbage collection research is focused on desktop or server based memory usage patterns. They assume virtual memory paging. Many gc algorithms are impractical for a fixed memory environment where utilization needs to be close to 100%. While some game engines use garbage collection, the algorithms are primitive compared to the state of the art generational collectors found in desktop environments, and the waste of memory resources is often 10-20% of total memory. Games generally can not afford large mark or sweep phases as they must execute at a smooth frame rate. Fragmentation can still be an issue in a fixed memory environment, although in this case many allocator strategies exist to combat this.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Multicore architectures for games&amp;nbsp;&lt;/b&gt;&lt;br /&gt;While this is still an active area of research for desktop and server applications, too, I've found exactly &lt;a href="http://research.microsoft.com/pubs/70655/tr-2008-152.pdf"&gt;one paper&lt;/a&gt; that attempts some research in this area for game architectures. This is a particularly fruitful area for research since there are many competing ideas out there (message passing! software transactional memory! functional programming!), but very few researchers testing any of them in the context of building a game. It is difficult enough to make a game by itself, let alone test multiple techniques for exploiting multiple cores. I find this somewhat interesting because aside from servers and scientific processing, games are pushing the state of the art in multicore programming more than anything else.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Automated testing&lt;/b&gt;&lt;br /&gt;This is something the EA STL paper brings up -- traditional automated testing techniques break down pretty quickly beyond unit testing lower level libraries. So much of the end result of game code is subjective and emergent that determining how to test even basic functionality automatically is a huge unsolved problem. This results in a large amount of manpower being used for game testing, particularly in the area of regression testing.&lt;br /&gt;&lt;br /&gt;This research is being done as a part of production by many companies inside the industry. But it is always going to be the case that in a production environment, you just aren't going to have the time and resources to, say, try three different approaches to multicore architecture and compare them. Generally you make an educated guess and hope for the best. Additionally, because much of this research is done as part of product development, rarely are the results published, which means we're all out there doing the same work over and over.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-8534638579237030032?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/8534638579237030032/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/10/where-is-game-architecture-research.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/8534638579237030032'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/8534638579237030032'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/10/where-is-game-architecture-research.html' title='Where is the game architecture research?'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-6127775002116462118</id><published>2009-10-04T11:02:00.001-04:00</published><updated>2009-10-04T11:08:08.666-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='HLSL'/><category scheme='http://www.blogger.com/atom/ns#' term='Nvidia Fermi'/><category scheme='http://www.blogger.com/atom/ns#' term='Nvidia'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenGL'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><category scheme='http://www.blogger.com/atom/ns#' term='DirectX'/><category scheme='http://www.blogger.com/atom/ns#' term='Intel'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>An Ode to the GPU. Hopefully not an Epitaph.</title><content type='html'>&lt;div&gt;The &lt;a href="http://solid-angle.blogspot.com/2009/10/im-afraid-grass-is-not-greener.html"&gt;last entry&lt;/a&gt; got me thinking about one area of game programming that has gotten unequivocally better over the last ten or fifteen years: graphics programming. From the advent of the GPU to programmable pipelines to the debugging and profiling tools available, things are for the most part way easier today than they were even five years ago.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I am not a graphics programmer. I'm a generalist who often finds himself programming graphics. So there are certainly gaps in the last ten or fifteen years where I wasn't really writing anything significant in graphics. There's a large gap between fixed-function gpus and when HLSL was introduced -- I don't think I've ever done assembly-level pixel shaders, for example. &lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;While I do remember doing a lot of OpenGL in the early days of fixed-function, I didn't do much multipass rendering on fixed function hardware, where companies like Id essentially faked a programmable pixel pipeline with texture and blend ops. Frankly, I thought during that era it was more about fighting the hardware than interesting techniques -- the amount of bs you had to put up with made the area unattractive to me at the time.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Languages like HLSL and Cg piqued my interest in graphics again, and when you think about it, are a pretty impressive feat. They allow a programmer to harness massively parallel hardware without having to think about the parallelism much at all, and the last few years have been more about interesting algorithms and more efficient operations than about fighting hardware capabilities. Sure, you still run up against the remaining fixed function parts of the pipeline (namely, blending and texture filtering), but those can be worked around.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The tools have improved year over year. On the PC, things like PerfHUD have slowly gotten better, with more tools like it being made all the time. The gold standard still remains PIX on the 360 -- so much so that many programmers I know will do an implementation of a new graphics technique first on the 360 just because it is so easy to debug when things go wrong. &lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So let me just praise the GPU engineers, tools makers, and language and API designers who have done such a good job of taking a hard problem and making it constantly easier to deal with. I think it is rare to get such productivity gains for programmers in any area, and we shouldn't take for granted when it happens.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is also why the dawn of fully programmable graphics hardware &lt;a href="http://solid-angle.blogspot.com/2009/08/diminishing-returns.html"&gt;makes me nervous&lt;/a&gt;. Nvidia recently &lt;a href="http://www.pcmag.com/article2/0,2817,2353608,00.asp"&gt;announced the Fermi architecture&lt;/a&gt;, which will allow the use of C++ on the GPU. Nvidia, AMD/ATI, and Intel are all converging on GPU architectures that allow more and more general computing, but is C++ really the answer here? &lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;HLSL and its ilk make concurrent programming easy. The same can not be said for C++.  While an architecture where the underlying threading architecture of a GPU is more open certainly will allow for a wider array of approaches, what is the cost? Are we blinded so much by the possibilities that we forget that the DirectX/OpenGL model is one of the few successes of hiding concurrency for programmers?&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I have not really done much with CUDA or compute shaders, so perhaps I am being hasty in judgement. But when I see Intel or Nvidia touting that you can use C++ on their GPUs, I get a little worried. I am not sure that this will make things better, and in fact, may make things very much worse. &lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Am I just paranoid?&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-6127775002116462118?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/6127775002116462118/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/10/ode-to-gpu-hopefully-not-epitaph.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/6127775002116462118'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/6127775002116462118'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/10/ode-to-gpu-hopefully-not-epitaph.html' title='An Ode to the GPU. Hopefully not an Epitaph.'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-7914025132624988442</id><published>2009-10-03T15:22:00.003-04:00</published><updated>2009-10-03T15:41:53.354-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='Coders at Work'/><category scheme='http://www.blogger.com/atom/ns#' term='Brad Fitzpatrick'/><title type='text'>I'm Afraid the Grass is not Greener</title><content type='html'>I started reading &lt;a href="http://www.codersatwork.com/"&gt;Coders At Work&lt;/a&gt;, and wow, it's rare that you run across a book about programming that's a page-turner, but this is it. I'm not very far into it, but a quote from Brad Fitzpatrick (LiveJournal, memcached, PerlBal) caught my attention. The context is he is bemoaning how it seems like computers are worse than they were ten years ago, that they feel slower even though under the hood they are faster, etc. Then this question and answer comes up:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;&lt;b&gt;Seibel&lt;/b&gt;: So maybe things are not as fast as they ought to be given the speed of computers. But ten years ago there was no way to do what people,as users, can do today with Google.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Fitzpatrick:&lt;/b&gt; Yeah. So some people are writing efficient code and making use of it. I don't play any games, but occasionally I'll see someone playing something and I'm like, "Holy shit, that's possible?" It just blows me away. Obviously, some people are doing it right. &lt;/blockquote&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We are? The funny thing is, I'm not sure a lot of game programmers would feel that we are doing things right. We work with imperfect middleware and engines, with hacks upon hacks piled upon them, all until the game we are working on is no longer broken and actually fun to play. We have code in our games we would describe as "shit" or "crap" or "I can't believe we shipped with that." When I was just starting out, I thought maybe it was just the games I was working on that had this problem, but any time you talk to anyone at other companies, it is the same story -- from the most successful games to the smallest ones, we can all list a huge litany of problems in the code bases in which we work or have written. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It's interesting reading this book because at least the first two programmers I've read are in totally different worlds than game development. Not better or worse, just different. The problems and constraints they have are somewhat alien to me. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I've often heard game developers say things like "game development is five to ten years behind the state of the art in 'straight' programming", referring to process or my least favorite term, "software engineering." I may have even said it myself. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The game industry often does a lot of navel gazing (like this entry!). We are constantly comparing ourselves to movies, theme parks, or how the rest of programmers work. Maybe we've got it all wrong. Maybe all along we've been figuring out how programming for &lt;i&gt;games&lt;/i&gt; needs to work. If the world that Brad Fitzpatrick lives in feels alien to me and vice versa, then why would we ever think that the processes or techniques that work in one are automatically going to work for the other? &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Food for thought.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-7914025132624988442?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/7914025132624988442/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/10/im-afraid-grass-is-not-greener.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/7914025132624988442'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/7914025132624988442'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/10/im-afraid-grass-is-not-greener.html' title='I&apos;m Afraid the Grass is not Greener'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-7076524405337072405</id><published>2009-09-23T23:38:00.006-04:00</published><updated>2009-09-24T00:27:35.655-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='STL'/><category scheme='http://www.blogger.com/atom/ns#' term='RTTI'/><category scheme='http://www.blogger.com/atom/ns#' term='templates'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><category scheme='http://www.blogger.com/atom/ns#' term='exceptions'/><title type='text'>Safety</title><content type='html'>I recently came across two articles that tangentially talk about the same thing -- technologies that are safe. Safe as in usable and not likely to get yourself in trouble.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The first was &lt;a href="http://www.dadhacker.com/blog/?p=1132"&gt;30 years of C&lt;/a&gt; over at DadHacker. The second is a Joel on Software article (nice to see him actually writing about technology instead of pimping FogBugz or whatever he's selling these days) called &lt;a href="http://www.joelonsoftware.com/items/2009/09/23.html"&gt;The Duct Tape Programmer&lt;/a&gt;. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Anyway, I thought I'd write some of my opinions of the language features mentioned in these two articles. For those of you who've known me a while, it just may surprise you where my thoughts have evolved over the years.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's cover the C++ features:&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Exceptions&lt;/b&gt; - While I have no problems with exceptions in a language like Java or C#, in C++ they just don't work well. In games we turn them off for code size and performance reasons, but I would tend to avoid them in C++ even if there was zero hit in either area. It is just too difficult to write exception-safe code in C++. You have to do extra work to do it, and the things that can break are sometimes very subtle. Most importantly, the entire culture and ecosystem built around the language is not exception-friendly. Rare is the library that is exception-safe in my experience. So just say no.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;RTTI&lt;/b&gt; - Not very useful in practice. Again, there are overhead concerns in games, although most games I've seen end up rolling their own. But the base implementation is rather inflexible -- it is reflection of only the most basic sort, and often in the places you do need run-time type information, you need a lot more than just class ids. It's a feature with its heart in the right place but it just doesn't come together very well. I think part of the problem is its all-or-nothing nature -- usually only portions of my architecture need any sort of reflection, and I don't want to pay for it on all the other classes. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Operator Overloading&lt;/b&gt;  - Rarely useful outside of math libraries. I'm not even a huge fan of the iostreams model, to tell the truth. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Multiple inheritence&lt;/b&gt; - Only with pure virtual interfaces, and even then should be used rarely and avoided if possible. Sharing implementation via inheritance goes awry enough in single inheritance, adding more base class chains just makes the problem worse.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Templates&lt;/b&gt; - The big one. I'll admit to having a love affair with templates in my twenties. What can I say? They were fun and a shiny new toy. I sure had some excesses, but even my worst one (a cross-platform file system library) shipped in multiple products. Even then I hid them all behind a straight-C API, so only programmers who had to either debug or extend the library innards had to deal with the templates. If I had to do it again, I'd probably do it differently, but I could say that about any code I've written in my career, whatever the language. I do know that it was an improvement over the previous file system library that was in use, because the new one actually worked.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I can say with a degree of certainty that template metaprogramming is a bust for practical use. There are a few major problems with it: the language isn't really built for it (it's more a clever side effect than anything), there is no good way to debug it, and functional programming isn't very ingrained in the game development culture. Ironically, I think the last part is going to have to change as parallel programming creeps into larger and larger sections of the architecture, but that won't make template metaprogramming practical.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In any case, these days templates are just a tool in the toolbox, and not one I reach for that often. The code bases I've been working in recently all roll their own template container libraries* (provided for us from external vendors), and they do the job. My experiences with code sharing via templates is that more than often it isn't worth the trouble, but sometimes it is. Like anything we do, it is a tradeoff, and one I don't necessarily feel particularly passionate about either way.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;*A somewhat amusing side note: I've done performance and code generation tests with one of the hand-rolled template container libraries I've encountered versus STL. STL came out on top for a lot of simple things like loop iteration overhead or sorting, on all the platforms I was interested in. Of course, I'm not about to rewrite hundreds of thousands of lines of code to use STL, and STL still is horrible for memory management. But I suppose that underscores the point "30 years of C" made -- even something as simple as a container library is hard to get right, even for experts. Which library I'm talking about shall remain anonymous for its own protection.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-7076524405337072405?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/7076524405337072405/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/09/safety.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/7076524405337072405'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/7076524405337072405'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/09/safety.html' title='Safety'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-1007346165114339835</id><published>2009-09-23T09:14:00.004-04:00</published><updated>2009-09-23T09:43:46.162-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='code bloat'/><title type='text'>The Other Cost of Code Bloat</title><content type='html'>The other day I almost wrote a redundant version of the exact same class that someone else on my project had written. In fact, if I hadn't have asked this person a couple general C# questions, and he hadn't put two and two together, I probably would have wrote that redundant class. Good detective work on his part, and shame on me for not doing a search of the code base to see if someone else had already tackled this problem. While I've got a pretty good feel of the C++ which makes up the majority of code in our engine/tools, I haven't looked at the C# side as much as I probably should have.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;As the code bases we write get larger and larger, and the team sizes we deal with get larger and larger, the question of how to avoid this scenario becomes an important one. Ideally you hire programmers who perform the necessary code archeology to get a feel for where things are in the code base, or who will ask questions of people more familiar with the code when unsure. Getting a code base of a million or more lines "in your head" takes time, though. I've been working with our licensed engine for about four years now, and there are still nooks and crannies that are unfamiliar to me. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Better documentation should help, but in practice it is rarely read if it even exists. This is because usually such documentation is either nonexistant or if it does exist, horribly out of date. With a licensed engine, you are at the mercy of the little documentation you are provided, and at the end of the day,  the code itself is the best documentation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A sensible architecture with clear delineation of what should go where is often a bigger help. Knowing [where to look] is half the battle, said a saturday morning cartoon show. Again, with a licensed engine, you again are at the mercy of what you are provided. Finding existing functionality usually comes down to experience with the code base and code archeology skills.  &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Recently, Adrian Stone has been writing an excellent series on &lt;a href="http://gameangst.com/?p=46"&gt;minimizing code bloat&lt;/a&gt;. Now while the techniques he describes aren't really about eliminating actual code and instead eliminating redundant generated and compiled code, the mindset is the same when you are removing actual lines of code. Aside from the important compile time, link time, and executable size benefits, there is another benefit to removing as much code as you possibly can -- the code will occupy less "head space." &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Unused or dead code makes it that much harder to do code archeology. Dead code certainly can make it more difficult to make lower level changes to the engine or architecture, as it is one more support burden and implementation difficulty. In the past, removing large legacy systems (whether written internally or externally) has had unexpected benefits in simplifying the overall architecture -- often there are lower level features that only exist to support that one dormant system. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;One of my favorite things to do is delete a lot of code without the end result of the tools or game losing any functionality. It's not only cleaning out the actual lines of code, but the corresponding head space that is wonderful feeling -- "I will never have to think about X again." With the scale of the code bases we deal with today, we don't have the brain power to spare over things we don't need.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-1007346165114339835?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/1007346165114339835/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/09/other-cost-of-code-bloat.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1007346165114339835'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1007346165114339835'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/09/other-cost-of-code-bloat.html' title='The Other Cost of Code Bloat'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-1317056110271871137</id><published>2009-09-21T09:43:00.003-04:00</published><updated>2009-09-21T10:15:49.392-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rogue programming'/><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><title type='text'>Rogue Programming</title><content type='html'>Gamasutra had an interesting article today titled &lt;a href="http://www.gamasutra.com/view/feature/4133/gaming_the_system_how_to_really_.php"&gt;Gaming the System: How to Really Get Ahead in the Game Industry&lt;/a&gt;. I found it probably had more to say about the political dysfunction that can often accompany game development rather than a how-to on being successful. To put it another way: if you find yourself having to follow the sneakier guidelines in this article too much, then you might want to consider a change in where you work.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The programming section is titled "Just Do It" and does have some truth to it. One of my leads and I came up with the term "rogue programming" for what he describes, which was half-joke, half-serious.  Here's a quote:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;As a programmer, it's not uncommon to see problems that you think should be fixed, or to see an opportunity to improve some piece of code, or speed up a process that takes a lot of time. It's also not uncommon for your suggestion to be ignored, or dismissed with an "it's not broke, so let's not fix it" response...&lt;br /&gt;&lt;br /&gt;What should you do? You should just do it -- on your own time. &lt;/blockquote&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is advice which is fraught with a lot of risk, because here's a hard-earned lesson for you: you don't always know best. I know, I know, you're superstar hotshot programmer, and you see something that is broken, so it must be fixed. Sure it is not in the schedule, but it'll just take a few hours, what's the harm? The code base will be so much better when you're done, or the artists and designers will have a feature they didn't have before. How can that &lt;i&gt;not&lt;/i&gt; make the project better?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let me give a cold-water splash of reality: when it is all said and done at the end of the project, you're going to ship with a lot of broken code. I'm not talking about obvious bugs in the shipped project, I just mean nasty, hack-filled, just-get-it-out-the-door brokenness in the code base, and some of that code will be code that you wrote. If this wasn't true, then a long-lived project like the Linux kernel wouldn't still have thousands of developers contributing to it -- obviously, there is still stuff that is "broken" and can be improved!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So in the big picture, a single section of brokenness is not going to make or break your project, and usually, there are bigger fish to fry on any given day, and its best to fry them. Because if your project is cancelled because a major feature was late, will it matter that you cleaned up the way you calculated checksums for identifiers?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;That said, if after all of this, you still think something is worth doing, let me tell you how to successfully rogue program:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;First, and most importantly, let's  define "on your own time." On your own time means you are hitting your scheduled work on schedule, and that will not change if you fix or implement this one thing. If you're behind on your scheduled work, then you really shouldn't be doing any rogue programming.  Whether not impacting your schedule means you work a saturday, do some exploration at home, or just have some slack in your schedule you'd like to exploit, if you don't complete the tasks you are supposed to be working on, you've done more damage than whatever improvement you're working on the side could benefit.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Additionally, you need co-conspirators. These days, programming is very collaborative process, and for the most part, the cowboy mentality is a dying thing. If you talk to your lead or other engineers about a problem ("Hey, X is pretty f'd up") and no one else agrees, or you can't make the case, then hey, maybe X really isn't that important! You really want to be working with a group of people that you can convince with sound arguments that something is a problem, and a lot of the time a little discussion about a problem can turn into a scheduled task -- and no rogue programming. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Often you'll be faced with something that everybody agrees *should* be done, but there's no time to do it. In these cases, I've found with good leads (which I've been blessed with the last few years), you can get tacit approval to do something "on your own time." This often takes trust -- I wouldn't recommend going off and "just doing it" until you've earned that trust. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If you've gotten to this point, you're in pretty good shape to go off and do some "rogue programming" -- because at this point (and this is where the joke comes in), it really isn't rogue at all. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now if you're at a company where you constantly feel like you need to "go around people" to "just do things," then maybe you really do need a change of venue, because that is not a healthy team. I happen to know &lt;a href="http://www.2kboston.com/jobs.php"&gt;someone who is hiring&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-1317056110271871137?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/1317056110271871137/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/09/rogue-programming.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1317056110271871137'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1317056110271871137'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/09/rogue-programming.html' title='Rogue Programming'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-3665832314358530294</id><published>2009-08-27T09:36:00.004-04:00</published><updated>2009-08-27T11:04:42.181-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='big o notation'/><category scheme='http://www.blogger.com/atom/ns#' term='cache misses'/><category scheme='http://www.blogger.com/atom/ns#' term='optimization'/><category scheme='http://www.blogger.com/atom/ns#' term='profiling'/><title type='text'>Big O doesn't always matter</title><content type='html'>The other day I was optimizing a bit of debug code which verifies the integrity of objects in memory. The details aren't super-important, but the gist is it is a function which runs periodically and makes sure that objects that should have been garbage collected were indeed purged from memory.&lt;br /&gt;&lt;br /&gt;I don't make a general habit of optimizing debug code, but this was a special case -- before, this process only ran in the debug build. Artists and designers run a "development" build, which is an optimized build that still includes assertions and many other development checks.&lt;br /&gt;&lt;br /&gt;We recently ran into a bug that would have been detected much earlier if this process had been running in the development build. While programmers run the debug build almost exclusively, we tend to stick to simpler test levels. Trying to debug an issue on a quick loading, small level is much easier than on a full-blown one.&lt;br /&gt;&lt;br /&gt;The algorithm is pretty simple -- objects have somewhat of a tree structure, but for various reasons they only have parent links and not child links. For objects at the top-level of the tree, we know for sure whether they should be in memory or not. Objects at the bottom of the tree keep all their parents in memory if they are connected to the reference graph. So the debug check looks at each object and verifies that it is not parented (via an arbitrarily long parent chain) to an object which should have been purged.&lt;br /&gt;&lt;br /&gt;First thing I did was measure how long the process was taking, and did some lower level profiling to get an idea of where time was spent. Most importantly, I also saw where I was running into cache misses.&lt;br /&gt;&lt;br /&gt;The first pass of optimization -- the original loop was doing a lot of work per-object that was simply unnecessary. This was because it was using a generalized iterator that had more functionality than needed for this specific case -- for most operations, particularly at editor time, this extra overhead is not a big deal. Removing this extra work sped up the process and it was now took about 90% of the time of the original.&lt;br /&gt;&lt;br /&gt;I then tried some high-level optimizations. There were two things I tried - one, the inner loop linearly checked each high-level object against an unsorted array of objects we know should be purged. I replaced this with a hash table from our container library. Finally, I realized that a &lt;a href="http://en.wikipedia.org/wiki/Memoization"&gt;memoizing&lt;/a&gt; approach should help here -- since I'm dealing with a tree, I could use a bit array to remember if I've already processed a parent object and deemed it OK. This would allow me to cut off traversal of the parent chain, which should eliminate a lot of work. Or so I thought.&lt;br /&gt;&lt;br /&gt;The new algorithm was faster, but not by much - only 85% of the original running time. The additional complexity was not worth 5% of running time, so I went back to the simpler approach. This isn't unusual in optimization -- you often can try something you think will be a big help but turns out not to matter much. I've made mistakes in the past where I stuck with the more complicated implementation for a marginal gain -- but it was not worth it, and it made other optimizations that may have bigger impact harder to do.&lt;br /&gt;&lt;br /&gt;As far as why the gain wasn't that much: The unsorted array was relatively small (a handful of elements), so a linear search was faster because it was simpler and had better cache behavior than the hash table implementation I was using. The tree structure of the objects was broad but not deep, so its obvious in hindsight why memoization would not be a win.&lt;br /&gt;&lt;br /&gt;Now, one thing that is nice to have is a decent container and algorithm library. I have that at my disposal, so implementing these two changes was a matter of minutes instead of hours. With that kind of arsenal, it is easy to try out algorithmic changes, even if they end up not working out.&lt;br /&gt;&lt;br /&gt;At this point, I took another look at the cache behavior from my profiling tools. I tried something incredibly simple -- prefetching the next object into the cache while I was processing the current. This resulted in the process now running at 50% of the time of the original -- a 2X speedup, and likely fast enough for me to enable this in the development build. I'm going to measure again, and see if there are any other easy wins like this to be had.&lt;br /&gt;&lt;br /&gt;The processors we use are fast -- incredibly fast, and even with branch penalties on the in-order processors of today's consoles, they can still do a lot of work in the time it takes to retrieve data from main memory. So while on paper, I'm using "slow" algorithms with worse O(n) times, in practice, your memory access patterns can easily drown out any extra calculation. The key, as always, is to measure and test your theories, and not just assume that any given approach will make something faster.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-3665832314358530294?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/3665832314358530294/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/08/big-o-doesnt-always-matter.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/3665832314358530294'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/3665832314358530294'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/08/big-o-doesnt-always-matter.html' title='Big O doesn&apos;t always matter'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-2993877221875244482</id><published>2009-08-24T01:41:00.005-04:00</published><updated>2009-08-26T21:02:07.004-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='deferred lighting'/><title type='text'>Stencil states for rendering light volumes</title><content type='html'>In the ShaderX 7 article "Designing a Renderer for Multiple Lights: The Light Pre-Pass Renderer", the author describes a number of approaches for rendering the lights into the lighting buffer. These are all pretty standard approaches for any deferred technique, but I thought the description of using stencil does not explain how to set up the stencil states very clearly. This was probably due to space constraints.&lt;br /&gt;&lt;br /&gt;The way it is worded implies that you still need to change the depth comparison function. This is not the case, and is most of the point of the technique. As the article points out, changing the depth test makes many GPUs take their early-Z rejection and go home.&lt;br /&gt;&lt;br /&gt;I'm sure you can find this detail elsewhere on the net, but my cursory searches did not find anything, and hopefully this will save at least one person some time. Standard caveats apply: I haven't extensively tested this stuff. &lt;br /&gt;&lt;br /&gt;Assuming convex light volumes, this is what I found worked well:&lt;br /&gt;&lt;br /&gt;&lt;code style="font-size: 12px;"&gt;&lt;span style="color:black"&gt;&lt;br&gt;&lt;/span&gt;&lt;span style="color:green"&gt;// render backfaces so that only pixels in front of the backface have stencil incremented&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;AlphaBlendEnable &lt;/span&gt;&lt;span style="color:blue"&gt;= false&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;StencilEnable &lt;/span&gt;&lt;span style="color:blue"&gt;= true&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;ColorWriteChannels &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;None&lt;br&gt;CullMode &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;Clockwise&lt;br&gt;DepthBufferEnable &lt;/span&gt;&lt;span style="color:blue"&gt;= true&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;StencilFunction &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;Always&lt;br&gt;StencilPass &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;Keep&lt;br&gt;&lt;/span&gt;&lt;span style="color:green"&gt;// If a pixel is front of the volume backface, then we want it lit&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;StencilDepthBufferFail &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;Increment&lt;br&gt;&lt;br&gt;&lt;/span&gt;&lt;span style="color:green"&gt;// render volume&lt;br&gt;&lt;br&gt;// render frontfaces so that any pixel in back of them have stencil decremented&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;CullMode &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;CounterClockwise&lt;br&gt;&lt;/span&gt;&lt;span style="color:green"&gt;// pass stencil test if reference value &amp;lt; buffer, so we only process pixels marked above. &lt;br&gt;// Reference value is 0. This is not strictly necessary but an optimization&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;StencilFunction &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;Less&lt;br&gt;&lt;/span&gt;&lt;span style="color:green"&gt;// If a pixel is in front of the volume frontface, then it is not inside the volume&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;StencilDepthBufferFail &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;Decrement&lt;/span&gt;&lt;span style="color:gray"&gt;;&lt;br&gt;&lt;br&gt;&lt;/span&gt;&lt;span style="color:green"&gt;// render volume&lt;br&gt;&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;AlphaBlendEnable &lt;/span&gt;&lt;span style="color:blue"&gt;= true&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;ColorWriteChannels &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;RGB&lt;br&gt;&lt;/span&gt;&lt;span style="color:green"&gt;// only process pixels with 0 &amp;lt; buffer&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;StencilFunction &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;Less&lt;br&gt;&lt;/span&gt;&lt;span style="color:green"&gt;// zero out pixels for so we don't need a separate clear for next volume&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;StencilPass &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;Zero&lt;br&gt;&lt;/span&gt;&lt;span style="color:green"&gt;// don't want to do anything if we fail the depth test&lt;br&gt;&lt;/span&gt;&lt;span style="color:black"&gt;StencilDepthBufferFail &lt;/span&gt;&lt;span style="color:blue"&gt;= &lt;/span&gt;&lt;span style="color:black"&gt;Keep&lt;br&gt;&lt;br&gt;&lt;/span&gt;&lt;span style="color:green"&gt;//render a screen space rectangle scissored to the projection of the light volume&lt;br&gt;&lt;/span&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Note that unlike shadow volumes, the light volume intersecting the near plane is not a concern here. We are rendering the frontfaces to find pixels that are in front of the light volume -- if parts of the light volume are in front of the near plane, by definition any pixels we're rendering are in back of those parts. So there is no need to render a cap in this case.&lt;br /&gt;&lt;br /&gt;The light volume intersecting the far plane is a concern. One way to handle this case is to use a &lt;a href="http://www.terathon.com/gdc07_lengyel.ppt"&gt;projection matrix with an infinite far plane&lt;/a&gt;, like shadow volumes do. Another way to handle it would be to detect this case and not use the stencil approach at all, instead rendering a screen space rectangle scissored to the light bounds. &lt;br /&gt;&lt;br /&gt;Finally, I've had better luck switching to rendering the backfaces without depth testing when the camera is inside the light volume, instead of using a screen space rectangle. But I think this has more to do with a bug in my scissoring code than with any fundamental problem!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-2993877221875244482?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/2993877221875244482/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/08/stencil-states-for-rendering-light.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2993877221875244482'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2993877221875244482'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/08/stencil-states-for-rendering-light.html' title='Stencil states for rendering light volumes'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-2243224640245841030</id><published>2009-08-24T00:28:00.006-04:00</published><updated>2009-08-24T01:40:46.824-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xbox 360'/><category scheme='http://www.blogger.com/atom/ns#' term='rendering'/><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='XNA Studio'/><category scheme='http://www.blogger.com/atom/ns#' term='deferred lighting'/><category scheme='http://www.blogger.com/atom/ns#' term='least common denominator approaches'/><category scheme='http://www.blogger.com/atom/ns#' term='leaky abstractions'/><title type='text'>XNA: at times awesome, at times frustrating</title><content type='html'>I'm not sure why, but Microsoft seems intent on crippling XNA for the 360. Perhaps they want to sell more dev kits.&lt;br /&gt;&lt;br /&gt;I recently had some more time to work on my &lt;a href="http://solid-angle.blogspot.com/2009/06/leaky-abstractions-in-xna.html"&gt;little toy project&lt;/a&gt;. After some work, I've now got a deferred lighting implementation on the PC.&lt;br /&gt;&lt;br /&gt;For the lighting buffer construction, at first I was using a tiled approach similar to Uncharted, which did not require blending during the lighting stage. It did work for the most part, and allowed me to use LogLUV for encoding the lighting information, which was faster. But it had issues - I didn't have any lighting target ping-ponging set up, so I was stuck with a fixed limit of seven lights per tile. Also, even with smallish tiles, you end up doing a lot of work on pixels not actually affected by the lights in question. So I wanted to compare it to a straightforward blending approach, and switched back to an FP16 target, and render the light volumes directly (using the stencil approach detailed in ShaderX7's Light Pre-Pass article).&lt;br /&gt;&lt;br /&gt;So this all worked great and my little toy is rendering 100 lights. Of course, on the 360, there's a problem. Microsoft, in its infinite wisdom, decided that the FP10 buffer format on 360 would blow people's minds and &lt;a href="https://connect.microsoft.com/feedback/ViewFeedback.aspx?FeedbackID=343887&amp;SiteID=226"&gt;it is not supported in XNA&lt;/a&gt;. They are using an actual FP16 target, which does not support blending. &lt;br /&gt;&lt;br /&gt;So I guess it is going to be back to alternate lighting buffer encoding schemes, bucketing, render target ping-ponging for me. It's not a huge deal, but it is frustrating.&lt;br /&gt;&lt;br /&gt;It is a real shame that XNA gives the impression that the 360 GPU is crippled, when in reality it is anything but. Couple lack of FP10 support with inability to sample the z-buffer directly, and the lack of control of XNA's use of EDRAM, and they've managed to turn the 360 into a very weak, very old PC.&lt;br /&gt;&lt;br /&gt;Least common denominator approaches generally haven't fared that well over the years. An XBLA title implemented in XNA is going to be at a fundamental disadvantage -- I don't think you are going to see anything approaching the richness of Shadow Complex, for example. &lt;br /&gt;&lt;br /&gt;At the end of the day, Microsoft needs to figure out where they are going with XNA. If they are going to dumb it down and keep it as a toy for people who can't afford a real development kit (people who've been &lt;a href="http://mynameismjp.wordpress.com/category/xna/page/2/"&gt;bumping into these low ceilings much longer than me&lt;/a&gt;), then they should keep on their current path. &lt;br /&gt;&lt;br /&gt;The potential for XNA is really much more, though. Today I wrote a pretty decent menu system in about 45 minutes, that handles gamepad, keyboard, and mouse input seamlessly. I don't think I could write that in C++/DirectX anywhere near as fast. If you start looking down the road to future generations of hardware, I'm not worried about the overhead of C# being fundamentally limiting. Games today already use much less efficient scripting languages than C#, and while you are limited to the heavy lifting Microsoft has chosen to implement for you today, who is to say that a future version of XNA couldn't allow shelling out to C++ for really performance intensive stuff? &lt;br /&gt;&lt;br /&gt;XNA has a chance to become something really great that would be very powerful for a large class of games. It remains to be seen if Microsoft will let it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-2243224640245841030?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/2243224640245841030/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/08/xna-at-times-awesome-at-times.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2243224640245841030'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2243224640245841030'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/08/xna-at-times-awesome-at-times.html' title='XNA: at times awesome, at times frustrating'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-2373843273425445088</id><published>2009-08-19T10:01:00.003-04:00</published><updated>2009-08-19T10:52:20.680-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='silver bullets'/><category scheme='http://www.blogger.com/atom/ns#' term='software transactional memory'/><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='memory wall'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>One has to have had inflated expectations to experience disillusionment</title><content type='html'>A colleague sent along this item, which asks if &lt;a href="http://blogs.msdn.com/stmteam/archive/2009/07/24/are-we-beyond-the-trough-of-disillusionment.aspx"&gt;Transactional Memory is beyond the "trough of disillusionment"&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I've never had any expectations that STM would be some silver-bullet solution to concurrency, and from the get-go just viewed it as just another tool in the toolbox. Granted, it is a technique that I haven't had much practical experience with yet -- it's on my TODO list. &lt;a href="http://graphics.cs.williams.edu/archive/SweeneyHPG2009/"&gt;Others&lt;/a&gt; might disagree with me, but I'm not even sure how much of a major factor it is going to be in writing games. Of course, if some major piece of middleware is built around it, I suppose a lot of people will end up using STM, but that doesn't necessarily make it a good idea.&lt;br /&gt;&lt;br /&gt;The latest piece of evidence against STM as a silver bullet comes from conversations I've had with colleagues and friends who have a lot of experience building highly-scalable web or network servers. STM advocates hail transactions as a technique with decades of research, implementation, and use. About this they are correct. The programming model is stable, and the problems are well known. But what has struck me is how often my colleagues with much more experience in highly-scalable network servers try to avoid traditional transactional databases. If data can be stored outside of a database reliably, they do so. There are large swaths of open source software devoted to avoiding transactions with the database. The main thrust is to keep each layer independent and simple, and talk to a database as little as possible. The reasons? Scalability and cost. Transactional databases are costly to operate and very costly to scale to high load. &lt;br /&gt;&lt;br /&gt;I found the link above a little too dismissive of the costs of STM, particularly with memory bandwidth. I've already discussed the &lt;a href="http://solid-angle.blogspot.com/2008/12/magical-missteps-and-memory-wall.html"&gt;memory wall&lt;/a&gt; before, but I see this as a serious problem down the road. We're already in a situation where memory access is a much more serious cost to performance than the actual computation we're doing, and that's with a small number of cores. I don't see this situation improving when we have 16 or more general-purpose cores.&lt;br /&gt;&lt;br /&gt;A digression about GPUs. GPUs are often brought up as a counter-argument to the memory wall as they already have a very large number of cores. GPUs also have a very specialized memory access pattern that allow for this kind of scalability - for any given operation (i.e. draw call), they generally have a huge amount of read-only data and a relatively small amount of data they write to compared to the read set. Those two data areas are not the same within a draw call. With no contention between reads and writes, they avoid the memory issues that a more general purpose processor would have.&lt;br /&gt;&lt;br /&gt;STM does not follow this memory access model, and I do not dismiss the concerns of having to do multiple reads and writes for a transaction. Again, we are today in a situation where just a single read or write is already hideously slow. If your memory access patterns are already bad, spreading it out over more cores and doubling or tripling the memory bandwidth isn't really going to help. Unlike people building scalable servers, we can't just spend some money on hardware -- we've got a fixed platform and have to use it the best we can.&lt;br /&gt;&lt;br /&gt;I don't think that STM should be ignored -- some problems are simpler to express with transactions than with alternatives (functional programming, stream processing, message passing, traditional locks). But I wouldn't design a game architecture around the idea that all game code will use STM for all of its concurrency problems. To be fair, Sweeney isn't proposing that either, as he proposes a layered design that uses multiple techniques for different types of calculations. &lt;br /&gt;&lt;br /&gt;What I worry about though is games are often written in a top-down fashion, with the needs at the gameplay level dictating the system support required. If at that high level the only tool being offered is STM with the expectation that it is always appropriate, I think it will be easy to find yourself in a situation where refactoring that code to use other methods for performance or fragility reasons may be very difficult and very expensive than if the problem had been tackled with a more general toolbox in the first place.&lt;br /&gt;&lt;br /&gt;Concurrency is hard, and day to day I'm still dealing with the problems of the now, rather than four or five years down the road. So I will admit I have no fully thought out alternative to offer. &lt;br /&gt;&lt;br /&gt;The one thing I think we underestimate is the ability of programmers to grow and tackle new challenges. The problems we deal with today are much harder and much more complex than those of just a decade ago. Yes, the tools are better for dealing with those problems, and the current set of tools for dealing with concurrency are weak. &lt;br /&gt;&lt;br /&gt;That means we need to write better tools -- and more importantly, a better toolbox. Writing a lock-free sw/sr queue is much harder than using one. What I want is a bigger toolbox that includes a wide array of solutions for tackling concurrency (including STM), not a fruitless search for a silver bullet that I don't think exists, and not a rigid definition of what tools are appropriate for different types of game problems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-2373843273425445088?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/2373843273425445088/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/08/one-has-to-have-had-inflated.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2373843273425445088'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2373843273425445088'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/08/one-has-to-have-had-inflated.html' title='One has to have had inflated expectations to experience disillusionment'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-6693634624289271115</id><published>2009-08-13T09:52:00.004-04:00</published><updated>2009-08-13T10:11:15.020-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='programming models'/><category scheme='http://www.blogger.com/atom/ns#' term='SIGGRAPH'/><category scheme='http://www.blogger.com/atom/ns#' term='random musings'/><title type='text'>Diminishing returns</title><content type='html'>One thing that has been on my mind since SIGGRAPH is the problem diminishing returns poses: when do you switch from an approach, algorithm, or model because any gains to be had are increasingly diminishing?&lt;br /&gt;&lt;br /&gt;The specific thing that has got me thinking about this is the rapid approach of fully programmable GPUs. So far this is not looking like it will be another evolutionary change to the venerable D3D/OpenGL programming model, and will in fact be a radical change in the way we program graphics. Which is just another way for saying it will be a change in the way we *think* about graphics.&lt;br /&gt;&lt;br /&gt;At SIGGRAPH there was a panel of various industry and academic luminaries discussing the ramifications -- is the OpenGL/D3D model dead? (not yet), what will be the model that replaces it? (no one knows), is this an interesting time to be a graphics programmer? (yes). A colleague pointed out that the members of the panel lacked a key constituency -- a representative from a game studio that's just trying to make a game without a huge graphics programming team. The old model is on its last legs, the new world is so open that to call it a "model" would be an insult to programming models. If you're an academic or an engine maker, this doesn't present a problem, in fact, it is a huge opportunity -- back to the old-school, software renderer days. Anything's possible!&lt;br /&gt;&lt;br /&gt;But for your average game developer, it could mean you are one poor middleware choice away from disaster. You don't have the resources of the engine creators, so being ripped asunder from the warm embrace of the familiar D3D/OpenGL model can be a little terrifying. To put it another way: the beauty of a model like D3D/OpenGL is that no matter what engine or middleware you use, when it comes to the renderer, there is a lot of commonality. In this new world, there are a bunch of competing models or approaches -- that's part of the point. Engine creators will have a bevy of approaches to choose from -- but if you're just trying to get a game done, and you find your engine's choice of approach doesn't match what you need to do, well, you've got a lot of work all of the sudden.&lt;br /&gt;&lt;br /&gt;But we face these choices in software development all the time: when to abandon an algorithm or model because of diminishing returns. Change too soon and you've done a lot of extra work you could have avoided by just refining the existing code. Change too late and you miss opportunities that could differentiate your offerings. We like to pretend like doing cost/benefit analysis on this kind of stuff is easy, as if we were comparing a Volvo against a Toyota at the car dealer. But often the issues can be quite complex, and the fallout quite unexpected. &lt;br /&gt;&lt;br /&gt;It's cliche, but we live in interesting times.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-6693634624289271115?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/6693634624289271115/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/08/diminishing-returns.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/6693634624289271115'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/6693634624289271115'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/08/diminishing-returns.html' title='Diminishing returns'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-15667628712700979</id><published>2009-08-09T20:48:00.006-04:00</published><updated>2009-08-09T21:24:38.322-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='new orleans'/><category scheme='http://www.blogger.com/atom/ns#' term='SIGGRAPH'/><title type='text'>Fresh from SIGGRAPH</title><content type='html'>You don't know what heat is until you spend a week in New Orleans in August.&lt;br /&gt;&lt;br /&gt;Here's a quick list of my favorites:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Favorite course:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.siggraph.org/s2009/sessions/courses/details/?id=12"&gt;Advances in Real-Time Rendering&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Favorite technical paper:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="Inferred Lighting: Fast Dynamic Lighting and Shadows for Opaque and Translucent Objects"&gt;Inferred Lighting&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Favorite somewhat technical talk:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.siggraph.org/s2009/sessions/talks/details/?type=talk&amp;id=149"&gt;Immersive and Impressive: The Impressionistic Look of Flower on the PS3&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Favorite non-technical talk:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.siggraph.org/s2009/sessions/talks/details/?type=talk&amp;id=148"&gt;Making Pixar's "Partly Cloudy": A Director's Vision&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-15667628712700979?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/15667628712700979/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/08/fresh-from-siggraph.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/15667628712700979'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/15667628712700979'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/08/fresh-from-siggraph.html' title='Fresh from SIGGRAPH'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-5047404465849972372</id><published>2009-07-08T20:48:00.005-04:00</published><updated>2009-07-08T21:13:12.117-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><title type='text'>Things to do when writing a tutorial</title><content type='html'>Something I've learned over the years that if you are writing a tutorial for a tool you've written, it pays dividends to actually perform the steps of the tutorial yourself as you write it.&lt;br /&gt;&lt;br /&gt;This seems obvious, but often while developing a tool you end up developing a set of test data as you go. Often features or changes you add later can break functionality that you only used early on. It can be tempting to try to plow through writing the tutorial, since you know how all the features work -- why bother actually doing them?&lt;br /&gt;&lt;br /&gt;If you actually perform the operations without having any existing data, you can uncover a lot of bugs, or features that don't work particularly well. Lately, I've gone even further: Take your fully developed test data, tear it down, and then build it back up again. This tests both the creation code paths and the destruction code paths.&lt;br /&gt;&lt;br /&gt;If nothing else, doing the above saves the embarrassment of releasing a tool to your artists and designers, and the first time the try the most basic of things, it crashes, because you haven't exercised that code path for a week or two. One last regression test is worth the extra effort.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-5047404465849972372?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/5047404465849972372/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/07/things-to-do-when-writing-tutorial.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5047404465849972372'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5047404465849972372'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/07/things-to-do-when-writing-tutorial.html' title='Things to do when writing a tutorial'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-7543164995425973583</id><published>2009-06-21T21:54:00.009-04:00</published><updated>2009-06-22T00:11:47.937-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xbox 360'/><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='XNA Studio'/><category scheme='http://www.blogger.com/atom/ns#' term='deferred lighting'/><category scheme='http://www.blogger.com/atom/ns#' term='leaky abstractions'/><title type='text'>Leaky abstractions in XNA</title><content type='html'>So continuing my &lt;a href="http://solid-angle.blogspot.com/2009/06/adventures-in-xna-continued.html"&gt;exploration of XNA&lt;/a&gt;, this weekend I did some more work on my little toy project.&lt;br /&gt;&lt;br /&gt;The first thing I did was get it running on 360. I was happy to see that XNA seems to be able to figure out how to deal with my various render targets, including one MRT, without too much trouble, and the performance was far superior on the 360 than on my laptop. I get about 200 fps on the 360 vs 60 on the laptop.&lt;br /&gt;&lt;br /&gt;There was one issue worth noting.&lt;br /&gt;&lt;br /&gt;First, some background on the deferred lighting approach I am using:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Render normal + depth into a G buffer for all primitives. Depth writes and tests are enabled in this step.&lt;/li&gt;&lt;li&gt;Render the lights into a lighting buffer using the G buffer. Depth writes and tests are disabled for this pass.&lt;/li&gt;&lt;li&gt;Apply the lighting to each primitive using the lighting from step 2 while computing albedo and (eventually) other material properties on the fly. Depth tests are enabled but not writes.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;So the first problem on the 360 is XNA blows away the depth buffer I lay down in step 1 by the time I get to step 3. After some searching on the internets, I discovered this is &lt;a href="http://blogs.msdn.com/shawnhar/archive/2007/11/21/rendertarget-changes-in-xna-game-studio-2-0.aspx"&gt;expected behavior&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;I tried setting my render targets to PreserveContents, which does work, but is completely wasteful since I don't give a hoot about restoring the actual color contents of any of these buffers. This dipped performance down to 150fps.&lt;br /&gt;&lt;br /&gt;My next attempt was to restore the depth buffer manually from my G Buffer. But this was exhibiting z-fighting, possibly due to slightly different methods of Z calculation for my G-Buffer vs the depth-buffer leading to small differences in the computed Z values. I didn't feel that messing around with z biasing would be a robust solution, so I abandoned this effort.&lt;br /&gt;&lt;br /&gt;The solution I ended up choosing was to just clear the z buffer again and reconstruct it during step #3. Since my scenes are so simple this gets me back to just slightly under 200 fps. &lt;br /&gt;&lt;br /&gt;It's not an ideal solution, since I had in mind some uses for a stencil buffer laid down in step #1 that would accelerate step #2 (mainly, masking off unlit pixels for the skybox). &lt;br /&gt;&lt;br /&gt;XNA's EDRAM handling is a great example of a &lt;a href="http://www.joelonsoftware.com/articles/LeakyAbstractions.html"&gt;leaky abstraction&lt;/a&gt;. Only having a 10 MB EDRAM buffer does make render target management trickier, but in Microsoft's attempt to completely hide it from XNA programmers, I think they've just made things more frustrating. The concept of a limited buffer for render targets is not that hard to get your head around. You have to understand EDRAM anyway since techniques in XNA that work perfectly on Windows (like what I was doing) will break on the 360. Even worse, you have no real good idea *why* it's breaking unless you understand the limitations of EDRAM and take a guess at what Microsoft is doing under the hood. So what is really being saved here? Just let me deal with EDRAM myself.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-7543164995425973583?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/7543164995425973583/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/06/leaky-abstractions-in-xna.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/7543164995425973583'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/7543164995425973583'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/06/leaky-abstractions-in-xna.html' title='Leaky abstractions in XNA'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-7277370913699756332</id><published>2009-06-14T23:07:00.004-04:00</published><updated>2009-06-14T23:46:07.681-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='hdr'/><category scheme='http://www.blogger.com/atom/ns#' term='XNA Studio'/><category scheme='http://www.blogger.com/atom/ns#' term='deferred lighting'/><category scheme='http://www.blogger.com/atom/ns#' term='nvperfhud'/><title type='text'>Adventures in XNA continued</title><content type='html'>This weekend I played around in XNA a little bit more (completely personal stuff, nothing to do with work, opinions are my own, etc). I'm still find it very fun for the most part but the lack of access to the metal can be frustrating at times.&lt;br /&gt;&lt;br /&gt;For the most part I've just been experimenting with &lt;a href="http://www.realtimerendering.com/blog/deferred-lighting-approaches/"&gt;deferred lighting&lt;/a&gt;. As far as what I'm trying to accomplish, I view this stuff like a musician doing scales. Good practice, but the goal is to get familiar with the techniques rather than produce anything "real".&lt;br /&gt;&lt;br /&gt;I'd already built up a quick and dirty deferred lighting implementation a couple months before. This weekend I removed some of the hacks I had, added HDR + bloom, threw in some simple terrain, played around with LogLuv encoding, and fixed some artifacts from my first pass implementation.&lt;br /&gt;&lt;br /&gt;I suppose that sounds a lot but the nice thing about XNA is there are a bazillion samples out there. The deferred lighting is the thing I'm really concentrating on, so for the other stuff I just grabbed anything I could find. &lt;a href="http://creators.xna.com/en-US/sample/collision3dheightmapnormals"&gt;Terrain&lt;/a&gt;, and &lt;a href="http://mynameismjp.wordpress.com/samples-tutorials-tools/xna-hdr/"&gt;HDR and bloom&lt;/a&gt; came pretty much as-is from samples/demos, as did a &lt;a href="http://blogs.msdn.com/shawnhar/archive/2007/06/08/displaying-the-framerate.aspx"&gt;FPS counter&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As far as the deferred lighting goes, I finally got the half texel offset stuff cleared up. In Direct3D 9, pixel coordinates and texture coordinates don't line up, so when doing something like sampling a normal buffer or lighting buffer, if you don't offset them properly you'll be sampling the lighting from the wrong texel. &lt;a href="http://diaryofagraphicsprogrammer.blogspot.com/2008/09/calculating-screen-space-texture.html"&gt;This entry&lt;/a&gt; by Wolfgang Engel was a big help here.&lt;br /&gt;&lt;br /&gt;Reading Engel's ShaderX7 article, I also understood why the specular lighting has to be multiplied by n dot l, and fixed up some artifacts I would have due to that (mainly specular appearing on backfaces). &lt;br /&gt;&lt;br /&gt;My first pass at HDR used FP16 render targets for everything. I changed the final apply lighting pass to encode into LogLUV, and then implemented the encoding for the lighting buffer suggested by Pat Wilson's article in ShaderX7. A side effect of the very simple material model I'm using allowed me to use a 8:8:8:8 buffer for this and still allow for high range when accumulating lighting. I currently don't have separate diffuse and specular albedo, so when I apply lighting the equation looks like this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;albedo*(diffuseLighting  + specularLighting)&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This is part of the joy of a small demo - no artists telling me they have to have a separate specular albedo :). Anyway, I realized that I can just add those together before writing to the lighting buffer, and just do a straightforward encoding of the result in LogLUV space. I do want to put control of the glossiness in the material model, but that will require encoding the depth into 24 bits of a render target and then including an 8 bit specular power in the remainder. (I have to render depth to a separate target because XNA gives no facility for sampling the depth buffer).&lt;br /&gt;&lt;br /&gt;In the process of all of this I wasn't quite getting the performance I wanted. I'm doing this on my Dell M1330 laptop which while no slouch, has trouble running World of Warcraft at a decent frame rate. But given such a simple scene, I was just shy of 60 fps so I decided to see if I could fire up NVPerfHUD and see what was going on. You can &lt;a href="http://forums.xna.com/forums/p/5830/30661.aspx#30661"&gt;run NVPerfHUD with XNA apps&lt;/a&gt;, but a side effect I discovered is all &lt;a href="http://forums.xna.com/forums/t/33184.aspx"&gt;vertex processing is done in software&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;This is a bummer since it greatly throws off timings (a draw call for one mesh took 5 ms on the CPU for a unbelievably simple vertex shader), but I was able to find some GPU hotspots, some of which I at improved by pulling stuff out of the shader and on to the CPU. &lt;br /&gt;&lt;br /&gt;Anyway, not sure how much I'll be working on this stuff but when I do I'll try to put the odd post up here. I haven't tried running my latest incarnation on the 360, which will probably be my next step. I think I've got the render targets set up so it should work, assuming the XNA runtime isn't doing anything retarded under the hood. But without PIX on the 360 it'll be hard to really dig into that.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-7277370913699756332?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/7277370913699756332/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/06/adventures-in-xna-continued.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/7277370913699756332'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/7277370913699756332'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/06/adventures-in-xna-continued.html' title='Adventures in XNA continued'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-1749167576507573887</id><published>2009-05-23T13:08:00.003-04:00</published><updated>2009-05-23T13:24:44.712-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><title type='text'>Enjoy being a beginner</title><content type='html'>That's a quote that stuck with me from an interview I read about Nolan Bushnell, and I've always thought it was something to aspire to. Because I find as I get older, it is can be easier to just stick to what you already know professionally. Inertia can settle in if you're not careful, and if a particular task is already in your wheelhouse, you tend to gravitate toward it.&lt;br /&gt;&lt;br /&gt;The last year at Midway I was exclusively doing graphics tasks, and it was nice to completely focus on one area for an extended period after having hopped around to whatever systems fire was highest priority on Stranglehold. Even then, I'd done a fair amount of graphics work before over my career. Some of the things I'd worked on were areas I wasn't familiar with at the time (spherical harmonics), and in the process I learned a lot.&lt;br /&gt;&lt;br /&gt;Now, though, I'm doing something completely different at the new gig. Unfortunately the project is still top secret and I can't get into any detail lest I inadvertently give something away. But unlike graphics, this is an area where aside from a handful of toy projects over the years, I haven't done anything before. &lt;br /&gt;&lt;br /&gt;These kind of opportunities are a big reason why I like working on games -- the breadth of the work available is really wide, from the lowest level shaders to the highest level tools. I'm sure there are some other programming gigs that have this kind of range, but I can't think of many. Sure the expectations in terms of what can be done, how long it takes, and how many people it will take to do it can be pretty high, but every once in a while it pays to step back and remember that the work can be very rewarding in its scope and variety.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-1749167576507573887?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/1749167576507573887/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/05/enjoy-being-beginner.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1749167576507573887'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1749167576507573887'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/05/enjoy-being-beginner.html' title='Enjoy being a beginner'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-4919725615078062936</id><published>2009-04-14T20:47:00.001-04:00</published><updated>2009-04-14T20:49:02.105-04:00</updated><title type='text'>Changes, continued</title><content type='html'>I just started my new gig at 2K Boston. So far, so good!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-4919725615078062936?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/4919725615078062936/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/04/changes-continued.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/4919725615078062936'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/4919725615078062936'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/04/changes-continued.html' title='Changes, continued'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-6302460736224771591</id><published>2009-03-31T14:09:00.002-04:00</published><updated>2009-03-31T14:16:10.291-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='downloadable games'/><category scheme='http://www.blogger.com/atom/ns#' term='game industry'/><title type='text'>The Future is Now</title><content type='html'>Engadget brings us news of the &lt;a href="http://www.engadget.com/2008/11/12/tectoys-zeebo-gaming-console-to-run-on-3g-network-in-brazil/"&gt;Zeebo&lt;/a&gt; -- a game console that only offers games available for download. Sound &lt;a href="http://solid-angle.blogspot.com/2009/03/what-will-future-bring.html"&gt;familiar&lt;/a&gt;?&lt;br /&gt;&lt;br /&gt;It is supposed to be released in Brazil in 2009. It is not intended to compete with the front-line consoles such as the PS3, 360 or Wii, and is targeted at emerging markets. Now this may be vapor, and there is some confusion as to the retail price - $599 seems pretty damn steep for anywhere, let alone emerging markets. &lt;br /&gt;&lt;br /&gt;It is probably not surprising that the innovation in abandoning the boxed retail model is coming not from the big console manufacturers, but from smaller, more nimble players. I hope for Microsoft, Sony, and Nintendo's own sakes that they are not too entrenched in the boxed retail mindset to realize that it is only a matter of when, not if, boxed retail dies.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-6302460736224771591?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/6302460736224771591/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/03/future-is-now.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/6302460736224771591'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/6302460736224771591'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/03/future-is-now.html' title='The Future is Now'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-5043990617312807905</id><published>2009-03-30T03:27:00.002-04:00</published><updated>2009-03-30T03:41:35.571-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='middleware'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><category scheme='http://www.blogger.com/atom/ns#' term='gdc09'/><title type='text'>Please Get Your Physics Off My GPU</title><content type='html'>I hope to have some more substantial thoughts on GDC, but one nice trend was a number of talks that focused on offloading graphics work off of the GPU and onto CPUs (in this case, SPUs on the PS3). &lt;br /&gt;&lt;br /&gt;For the last couple of years there has been a major push by the graphics card manufacturers to get non-graphic-y things onto the GPU. Cloth, physics, heck I'm sure there's even some GPU AI examples out there somewhere. These are things that console game developers I know don't particularly want or need.&lt;br /&gt;&lt;br /&gt;The lion share of console games are GPU bound. The last thing I want to do is put more stuff on the GPU. So even if your cloth or physics solution runs really fast on the GPU, I'm not going to use it because there is no room at the inn. Even if a CPU solution is slower, it won't matter since I've got the spare processing capacity due to having to wait on the GPU, or have processing elements that are not used during a frame. &lt;br /&gt;&lt;br /&gt;What I want to do is offload as much as possible to the CPU, since most games still probably are not maxing out the CPU capabilities of the PS3 or 360. It was nice to see some talks focusing on doing hybrid GPU/CPU solutions to things such as lighting or post processing, and I imagine this trend will continue.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-5043990617312807905?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/5043990617312807905/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/03/please-get-your-physics-off-my-gpu.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5043990617312807905'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5043990617312807905'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/03/please-get-your-physics-off-my-gpu.html' title='Please Get Your Physics Off My GPU'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-38640088052239249</id><published>2009-03-23T02:12:00.004-04:00</published><updated>2009-03-23T02:21:39.218-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gdc'/><title type='text'>The Problem with GDC</title><content type='html'>It always seems like I'll have a day that has 4 or 5 sessions booked at the same time that I want to see, and then on another day have some time slots with nothing that I want to see. Obviously, you can't please all of the people all of the time, but Wednesday definitely seems like the busy day.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Anyway, here's what my current schedule is looking like (I am not sure why blogger is inserting all this whitespace):&lt;br /&gt;&lt;table width=100%&gt; &lt;br /&gt;&lt;tr&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; font-weight: bold;"&gt;Session Title&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; font-weight: bold; text-align: center;"&gt;Date&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; font-weight: bold; text-align: center;"&gt;Start Time&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; font-weight: bold; text-align: center;"&gt;End Time&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;Discovering New Development Opportunities&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-25-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;9:00 AM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;10:00 AM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;Hitting 60Hz with the Unreal Engine: Inside the Tech of MORTAL KOMBAT vs DC UNIVERSE&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-25-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;10:30 AM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;11:30 AM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;Next-Gen Tech, but Last-Gen Looks?  Tips to Make your Game Look Better - That Don't Include Bloom and Motion Blur.&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-25-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;12:00 AM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;1:00 PM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;Out of Order:  Making In-Order Processors Play Nicely&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-25-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;2:30 PM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;3:30 PM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;Deferred Lighting and Post Processing on PlayStation(R)3&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-25-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;4:00 PM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;5:00 PM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;The Unique Lighting in MIRROR'S EDGE: Experiences with Illuminate Labs Lighting Tools&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-26-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;9:00 AM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;10:00 AM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;From Pipe Dream to Open World: The Terraforming of FAR CRY 2&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-26-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;1:30 PM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;2:30 PM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;Morpheme &amp; PhysX: A New Approach to Combining Character Animation and Simulation&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-26-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;4:30 PM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;5:30 PM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;The Cruise Director of AZEROTH: Directed Gameplay within WORLD OF WARCRAFT&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-26-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;3:00 PM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;4:00 PM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;Fast GPU Histogram Analysis and Scene Post-Processing&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-27-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;9:00 AM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;9:20 AM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;Mixed Resolution Rendering&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-27-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;10:30 AM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;10:50 AM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;Rendering Techniques in GEARS OF WAR 2&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-27-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;2:30 PM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;3:30 PM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;tr&gt;&lt;td style="border: 1px solid grey;"&gt;Dynamic Walking with Semi-Procedural Animation&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;03-27-2009&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;4:00 PM&lt;/td&gt; &lt;br /&gt;&lt;td style="border: 1px solid grey; text-align: center;"&gt;5:00 PM&lt;/td&gt;&lt;/tr&gt; &lt;br /&gt;&lt;/table&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-38640088052239249?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/38640088052239249/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/03/problem-with-gdc.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/38640088052239249'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/38640088052239249'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/03/problem-with-gdc.html' title='The Problem with GDC'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-4005792128003949516</id><published>2009-03-20T00:34:00.003-04:00</published><updated>2009-03-20T00:36:22.483-04:00</updated><title type='text'>Changes</title><content type='html'>A short personal update:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Last Friday was my last day at Midway Games. After seven and a half years I decided it was time for me to move on and will soon be pursuing another opportunity. It was sad to leave in some ways, as there are many people there who I have enjoyed working with, but it was the right time for a move.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'll have updates about the new opportunity soon, and hopefully some more content.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-4005792128003949516?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/4005792128003949516/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/03/changes.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/4005792128003949516'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/4005792128003949516'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/03/changes.html' title='Changes'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-5155747219467250145</id><published>2009-03-12T00:14:00.001-04:00</published><updated>2009-03-12T00:16:26.126-04:00</updated><title type='text'>Should Have Known Better</title><content type='html'>Of course, one day later, &lt;a href="http://www.penny-arcade.com/comic/2009/3/11/new-policy/"&gt;Penny Arcade&lt;/a&gt; makes the point &lt;a href="http://solid-angle.blogspot.com/2009/03/what-will-future-bring.html"&gt;I was trying to make&lt;/a&gt; but &lt;span class="Apple-style-span" style="font-style: italic;"&gt;much better&lt;/span&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-5155747219467250145?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/5155747219467250145/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/03/should-have-known-better.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5155747219467250145'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5155747219467250145'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/03/should-have-known-better.html' title='Should Have Known Better'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-4977181198785159868</id><published>2009-03-10T19:28:00.002-04:00</published><updated>2009-03-10T20:10:21.405-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='downloadable games'/><category scheme='http://www.blogger.com/atom/ns#' term='game industry'/><category scheme='http://www.blogger.com/atom/ns#' term='boxed retail'/><category scheme='http://www.blogger.com/atom/ns#' term='random thoughts'/><title type='text'>What Will the Future Bring?</title><content type='html'>&lt;div&gt;It's been pretty obvious for a while that boxed retail in games will die someday, but recent news that &lt;a href="http://blogs.barrons.com/techtraderdaily/2009/03/05/gamestop-whacked-amazon-enters-used-game-business/"&gt;Amazon will buy and sell used games&lt;/a&gt; definitely seems like one of the nails in the coffin.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I can't blame retailers such as Amazon, or BestBuy for entering the used game market, but I really wonder when the major players in the game industry will wake up and realize that the day when all games will be downloadable is sooner rather than later.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Another way to phrase the question: when will a major console manufacturer release a console that does &lt;span class="Apple-style-span" style="font-style: italic;"&gt;not&lt;/span&gt; contain any sort of removable DVD/bluray drive?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Consumers are already used to downloadable games in many other forms - cell phones, the iPhone, web games, Steam, Gametap, XBLA, PSN store, etc. There are even rumors that the &lt;a href="http://www.joystiq.com/2009/02/26/sony-doesnt-say-no-to-umd-less-psp2/"&gt;next version of the PSP will not contain a UMD drive&lt;/a&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So why not cut the cord? Imagine a console comes out in 2012 which does not contain any sort of optical drive, and instead just a large hard drive. All games, not just ones deemed small enough for a special "arcade section, are downloadable. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There would be issues. Currently US broadband adoption rates are at &lt;a href="http://bits.blogs.nytimes.com/2009/03/10/the-broadband-gap-why-is-theirs-faster/?hp"&gt;59% of households,&lt;/a&gt; well behind Japan or Europe. Even optimistic projections estimate that&lt;a href="http://www.webwire.com/ViewPressRel.asp?aId=89487"&gt; by 2012 only 77% of US households&lt;/a&gt; will have broadband, although it would be interesting to know what percentage of console gamers will have broadband. Still, it would be a definite leap of faith to exclude such a large percentage of households from buying your product.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;As far as the user experience goes, I don't think there would be many problems. Even if games took multiple hours to download, I don't see how that is any worse than getting a game from GameFly or Amazon is now. Steam has experimented with allowing users to "predownload" popular titles before their release date, and a similar model could be used for the users that just gotta-have-it the day of release. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Another advantage of this approach is some savings on cost-of-goods on the consoles themselves -- for example, the bluray drive in the PS3 is probably a big driver of the total cost of that system. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The only question is how retailers would react. They could threaten to not sell the console hardware, but a colleague of mine had an idea about that: prepaid download codes. Retailers could sell these along with the hardware. It won't be as lucrative as current boxed retail sales, but then again, by pushing used game sales so hard, the retailers are eventually going to force game publishers and console manufacturers' hands.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-4977181198785159868?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/4977181198785159868/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/03/what-will-future-bring.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/4977181198785159868'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/4977181198785159868'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/03/what-will-future-bring.html' title='What Will the Future Bring?'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-8061280018105980490</id><published>2009-01-26T21:35:00.003-05:00</published><updated>2009-01-26T21:37:16.587-05:00</updated><title type='text'>I get your fail</title><content type='html'>&lt;div&gt;&lt;a href="http://igetyourfail.blogspot.com/"&gt;I get your fail&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;via&lt;a href="http://beautifulpixels.blogspot.com/2009/01/submitting-our-fails-or-how-i-love.html"&gt; Beautiful Pixels&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-8061280018105980490?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/8061280018105980490/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/01/i-get-your-fail.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/8061280018105980490'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/8061280018105980490'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/01/i-get-your-fail.html' title='I get your fail'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-2485710658204066389</id><published>2009-01-25T12:55:00.005-05:00</published><updated>2009-01-25T13:38:46.848-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rendering'/><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='hacks'/><title type='text'>C0DE517E on understanding your choices</title><content type='html'>&lt;a href="http://c0de517e.blogspot.com/2009/01/too-young.html"&gt;Link (Too Young)&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;And when something does not look right, do not fix it, but find out first why it does not.&lt;br /&gt;&lt;br /&gt;If it's too saturated, the solution is not to add a desaturation constant, but first to check out why. Are we taking a color to an exponent? What sense that operation has? Why we do that? Is the specular too bright? Why? Is our material normalized, or does it reflect more light than the one it receives? Think.&lt;/span&gt;&lt;/blockquote&gt;The entry I'm linking today wanders around a bit but eventually lands in a good spot.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ultimately with games we're just trying to make something fun, and being visually interesting is part of that. We're not in the business of shipping perfectly accurate simulations of light, nor is that possible anyway. It may not be desirable depending on your art style -- I've always felt the ongoing arguments about "graphics not being important" in games is more about "photorealism is not the end all be all." &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Photorealism in games is a discussion for another entry some other day. Back to the linked article, if I were to sum it up I would say it is about understanding your choices. A well placed hack is often necessary, but do you understand what the hack does and (often more important) why it is needed?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In rendering we are almost always approximating some ideal, be it the behavior of reflected light on various surfaces or the behavior of skin over muscle, fat, and bone. The ideal may not be something that exists in the real world -- it may be a stylized alternate reality created only in the minds of your artists. Even these alternate realities have consistent rules, ones often based in the physical realities of the real world, or more importantly, based on how humans &lt;span class="Apple-style-span" style="font-style: italic;"&gt;perceive&lt;/span&gt; those physical realities. If you go back to old Walt Disney cartoons, there is a consistency of action and reaction in any given movie, a set of unwritten rules that the animators provided for their audience. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So as the author suggests, when presented with something that doesn't look right, a little thought into why it doesn't look right can go a long way. What ideal are you trying to approximate? Why does your approximation produce wrong results to the naked eye? How can you better improve the approximation to produce the desired results? Some times, you may find a bug that can be fixed, or a better technique for fixing the issue.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It may be the case that the answer is to add a simple constant hack. If you go through the process of determining what is going wrong, you will at least have a much better understanding of why that hack is necessary, and how it interacts with the rest of the system. This is true beyond just rendering; understanding the choices you make is key to good software engineering in general. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-2485710658204066389?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/2485710658204066389/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/01/c0de517e-on-understanding-your-choices.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2485710658204066389'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2485710658204066389'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/01/c0de517e-on-understanding-your-choices.html' title='C0DE517E on understanding your choices'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-5533012890220381034</id><published>2009-01-21T19:44:00.003-05:00</published><updated>2009-01-21T20:03:39.142-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><title type='text'>The world is... the world</title><content type='html'>(This is the third and final part of a three part series. &lt;a href="http://solid-angle.blogspot.com/2009/01/world-is-not-scenetree.html"&gt;Part 1&lt;/a&gt;. &lt;a href="http://solid-angle.blogspot.com/2009/01/characters-are-special-sort-of-tree-but.html"&gt;Part 2&lt;/a&gt;. )&lt;br /&gt;&lt;br /&gt;We've established that a generic tree implementation is not a good choice for implementing the SceneTree data structure and algorithms. This begs the question: then why call this thing a tree?&lt;br /&gt;&lt;br /&gt;A group of people I work with obsess over naming classes, systems, functions, member variables, etc. We view it as a very important part of programming -- the name should describe what it does. If I have trouble coming up with a good name for something, then maybe it isn't clear to me what it does, and if it is not clear to me, how is it going to be clear to others? &lt;br /&gt;&lt;br /&gt;The best documentation of the code is always going to be the code itself. Obviously, you want to comment your code, but if you pick clear and concise names for things, there is less you need to communicate via comments. If a user comes across a function named DrawCircle, it is a pretty good bet that function will draw a circle. If it doesn't draw a circle, or it draws a circle and formats the hard drive, that would be quite a surprise. &lt;br /&gt;&lt;br /&gt;The name SceneTree implies both an interface and an implementation that is based on trees. We've seen from the previous entries that we don't need or want either. So naming it SceneTree and delivering something else would be a case of bad naming.&lt;br /&gt;&lt;br /&gt;I don't have an alternate suggestion off the top of my head. To be honest, I'm not absolutely sure we need a separate name for transform update. The important concept is that we separate transform update from the renderer. I've worked in frameworks where this transform update was part of the World system. &lt;br /&gt;&lt;br /&gt;In summary, a generic tree design and implementation is not a good choice for world object transform update and hierarchy composition. This is due to many of the same reasons that make a scene graph a bad choice, and due to the way that gameplay code tends to access transforms and bones. Given that a generic tree is a bad choice, the name SceneTree is a bad name.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-5533012890220381034?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/5533012890220381034/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/01/world-is-world.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5533012890220381034'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/5533012890220381034'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/01/world-is-world.html' title='The world is... the world'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-777778951914448127</id><published>2009-01-13T23:29:00.009-05:00</published><updated>2009-01-14T00:29:08.776-05:00</updated><title type='text'>Characters are a special sort of tree but not a SceneTree</title><content type='html'>&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;(This is the second part of a three part series, here is the &lt;/span&gt;&lt;/span&gt;&lt;a href="http://solid-angle.blogspot.com/2009/01/world-is-not-scenetree.html"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;first part.&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt; )&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At this point we've established for the vast majority of objects in our world, a SceneTree implemented as a straightforward tree structure is not a good fit. But what about characters or other objects that are driven by skeletal animation?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At first glance, a straightforward tree structure seems like a really good fit. What is a skeleton if not if bones connected by joints, where each bone may have many children but only one parent? Each bone's transform is relative to its parent. To compose the world transforms of each bone in the skeleton, we merely traverse the tree, multiplying each bone's relative transform with the world transform&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If we have a straightforward tree data structure, then we have a base SceneTreeNode class, and anything that we need to be in the tree derives from that node. Well, our bones are a tree, so it makes sense to make a bone a node, right?&lt;/div&gt;&lt;div&gt;&lt;pre    style="  background-border: 1px dashed #999999;line-height: 14px;padding: 5px; overflow: auto; width: 100%font-family:Andale Mono, Lucida Console, Monaco, fixed, monospace;font-size:12px;color:#000000;"&gt;&lt;code&gt;class Bone : public &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;SceneTreeNode&lt;/span&gt;&lt;br /&gt;{&lt;br /&gt;};&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse; "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;I'm going to describe why the above is not the best decision on a number of fronts . It certainly works, it is simple, tons of commercial and noncommercial projects have used it, what could possibly be the problem?&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Let's think about this in terms of our gameplay code, the high level code that makes our game different than every other game out there. We want this code to be as easy to write as is possible. One part of accomplishing this is to hide any complexity the gameplay code doesn't need to know about from it. &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  ;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Gameplay code doesn't need to know about the hierarchy, and in fact, is going to be much happier if it doesn't know about it. It usually just wants to retrieve a small handful of specific bones' transforms, or attaches other objects to a bone. With Bones as full fledged citizens in the SceneTree, and a straightforward tree structure as the implementation, how would gameplay code go about this? It would need to traverse the SceneTree to find the bone it is interested in and retrieve the cached world transform. This is not a very convenient, and we'd probably add a GetBoneTransform helper to the Character node to hide these details. &lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;We've still got an implementation of GetBoneTransform that hops around a lot of different pieces of memory, causing cache misses all along the way. Maybe this is a performance bottleneck, so we decide to use some kind of efficient indexed lookup to cache the bones at the character node level. We implement GetBoneTransform in terms of this efficient lookup. Attachments can be handled similarly -- rather than use the built-in tree traversal mechanisms, most likely the code will end up caching that list of attachments somewhere else for easy access in gameplay code&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;If we're going to abstract the tree away from gameplay code, then what is the advantage to making bones full-fledged citizens in the tree? In fact, there are significant design and performance disadvantages.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;Bone hierarchies are mostly static, most of the time. I say mostly because sometimes a game may swap different body parts in, or switch level of detail on the skeletons. Practically, though, the hierarchy doesn't really change in any sort of significant fashion.  Given this knowledge, a much better implementation is to lay out our bones in a flat array, with a simple index to their parent. This index may be in a separate array depending on cache usage patterns. The array can be laid out in such a way that compositing relative transforms we get from the animation system into absolute transforms is a simple run through an array. There are tricks you can use to make this more efficient, of course, but the point is an array traversal is going to be much better than hopping all around memory calling virtual functions on a base SceneTreeNode class. The approach also lends itself much better to offloading to another CPU due to better memory locality, or easier DMA if the CPU has NUMA.&lt;br /&gt;&lt;br /&gt;Do we need a tree at all? Not that I can see -- from the previous entry we've got a list of root level objects, some of which may have an array of tightly packed bones, and an array of attachments. Their attachments may also have bones and attachments, so yes, conceptually, we've got a tree that we can traverse. But gameplay code never needs to know anything about this for the most part, as it deals with bones and attachments directly and isn't very concerned about the internals of the hierarchy.&lt;br /&gt;&lt;br /&gt;We don't need a base SceneTreeNode class that is exposed for all  -- the internals of how we update attachments and bones and objects are just that: internal. As we've shown, a straightforward tree structure doesn't really fit what we want to do very well as an implementation. From experience, you can spend a fair amount of time optimizing your updates of objects. The ability to special case, offload work to other CPUs, or any number of other abstractions makes it very easy to do so without breaking gameplay code. A generic tree structure does not provide the API we need for gameplay code, nor does it provide a good implementation at the level. &lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse; "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse; "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Tomorrow I will conclude with some thoughts on why naming matters.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="border-collapse: collapse;  font-size:13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-777778951914448127?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/777778951914448127/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/01/characters-are-special-sort-of-tree-but.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/777778951914448127'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/777778951914448127'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/01/characters-are-special-sort-of-tree-but.html' title='Characters are a special sort of tree but not a SceneTree'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-2985118745475742378</id><published>2009-01-12T19:30:00.008-05:00</published><updated>2009-01-13T11:14:39.099-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><title type='text'>The world is not a SceneTree</title><content type='html'>&lt;span class="Apple-style-span" style="border-collapse: collapse;   "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;I wanted to expand on something I thought of while writing my thoughts on the &lt;/span&gt;&lt;a href="http://solid-angle.blogspot.com/2009/01/animation-and-physics-sitting-in-tree.html" target="_blank" style="color: rgb(42, 93, 176); "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;SceneTree&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;. Before I made the argument that transform update and animation should not be tightly coupled with rendering. Now I want to touch on something else: it may be useful to think of the SceneTree as conceptually a tree, but the API and implementation should not be implemented as a global tree data structure.&lt;/span&gt;&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;I jumped the gun on posting this entry, so those of you with RSS feeds may have seen an earlier draft go by. It was pointed out to me that perhaps I was trying to tackle too much in one entry, so I decided to split it up. The win for the reader is (hopefully) less jumbled thoughts from me. The win for me is three days' worth of entries. Win-win!&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;The title of this entry was shamefully stolen from Tom Forsyth's article on scene graphs (j&lt;/span&gt;&lt;a href="http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BScene%20Graphs%20-%20just%20say%20no%5D%5D"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;ust say no&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;):&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="border-collapse: separate;  line-height: 16px; "&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;span class="Apple-style-span" style="border-collapse: separate;  line-height: 16px; "&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;The world is not a big tree - a coffee mug on a desk is not a child of the desk, or a sibling of any other mug, or a child of the house it's in or the parent of the coffee it contains or anything - it's just a mug. It sits there, bolted to nothing. Putting the entire world into a big tree is not an obvious thing to do as far as the real world is concerned.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;&lt;/blockquote&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Very succinctly put. Tom was talking about scene graphs, but the same criticism applies to the SceneTree introduced in the previous entry. Summing it up, we replace the scene graph with three systems: the SpatialGraph, the RenderQueue, and the SceneTree. The SpatialGraph deals with visibility determination. The RenderQueue is constructed from the SpatialGraph and efficiently issues render commands. The SceneTree handles animation and transform update. It is certainly a better breakdown that scene graphs, which try to do it all. Unfortunately, Tom's criticism of scene graphs still applies to the idea of a SceneTree.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;span class="Apple-style-span" style="border-collapse: separate;   "&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;   "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;I have no idea if the original author of the SceneTree concept is using a global tree structure. I just want to demonstrate that a straightforward implementation of SceneTree which used such a structure suffers from many of the same problems that a scene graph does. Given the amount of scene graphs I've seen over time, it is obvious that some people might be tempted to choose a global tree structure to implement a SceneTree.&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;  "&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"   style="border-collapse: collapse;  font-family:arial;font-size:13px;"&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;First, let's take a step back and think about what we really need to accomplish with the SceneTree:&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;ul&gt;&lt;li style="margin-left: 15px; "&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;We have a number of objects that are in the world.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li style="margin-left: 15px; "&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;These objects have transforms describing where they are in the world.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li style="margin-left: 15px; "&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;For a relatively small number of objects, their transforms may be relative to some other frame of reference (i.e. a gun may be attached to a character).&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li style="margin-left: 15px; "&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;There may be different types of attachment -- you may have situations where you want translation to be relative and rotation absolute, or vice versa. Some attachments may be accomplished through physical constraints simulated in the physics system rather than perfect constraints in the update system.&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li style="margin-left: 15px; "&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;When a transform changes due to animation, physics, or other means, we want those changes to propagate to relative objects in the most efficient manner without doing redundant work.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li style="margin-left: 15px; "&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;We want to have a large number of objects in the world with a large number of characters receiving animation.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;What you'll find is while we have the concept of a global hierarchy, in reality we have a mostly boring common case and a relatively small number of objects that actually have hierarchical relationships.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;The boring case is what Tom describes. Look around the room you are in right now, and you will find the vast majority of objects are just "there". They do not have any sort of child, parent, or sibling relationship with each other. While most will have physical constraints (gravity pushing down, whatever they are on pushing up), this is due to physical interaction, not a familial relationship. In most games, this sort of interaction is either static or simulated by a physics system.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;Having one root node for the entire world and having all these objects have be children of that root node makes no sense, or at the very least conveys no useful information. What does it mean to be a child of "the world" ? In a game, most of these things are going to be static -- they are the literal immovable object. Why clutter up our SceneTree with such things? Sure we might want to attach something to them (destructible lamp on a wall, for example), but that's the same as attaching it to the root frame of reference if the wall doesn't move. There may be some other special cases specific to a game (the wall can not move but it can disappear), but that's just it -- they are special cases that can be handled at a higher level.&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;There may be no static object at all to put in a tree. Many games treat all the static objects in an atomically streamable unit as one giant blob. There's no reason to deal with individual objects if you can just submit a precomputed command list to the renderer, or an efficiently packed polygon soup to the physics system. &lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;Most of the same arguments apply with dynamic things -- the familial relationship with "the world" is not useful. What we've really got is a lot of potential root nodes for attachment -- a list or array of conceptual trees. But if nothing is attached to an object, does it need a hierarchical update? Probably not. Now, for code simplicity, it may be simpler to just treat those as a 1-node hierarchy, but if your number of objects which have attachments is relatively low to everything else, you may be better off using a separate data structure which only holds things that have an actual hierarchy. &lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Dynamic objects provide other interesting opportunities for optimization which do not map well to a generic tree structure. For objects that are physically simulated, the code can take advantage of the physics system's knowledge of what actually moved that frame, and only update those objects. This is considerably more efficient than traversing an entire SceneTree and checking if an object needs to update. Obviously none of this precludes having a generic tree structure, it just brings into question what good it does.&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;We've established that for the grand majority of our objects, we do not need a generic tree representation to process transform updates. We still have cases that do indeed have hierarchical relationships such as characters or attachments. Tomorrow, I will discuss why a generic tree structure is not a good choice for characters and bones.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-2985118745475742378?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/2985118745475742378/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/01/world-is-not-scenetree.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2985118745475742378'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2985118745475742378'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/01/world-is-not-scenetree.html' title='The world is not a SceneTree'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-1970442138381085558</id><published>2009-01-09T12:46:00.007-05:00</published><updated>2009-01-09T13:04:51.089-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='multicore'/><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><title type='text'>Joe Duffy on CAS operations and performance</title><content type='html'>&lt;span style="font-size:100%;"&gt;Interesting results in a &lt;a href="http://www.bluebytesoftware.com/blog/PermaLink,guid,08770cb2-a8fa-467d-9e38-93e2abadfadc.aspx"&gt;blog entry&lt;/a&gt; from Joe Duffy about the scalability of compare-and-swap instructions across x86 architectures. I found this via the &lt;a href="https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list"&gt;gdalgorithms list&lt;/a&gt;. The money graph:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;blockquote  style="font-style: italic;font-family:georgia;"&gt;&lt;span style="font-size:100%;"&gt;The most common occurrence of a CAS is upon lock entrance and exit.  Although a lock can be built with a single CAS operation, CLR monitors use two (one for Enter and another for Exit).  Lock-free algorithms often use CAS in place of locks, but due to memory reordering such algorithms often need explicit fences that are typically encoded as CAS instructions.  Although locks are evil, most good developers know to keep lock hold times small.  As a result, one of the nastiest impediments to performance and scaling has nothing to do with locks at all; it has to do with the number, frequency, and locality of CAS operations.&lt;/span&gt;&lt;/blockquote&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;One thing that has been talked about on the recent gdalgorithms thread is understanding that your only options are not between lock-free and heavy operating system lock primitives. Some operating systems provide lighter weight synchronization primitives (critical sections on Win32 with a spincount come to mind). In the past on some consoles I've come up with faster mutex implementations than what the operating system provides, although this is not something I'd recommend to the weak of heart -- it is unbelievably easy to get this stuff wrong, even for very experienced developers.&lt;br /&gt;&lt;br /&gt;I think the lesson from this entry is to always measure. Maybe for a specific application a fancy lock-free algorithm may be faster, but maybe not -- as we see from above it can depend on a lot of factors. The other thing to avoid is premature optimization -- get your algorithm correct and without races first using locks, and then evaluate whether a lock-free algorithm is appropriate and faster.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-1970442138381085558?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/1970442138381085558/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/01/joe-duffy-on-cas-operations-and.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1970442138381085558'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1970442138381085558'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/01/joe-duffy-on-cas-operations-and.html' title='Joe Duffy on CAS operations and performance'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-8744222185202355925</id><published>2009-01-04T15:02:00.004-05:00</published><updated>2009-01-04T15:29:23.657-05:00</updated><title type='text'>Animation and physics sitting in a tree</title><content type='html'>Over at &lt;a href="http://diaryofagraphicsprogrammer.blogspot.com/2009/01/handling-scene-geometry.html"&gt;Diary of a Graphics Programmer&lt;/a&gt;, Wolfgang Engel points out a &lt;a href="http://www.gamedev.net/community/forums/topic.asp?topic_id=515082&amp;amp;whichpage=1&amp;amp;#3351070"&gt;gamedev.net post&lt;/a&gt; introducing some nomenclature for the systems involved with rendering scene geometry. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Instead of a scene graph approach (&lt;a href="http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BScene%20Graphs%20-%20just%20say%20no%5D%5D"&gt;just say no&lt;/a&gt;), they propose the SpatialGraph (for visibility determination), the SceneTree (for animation and transform update), and the RenderQueue (filled from the SpatialGraph, just draws stuff fast). It is a division that makes much more sense than a scene graph, which tries to handle all of the above. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;One of my biggest dislikes about scene graphs is how they misled us all to think that animation and transform update belongs in the renderer in an engine design. The renderer just deals with the results of these operations -- it doesn't much care how the transforms were created, it just needs them. Coupling these two things doesn't really make much sense.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If anything, animation/transform update is much more tightly coupled with physics and collision. Conceptually animation is saying "here's where I would like to go" and the physics system says "here's where you can go." It is even more intertwined than that, because the physics system has complete control over things like rigid bodies and ragdolls -- saying both where they would like to go and where they can go. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If you are doing any sort of ragdoll blending between animation and ragdoll simulation, you have a feedback loop. The animation system figures out its desired pose, and the ragdoll figures out its desired pose. There is a blending step but its not always obvious where that should go. Traditionally the animation system is responsible for blending transforms, but there's an argument that the physics simulation should do it because it knows the constraints and limitations on where the bones can be physically placed. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I haven't gotten into other interesting issues such as vehicles, which can be a blend of physical simulation (the car moving) and animation (the door being opened) similar to ragdolls, and when you add attachments to the mix (gun turret), keeping the animation and physics systems in sync can be a challenge. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm starting to think that animation and physics are two sides of the same coin. I'm calling the combined thing "simulation." Obviously different games are going to have different complexity of physics, and I'm not sure coupling these two things so tightly is a one size fits all thing. What I do know is that coupling animation/transform update with the renderer is almost never the right thing to do, even though there are still a large number of scene graph based rendering libraries available. &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-8744222185202355925?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/8744222185202355925/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/01/animation-and-physics-sitting-in-tree.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/8744222185202355925'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/8744222185202355925'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/01/animation-and-physics-sitting-in-tree.html' title='Animation and physics sitting in a tree'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-1315040873886249949</id><published>2009-01-03T13:47:00.004-05:00</published><updated>2009-01-03T14:26:58.326-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='multicore'/><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='middleware'/><title type='text'>Good Middleware revisited</title><content type='html'>&lt;span class="Apple-style-span"  style="font-size:small;"&gt;I was having a conversation with a good friend about the &lt;/span&gt;&lt;a href="http://solid-angle.blogspot.com/2008/12/game-architect-on-good-middleware.html"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Good Middleware&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt; entry, and he made a very interesting point. He pointed out that in the future, good middleware is going to allow you to hook their multicore job scheduling just like you can hook memory allocation.&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Let's start out with a quick review of why we hook memory allocation, as it will help us understand why we're going to want to hook job scheduling.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Consoles are fixed memory environments.&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt; It is critical to balance overall memory utilization for your title. If you can provide five more megabytes for game data over the built-in system allocator, then that allows you to have more stuff in your game, which generally can lead to a better game.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Different titles have different memory needs.&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt; A 2D side-scroller is going to have different memory requirements than a world game. Some types of games lend themselves to pre-allocation strategies, where you know ahead of time how many assets of what type you are going to have. Every game I've worked on has spent time tuning the memory allocator to squeeze out as much memory as possible, and what works in one title may not be applicable to another. &lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;The system allocator may perform badly, particularly when accessed from multiple threads&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;. Often, console operating systems will have poor allocator implementations that have a lot of overhead. Sometimes they will use allocators that are designed for a much wider range of application than games, or designed for systems with virtual memory. Others explicitly punt on providing anything but systems designed for large, bulk allocations and expect you to handle smaller allocations on your own. Finally, no matter how good an allocator may be, it does not have knowledge of your application's memory needs or usage patterns, which are things you can use to optimize your allocator&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;The system allocator may not provide good leak detection or heap debugging tools. It may provide none at all.&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt; If you're doing a console game, I know of no standalone leak detection software such as BoundsChecker that is available. Often you have to provide these facilities yourself.&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt; CPU core usage can be thought of as a system resource that is arbitrated just like memory. Most middleware that does use jobs spread across multiple cores I've seen so far usually has its own internal scheduler running on top of a couple hardware threads. In the early days of multicore middleware, you couldn't even set the CPU affinity of these vendor threads without recompiling the library, something which is key on the 360. Increasingly, I think games are going to want to own their job scheduling and not just leave it up to the operating system to arbitrate CPU scheduling at the thread level. &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Consoles have a fixed number of cores.&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt; Goes without saying but worth touching on. You have a fixed amount of CPU processing you can do in a 30hz/60hz frame. Ideally you want to maximize the amount of CPU time spent on real game work and minimize the amount spent on bookkeeping/scheduling. &lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;You know more than the operating system does about your application.&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt; One approach to scheduling jobs would be to just create a thread per job and let them go. You could do that, but you would then watch your performance tank. Forget the fact that creating/destroying operating system threads is not a cheap task, and focus on the cost of context switches. Thread context switches are very expensive and with CPU architectures adding more and more registers, they are not going to get any cheaper. A lot of game work are actually small tasks more amenable to an architecture where a thread pool sits and just executes these jobs one after another. In this architecture, you do not have any thread context switches between jobs or while jobs are running, which means you are spending more time doing real game work and less time on scheduling.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;It is difficult to balance between competing job schedulers.&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt; Imagine a world with 16 to 32 cores, and both your middleware and your game need to schedule numerous jobs to a thread pool. It is going to be very difficult to avoid stalls or idle time due to the fact that you have two job schedulers in your system -- one from the middleware and one from your game. Either you are "doubling up" thread allocation on cores and paying the cost of context switches from one thread pool to the next, or you allocate each thread pool to a different set of cores and hope that the work balances out. Unfortunately, I don't think the latter is going to scale very well and you will end up with a lot of unused CPU capacity.&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;The third reason is the clincher as to why good middleware in the future will allow you to replace their job scheduler with your own. It's the only solution I can think of that will allow you to balance workloads across many, many cores. It is possible in the future that operating systems will provde good enough thread pool/microjob APIs that we won't have to be writing our own, but even in that case, you'd still want the hooks for debugging or profiling purposes. &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:13px;"&gt;I haven't yet seen any middleware that really allows you to hook this stuff easily. Have you?&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-1315040873886249949?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/1315040873886249949/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/01/good-middleware-revisited.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1315040873886249949'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1315040873886249949'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/01/good-middleware-revisited.html' title='Good Middleware revisited'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-4806139494730030890</id><published>2009-01-02T18:21:00.004-05:00</published><updated>2009-01-02T18:33:26.161-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='particle systems'/><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='optimization'/><title type='text'>RTCD on Particle Rendering</title><content type='html'>&lt;a href="http://realtimecollisiondetection.net/blog/?p=91"&gt;Optimizing the rendering of a particle system&lt;/a&gt;.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Real Time Collision Detection has an entry on optimizing particle rendering which is a good survey of many techniques you can use to make your particles fly. Don't forget to read the comments, there are some good suggestions in there, too!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Over the last five+ years or so people have been a lot more open about this kind of stuff, and I like it. There's just too much you have to deal with these days to write a good game to have to figure it out on your own. I don't know about other people's experiences but earlier in my career there seemed to be much more of a secretive culture in game development than other types of software.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Whether it is entries like the above or things like Insomniac's &lt;a href="http://nocturnal.insomniacgames.com/index.php/Main_Page"&gt;Nocturnal&lt;/a&gt; initiative, people are much more open about ideas these days. Ultimately, we all have to execute, and doing that well is hard enough on its own. Being more open is healthy and ultimately will lead to better games across the board.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-4806139494730030890?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/4806139494730030890/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2009/01/rtcd-on-particle-rendering.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/4806139494730030890'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/4806139494730030890'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2009/01/rtcd-on-particle-rendering.html' title='RTCD on Particle Rendering'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-6094345971468330236</id><published>2008-12-30T23:28:00.002-05:00</published><updated>2008-12-30T23:50:17.567-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xbox 360'/><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='XNA Studio'/><title type='text'>XNA Studio 3.0: First impressions</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: small;"&gt;I've been doing some graphics prototyping in my spare time lately, and I decided to see if there was a better way to go about it.&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;I've used RenderMonkey in the past, and while it certainly has its uses, ultimately it left me dissapointed. For a straightforward shader it's fine, but when you start getting into more complicated techniques it starts to break down. After clicking on a billion buttons to get my render targets and passes set up the way I needed I really wished I could just write some code. The lack of any ability to compute anything on the CPU side is what really frustrated me. What I ended up doing was computing a lot of things shader-side that in a real application would be done on the CPU, and it needlessly complicated the shaders.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;Another avenue I've pursued is using the DirectX sample framework or an OpenGL sample, and working from there. Even in these simplified environments I find you end up doing a lot more bookkeeping then actual code. Additionally, OpenGL seems very unstable on my laptop -- even vanilla samples are crashing in the Nvidia DLLs. &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;The last couple days I've been playing around with Microsoft's XNA Studio 3.0 as a graphics prototyping tool, and my impression so far is very favorable. The API is pretty straightforward, and for the most part the abstractions seem in the right place. Porting the current thing I'm working on from C++ to C# took no time at all, and so far I've spent much more time writing meaningful code rather than working on scaffolding. &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;The GameComponent architecture they have is interesting -- for example, I found myself needing an orbit camera. Rather than write one myself, I just grabbed a component &lt;a href="http://www.dhpoware.com/demos/xnaCamera.html"&gt;someone else has written&lt;/a&gt;. It was one of those few times where the code just dropped in. The only drawback is they haven't implemented taking input from the 360 controller, but that's easy enough for me to write and a lot less involved than doing the whole thing.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;I was also surprised how easy it was to get my little project up and running on my Xbox 360. I didn't have to do any code changes, and it all Just Works. The debugging is solid but experienced Xbox 360 developers will miss all the nice tools you get with the real SDK. I'd be really nice if Microsoft released a lightweight version of PIX for XNA that worked with the 360, but I guess you can't have everything.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;There have been some minor annoyances. Some of the C# syntax for dealing with vertex arrays can be cumbersome -- new'ing a Vector3 never feels right to me. You also lose access to some hardware features. For example, for some reason it thinks I can't create a floating point depth buffer on my laptop, when I'm very certain the GPU I have can handle that. &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;On the 360 side, they simplify a lot of the hardware details, but this has limitations. I can't find any way to resolve the depth buffer on the 360 to a texture in XNA. While these limitations are understandable given the intended audience of XNA, it is somewhat annoying.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;All in all, I think it is a pretty good framework so far. I don't think you are going to max out the hardware with it, but for a large category of games it will work really well. &lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-6094345971468330236?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/6094345971468330236/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2008/12/xna-studio-30-first-impressions.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/6094345971468330236'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/6094345971468330236'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2008/12/xna-studio-30-first-impressions.html' title='XNA Studio 3.0: First impressions'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-1996472362511847265</id><published>2008-12-29T11:56:00.006-05:00</published><updated>2008-12-29T13:16:32.092-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='gdc'/><title type='text'>Interesting GDC Talk #2: Hitting 60hz with the Unreal Engine</title><content type='html'>&lt;span class="Apple-style-span" style="  ;"&gt;&lt;a href="https://www.cmpevents.com/GD09/a.asp?option=C&amp;amp;V=11&amp;amp;SessID=8529"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;Hitting 60Hz with the Unreal Engine: Inside the Tech of MORTAL KOMBAT vs DC UNIVERSE&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" ;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" ;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;This is another talk I will be attending. I work at the same studio as Jon and have had more than a few conversations with him about this very topic, and he has given internal presentations on the subject. I don't want to give too much about the talk away, but it is a very interesting case study in taking a graphics pipeline completely designed for a 30hz game, and modifying it to meet a game's performance goals without sacrificing a whole bunch of functionality. Just one example: when we originally got our hands on Unreal, the sum of all post processing took nearly half of a 60hz frame. While Epic has optimized this over time, this gives an idea of some of the challenges on the GPU. It is easy enough to remove functionality until it runs fast enough, but the real trick is to preserve quite a bit of that functionality and still get it under time.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" ;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" ;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;The section on porting the Unreal particle system to the SPU is definitely worth paying attention to - this was a collaborative effort done by a few tech group guys at Chicago. It underscores the difficulty of dealing with legacy code bases that are not designed for NUMA architectures, and the ingenuity and hard work required to overcome that hurdle. This talk should definitely be on your list.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"   style=" ;font-family:Verdana;font-size:13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-1996472362511847265?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/1996472362511847265/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2008/12/interesting-gdc-talk-2-hitting-60hz.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1996472362511847265'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/1996472362511847265'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2008/12/interesting-gdc-talk-2-hitting-60hz.html' title='Interesting GDC Talk #2: Hitting 60hz with the Unreal Engine'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-7850676649237629669</id><published>2008-12-28T14:09:00.004-05:00</published><updated>2008-12-28T14:55:49.803-05:00</updated><title type='text'>Interesting GDC talk #1: Light Pre-pass Renderer</title><content type='html'>&lt;a href="https://www.cmpevents.com/GD09/a.asp?option=C&amp;amp;V=11&amp;amp;SessID=8544"&gt;The Light Pre-Pass Renderer: Renderer Design for Multiple Lights&lt;/a&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;GDC is fast approaching and I'll be attending this year. One of the talks I'm looking forward to is Wolfgang Engel's talk on a novel renderer design. I've been following his &lt;a href="http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html"&gt;various&lt;/a&gt; &lt;a href="http://diaryofagraphicsprogrammer.blogspot.com/2008/07/light-pre-pass-first-blood.html"&gt;entries&lt;/a&gt; on this &lt;a href="http://diaryofagraphicsprogrammer.blogspot.com/2008/09/light-pre-pass-more-blood.html"&gt;topic&lt;/a&gt; for the last year. Additionally, if you have access to the PS3 dev site, the Uncharted guys gave a talk on a very similar renderer.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The technique is a hybrid between forward rendering techniques and deferred rendering techniques. In a forward renderer you are evaluating material and lighting properties simulatenously, and in a deferred renderer you evaluate material properties into a G-Buffer and then evalute each light sampling that buffer.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In the light pre-pass renderer, it is broken down further -evaluate the bare minimum material properties needed for the lighting equation, then evaluate the lights into an accumulation buffer, then apply the lighting in a final pass of geometry. While you still have one more pass than in a typical deferred setup, the light pre-pass renderer does give you a lot more flexibility for materials. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I like this approach over forward rendering or deferred rendering because I think it fits the box today's consoles lay out very nicely. Deferred rendering requires a huge number of render targets. Forward rendering requires either huge complicated shaders or more rendering passes, and neither of these things scale very well as you add more lights or more material types. The light pre-pass approach gives you the linear scalability of deferred rendering while allowing the material variety of forward rendering. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When it comes down to it, what is most interesting is the array of choices you have when implementing the technique. The technique can work well with statically computed lighting such as  lightmaps if you choose to go that way. You have different choices in how to approximate specular and accumulate it in either a separate or combined RT. You have a variety of choices of how far you want to implement MSAA. Particularly when you have a cross-platform renderer, having this sort of flexibility without massive rearchitecture on each platform can make things a lot easier. I plan to explore this technique further.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-7850676649237629669?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/7850676649237629669/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2008/12/interesting-gdc-talk-1-light-pre-pass.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/7850676649237629669'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/7850676649237629669'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2008/12/interesting-gdc-talk-1-light-pre-pass.html' title='Interesting GDC talk #1: Light Pre-pass Renderer'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-6886683232333803469</id><published>2008-12-26T16:06:00.006-05:00</published><updated>2008-12-26T16:46:54.791-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='multicore'/><category scheme='http://www.blogger.com/atom/ns#' term='ps3'/><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='memory wall'/><title type='text'>Magical missteps and the memory wall</title><content type='html'>Two links today:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.magicalwasteland.com/2008/12/serial_missteps_on_the_paralle.htm"&gt;Magical Wasteland: Serial Missteps on the Parallel Road&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.spectrum.ieee.org/nov08/6912"&gt;IEEE Spectrum: Multicore is Bad News for Supercomputers&lt;/a&gt; (via &lt;a href="http://arstechnica.com/news.ars/post/20081207-analysis-more-than-16-cores-may-well-be-pointless.html"&gt;ArsTechnica&lt;/a&gt;)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I've spent a lot of time thinking about where multicore hardware is going and how that is going to affect software architectures. My current thinking is that designing software for UMA (uniform memory access) is not going to scale. This is hardly a revolutionary statement.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's start with the IEEE Spectrum article first. The article details a study by Sandia National Labs showing that performance of general-purpose multiple cores really starts to deteriorate for certain applications after 8-16 cores due to the memory wall. The "memory wall" is the fact that while our ability to cram transistors on a die keeps ever-increasing, memory bandwidth grows at a piddling rate. Eventually, you've got a large number of cores starving for data.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, an important piece of information is these results only apply to certain types of workloads. For instance, simulating a nuclear bomb does not hit the memory wall because the data and calculations can be partitioned to a small amount of memory corresponding to a small spatial area. Thus you don't have a lot of memory bandwidth used because all of your data is in-cache and you are happy.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What the study says is for certain types of applications which involve traversing massive amounts of data and can not be sufficiently partitioned, your performance goes out the window. It then goes on to talk about fancy hardware (stacked memory) that may fix the problem. This hardware is sufficiently far in the future that I'm not going to bank on it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The challenge for game programmers with future multicore designs will be to architect our systems and engines to use well-partitioned and locally coherent memory access. This will become increasingly more important as the number of cores increases. Right now, with systems such as the 360 having only 6 cores, or consumer PCs only having 4 cores, memory bottlenecks, while troublesome, are not at the severity they will be with 16, 32, or 64 cores. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Which brings me back to why designing for UMA is troublesome. UMA makes the promise that you can access any memory you need at any time -- which is true, but it doesn't make any promises about &lt;span class="Apple-style-span" style="font-style: italic;"&gt;how fast&lt;/span&gt; that access will be. Cache logic is not going to be sufficient to hide all latencies of poorly designed algorithms and systems. Obviously, this has been true for many years even with single core designs, but the point is the problem is about to get many times worse. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If you design algorithms such that they can be run on processors with a small local memory, those will run well on UMA designs . The reverse is not true. Local memory designs force you to make good decisions about memory accesses, and also prepare you for architectures which are not UMA, which for all we know right now the next crop of consoles may very well be.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So what does all of this have to do with the Magical Wasteland article on the PS3? Due to NDAs, I won't comment on the article beyond saying I think its characterization of the PS3 is fair. But I will say this: it may very well be the case that five or ten years from now we realize that the Cell architecture was ahead of its time. Maybe not in the specific implementation, but just keeping the idea alive that, hey, maybe touching every cache line in the system in a garbage collection algorithm isn't the best of ideas. Because sooner or later, we'll all be programming with a local memory model, even if we don't actually have local memory.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-6886683232333803469?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/6886683232333803469/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2008/12/magical-missteps-and-memory-wall.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/6886683232333803469'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/6886683232333803469'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2008/12/magical-missteps-and-memory-wall.html' title='Magical missteps and the memory wall'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-3483758361416099097</id><published>2008-12-25T16:59:00.004-05:00</published><updated>2008-12-25T17:29:09.976-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='middleware'/><title type='text'>Game Architect on Good Middleware</title><content type='html'>&lt;div&gt;&lt;a href="http://gamearchitect.net/2008/09/19/good-middleware/"&gt;Good Middleware&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;An excellent article on what makes good middleware was posted on Game Architect. Kyle, the author, hits the major points -- anyone who's used &lt;span class="Apple-style-span" style="font-style: italic;"&gt;bad&lt;/span&gt; middleware has certainly run into these issues. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I would make a distinction between middleware and engines. Something like Unreal is an engine. It isn't something you integrate into your application, it forms the framework of your application. Because it is a framework, it has made some architectural decisions for you: memory, I/O, the data pipeline, etc. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Due to an engine enforcing architecture, the questions used to evaluate it are different (and probably the subject of another blog entry). The advantage of a complete engine is tight integration and a lot less work to get up and running, particularly on the tool side of things. The disadvantage is the engine may have made architectural decisions that don't fit your game. Like most engineering decisions, it is a tradeoff everyone has to decide for themselves.&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-3483758361416099097?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/3483758361416099097/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2008/12/game-architect-on-good-middleware.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/3483758361416099097'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/3483758361416099097'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2008/12/game-architect-on-good-middleware.html' title='Game Architect on Good Middleware'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-263327529581652713.post-2541860898716396190</id><published>2008-12-25T16:42:00.000-05:00</published><updated>2008-12-25T16:50:29.061-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='game programming'/><category scheme='http://www.blogger.com/atom/ns#' term='welcome'/><title type='text'>Welcome</title><content type='html'>Welcome to Solid Angle, a blog about game programming. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I've been programming in the game industry for about nine years, and I still love it. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It's been a while since I ran a blog. My last one was less focused, and had topics ranging from politics to programming to how to shuffle poker chips (one of my more popular entries). This one will be strictly about game development.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Initially, my entries probably will just be links and comments to other game development blogs. I've found quite a lot of very interesting blogs on game development and have always wanted a place that aggregated them. After looking around for one, I decided to make one (I'm sure there are others, I just haven't found them). I may branch out into more original content later as I have things to write about.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Anyway I hope you stick around.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/263327529581652713-2541860898716396190?l=solid-angle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://solid-angle.blogspot.com/feeds/2541860898716396190/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://solid-angle.blogspot.com/2008/12/welcome.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2541860898716396190'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/263327529581652713/posts/default/2541860898716396190'/><link rel='alternate' type='text/html' href='http://solid-angle.blogspot.com/2008/12/welcome.html' title='Welcome'/><author><name>Vince</name><uri>http://www.blogger.com/profile/15315525780089168677</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
