Saturday, August 15, 2015

Tips for Navigating Large Game Code Bases

(Author's Note: This article is about navigating large game code bases, which can go up to and past the 2 million lines of code mark and are usually mostly C++. I'm sure these tips are applicable to other industries with comparable size projects, but you write what you know)

Freshly hired, you sit down at your desk at your hard won job at Big Game Studio. You're making games! Big ones! Excited to get started, you've gone through the orientation, pulled the source tree to your machine, and fire up the IDE. You wait for it to load... and wait... and wait... just how large is this project? Your producer has already given you a simple task for their editor -- make the FooBaz window remember its position and size between runs of the editor. You have no idea where to start. A sense of panic comes over you as you realize little in your previous experience has prepared you for dealing with a code base this big.

It is not uncommon these days to land a game programming gig and have to deal with a large legacy code base. It could be all in-house code, it could be a licensed engine, or it could incorporate a lot of open source software. Unless you are working on a small game, more than likely the first thing you are going to have to learn is how to navigate this beast.



Don't Panic

The first thing to realize is all of this code was written by people like you. They are smart, they are experienced, but hey you are pretty smart too -- if you weren't, you wouldn't have the gig. It may have taken years or even over a decade to build this code base up. At the end of the day every single person started out as a beginner in this code base, even the person who wrote the first lines of code for it. There's no magic, it's just programming like you've done in previous smaller projects but on an industrial scale.

Building

The first thing to do is figure out how to build the thing. Every large code base I've dealt with has a certain amount of lore for setting up the build environment. If you're really lucky this setup is automated, but I've never encountered this because new hires are rare enough it is usually not worth automating.

Hopefully it is documented somewhere. That document is likely out of date. A lot of places have the new programmer update the document as they go through the setup. If this document doesn't exist, volunteer to write it as you go through the process of setting things up -- this will impress your lead.

The build process is something that is often very home-brewed and custom, so it's difficult to have many general tips. But here are a few:

Make sure you have the correct versions of everything installed. Depending on the organization, you may have had everything set up for you by IT, but in some companies programmers prefer to just have IT install the basics and set up their tools and environment themselves. Either way, you may have just happened to start a week after they updated their compiler or an SDK-- everyone got the email but you weren't here to get it. Ask someone (your mentor, your next desk neighbor, etc) if anything in the build environment has changed recently that may not be documented.

If it's not building, it's almost always something you missed setting up your environment. It's unlikely the checked in build is broken -- if it were you'd probably see some Senior Engineers stomping around looking like T-Rex in hunt of fresh meat. Here are just a handful of things that may have been omitted from any setup guide - environment variables, compiler updates, SDK updates, if on windows some redistributable installer to set up some .NET component, etc.

At all points it is ok to ask someone about build setup problems -- no one expects someone off the street to know all this lore. People do appreciate some effort to figure things out on your own, and in general it is a good idea to do so, because you can ask specific questions ("it seems like I don't have OpenEXR installed, where do I get that?") rather than general ones ("it's broken?"). Specific questions get you an answer faster and take up less time for the person answering. A mistake new programmers often make is transitioning from asking about the build setup (which is expected) to asking about every little thing (which is not).

Finding Things on Your Own

You've got it building, now you need to find this FooBaz window your task is about.

You are going to be tempted to ask where to find every little thing in the code. After all, when the senior engineers aren't stomping around like T-Rex hunting down build breaks, they seem to have an encyclopedic knowledge of the code base. They could answer something in five seconds that might take you five minutes to find.

Resist this temptation. Senior Engineer has stuff to do, and helping people find which file a function is in is not one of them. What they should be doing is educating you on how to find things yourself (hence this entry or an article my co-founder Steve Ellmore wrote on learning how to learn).  With enough practice, you'll be the one with the seemingly encyclopedic knowledge of the code base.

Because that's the real trick -- Senior Engineers do not have the entire code base in their head. That's impossible beyond a certain amount of code. What they do know is how to look for things.

I'm not suggesting you never ask any questions or seek help, but learning how to investigate things on your own is a valuable skill that will make you a better programmer. If nothing else, such investigation will lead you to ask better and more specific questions which give you quicker answers.

Your Weapons

The project may have some documentation describing high level structure of the code. Again, expect it to be out of date. Still out of date documentation can often be better than none when wrapping your head around the code base -- large code bases are like big ships, they steer slowly and some of the information is bound to still be relevant.

Top-down searching will only get you so far, though. For example, you may think to search for the main function (WinMain on Windows). You can do this, but realize that this (by definition) is going to lead you to *all* the code. Startup and shutdown code can often be messy, particularly on cross-platform projects. The main game loop may or may not be clean and easy to understand. In modern code bases which spread work across multiple cores, there may not be a main game loop at all.

Find in Files

Your number one weapon for finding things is going to be (in Visual Studio) Find in Files. For the Unix inclined, grep. People love to use fancy tools like Understand, and IDEs have all sorts of built in source browsing functionality, but at the end of the day Find in Files or its equivalent is going to help you the most. Figuring out what to search for is the tough part.

Often I prefer to search for things bottom up, because while the high level details of the code can vary from code base to code base, low level things like platform APIs and system calls do not. For example, at the end of the day, everyone's renderer is using some graphics API. Searching for specific D3D, or OpenGL calls will inevitably give you a starting point for understanding the renderer. You may need to dive down into some open source graphics wrapper or licensed engine code, but you can always work your way back up from that starting point to get the bigger picture.

Other examples are searching for common terms in a specific area. To find the animation system, I'd search for things like "animation", "anim", "bone", "skeleton", "skin", "skinning", etc. For physics "rigidbody", "rigid", "force", "mass", etc. 

You want to avoid search terms that are too generic -- "matrix" or "transform" would likely get you hits just about everywhere. 

Editor and UI code

One trick for searching any kind of editor or even in game UI code is to search on the terms that show up to the user. Because the editor has non-programmer users, documentation on it can often be in much better shape than available to programmers, particularly on licensed engines. Learning how to navigate and use the editor and game itself can help you understand the underlying code.

Going back to our FooBaz example, I'd fire up the editor and find the FooBaz window. I'd then look for some UI string - a menu name or menu item. I'd then go back to the code and find in files on that menu item name. It may be in a string table, but those are usually key-value pairs so you search again on the key. Now you've found the FooBaz window's code.

Player-visible names may not be the names in the code

One thing to be aware of is what some feature or thing is called in the game is not always what it is called in the code. Early on in feature development it is quite common for an internal-only name to be used when writing the initial code, just because the official name hasn't been thought of yet. For example, in BioShock Infinite what eventually became known as Skylines were originally called something else. Very little of the code referenced the word "skyline".

So if you can't find something after searching for a while, don't get frustrated -- this may just be a piece of lore you do not have yet, and a good question to ask about.

Code Navigation Tools

In day to day C++, a useful code navigation tool is being able to flip between definitions and declarations quickly. I primarily use Visual Studio so I'm familiar with the tools there. IntelliSense is "ok" when it works and when it doesn't spontaneously decide to hang your editor, but I find Visual Assist to be much more reliable and fast with larger code bases.

Tools that generate class inheritance diagrams and other such navigation aides may be useful to you, particularly when just starting out in a code base, but I've never really found them that helpful. Same with doxygen generated docs -- they are always out of date and you find yourself cross referencing the source anyway.

Source Control History

When trying to understand a specific part of the code, a great tool is source control history. Any source control contains a detailed history of the code base. You can find out what changes were made when and by who. You can trace back when a feature was added and see all the related files that need to be changed -- this can be very helpful when making a similar change to make sure you catch all the cases you need to modify. Git has the blame tool. Perforce in particular has a great feature called "Time-Lapse View" which allows you to interactively go back in time in any given source file.

Assume It's Been Done Before

One common mistake I see new programmers to a large code base make is to reinvent the wheel. Tasked with implementing a feature, they need some utility routine, let's say a line-plane intersection. Excited to get their feature working, the blaze ahead and write a line-plane intersection function and then implement their feature. When they go to check it in, the code reviewer says "why didn't you use the existing line-plane intersection function in MathFoo.h?" and they have to go back and update the code. In the process they've wasted some time reinventing the wheel. Even worse, sometimes the code review does not catch the duplication, and now the code base is worse for it.

With a big enough code base, you need to start from a position of assuming someone has already tackled this problem or a problem similar to it before. It's on you to search the code and try and find it before going off and writing a bunch of duplicated functionality.

Returning to our FooBaz window example, I would open other windows in the editor and see if they remember their position and size between runs. If they do, I'd use the UI string trick to find their code, and read the code to figure out how they save and restore this position. I might step through the code in the debugger if I can't figure out how it's done on inspection. 

Coding Style

Everyone has a house coding style, and the rule of thumb when dealing with existing code is to stick to the style that's already there. Don't be that person injecting your personal style and creating style mismatches. 

Coding style guidelines vary in formality. You may have something as comprehensive as the Google C++ Style Guide, or it may be a short page with some general guidelines. Whatever there is, read it, understand it, and embrace it. You can use your personal style at home, but on the job stick to what's there.

The Map is not the Territory

Putting aside compiler bugs or platform bugs, what the source actually does is the final word. Documents may be out of date. Comments may not match what the code actually does. Someone you ask about the code may have a misunderstanding of what it actually does.

It's an obvious concept, but the answers to "What? How? Where? When?" can always be found in the source code itself. The only thing that can't always be sussed out from the code alone is "Why?" - there you are going to have to track down Senior Engineer and ask them.

Practice Makes Perfect

One advantage new programmers have today is you can find examples of larger game code bases online. Unreal Engine 4 has its entire source code available to just about anyone, and if you learn to navigate that beast, you can navigate just about anything. Id has had a history of open sourcing many of their older games (don't forget to navigate the tools, too). There are also open source game engines you can find and dive into.

With enough practice of navigating large code bases, you can parachute into any project and be productive right away.

No comments:

Post a Comment