The Next Generation of Graphics APIs
How different things are in just a year’s time. PC gaming is undergoing a revolution that will completely change the landscape, especially between consoles and PCs. A year ago, we had just two graphics APIs available to us (Direct3D 11 and OpenGL). Now, not only is Mantle a reality, but DirectX 12 is right around the corner. Both share core concepts that radically rethink the purpose and structure of a graphics API, opening the door to whole new worlds of performance as modern hardware design has far surpassed the aging, bloated APIs of the past. Both, in our opinion, are necessary for the evolution of PC gaming.
What exactly is wrong with our current APIs? There are two main problems. The first is that the today’s APIs are fragile. They might be great for a small demo, but trying to ship a complex modern game on one is difficult. End users mostly expect to have to download the latest drivers for their game to run, a clear sign that APIs are not achieving their goal of abstraction. The second problem is that APIs and drivers are generally inefficient and very poor at threading. In D3D11’s case, threading was attempted but failed to achieve decent scalability, where OpenGL hasn’t even proposed true threading capabilities. The trend of eight-core and beyond consumer CPUs is clear. If we are to utilize our consumers hardware, then true threading is a must.
We get a great number of questions asking why we need a new API, and why the best option isn’t to push for an open industry standard that could fix these problems once and for all. We flip this question on its head: Why do we need an API at all? Really, if GPUs are so programmable, if they are truly just as general as the CPU, then why do we need an API? We don’t use an API to program a CPU, after all. There is never a need to download a new CPU driver so that the latest spreadsheet application can work, nor would anyone ever tolerate it. If our software doesn’t work, we don’t blame Intel or AMD. We fix it ourselves and move on.
Sadly, business realities dictate that we cannot get rid of graphics APIs entirely. Despite the screaming of project managers, this is largely a business model problem rather than a technical one. The hardware innovation cycles on GPUs are much shorter than those of CPUs (though radical GPU innovation has been slowing down over the last few years), and thus there are quite often fundamental rethinks about architecture. Having a software driver also allows a certain amount of bugs and idiosyncrasies to be hidden. GPUs are far more finicky chips then CPUs.
The ugly reality is that graphics APIs and drivers are intrinsically problematic for developers. Software is difficult to write, and drivers are huge and complicated. Part of this is due to hardware differences, but as GPUs have become more programmable, we now realize that the majority of driver complexity is due to API features that have little to do with hardware. Instead, they have more to do with managing resources, allocating memory, and reconstructing information about the frame which the API interfaces prevents us from providing.
This is where Mantle and DX12 come in. These are the first of a new crop of minimalistic APIs: APIs that do the least amount possible and still abstract hardware. We shouldn’t forget that a big part of the elegance of these new APIs is that hardware is far more general then it used to be. There are far fewer corner and edge cases that we need to deal with.
DX12 and Mantle have many advantages. One is that they should prove far more reliable than the older driver models. This is largely a function of being simpler, especially in regards to asynchronous management (e.g., the GPU is a processor running in parallel to the CPU, and the handling of multiple threads is left to the application now). Many people believe that once you write to an API such as D3D11 your job is done. Hardly. When you routinely have things in your code like a lock called D3DIsNotActuallyThreadSafe, or a handler DelayDeletionOpenGL, to work around crashes, you start to question the value-add of the API. These are literally the things you have to do to ship on both OpenGL and D3D11.
Being simpler is a big advantage for other reasons, including one that is not particularly obvious to many: It allows reliable drivers to be written with far fewer resources. Something often not considered when designing an API is the cost of implementing the specification. OpenGL, for example, is a gargantuan API. It is so complex that it requires hundreds of engineers to write and support. And even when the resources are spent, vendors still find the performance insufficient and so they add extensions which make everyone’s code yet more complex. Oxide would rather engineering resources be spent on hardware innovation, not on features we don’t need or want.
A simpler API also makes things more predictable. For example, Mantle’s performance characteristics are not only excellent, but it makes it easy for us to calculate the expected amount of work. We can directly correlate an action taken by our engine to the cost in the driver. This might sound simple, but with D3D11 and OpenGL it is quite difficult to ascertain where driver performance is being eaten. This must be determined by the hardware vendors and usually involves a good deal of guesswork. A simpler API also makes the environment our games run in more predictable, which has real benefits in QA and bug tracking.
One argument is that all APIs start simple, but get complex over time. We call this creep. While this may happen, it is our belief that the bulk of problems are not due to creep. Oxide’s study into driver overhead and reliability concerns concluded that the vast majority of serious driver bugs and overhead were caused by memory management issues and synchronicity issues. Neither had very much to do with hardware abstraction, and our engine already was perfectly capable of handling both of these services (and in many cases already did). This was the same conclusion that many other developers came to as well.
For this reason, many of the most experienced developers, Oxide included, had for years advocated a lighter, simpler API that did the absolute minimum that it could get away with. We believed we needed a teardown of the entire API rather than some modifications of current APIs. Admittedly, this was after advocating no API at all caused the hardware architects’ faces to pale a bit too much. But if we were to build something evil, at least we could make it the least evil possible.
It was this group of advocates who, with AMD, pioneered the development of Mantle. Mantle was not an API birthed by a hardware vendor, but rather a child born of developers and AMD to create a completely different class of API. AMD selected a small but expert group of developers to help advance it. The intention was not to develop the end-all solution for every developer, but rather to build something that didn’t block our studios from maximizing the very capable GPUs that AMD was building.
This group spent quite a bit of time with AMD going over and helping shape the API. Many of the features and structure of Mantle came from developers, not from AMD itself. For example, we could show that nearly every batch required at least some small data payload, so we built in a specialized fast path just for it.
Much of it was a learning experience for AMD, showing just how many things we as developers really don’t need an API to do. For example, loading textures into GPU memory has to be an asynchronous process, with a few GPU commands called to copy it into place and prepare it before we use it. As it turns out, this task was once owned by the driver, but our engine was perfectly capable of doing it both faster and more reliably than any driver could.
Oxide still remembers the day we did our first tests. We watched as driver overhead, once the dominant chunk of our frame execution time, practically disappeared. We watched as the thick driver threads that often polluted our cache and stole our CPUs disappeared. We watched as the little driver overhead we had linearly scaled across our cores. We saw this, in spite of the fact that Mantle was a very new API, competing against established and optimized APIs. There are still many optimizations that both we and AMD have yet to make!
We heard nothing of the development of a new version of D3D12 until sometime after the early tests of Mantle were indicating that this radically different API model could work, and work well – data which we shared with Microsoft. What prompted Microsoft to redesign DX is difficult to pin down, but we’d like to think that it would never have happened if Mantle hadn’t changed the game. We are delighted in DX12’s architectural similarities to Mantle, and see it is a validation of the pioneering work that Oxide was privileged to be part of.
Does D3D12 mitigate the need for Mantle? Not at all. Though it is difficult to determine without more complete information on D3D12, our expectation is that Mantle will have the edge of D3D12 in terms of CPU performance. This is because the GCN architecture by AMD is the most general of all the current GPUs, and therefore the driver needs to do a bit less.
Additionally, Oxide has a strong interest in supporting platforms beyond Windows. Our hope is that Mantle will be a call to arms to bring an industrial-strength API to such platforms as SteamOS, Linux, Android and MacOS. The biggest problem for us moving to other platforms is the relative weakness of the graphics software on the platforms. Added to this is that we yet have no word on whether we can have D3D12 on Windows 7. From a business standpoint, it makes little sense to rely exclusively on Microsoft doing the right thing.
Finally, there is a myth that graphics APIs must be particularly expensive or difficult to write versions for. Mantle is, for us, quite cheap to support. The shader translation is automated; the graphics interface layer in Nitrous already written. These represented only a few man-months of work, and have required shockingly little maintenance. The expensive part of porting to another graphics platform isn’t the initial code that must be written, but rather logistical issues like maintaining your shaders and working around the huge matrix of bugs and performance problems. Small, simple APIs are actually quite easy to support.
Oxide’s door is still open to other forward-looking APIs. Should Nvidia or Intel create an API that is as functional and straightforward to use as Mantle, we’d be more than happy to support it. Oxide doesn’t play favorites here. We recognize the passion of PC gamers. We recognize the amount of time and money players invest in their PC, and are committed to doing whatever we can to get the most out of whatever hardware our fans are running.