Gnomes per second in Vulkan and OpenGL ES

It’s been a while since we first showed off our Vulkan* driver for PowerVR Rogue GPUs. Since then, our PowerVR driver and graphics demo teams have been working hard to synchronize with the spec as it evolves towards its final form.

Today we are excited to show you a new demo we have been working on that better highlights the specific benefits we believe this API should bring to developers and devices.

Vulkan and OpenGL ES in Gnome Horde

This new demo is called Gnome Horde and runs under Android on the Intel-based Nexus Player, a consumer device integrating a PowerVR G6430 GPU; it uses the latest prototype Vulkan API driver for PowerVR GPUs (final performance may differ).

On the left-hand side of the video, we are showing Vulkan and on the right we have OpenGL® ES 3.0. We have attempted to ensure both versions run equivalent code and both run without extensions. The demos are not using instancing either, each draw call could be a different piece of geometry with a different material or texture and the CPU performance would be very similar.

Before reading any further, please note that this is an exaggerated scenario that is intended to highlight and amplify Vulkan’s strengths. It is not intended to show OpenGL ES in a bad light – we are deliberately using OpenGL ES in a way that it was not designed for. We are also aiming to be GPU bound using the Vulkan API; this means the GPU and CPU are being used as effectively as possible, which is a great thing for developers and vendors alike.

The implementation details

Using Vulkan we batch draw calls into tiles and render a tile at a time. Each time a tile goes out of view, comes in to view or changes its level of detail we regenerate a command buffer (more on this later). By avoiding changes in the command buffer, we reduce overall CPU usage significantly compared to OpenGL ES.

This is explained in more detail below.

Tiled renderingTiled rendering

In OpenGL ES, all draw calls are submitted dynamically according to the tiles in view, with no opportunity to cache draw calls that have already been executed.

Lower CPU usage

As you can see from the CPU usage graph in the bottom left of the video, CPU usage is very low for this many draw calls in the first mode. In the highest zoom level we are drawing around 400,000 gnomes (and other objects) per second. Each object has a different transformation, and there are many different materials, textures, blend modes and shaders being used.

The reason that the OpenGL ES API struggles with these tasks is because OpenGL ES requires many calls into kernel mode to change the state of the driver, along with validating that state and any extra work that goes on behind the scenes, all during an applications render loop.

This is in contrast to Vulkan where we can pre-generate these commands. Executing pre-generated commands in Vulkan is very fast, with little CPU overhead and no need for the driver to validate or compile anything inside the render loop. These pre-generated commands are called command buffers.

Vulkan demos CPU usage vs OpenGL ESVulkan CPU usage (left) and OpenGL ES CPU usage (right) for Gnome Horde

The lower line is the process CPU usage and the top line represents system CPU usage. Both are reduced in Vulkan due to the ability to process command buffers before submission.

Command buffer re-use

Being able to re-use command buffers proves useful in some circumstances. This feature will not be a panacea, but it will be possible to use it to a great extent in many games and applications. In this specific instance we decided that being able to re-use command buffers for each tile would reduce the overall CPU usage.

Before drawing in both APIs, the driver needs to compile a set of commands for the GPU to execute, validate those commands, and do other work – all before actually starting the GPU. With OpenGL ES, this needs to be performed for each draw call, during the render loop. In Vulkan we can compile and validate this list of commands ahead of time, and then have the GPU execute these pre-generated commands.

Vulkan demo screenshotVulkan in action: Gnome Horde demo screenshot

In this screenshot there are 300 tiles with a total of 13,500 draw calls being run at roughly 30fps with very little CPU usage, this is approaching half-a-million draw calls per second without instancing.

Parallel command buffer generation

In the next demo modes, watching the CPU graph we can see that we can go from very little CPU usage to using nearly the whole of every CPU core. What’s happening here is the camera is moving much faster and therefore needs to regenerate command buffers more frequently (a slightly unrealistic use-case). In OpenGL ES we are CPU bound and cannot feed the GPU with enough commands. However with Vulkan we have the opportunity to distribute the regeneration of the tiles command buffers to different threads. This is not possible with OpenGL ES which was designed before multi-threading was widely available. In a real application, the workload will be somewhere between the two extremes of dynamic draw calls and static draw calls.

In this case we are sacrificing CPU usage for memory usage. We could store all of the command buffers for the entire scene in memory. However on mobile devices, memory is often limited so we only store the command buffers that are in the viewable frustum instead. With Vulkan we are purposefully bound by GPU performance which goes to show that we are using the the CPU effectively and feeding the GPU with enough commands.

Vulkan CPU vs OpenGL ES CPU: Note how OpenGL ES cannot do multi-threadingVulkan CPU vs OpenGL ES CPU: Note how OpenGL ES cannot do multi-threading

For equivalent performance, the Vulkan demo could have the CPU run at a much lower clock frequency, increasing efficiency and battery life compared to OpenGL ES.

In this mode there are roughly 80 command buffers being re-created each frame distributed between the cores of the CPU. Each command buffer has 45 draw calls and other state setting information. With all this work going on, it is good to see the frame rate stays the same as in the previous mode.

Memory allocation strategies

One advantage of Vulkan over OpenGL ES is that the developer has more visibility of the memory that needs to be allocated. With OpenGL ES the driver handles most of the allocation and hides it away from the developer. With Vulkan the memory that the driver allocates is very minimal and the developer can use different memory allocation strategies. For example, if an image is not in use by the GPU, the developer could decide to use that memory for other purposes like uploading a texture.

Render pass – pixel local storage

In Vulkan there is a structure called a render pass; each render pass has one or more sub passes. These sub passes can be exploited to utilise pixel local storage to store intermediate values for shaders between sub passes.

Being a tile-based deferred renderer, PowerVR can execute multiple shaders for the same pixel in an FBO effectively using fast on-chip memory. This is a good idea in rendering techniques such as
deferred rendering. The benefit of doing this is that it avoids wastefully writing intermediate values back to main memory, saving bandwidth and therefore power. However this functionality is an extension in OpenGL ES, requiring more code to check if the extension exists.

In Vulkan this functionality is a core feature that will benefit battery life and the efficiency of applications and devices. Vulkan also allows the driver to handle out-of-memory issues gracefully with respect to deferred renderers and the transient memory they use.

Finally

All of the features above require implementation in code, so the use of Vulkan does come with added code complexity compared to OpenGL ES. However, Imagination is committed to continuing full support for OpenGL ES for a long time to come alongside developing a new Vulkan API driver for PowerVR Rogue GPUs.

Devices with the new Vulkan API should bring new optimisation opportunities and increased efficiency to application developers.

If you are heading to SIGGRAPH 2015 this week, drop by the Khronos Group BoF meeting on Wednesday to see this demo in action and get an explanation of what is going on.

Stay tuned to our blog as we will bring you more details after the BoF.

Remember to also follow us on Twitter (@ImaginationPR, @PowerVRInsider) for  the latest news and announcements from the PowerVR Insider team.

Editor’s Note

* The prototype Vulkan driver for PowerVR Rogue GPUs is based on an internal draft Khronos Specification, which may change prior to final release. Conformance criteria for this Specification have not yet been established.

PowerVR Rogue GPUs are based on published Khronos specifications, and are expected to pass the Khronos Conformance Testing Process. Multiple PowerVR Rogue GPU cores have already achieved OpenGL ES conformance. Current conformance status can be found at www.khronos.org/conformance.

OpenGL is a registered trademark and the OpenGL ES logo is a trademark of Silicon Graphics Inc. used by permission by Khronos.

, , , , , , ,