Trying out the new Vulkan graphics API on PowerVR GPUs

Vulkan™ is a next-generation, high-performance graphics and compute API developed by the Khronos Group. Previously known as glNext, Vulkan has been designed to address some of the shortcomings of the original OpenGL® API which was introduced 22 years ago.

Here is a summary of Vulkan extracted from the official press release:

  • Ground-up redesign of the API: enables high-efficiency access to graphics and compute on modern GPUs
  • Explicit: the application has direct, predictable control over the operation of the GPU.

Introducing one of the first demos using the Vulkan API

Imagination is a promoting member of the Khronos Group and has been working on developing a proof-of-concept driver for Vulkan for our PowerVR Rogue GPUs. Our PowerVR demo team has also spent the last two months porting one of our new OpenGL ES 3.0 demos to the new API and today we are able to show you a snapshot of our work.

The Library demo was originally created using the OpenGL ES 3.0 API and we worked on porting it to the Vulkan API at the same time as the API was being designed. We needed to remove some of the effects compared to the OpenGL ES 3.0 version because of time constraints but the demo still maintains a lot of features implemented in the original app. Here is a summary of what you can see in the video below:

  • High-quality, physically-based shading
  • HDR (High dynamic range) rendering
  • 20 unique 2K PVRTC textures
  • 2 GiB of texture data compressed to 266 MiB using Imagination’s PVRTC texture compression standard
  • 4 x MSAA (Multi-sample anti-aliasing)
  • 16 x Anisotropic texture filtering
  • Physically-correct material parameters
  • Low CPU usage, very efficient GPU usage
  • Correct specular reflections on reflective materials
  • More than 250,000 triangles
  • Post processing effects: saturation, exposure and tone mapping


Please note that this is an alpha driver and performance is not representative of the final product.

Less CPU work

The new Vulkan interface is designed to be as close to the architecture of modern GPUs as possible. This means that both the code size and the amount of work going on in user and kernel space for the Vulkan driver is very small and therefore will be more efficient than OpenGL ES.

For example, there are no glUniform*() equivalent entry points in Vulkan; instead, writing to GPU memory is the only way to pass data to shaders.

When you call glUniform*(), the OpenGL ES driver typically needs to allocate a driver managed buffer and copy data to it, the management of which incurs CPU overhead. In Vulkan, you simply map the memory address and write to that memory location directly.

Here is a chart showing the difference in CPU usage between Vulkan and OpenGL ES 3.0 for our Library demo.

oglesvsvulkan

CPU usage: Vulkan vs OpenGL ES

Leaner, more explicit driver

The result of designing an API around the hardware means that the number of instructions in the front end portion of the driver is significantly reduced compared to OpenGL ES. This reduction in complexity enables developers to issue more draw calls, while hardware vendors can achieve better stability and quicker driver bring up time.

Even though driver used here is an alpha (i.e pre-release) version, we hope that Vulkan should eventually be very stable because there is less code to go wrong.

PowerVR Rogue GPUs running early Vulkan demo (2)

In Vulkan, high level management of the GPU needs to be performed by the application (e.g. resource lifetimes). The driver is almost completely hands-off  and does what the application tells it to. Whilst this results in greater complexity in the application, it should be offset by the need to work around the driver (e.g. shader pre-warmings in OpenGL ES).

If your application is using an engine to do the rendering, the engine will probably already be managing this anyway, and Vulkan can provide an almost free speedup.

The way that Vulkan is designed resembles modern command buffer-based APIs so this work should be easier to do if the application or framework has been ported to these types of programming interfaces already.

More consistent performance

People might say that the main advantage to this API is that less CPU-relevant work needs to be done when submitting a draw command – and this is true.

However the main benefit I see is that the API will make programming 3D graphics much more predictable. Let me explain: for example, when you call glBlendFunc() in OpenGL ES, different things could happen depending on the underlying graphics architecture that is running that code.

PowerVR Rogue GPUs running early Vulkan demo (1)

Some GPUs could delay setting up the blending until the first time the bound shader is used; others might not. This makes achieving consistent performance across different GPU vendors very difficult.

Vulkan makes solving this problem easier because the entry points to the API are designed to allow the driver to do work in consistent places.

When you fill in a struct describing some state using Vulkan, you know that there is no driver work going on; the code is all application code. The API is designed to fit as best as it can to all GPU vendor’s architectures so there are fewer opportunities for unknown performance hiccups.

The glBlendFunc() problem becomes obsolete because the blend function is specified in a struct during pipeline setup. The driver work will happen early, when the function to create the pipeline is called,  instead of some time during rendering causing a stutter.

PowerVR Rogue GPUs running early Vulkan demo (4)

Actually, a lot of the Vulkan API is aimed at being able to specify everything up-front if possible. For example you can record a list of render commands and state setting commands into a command buffer and replay that every frame with just one call. The driver has more opportunities to optimise this usage case because it knows it can do more work when creating the command buffer, rather than when executing it.

Another consequence of the explicit nature of Vulkan is that there is no resource renaming (or ghosting) behind the application’s back – multi-buffering needs to be performed explicitly. Multi-buffering is the process whereby a graphics driver may have a number of frames being processed at the same time.

The data attached to those frames (e.g. uniform data and attached textures) needs to be kept around until the frame it is attached to has finished; this will need to be performed by the application. On the plus side, the data that you know will not be modified between frames (e.g. brightness or contrast) can be specified as const for possible optimisations.

PowerVR GPUs are first-class citizens

A key feature added to Vulkan is the render pass, which redefines how well an application can control our hardware, and reduces the amount of work we have to do implicitly without the application necessarily knowing about it.

A render pass consists of framebuffer state (other than actual render target addresses), and how render targets should be loaded in and out of the GPU at the start and end of each render. This structure is the key object that allows tiled architectures like PowerVR to run at extremely high efficiency.

PowerVR Rogue GPUs running early Vulkan demo (2)

In OpenGL ES during rendering, several things can cause implicit flushes of tile buffers to main memory; a bandwidth heavy operation that’s usually unnecessary. Our OpenGL ES drivers spend a lot of effort trying to figure out what the application is doing to avoid doing these flushes, and to avoid having to flush all render targets to main memory. In Vulkan, the only time such a flush can happen is between render passes, making it obvious to both the application and the driver. More importantly – it tells the GPU exactly what an application wants to do with each render target.

Render commands can be created in parallel

Command buffers can be created on a different thread to the thread they are submitted on. This means rendering commands could be created on all cores of a CPU.

There is no extra work or locking required to do this – a feature that was not previously possible with OpenGL ES. This may be of use to games which need to recreate their render commands a lot (e.g. Minecraft).

More intuitive design

Vulkan gives you the advantage of knowing exactly the state that you are setting. Take for example the glActiveTexture() function in OpenGL ES: it is not obvious whether this function will change the state globally for all shaders or maybe change the state just for the current shader program.

In Vulkan, this is explicitly defined: you know that when you bind your resources, it is changing the state for the bound command buffer because that is the first parameter to the function.

A consistent idiom in Vulkan is to have the first parameter to all entry points be the representation of the state that you are going to change with the function call. For example:

vkCmdBindDescriptorSet(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, 
textureDescriptorSet[0], 0);
vkQueueSubmit(graphicsQueue, 1, &cmdBuffer, 0, 0, fence);
vkMapMemory(staticUniformBufferMemory, 0, (void **)&data);
// ...
vkUnmapMemory(staticUniformBufferMemory);

Explicit memory management

When you call glTexStorage2D() in OpenGL, the driver has to allocate memory for a two-dimensional or one-dimensional array texture. The function and the memory allocation process represent a black box.

In Vulkan however, the memory allocation is done by the application. This means that the application knows more about what type of memory it is using and more importantly how much memory it is using, which should be useful for applications that are memory-bound. This is in contrast to receiving an “out of memory” error in OpenGL ES and needing to reduce resource usage by an unknown value.

Explicit memory management in Vulkan allows applications to use custom allocation strategies. For example to allocate all memory up-front and avoid any allocations during rendering.

Extra details

Imagination is working to give you more information on the Vulkan API as it becomes more mature and will release example source code in the near future.

 

Editor’s Note

* PowerVR Rogue GPUs are based on published Khronos specifications, and are expected to pass the Khronos Conformance Testing Process. Previous generation PowerVR GPU cores have already achieved OpenGL conformance. Current conformance status can be found at www.khronos.org/conformance.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.

OpenGL is a registered trademark and the OpenGL ES logo is a trademark of Silicon Graphics Inc. used by permission by Khronos.

Mojang © 2009-2015. Minecraft is a trademark of Mojang AB..

, , , ,

  • Pingback: Some thoughts about the Vulkan API (glNext) and the future of OpenGL | RenderingPipeline()

  • Mike Lothian

    Great article – thanks for sharing this

  • Pingback: AMD,s Gaming Zukunft 2015-2017 - Seite 34()

  • AyresRocket

    Smith san…domo.
    Nihon de..niju nen kan ni sundeimashita.
    Tokyo, Hamamastu ,Nagano mo.
    Nihon wa ee yo ne 🙂

  • Sean Lumly

    Very nice! I think I’m most impressed that there is finally a single API to unify a host of devices, rather than different versions tied to a legacy API — I’m that Vulkan will support desktop, console, and mobile equally.

    I’m crazy about an intermediate bytecode compared to storing shaders in plaintext. As a method of obfuscation, this should make many developers feel more at ease, and improve performance in the bargain. Storing thousands of shaders should no longer be a problem opening up a very interesting point of optimization trade-off — calculation values can be “streamed” in with code, rather than looked up from memory, potentially suppressing random-reads, using a load/store, and making better use of cache. Of course, this can be done today, (load all shaders at the outset), but I would imagine it would be more convenient with a bytecode.

    I’m also really glad about the way that memory is handled. At times it is frustrating having to work within the confines of the GLES pipeline, shaping technology to this model. Passing pointers is much more liberating.

    Lastly, I’m stoked for the efficiency gains. Being able to batch in a thread-safe way is a pretty useful feature, and I’m glad that all of this work can be uploaded for the driver to optimize the order-of-execution. I would imagine that it would make things far easier to both the developer and the driver engineers!

    I really hope that Vulkan finds its way to devices soon! While GLES will still be a central focus for a long time (for consumer apps. I can’t expect most hardware in circulation supports GLES3.1 nor will get the latest Vulkan drivers), Vulkan is a very fresh start for proprietary apps, or those targeting the cutting edge.

    I’m looking forward to reading more about Vulkan (specifically about the facilities around building command queues)!

  • Sean Lumly

    In the demo, when you mention ‘more than 250,000 triangles’ are you referring to the overall scene, or the triangles computed in each frame?

  • Hi Sean, I’m glad you are excited, I am too. Having used the Vulkan API I can say that in some ways it is easier to understand than OpenGL ES. Command buffers are definitely one of the most important parts of Vulkan. It lets you decide what tradeoffs you want to make. As for the 250000 triangles, that is the total number of triangles in the scene if frustum culling is turned off.

  • Hi Ayres, I think we should keep to English here, but I am jealous you have spent so much time in Japan

  • Sean Lumly

    Thanks! I very much appreciate the information and the post! 🙂

    I look forward to reading much, much more about Vulkan! Exciting times are indeed upon us.

  • Bogdan

    Congrats … too bad you are so FOSS unfriendly …

  • Pingback: PowerVR Rogue Grafik für mobile Geräte - Gamers.at()

  • ,,vkQueueSubmit(graphicsQueue, 1, &cmdBuffer, 0, 0, fence);”
    congratulations for reinventing the same cancer we alreday seen in opengl. there is no need for 3d apis any more, we are happy with software rendering. die.

  • Thanks Sean, I’ve just featured your comment.

  • Sean Lumly

    Wow, thank you! That is surprisingly flattering.

    Thanks for producing such great GPUs and great blog content — it is very appreciated. 🙂

  • Ancurio

    Hm, why does the function “vkCmdBindDescriptorSet” contain “bind”? Please don’t tell me this is setting some global state that later calls reference?

  • Hi, any vkCmd* command is changing a command buffer. There is no global state if my memory serves me. There should hopefully be some more in-depth articles around soon.

  • Pingback: Meet Vulkan, The Successor of OpenGL and OpenGL ES 3D Graphics APIs()

  • IonutCava

    Great article. Thank you!

  • Derek

    when I was drunk last night, I suddenly spoke fluent Cantonese. So I can’t use that gift?

  • But… but… The entirety of Vulkan is FOSS…

  • AyresRocket

    LOL,no problemo Ash.
    Figured you’d enjoy the romajii.

  • Pingback: Khronos announces Vulkan, Next Generation OpenGL graphics API()

  • Daniel Jo

    Command buffers make me remember display lists — but they seem infinitely better. I’m really excited to see the API and spec.

  • Pingback: Vulkan links | RenderingPipeline()

  • Pingback: Vulkan > OpenGL | El Código Gráfico()

  • LDM

    Hi Alex,

    Today Kishonti has announced the GFXBench 5.0 which will support Vulkan.
    It will be available at end of this year or maybe sometime in 2016.

    One of the main debate in the software benchmarks was the ”cheating” by some suppliers giving some useless frame rate that never matched the reality of the games.

    in you opinion using a lower level API such Vulkan in the bench software will help to remove cheating in the test or it would be the other way around?

    Cheers

    L

  • Secret Library

    What exactly do you mean by this:

    “This may be of use to games which need to recreate their render commands a lot (e.g. Minecraft).”

    Doesn’t… every gamee need to recreate its render commands a lot? (Like, every frame)

  • there is no need for 3D apis and IGP-s any more. just look at this api – just as bad as the original opengl. software rendering is a better solution: the cpu-s alreday have enough performance to run it, any rendering pipeline can programmed freely in a monolithinc-smp-neumann convention on CPU, and no driver bugs while using it on various systems. the same applies on gpgpu, which only works in theory, and in manufacturian tech-demos. the conception is very rarely can be used in real world. igp-s must be throwed out from the processor, and the DIE size should be used to add additional CPU cores.

  • Hi,

    Yes every game needs to recreate their render commands if they are using frustum culling. Minecraft is an interesting example because they may not be using frustum culling on every block. The render commands may only need to be recreated when a block is changed within a chunk. This may be a heavy operation so could be off loaded to another thread.

    Hope that helps,
    Ash

  • Although you are correct that some high-end multi-threaded CPUs might be able to handle software rendering for tasks like ray tracing, that is simply not possible in mobile today. Embedded GPUs and graphics APIs were created exactly because we needed a way to boost performance for mobile devices.

    Regarding GPU compute, many real-world applications today run image processing filters using RenderScript or OpenCL kernels on the graphics core (although it is not necessarily public knowledge).

    Regards,
    Alex.

  • Some scenes go up to 250,000 triangles.

  • What you are referring to is not directly linked to APIs but more to disabling DVFS for short bursts which enables the hardware to run faster than it would normally do.

    A low level API will give the programmer more direct control over the application – used correctly, it can boost performance significantly, especially when doing a lot of draw calls.

  • yeah, maybe we can say, that today, most phone have too slow cpu to run ray tracing, or other complex software rendering in good frame rates in enjoyable screen resolutions.

    but today mobile IGP-s are too slow to run any complex graphics at all, compared to PC. not only the cpu is weak, when we compare it to pc high-end.

    maybe we can say, that after ~5 years, we will have enough performance boost in mobiles to run games and renderers like nowdays on PC. but that will also need the agressive development of the CPU cores. this means that the cpu-s will gain the performance required to achieve somewhat nice software rendering on the platform.

    with the current 8 core arm cpus, we alreday can use software renders smoothly, but the ray tracing itself will need a 10x speed-up.

    i personally switched to software rendering 3 years ago, becouse i was sick from the bugous drivers from the n+1 manufacturer, and the n+1 graphics api. but the most shocking is that some drivers just crash.

    for example, on mobiles i just use SDL to draw my frame buffer (becouse i was too lazy to learn the android apis). and on 20-30% of the mobiles, basically the SDL init is just basically hanging from the so-quality opengl-ES drivers.

    a serious developer cant afford to build software that will maybe not run properly on the 99% of the users. an AAA game developer can afford this, becouse for him, only hardcore gamers are the priority, and they have decent hardware usually. but for someone, who develops for the mass-market, or some subcultural markets, using graphics acceleration is not an option. my old 3d engine, that was for windows and linux, only properly runned for 20-30% of users, becouse of the shoddy 3d drivers. this is now maybe decrased to 10-15%, thankfully for the new bugs in the 3d drivers. this also created me a massive loss in income.

    so i just switched to software rendering, which is far more ugly, buggy, stutters, but at least runs everywhere. even if igp’s would get a theocretical 1000x performance up from today to tomorow, i would not use them, and i would tell my users to buy a device with stronger cpu to run it on bigger resolution.

    both as a developer, and as an user, i dont really see any future in igp-s.

  • Sean Lumly

    Thanks Alex! That’s an impressive number of triangles on mobile, and it’s funny to think that modern high-end PVR GPUs can handle much, much more!

  • Nanook

    Mircosoft’s buyout of Minecraft has me guessing whether or not Minecraft will ever see Vulcan rendering.

  • Is the demo video running on an android device? I’m just wondering because of the buttons on the bottom. 🙂

  • Nathaniel Lewis

    I seriously hope Apple adopts Vulkan support in iOS 10, or some point release of iOS 9, and OS X. As much as I like Metal, i would hate it if they stopped supporting open standards for graphics APIs. This blog post shows that the company they source their GPU from has a driver supporting Vulkan. The lack would indicate it was their decision alone which prevented it.

  • Yes, it is.

  • Pingback: Possible to turn off multibuffering/doublebuffering in Metal? - DexPage()

  • EM87

    Yeah, Metal is great but Vulkan is looking even better. I’m really hoping it takes off, having Apple and Nintendo (with the NX devices) on board would be great.

  • blobjim

    Also depends on when LWJGL (graphics, etc. bindings for Java) will created Vulkan bindings. But yes, Microsoft may force them to use DirectX only (especially since as now they are creating a new Minecraft version with C++ using DirectX)

  • johnBas5

    Thanks! Very promising API!

    Does the explicit memory management of Vulkan means there are functions to tell you how much memory there is on the device?
    Not just how much the application is using but how much memory and if it’s dedicated, type and if the memory is shared with CPU or not.

  • johnBas5

    Vulkan itself is a specification, not a piece of software.

  • It sure is. I wonder why I said it like that. My point is that it will be an open spec once released. 😛

  • Hi John,
    There are plans to expose more detailed statistics about the GPU. One of those allows you to investigate the heaps that are present. So yes I believe so.

  • Nanook

    They’ve got a whole generation of kids to squeeze into Microsoft loyalty.

  • blobjim

    Indeed they do : We’re gonna have to do something about it if we don’t want to see Minecraft’s soul die.

  • Pingback: Khronos launches the Vulkan 1.0 specification – John Ward()

  • AlexByrth

    This new version with C++/DirectX is new to me. I guess they would go for C# / ANGLE.
    MSTech is committed to ANGLE, as bridge for OpenGL ES 2.0 and DIrectX.