Since I bought Oculus Quest, I felt that it lacked a proper Doom-like action shooter. So I decided to make one. I decided to use an earlier idea about an action shooter taking place in a futuristic city with incredibly tall skyscrapers and all exterior action taking place on catwalks around these skyscrapers. I chose Unreal Engine because of its high performance, good platform support and use of C++, my favourite programming language.
I created a scene in the editor based on what I envisioned. It looked good, could run relatively well on my high end laptop… but was absolutely impossible to run on Oculus Quest. The scene had around 400K polygons, thousands of mesh draw calls, high shader complexity and used reflection captures whose packaging was broken for Android (causing runtime errors on startup). Oculus recommended 80K polygons and 80 draw calls. That was a long way to go. This article describes various tricks I have used to bring it to Oculus without too much sacrifice.
Tweaking of settings
Because of the packaging bug, I had to remove all reflection captures. They don’t seem to be particularly heavy, but otherwise the packaged game didn’t run.
I had to use level streaming to get the level to load, otherwise I got stuck at black screen.
It ran at about 20 FPS. Way too low. Some easy improvements could be done simply by tweaking the settings of Unreal. Most of the recommendations, even the official ones, did nothing or almost nothing. Here are the ones that actually did something.
Disabling static lighting and using automatic mesh instancing
Before, I worked with Ogre3D that has really good automatic instancing, thus it could batch large numbers of draw calls together very well and did much of it automatically. I was always bound by geometry rather than mesh draw call count and I wasn’t used to care about this particular parameter. However, Unreal is far from being as good at instancing as Ogre3D. Instead of instancing, Unreal seemed more focused on baking objects and textures together (which is probably responsible for games being 100 GiB large). This could not be used in my scene because some of the scale of some objects (skyscrapers).
Fortunately, Unreal had introduced automatic mesh instancing not long before. For mobile devices, it had to be enabled manually in config (keys
DefaultEngine.ini). This didn’t seem to work at first try, but it turned out that I had to disable static lighting to get it running.
Static lighting was not working great with the scene anyway, because the tall buildings around shadowed almost everything. Real glass-covered buildings are quite reflective and thus the lighting around them comes from many directions. Disabling it also added a few precious frames per second.
Switching to Vulkan
Although the editor using Vulkan caused me a lot of headache with its numerous crashes, there was no reason not to use it for the game itself – it doesn’t reload materials, keep lots of old stuff cached, work with raw resources etc. Switching to Vulkan Mobile made a surprising improvement the limitations.
I don’t know what are the exact limits, but I was able to get over 60 FPS with 110K triangles and 130 mesh draw calls.
The requirements on low instruction counts in the pixel shader are not changed, there are no tricks to get around this.
Reducing mesh draw call count
With automatic instancing, the number of draw calls is number of materials of each LoD level on each mesh type.
Reducing the number of LoDs obviously helps a lot here, but that sharply increases the triangle count or requires extremely low poly objects. This is quite a tradeoff, but the limitations seem to be that a LoD is good if it removes 1000 triangles in total in the whole scene.
This was easy. but the rest required quite a lot of effort.
I had a bad habit to combine multiple materials on an object to avoid having to paint textures on each variation, which is tedious and produces needlessly many textures. This turned out to be a really bad idea regarding performance.
Unreal has a tool that can join materials, but it usually bakes textures on them, creating needlessly large amount of textures. It also doesn’t work well on more complex materials.
This can be done manually quite well. Objects are typically made of two or three materials, differing mainly in colour and surface structure. Often, one has a structure (rough metal, concrete, wood) and the other one doesn’t (polished metal, plastic, painted metal). Vertex colour can be used to choose the right one, for example alpha to choose between structured and structureless (texture or plain colour) and the other colours to choose an actual colour.
If there are more structured parts, there can be larger texture containing both and using the UV map to select between the two. This is not feasible for large surfaces where a texture is repeated. It is doable by recomputing the UV coordinates in the pixel shader, but it’s not recommended because it’s inefficient for mobile GPUs.
The vertex colours themselves can be used for other purposes than actual colour, they can be used also for metallicity, roughness, etc. If colour is also needed, it’s possible to use a single colour channel to pick from a range between two colours or even to use multiple vertex colour levels. Vertex colours are part of geometry, not texture, and thus they are fast.
A drawback of this approach is that it’s computed for each pixel on screen, so there can’t be too many calculations. Also, all variants are computed, so choosing between two textures based on vertex colour means that both textures need to be sampled every time, which may be a performance issue. This may be useful, as it can be used to make a smooth transition between quasi materials.
Another problem is that masked materials are inefficient and a material can’t be partially masked and partially opaque, it has to be all masked.
A good example how not to join materials is the gun in Unreal’s starter content. It uses a texture to chose between the outputs of five materials. The same could be done by using the texture’s colour channels or vertex colours to set parameters to a single material, with very close to five-fold decrease in shader instruction count. Placing this gun in front of the camera in VR can cause the frame rate to drop to a half. This complicated my benchmarking, because the position of the gun (attached to my hand) affected the frame time more than the actual scene.
A larger mesh can be used to batch more draw calls together. It’s useless if it’s made by joining meshes of different materials, unless the number of meshes merged is sufficiently high. Merging the meshes’ materials before surely helps.
This comes with many disadvantages. The geometry complexity is negatively affected, because all of big mesh has the same LoD and hidden parts still increase the triangle count. It’s also tedious to work with, as it’s some sort of poor replacement of a prefabricator tool.
The default settings of the object merging tool need to be changed to do this efficiently. The merge materials option has to be disabled and reuse materials has to be used instead, however, this option isn’t available if not merging a specific LoD (typically, it’s LoD 0, so it might be necessary to adjust LoD 0 on the meshes to use the desired detail in the basic LoD of the result, and then to generate LoDs for the result).
I found this the most useful to merge any large, low geometry objects in the scenery (typically, buildings) into one huge object. It had multiple materials, but it didn’t matter so much because it appeared only once. Each of them should be as simple as possible, because these materials would occupy most of the screen area.
The first step is not to make the meshes too complex. Using smooth shading to generate normals can make many surfaces appear round even if they are actually quite low poly. The edges of the object will not be rounded, so there are some limits.
Also, it’s possible to set some amounr of geometry reduction also for LoD 0, so Unreal can automatically remove some of the polygons that wouldn’t be missed too bad. This can combine well with manual reduction because removing some polygons manually may be inconvenient and it may not be obvious which are the worst offenders.
For small objects, it’s always feasible to set maximum render distance.
Larger objects can be replaced by objects with extremely basic geometry, like cubes or pairs of intersecting squares. These can be added as LoDs, but they can also be all merged into one big object (in that case, it’s necessary they are strictly smaller than the actual object).
Fundamentally complex geometries like scaffoldings or railings can be replaced by several layers of masked material (part is fully transparent, part is fully opaque, nothing in between). With normal mapping, this can create very good illusions of complex geometry, to the point that one has to search for its imperfections to spot them even if standing only a few metres away, allowing the real mesh to have a pretty low maximum view distance. However, this should not be done in case they take too much of screen area, because all non-opaque materials between the camera and an opaque surface have to be processed in the pixel shader.
The gun from starter content is again an example how not to do geometry. It has 10K triangles, most of which are useless, because they are visible only by carefully positioning the gun so that it overlaps another object. In the case of VR, the pixels are quite large and it’s nearly impossible to see anything smaller than 2 millimetres.
Reducing pixel shader complexity
Pixel shader is code executed to determine the colour of each pixel. It is computed for each pixel on the screen and thus it is affordable to make it complex for objects taking little screen space, while making it as simple as possible for objects taking lots of screen space can greatly improve performance.
It appears that Unreal generates fairly efficient shaders. A large object (that occupies a lot of screen space) can have two textures used for variable colour, roughness, metallicity, specular and normals.
Of course, having no metallicity (nothing connected to the metallic input) will improve performance. No normalmap will improve performance (vertex normals used for smooth shading are not related to this). Making the object fully rough will improve performance. Less texture samplers will improve performance.
A huge inefficiency is when the material is not opaque. Any masked or transparent materials between the camera and the first opaque material will be computed in the pixel shader, so having 4 surfaces of a masked material behind each other will cause it to take 4 times more time. It may be even worse for transparent materials. However, because this is computed per pixel, it’s fine if these objects are not occupying large parts of the screen.
Some calculations can be pushed into the vertex shader, so that the value will be interpolated for individual pixels between the vertices. There’s a feature called customised UV, which allows computing 2D coordinates in each vertex and then using these coordinates in each pixel. This is obviously the most useful to compute texture coordinates at runtime, which is very useful to make one texture continue seamlessly from one object into another.
Support for reflections do increase the frame time, but they greatly improve the visual experience. The elementary part of it are screen space reflections. They, however, do not work very well on mobile. They can be made much better using static reflection captures, however, they can create various oddities and their packaging for Android is bugged and usually causes assertion failures at startup. There is an option for high quality reflections, which look quite good and do not cause a large additional increase in frame time.
Alternative to reflections
It is possible to use a texture that fakes reflections. The texture can be generated placing a reflection probe and saving its captured image. If used as emissive colour, it looks totally like reflections. The main difficulty about the approach is to correctly calculate the performance to make it behave like a reflection and not like a picture painted on the object.
The texture itself is mapped using spherical coordinates, but projected on a cube.
The mapping can be placed on objects using environment mapping, but it produces visual artefacts – the reflection does not get smaller when the camera moves away from the object. This can be spotted easily and is very visible if there are multiple parallel planes at different distances from the camera – the reflections will have very visibly the same scale, possibly even seamlessly continue from one plane into another.
I have derived a formula describing the relation between the texture coordinates, camera angle and camera distance, but it turned out to be useless. However, the coordinate was the root of a quartic (fourth degree polynomial) equation. Solving this iteratively is not feasible. In theory, it should be solvable analytically using a so-called Ferrari’s method, but it’s impractical due to accumulation of floating point errors and has 4 results, some of which may be complex. Although there might be a heuristic allowing to find the correct root and the floating point precision might not be too bad, implementing this on GPU was an area I don’t want to delve into.
However, I didn’t need an exact result. All I needed was a transformation that meets these conditions:
- No spherical distortion
- The reflection zooms out when the camera moves away perpendicularly
- If the camera moves in parallel with the reflective surface, the reflection moves in the same direction
It was necessary that the image coordinates didn’ŧ depend only on the camera angle, but also on the object coordinates. I found one such function, but it was complicated, needed to compute arcsine (fast approximation of arcsine produced no visual artefacts) and thus couldn’t be used in the pixel shader without causing unacceptable performance issues. I have not tested if it can be done in the vertex shader through a customised UV without spherical distortion, because I suppose that the reflection captures will be fixed at some point.
This trick could be useful also to create an illusion of seeing the interiors of buildings through windows at night.
I have not found a good solution for foliage. A ball with a leaves texture on a trunk has good performance, but it doesn’t look good.
Usually, foilage is done using many masked layers, where the space between leaves is transparent. This looks good and doesn’t need too much geometry, but it requires several layers of masked material. This is a problem on mobile devices. If the foliage takes a significant portion of the screen, the pixel shader is computed too many times for each pixel, resulting in unacceptable frame rate. This is fine if the trees are only an interior decoration and the player isn’t going to look at them from too close. An additional problem is that Unreal does not offer dynamic shadows for mobile devices, so the outer layers of leaves will not shadow the ones behind, which looks a bit odd.
A tree with leaves made without masked materials, using geometry only, had about a million triangles, which is way too much for almost any application.
Here is a description of many optimisations I have used to render a scene with mediocre PC graphics on Oculus Quest with 60 frames per second. Although they may not be applicable for other scenes, especially outside urban environments, I have written this in hope it will help someone.
Many of these optimisation sacrifice one parameter for another. Finding the right balance between triangle count, draw call count and pixel shader complexity may need runtime tests. Even with the same objects, the proper balance may be different for far away objects and for close objects.