Error codes are far slower than exceptions
C is considered to be the fastest programming language. C++ has features that only make C more convenient without an effect on performance and features that do impact performance. They help a lot to improve code quality, so they are often used anyway. Runtime polymorphism is virtually ubiquitous, exceptions less so.
A completely valid reason not to use exceptions is when the executable’s size is or is expected to be tightly constrained by the platform’s limitations. A questionable reason not to use them is performance, as it’s unlikely for completely new functionality to work without compromises. Also, using exceptions in wrong cases can completely ruin performance because handling a thrown exception is known to be very expensive.
But how significant is the performance impact? On most modern 64-bit platforms, exceptions are implemented in a way that minimises their cost as long as they are not thrown. There are no checks for exceptions being thrown in the generated functions, the execution switches to special functions and special data when handling an exception. However, not using exceptions is not free either. Rare errors have to be handled somehow. One possibility is to have the program simply abort, leaving any broken state on the disk, leading to very annoying user experience (done for example in Unreal Engine and Unity Engine, where incorrect API usage in code causes the editor to crash and keep crashing until the incorrect binaries are manually erased). Another alternative are error codes, when functions report they failed and the calling code is supposed to react appropriately, which is less convenient for the programmer and requires the program an additional check after returning from functions, however, it’s often done for performance reasons.
But actually, how do these approaches affect performance? I have tested this on realistic examples that simulate use cases typical for video games.
Reminder – where not to use exceptions?
An exception, as its name suggests, is supposed to deal with exceptional cases. An exception is a case when a rule doesn’t apply. In software, that means something isn’t going as intended. Not a part of a use case. A failure. Invalid user input, connection failure, corrupted data, invalid packet, failure to initialise a device, missing file, programmer errors…
In many of these cases, the program shouldn’t just abort. Invalid user input stopping the program is super annoying because it causes all unsaved data to be lost and forcing the user to wait until the program restarts. Connection failure is a very recoverable problem, usually solvable by simply reconnecting. Invalid packet causing a program to crash is an open door to sabotage, as anyone can send invalid packets to cause the program to crash. And that is what can be solved by exceptions. Throwing them is slow, but the code does not need to be optimised to what isn’t its use case.
Examples of incorrect use of exceptions is when they’re thrown when everything works as it should. Breaking from a double loop, handling the end of a container, checking if a number can be deserialised in order to use a default value otherwise…
Modern 64 bit architectures use a model called zero-cost exceptions that optimises error handling with exceptions strongly in favour of the happy path when no exception is thrown at the cost of very bad performance of exceptions when they are actually thrown.
In other words, it should be possible to run the program in a debugger with the stop on exception function enabled.
Although not all error handling can be efficiently handled with exceptions, error codes can handle all of it. The question is, should they?
Test 1 – XML parsing
For the purpose of this test, I have written an XML parser. I chose to write a parser because it can fail at many locations and does not depend on I/O. It’s definitely not standard-compliant or guaranteed to fail on every possible invalid input, but it can parse a usual XML configuration file and should end with an error in most cases where the file is syntactically incorrect. The code is quite low level and should be relatively fast (about 150 MiB/s), but I did not optimise it and used STL containers to make it convenient to use (as opposed to in-situ parsing). I wrote it with a lot of #ifdef
checks to switch between exceptions, error codes and abort on error just with compiler arguments and thus ensure that the only differences between the variants would be what is necessary for different error handling.
I benchmarked it with an XML file that imitates the configuration of a video game. Its size is 32 kiB and is loaded into memory before the benchmarks start. The parsing was repeated 10000 times and the duration was averaged, then repeated 10 times to test that its imprecision was below 1%.
The code was compiled with GCC 9, on Ubuntu 20.04, with an Intel i7-9750H processor with maximum single threaded frequency 4.5 GHz. I ran all experiments that I wanted to compare at a similar time, without doing anything in between, in order to equalise the influence of other programs occupying cache. Anyway, there were still outliers that took noticeably more than average. I removed these.
The version that aborted on error was as fast as the version with exceptions. The version with error codes was 5% slower.
For some reasons, if failures were handled by a special function that printed the error and exited the program, it was for some reasons slightly (about 1%) slower than the version with exceptions. I had to use a macro to make it comparable to the speed of code using exceptions. This behaviour was repeated in the other tests.
Test 2 – filling classes with the parsed XML
For this test, I’ve written several classes meant to represent the structures in the XML file and code for filling the data with the parsed XML structure. This part was about 10 times faster, probably because there was much less dynamic allocation.
The error margins of the code with exceptions and the code with no proper error handling overlapped, but the times were 0.6% higher for exceptions. In the case of error codes, the program was 4% slower. I achieved a similar slowdown by forgetting to use move semantics.
Test 3 – Updating with data from a binary stream
This test imitates the usage of an asynchronous API for reading data from a TCP socket (such as Boost Asio or Unix Sockets). These APIs are used in a way that always a certain number of bytes is read from the stream, have to be processed and then more data is read. For faster processing and reduced bandwidth, the data are in binary form. Because network data in video games are streamed continuously, waiting for the end is not feasible.
The communication is represented by three message types that identify different possible updates. Because the messages have different lengths, it’s not possible to exactly determine whether all of the message’s length is available, so the function that identifies the message and calls appropriate parsing code will fail often even if everything is running correctly – so exceptions cannot be used to handle this type of failure. Other failures, like unidentifiable message types, wrong identification of objects or large sudden changes of values (either cheating or data corruption) are still handled by exceptions (in the case where they are used).
The data were read from memory in order to prevent networking from influencing the tests. The data were generated by this script.
The result of the test was similar to previous tests – the code using exceptions for error handling was 0.8% slower than the code that aborted on error, which was within the margin of error, while the code using error codes to handle errors was 6% slower.
The results
The times taken by the benchmarks are summarised in the following table, scaled so that the time needed by the version that aborts when an error happens is 100%.
Test | Abort | Exception | Error code |
---|---|---|---|
Parsing | 100% | 100% | 106.2% |
Filling | 100% | 100.6% | 104.2% |
Updating | 100% | 100.8% | 106.2% |
The imprecision was around 1%, so the version using exceptions might not really be slightly slower and the difference might be the result of chance or some invisible compiler decisions, like inlining. The time needed by the version using error codes was consistently higher.
The entire source code is here.
Error handling and clean code
When an exception is not handled in a block, the execution exits the block automatically until it finds a piece of code that can catch it. Any other type of handling does not support this and requires writing additional logic to handle the failure, although in almost all cases the appropriate reaction is to abort the operation the program is performing (the test with reading from a stream is an example where this does not apply). This can significantly lengthen the code even if the reaction to any failure in a function being called is to return the error code to the caller’s caller.
This is a line from the initialisation sector of a constructor in test 2:
animation(*source.getChild("animation")),
It forwards the child XML tag called animation
of its argument to the constructor of a member class called animation
. The constructor may fail due to incorrect content of the XML tag, or getChild
function can fail because the entire tag is missing. This aborts the creation of the structure, or some other process in the program that’s in the catch
block.
If the errors were announced through return values (or output arguments), the code would have to look more like this:
std::shared_ptr<Xml> animationTag; auto problem = source.getChild("animation", animationTag); if (problem) return problem; problem = animation.fromTag(animationTag); if (problem) return problem;
This can be shortened with macros (usually, lambdas can replace macros, but not this one):
std::shared_ptr<XML> animationTag; PROPAGATE_ERROR(source.getChild("animation", animationTag)); PROPAGATE_ERROR(animation.fromTag(animationTag));
That is three times more code even if the macros hide the more repetitive parts!
In addition to needing more code, it also makes RAII less usable, because constructors cannot return information if they were successful, requiring either initialisation functions or special functions that return if the construction went well. This further complicates the code.
There is no benefit in writing the code this way – the extra lines complicate the logic with additional definitions, output arguments and macros, errors can be accidentally ignored, RAII can’t be used properly and the code has to be exception safe anyway due to the number of early returns.
Other results
In debug mode, the performance cost of using exceptions was noticeable. It stayed around 2%, so it was still faster than error codes.
Adding many additional (unnecessary) try
blocks had a negative effect on performance. It looked like if the try
block was responsible for the possibly slightly lower performance of version using exception handling, but experiments didn’t confirm it.
Disabling RTTI while using exceptions did not seem to have any effect on the speed of try
blocks (in that case, the exception object has to be caught as catch(...)
, is not accessible and the error message has to be stored in a thread_local
variable).
If an exception is actually thrown, the numbers become very different. Handling an exception takes about 2 microseconds per each function it exits. This isn’t much, but it’s an equivalent to a slowdown caused by roughly tens of thousands of function calls with error codes. Thus, exceptions are inefficient if the probability that they are thrown is roughly above 0.01%. This should not have much implication on the places where exceptions are used – the goal is to make the program run fast if used correctly. Although it should fail gracefully, it does not need to be optimised for failure.
Exceptions do increase the size of executables. The bechmarking program’s executable shrunk from 74.5 kiB to 64 kiB. Disabling also RTTI reduced the size to 54.8 kiB. I didn’t test how much of the increase is additive and how much is multiplicative, but it should be a warning that it might be necessary to disable exceptions on some more limited embedded platforms.
I have also analysed the generated assembly using the compiler explorer. The enabling of exceptions did not alter the function body (that is, no branching or additional return values), but functions ended with a block of exception handling code that was normally unreachable (try
blocks were also pieces of normally unreachable code). This code called destructors and returned control to the exception handling functions. This code, although not executed, was occupying cache (similarly to an early return). This code was not generated for functions that did not stack allocate anything with destructors or functions marked as noexcept
. Thus, performance critical code that does not need to handle errors can be optimised with noexcept
and if it uses something that can throw, it can be optimised by avoiding to stack allocate objects with destructors (however, I haven’t tested if C++ used as C with exceptions is faster than C).
Conclusion
When handling errors, exceptions are the most convenient way of handling errors. I have tested its performance implications on realistic benchmarks. On 64-bit architectures, the cost of having exceptions enabled is less than 1% compared to aborting when an error happens. The usual alternative at handling recoverable errors, error codes, reduce performance by about 5%. Thus, disabling exceptions on PC architectures is not only inconvenient, but also probably harmful to performance.
Interesting read, just one precision, exception is a trade-off between CPU and memory depending on the compiler and platform. On x86, you can check your binary size, but compiled with -fno-exceptions the binary is 30% smaller than the one compiled with exception. On embedded system, the implementation is usually toward preventing memory usage and so their is an increase CPU usage. I haven’t looked at what iOS or Android would do, I am guessing they have enough storage that they are tuned toward increased storage consumption, but I wonder.
I had a similar result with the size of the binaries. I have mentioned it somewhere, but I didn’t go very deep into it, because I don’t think executable size matters much outside of embedded platforms.