The idea of representing JSON objects internally as vectors instead of hashtables is likely to reactions like what the hell? or are you mad? but it turns out to be a surprisingly feasible idea.
The use case
I was tasked with switching the unerlying container of nlohmann’s JSON library’s objects from
tsl::ordered_map (both hashtables), a highly sophisticated hashtable implementation that has slightly better performance and preserves insertion order when iterating as long as erasing is done using a method with linear complexity (preserving insertion order is quite useful when reading JSON in a text editor). Because this part was used in many parts of the software (configuration and RPC), I decided to investigate it deeper.
Among other things, I investigated using
std::vector with a façade that made its external interface similar to a hashtable. In theory, it could be fast for small objects, because there is much less dynamic allocation, no hashing, no division etc.
So I made benchmark of typical use case: create the object, fill a certain number of keys, access them all and destroy it; repeat each step with various groups of keys a million times before proceeding to another step. It was assumed that the size is known in advance to avoid rehashes. This is similar to most use cases of JSON I worked on.
The keys in each group were randomly picked from a pool of manually written strings imitating keys in JSON objects I remember using. There were several groups of keys sharing the same prefixes (which is realistic and unfavourable for vector).
Here are the results (median values of several runs):
It should be noted that linear increase means constant time of a single insertion and the benchmark has linear complexity internally.
This is surprising. I expected the hashtable to take lead much sooner. Because most JSON objects I used had far less than 30 elements, this means that I’ve been using hashtables in places where they were not necessary.
I wanted to know how it continued. I could not write hundreds of strings, so I wrote a piece of code that generated these keys. Every key had at least 5 letters, about a half of them started with the same 5 letters and there were also longer strings with longer identical prefixes. Example:
qqpbi mvvdt fguxt cdbkn xkeox xkeoxmvygw xkeoxmvygwqnpvv wtqxf knyia ymvxv ymvxvjmiyk ymvxvjmiykliwbg ymvxvjmiykliwbgugxej ymvxvjmiykliwbgkvuqv ymvxvjmiykrpfkb ymvxvjmiykrpfkbvcgcn
I tested it for sizes up to 500. The results were more expected, the time needed for hashtable to complete the benchmark increased linearly with size, the time needed for vector increased quadratically in size. However, the coefficient was far lower than expected.
This means that placing values into a vector instead of a hashtable is faster until the number of elements reaches 200! Considering that it also consumes much less memory, I have been needlessly using
std::unordered_map so many times!
The breakpoint may be at greater sizes if used with primitive types that can be compared with specialised instructions and never refer to dynamically allocated values.
The tests were run using GCC 9.2.1 on a i7-9750H CPU clocked at around 4.1 GHz on Ubuntu 19.10.
I tried to run it all on a different machine, with i7-7700 (4.2 GHz turbo boost), on Windows 10. Everything took about 10% more time.
When I used the MSVC 2015 compiler, the results were very different:
The breakpoint was between 22 and 23. The overall speed was slower (by about 25% with hashtable) and vector did not shine that much. It needed about 250% of the time the hashtable needed with size around 200.
Warning, this part is speculative.
std::vector has better memory locality. Because of the short string optimisation of
std::string, most keys do not need any dynamic allocation. This allows much more of the object to be cached together.
std::unordered_map does a lot of operations to access a single element, which include integer division (necessary for modulo). It also has to handle collisions with linked lists, which are very slow to traverse.
Also, a large number of independent checks that the vector-based implementation uses can be optimised really well by vectorisation or speculative execution. They CPU can attempt dozens of branches of the for loop at once. In case of the hashtable, there are no independent loops except string comparison. The next step when hashing the string depends on the hash of the previous letters. When a collision is found, each next step in the linked list requires knowing the address from the previous one. MSVC 2015 is clearly bad at exploiting this difference.
I created a class named
FakeMap that acts like
std::unordered_map, but actually wraps around a
std::vector. For string-indexed tables, it is faster with sizes below 200 elements and takes much less memory (this breakpoint might be at as low as 20 with older compilers).
I am not suggesting to stop using
std::unordered_map, only to point out that in some cases, better complexity does not mean better speed.