Super compact serialisation of C++ classes
In an earlier post, I have written about a way to serialise C++ classes with only one short function call dealing with both serialisation and deserialisation. However, the variables are still declared at one location and the serialisation takes place at another location. This made me think about defining the variable, initialising it and writing code for serialisation and deserialisation on a single line of code?
Because the condition to keep it human readable and not to break compatibilty when new members are added, raw memory copies and boost::serialisation
are not suitable. Scripts to generate the code are also not considered because of the nuisance they add to the build.
The idea
The idea was to somehow abuse the in class default member initializers (since C++11) in a way similar to this:
struct Device : SerialisableBrief { int timeout = key("timeout") = 1000; std::string address = key("adress") = "192.168.32.28"; bool enabled = key("enabled") = false; };
That looks nice, but how to implement that? There doesn’t seem to be a way for a parent method to capture where the magical key()
function was called.
The tricks to get it working
The rough idea is to use key()
to capture the key, the
return type and return address to collect all the information that is
needed to create a vector of functors to be called an override of the serialisation()
method that call the synch()
method of Serialisable
correctly.
However, the code above still looks impossible to compile and wrong also in other ways. So, how is it possible to implement the SerialisableBrief
class in a way that would compile and work?
Trick 1: Calling code from class initialisation
The actual meaning of the initialisation after class declaration is much like initialisation of local objects:
struct Connection : SerialisableBrief { int timeout = 1000; }
struct Connection { int timeout; Connection() : SerialisableBrief(), timeout(1000) { } }
The expression on the right side of the =
sign can be any single line expression and can contain function calls including member functions.
The parent class is always initialised before member variables. Member variables, if not explicitly initialised in the constructor, are initialised using the value at the right side of the ‘assignment’ as a single argument given to the constructor (does not apply to copy or move constructors).
This means that parent methods can be called from outside methods at
the time of the object’s construction after the parent class is already
initialised. Thus, the key()
function can be a parent method that has access to this
. It is used to save the whereabouts the function call into the parent class.
Trick 2: Overloading by return value
The key()
method is supposed to be able to return values of multiple types. It cannot be done simply this way:
int key(const std::string&); std::string key(const std::string&); //...
It is also impossible to do this:
template <typename T> T key(const std::string&) { //...
But it’s possible to return an auxiliary object:
class Mark { SerialisableBrief* parent; std::string key; operator int(); operator std::string(); }; Mark key(const std::string& name) { return { this, name }; }
The auxiliary object can have its own operator=
overloaded in order to accept the actual initialisation value on its right.
To support any type supported by Serialisable
, a generic conversion operator needs to be used:
class Mark { std::string key; template <typename T> operator T() { //...
The actual implementation won’t work on (at least some) MSVC
compilers that will report ambiguous function calls due to the class
being convertible both to std::string
and const char*
which is also convertible to std::string
(direct conversion should take precedence). This has to be prevented using SFINAE.
The disadvantage of this approach is that the return type cannot be saved in an auto
variable, but that’s not possible for a member variable anyway.
Now, the type can be inferred and matched with the name, the value is actually initialised, but how to determine the address of the member variable?
Trick 3: Layout prediction
The =
sign in the default member initialiser isn’t a =
operator and cannot be overloaded to obtain the address of the other
operand. The part on the right is fed as an argument to the constructor
of the part on the right.
However, it happens that the memory layout of a C/C++ structure is exactly defined. Unless a nonstandard compiler directive is used, the rules are as follows:
- The members are in the order they are written down (this means that a structure containing float, double, float is larger than a structure containing double, float, float)
- A primitive type of size 1 will be on the address right after the end of the previous member
- A primitive type of size 2 will be on the nearest address after the end of the previous member that is divisible by 2
- A primitive type of size 4 will be on the nearest address after the end of the previous member that is divisible by 4
- A primitive type of size 8 or more will be on the nearest address after the end of the previous member that is divisible by 8 (except on 32-bit architectures)
Composite types do not count as larger, only the primitive types they
contain will be placed accordingly to these rules. A class like std::array<uint8_t, 8>
will start on the nearest unused byte. However, pointers and the
pointer to vtable have the maximal size, so any classes with virtual
functions or pointers (which is pretty much any class including anything
derived from Serialisable
except std::array
).
The first member is at address sizeof(SerialisableBrief)
.
But there is a little problem. All members must be known, including those that aren’t serialised. They can be marked using something like:
int lastIndex = skip();
Forgetting this can have bad consequences. Although it’s advisable not to give any other functionality to the class besides holding data, someone might disregard it. This can lead to segfaults.
Bonus trick: Getting the return address
The previous part assumed that it’s impossible to obtain the address where the retun value is saved. I lied. Somewhat. It actually can be obtained, it’s just unreliable. It’s useless if the code is supposed to be reliable, but it can be used to detect some mistakes.
The trick lies in copy elision. If an object is to be returned, it can be already created at the address where it’s supposed to be returned and manipulated there. In practice, this happens for anyting that isn’t a primitive type.
std::string returned = std::to_string(newValue); std::cout << "Return address is " << &returned << std::endl; return returned; }
Cases when copy elision does not happen as expected can be detected:
- If the target of returning is on the heap, the difference between the stack address of the local object is very large
- If the target of returning is on the stack, the return location is on a lower address than the local object
Thus, copy ellision did happen as expected if the address of the object to be returned is on an address slightly higher than this
. This allows to check if an object’s offset was predicted correctly.
Using the gathered information
This part is more or less straightforward. Setting up the functors to be used in serialisation()
:
template <typename T> operator T() { T returned = T(argument); // Initialise the actual value if (_serialised) { // Not just a skip() call int position = _parent->addElementToOffset<T>(uint64_t(&returned) - uint64_t(_parent)); _parent->_serialisers.push_back( [position, name = _name] (SerialisableBrief* self) { T& reference = *reinterpret_cast<T*>(reinterpret_cast<uint64_t>(self) + position); self->synch(name, reference); }); } else { // Just a skip() call _parent->addElementToOffset<T>(uint64_t(&returned) - uint64_t(_parent)); } return returned; }
When the vector of functors is filled, the serialisation()
method needed by Serialisable
is implemented by SerialisableBrief
:
void serialisation() final override { for (auto& it : _serialisers) it(this); }
The code is on github. The actual code is slightly different because the serialisation parts are cached and not stored individually for each instance. It also contains additional code for actually initialising the values, which was largely skipped here.
Conclusion
At the cost of runtime inefficiency and possibility of causing errors by incorrect use, it’s possible to make a C++ class serialisable into a human readable format without a single extra line of code per class. Although its cost may be too high to justify the deduplication of information, the individual tricks may be useful elsewhere.