Super compact serialisation of C++ classes

How about defining the variable, initialising it and writing code for serialisation and deserialisation on a single line of code?

In an earlier post, I have written about a way to serialise C++ classes with only one short function call dealing with both serialisation and deserialisation. However, the variables are still declared at one location and the serialisation takes place at another location. This made me think about defining the variable, initialising it and writing code for serialisation and deserialisation on a single line of code?

Because the condition to keep it human readable and not to break compatibilty when new members are added, raw memory copies and boost::serialisation are not suitable. Scripts to generate the code are also not considered because of the nuisance they add to the build.

The idea

The idea was to somehow abuse the in class default member initializers (since C++11) in a way similar to this:

struct Device : SerialisableBrief {
	int timeout = key("timeout") = 1000;
	std::string address = key("adress") = "192.168.32.28";
	bool enabled = key("enabled") = false;
};

That looks nice, but how to implement that? There doesn’t seem to be a way for a parent method to capture where the magical key() function was called.

The tricks to get it working

The rough idea is to use key() to capture the key, the return type and return address to collect all the information that is needed to create a vector of functors to be called an override of the serialisation() method that call the synch() method of Serialisable correctly.

However, the code above still looks impossible to compile and wrong also in other ways. So, how is it possible to implement the SerialisableBrief class in a way that would compile and work?

Trick 1: Calling code from class initialisation

The actual meaning of the initialisation after class declaration is much like initialisation of local objects:

struct Connection : SerialisableBrief {
	int timeout = 1000;
}
struct Connection {
	int timeout;
	Connection() : SerialisableBrief(), timeout(1000) {
	}
}

The expression on the right side of the = sign can be any single line expression and can contain function calls including member functions.

The parent class is always initialised before member variables. Member variables, if not explicitly initialised in the constructor, are initialised using the value at the right side of the ‘assignment’ as a single argument given to the constructor (does not apply to copy or move constructors).

This means that parent methods can be called from outside methods at the time of the object’s construction after the parent class is already initialised. Thus, the key() function can be a parent method that has access to this. It is used to save the whereabouts the function call into the parent class.

Trick 2: Overloading by return value

The key() method is supposed to be able to return values of multiple types. It cannot be done simply this way:

int key(const std::string&);
std::string key(const std::string&);
//...

It is also impossible to do this:

template <typename T>
T key(const std::string&) {
//...

But it’s possible to return an auxiliary object:

class Mark {
	SerialisableBrief* parent;
	std::string key;
	operator int();
	operator std::string();
};

Mark key(const std::string& name) {
	return { this, name };
}

The auxiliary object can have its own operator= overloaded in order to accept the actual initialisation value on its right.

To support any type supported by Serialisable, a generic conversion operator needs to be used:

class Mark {
	std::string key;
	template <typename T>
	operator T() {
	//...

The actual implementation won’t work on (at least some) MSVC compilers that will report ambiguous function calls due to the class being convertible both to std::string and const char* which is also convertible to std::string (direct conversion should take precedence). This has to be prevented using SFINAE.

The disadvantage of this approach is that the return type cannot be saved in an auto variable, but that’s not possible for a member variable anyway.

Now, the type can be inferred and matched with the name, the value is actually initialised, but how to determine the address of the member variable?

Trick 3: Layout prediction

The = sign in the default member initialiser isn’t a = operator and cannot be overloaded to obtain the address of the other operand. The part on the right is fed as an argument to the constructor of the part on the right.

However, it happens that the memory layout of a C/C++ structure is exactly defined. Unless a nonstandard compiler directive is used, the rules are as follows:

  • The members are in the order they are written down (this means that a structure containing float, double, float is larger than a structure containing double, float, float)
  • A primitive type of size 1 will be on the address right after the end of the previous member
  • A primitive type of size 2 will be on the nearest address after the end of the previous member that is divisible by 2
  • A primitive type of size 4 will be on the nearest address after the end of the previous member that is divisible by 4
  • A primitive type of size 8 or more will be on the nearest address after the end of the previous member that is divisible by 8 (except on 32-bit architectures)

Composite types do not count as larger, only the primitive types they contain will be placed accordingly to these rules. A class like std::array<uint8_t, 8> will start on the nearest unused byte. However, pointers and the pointer to vtable have the maximal size, so any classes with virtual functions or pointers (which is pretty much any class including anything derived from Serialisable except std::array).

The first member is at address sizeof(SerialisableBrief).

But there is a little problem. All members must be known, including those that aren’t serialised. They can be marked using something like:

	int lastIndex = skip();

Forgetting this can have bad consequences. Although it’s advisable not to give any other functionality to the class besides holding data, someone might disregard it. This can lead to segfaults.

Bonus trick: Getting the return address

The previous part assumed that it’s impossible to obtain the address where the retun value is saved. I lied. Somewhat. It actually can be obtained, it’s just unreliable. It’s useless if the code is supposed to be reliable, but it can be used to detect some mistakes.

The trick lies in copy elision. If an object is to be returned, it can be already created at the address where it’s supposed to be returned and manipulated there. In practice, this happens for anyting that isn’t a primitive type.

	std::string returned = std::to_string(newValue);
	std::cout << "Return address is " << &returned << std::endl;
	return returned;
}

Cases when copy elision does not happen as expected can be detected:

  • If the target of returning is on the heap, the difference between the stack address of the local object is very large
  • If the target of returning is on the stack, the return location is on a lower address than the local object

Thus, copy ellision did happen as expected if the address of the object to be returned is on an address slightly higher than this. This allows to check if an object’s offset was predicted correctly.

Using the gathered information

This part is more or less straightforward. Setting up the functors to be used in serialisation():

template <typename T>
operator T() {
	T returned = T(argument); // Initialise the actual value

	if (_serialised) {
		// Not just a skip() call
		int position = _parent->addElementToOffset<T>(uint64_t(&returned) - uint64_t(_parent));
		_parent->_serialisers.push_back( [position, name = _name] (SerialisableBrief* self) {
			T& reference = *reinterpret_cast<T*>(reinterpret_cast<uint64_t>(self) + position);
			self->synch(name, reference);
		});
	}
	else {
		// Just a skip() call
		_parent->addElementToOffset<T>(uint64_t(&returned) - uint64_t(_parent));
	}
	return returned;
}

When the vector of functors is filled, the serialisation() method needed by Serialisable is implemented by SerialisableBrief:

void serialisation() final override {
	for (auto& it : _serialisers)
		it(this);
}

The code is on github. The actual code is slightly different because the serialisation parts are cached and not stored individually for each instance. It also contains additional code for actually initialising the values, which was largely skipped here.

Conclusion

At the cost of runtime inefficiency and possibility of causing errors by incorrect use, it’s possible to make a C++ class serialisable into a human readable format without a single extra line of code per class. Although its cost may be too high to justify the deduplication of information, the individual tricks may be useful elsewhere.

Leave a Reply

Your email address will not be published.