Lords Of Tech

Stop telling C/C++. There is C and then there is C++.

Ever been criticised for using totally working C code in a C++ codebase just because it wasn’t proper C++? Here’s why you shouldn’t have done it.

C is a minimalistic programming language that can be compiled to nearly anything and its functions are used in nearly any modern programming language.

C++ is a general purpose object oriented language designed to combine the power of low level languages with convenience of high level languages.

A major selling point of C++ was the possibility to use the C ecosystem in it. And this led into a belief that these two are somehow a single language, C/C++. Libraries are advertised with it. Job offers tell it. Some youtube videos even defend it. This draws a lot of ire from C++ developers.

And it deserves all of it.

First, what is it as a language?

By definition, it’s a hybrid between C and C++. Using any C++ requires C++ compilers, because C compilers cannot compile C++, but any C++ compiler can compile C (this is easy to say when there are de facto only 3 C++ compilers).

C/C++ is mostly produced by C programmers when they use a C++ compiler. C++ has a few low hanging fruits that allow using it as a more convenient form of C. These are typically:

  • struct Something {}; instead of typedef struct {} Something;
  • Handy functions, like the stuff in std::filesystem
  • Array size as part of function argument
  • Giving multiple functions the same name, because one runs out of good names quickly
  • Using references to avoid having to write all those ->s
  • Using std::string, because who the hell would want to do all the allocation code

Alternatively, it’s based on HR people’s (and probably also managers’) simplification that C code is valid C++ code means that programmers who know C also know C++. Although both of these statements are incorrect, the latter is more incorrect. Haphazard usage of C code in C++ is utter shitcode. Slow to write, hard to read, prone to errors. But AFAIK universities often don’t teach C++, just some Python for introduction into programming, C for low level understanding and Java for OOP; so people who know C are more abundant and useful for filling the recruitment quotas.

If not done carefully, this leads to a huge number of problems.

Random mixture of C and C++ is full of traps

Consider this code:

struct Stuff {
	float size;
	std::string name;
};

Stuff* makeStuff(int amount, char** names) {
	Stuff* made = (Stuff*)malloc(sizeof(Stuff) * amount);
	for (int i = 0; i < amount; i++) {
		made[i].name = names[i];
	}
	return stuff;
}

void freeStuff(Stuff* freed) {
	free(freed);
}

It compiles. And superficially appears to do what it’s supposed to do. After better testing (possibly pushing it and running it in a less isolated scenario), it will be revealed that it leaks memory and possibly can do a double free as well. If this was more convoluted, like a realistic scenario would be, this would be rather hard to spot.

If you don’t see it, the problem is that the constructor and destructor of Stuff are never called and thus the constructor and destructor of std::string are also never called. Because a std::string with zeroes on all of its bytes typically means it has zero size, zero capacity and null pointer for allocated data, the assignment usually won’t try to deallocate any previous data. A short std::string with less than 15 bytes (plus terminating zero) typically doesn’t hold any dynamically allocated data, so skipping its destructor will not leak anything if it was testing with some "Hello world!" or "abc123".

Okay, so let’s remove that std::string:

struct Stuff {
	float size;
	const char* name;
};

Stuff* makeStuff(int amount, char** names) {
	Stuff* made = (Stuff*)malloc(sizeof(Stuff) * amount);
	for (int i = 0; i < amount; i++) {
		made[i].name = names[i];
	}
	return stuff;
}

void freeStuff(Stuff* freed) {
	free(freed);
}

This will probably work. Well, probably. A typical compiler knows you’re making a common mistake and will be merciful. But C++ mandates that all objects except char* are created with constructors, casting from char* or by memset() (or something that uses these internally, like smart pointers). Using the void* returning malloc() to for usage as a struct type it is undefined behaviour. The compiler can assume in the rest of the code that it never happened and optimise it so hard that it won’t work. But what can possibly happen here?

Well, it is free to assume that undefined behaviour never happens. Because Stuff* made = (Stuff*)malloc(sizeof(Stuff) * amount); is undefined behaviour, it can assume the function will never be called, thus any branches that lead to it being called can be removed and retrogradely assume the variables will have values where does branches don’t happen:

float useStuff(int amount, char** names, float (*worker)(Stuff*)) {
	std::cout << "Making " << amount << "instances" << std::endl;
	
	Stuff* made;
	if (amount != 0)
		made = makeStuff(amount, names);
		
	float result = 0;
	for (int i = 0; i < amount; i++)
		result += worker(&stuff[i]);
	
	if (amount != 0)
		freeStuff(made);
		
	return result;
}

If the compiler knows the definition of makeStuff(), it can assume it will never be called. The assumption that it will never be called implies that the variable amount will always be equal to 0. Thus, it can simply unconditionally print Making 0 instances, ignore the for cycle and everything else in the function and return 0.

As a result, this code is at the compiler’s mercy that it will understand that the casting after malloc was meant to actually happen. Of course, if the codebase is larger and the cases are more complex, then good luck finding out why -O3 causes some conditions to evaluate wrongly.

This can be fixed by changing that one line to:

Stuff* made = (Stuff*)(char*)malloc(sizeof(Stuff) * amount);

Casting to char* is correct and casting from char* is correct. This is for accessing raw bytes as structured data and vice versa, which C++ needs to support. So in this case, the use of malloc() is okay. But is the code okay? Well, it still isn’t.

The worker function is a free C++ function. It can contain typical C++ code like this:

float greatWorker(Stuff* stuff) {
	std::string checked = stuff;
	for (char& letter : checked)
		if (letter == ',') // Get rid of decimal commas
			letter = '.';
	stuff.size = std::stof(negative);
	return stuff.size;
}

This would sometimes leak memory. What is the problem? It’s quite subtle. The reference will reveal that std::stof will throw an exception if the conversion could not be done. The exception would propagate through useStuff(), skip the call to freeStuff(), leak memory and reach a catch block in the calling code that may handle it correctly. If this was proper C++ code, there would be a class whose destructor would free the memory (which can be done for the sole reason of being tired of tracking the origins of leaked memory due to forgotten cleanups). If it isn’t done for some reasons, worker has to be declared as float (*worker)(Stuff*) noexcept.

Let us assume that at this point, someone is fed up with all the problems caused by this code and rewrites it to modern idiomatic C++ (assuming the function that uses it is altered to accordingly):

struct Stuff {
	float size = 0;
	std::string_view name;
};

std::unique_ptr<Stuff[]> makeStuff(std::span<std::string_view> names) {
	auto made = std::make_unique<Stuff[]>(names.size());
	for (size_t i = 0; i < names.size(); i++) {
		made[i].name = names[i];
	}
	return made;
}

The code is shorter, reliable and, maybe surprisingly, slightly faster. It’s because the C version of the code contained a slight inefficiency. std::string_view contains the size of the string, so strlen() is called at most once. Actually, the cause of the infamously long loading times in the GTA V game was a strlen() call on every loop while iterating through a string.

Even small pockets of mixing C and C++ code can spaghettify an entire codebase

As pointed out earlier, any functions that pass through C code have to be noexcept. Any code between an unprotected malloc and free has to be either noexcept or placed in a catch block that catches all exceptions. This is hard to verify and often ends up in forbidding exceptions in the entire codebase, often ending with codebases that explicitly handle even the least probable of errors on every level like this:

Stuff1 stuff1;
Error error = something.getStuff1(someArgument, stuff1);
if (error != Okay) {
	return error;
}

Stuff2 stuff2;
error = stuff1.getStuff2(stuff2);
if (error != Okay) {
	return error;
}

Stuff3 stuff3;
error = findStuff3(stuff2);
if (error != Okay) {
	return error;
}

With better error handling it could look like this:

Stuff3 = findStuff3(something.getStuff1(someArgument).getStuff2());

Proper C code might have the same brevity using longjmp() for error handling, but it skips C++ destructors which renders it almost unusable with C++.

Thus, mixing of the two can increase the line count even tennfold compared to either of them. As a bonus, it probably won’t be possible to make the debugger interrupt on error.

Note: I am not bashing error codes here. This is improper usage of error codes. An error code should be handled immediately after the call and an appropriate action should be chosen, which is not done here (it obviously isn’t a use case for that). Handling every failed allocation or disk write is needed only by kernel code and similar stuff.

Larger codebases tend to be OOP and C++ properly supports it

The main advantage of OOP is that it’s usually easy to read and understand. If a codebase grows beyond several hundred lines, it becomes hard to keep all the functions in head and the probability of more people working on it increases. It’s really practical to write the name of a variable, add a dot and let the IDE show the list of member functions it has and the exact types they need to be used correctly. It’s so useful if helper classes ensure calls that have to be done in pairs are really done in pairs. It’s convenient to avoid duplicating code for every shared functionality two classes have.

I have seen C codebases develop into some sort of pseudo-OOP, with inheritance done with composition and polymorphism with various structs containing the same nested struct with pointers to functions that cast it to the parent type and work with it, or child type identifying variables and massive switches for all variants. I have seen this done in C++ projects because… reasons. Depending on the situation, it might not even be faster, and I strongly doubt the authors benchmarked it.

Although OOP is not a silver bullet that solves all design problems, it should be used when there is no reason not to use it, rather then when there is a reason to use it. C++ has more OOP features even than Java has (virtual inheritance!), so if it can be used, then why to avoid it?

Actually, using C probably won’t make it faster

There’s plenty of C++ features that can’t make the code slower (but can make it faster by allowing the compiler to make more assumptions when optimising):

  • Classes
  • Inheritance
  • Templates (though they can hide large volumes of code, like macros)
  • Unique pointers (unless the compiler can’t determine whether it will be null)

These are typically directly translatable to identical C code. For example, a non-virtual member function is just syntax sugar over a normal function that takes this as its first argument, a template is just a smarter macro, non-virtual inheritance is just fancy composition… It’s convenience for no cost.

Some other C++ features might be slower or faster than doing the same without them, depending on the situation:

  • Polymorphism (overhead, but can avoid branch misprediction)
  • Exceptions (can avoid branching after return, but the cost outweighs the gain if the probability of failure is above cca 0.5%)
  • Coroutines

Of course, C++ code will be slower with lots of invisible std::string copying, wild std::map usage (std::unordered_map is less bad, but still not great) and std::function all over the code. These are still super convenient and thus very handy in code that isn’t performance critical.

In the world of embedded systems, polymorphism can also also useful for optimising the code for size, because it allows the same code to use different objects. Exceptions can be done without dynamic allocation and RTTI, but they still have other problems.

C++ isn’t really a superset of C

I first learned C and then I learned C++. This meant that I always had a relatively good idea what was going on under the hood, but it didn’t come without any hiccups. One of the problems was that C++ does not support arrays of runtime-determined size on stack.

int blabla(int size) {
	int data[size] = {};

GCC allows this to allow compiling C libraries with C++ code (rather than compiling them as C and then linking them with C++ code). But was never in C++ and even C discourages using this. It complicates the layout of the stack and isn’t really needed because dynamic allocations that are well confined and small enough will use alloca() (or _alloca() on Windows) instead of malloc() to perform the dynamic allocation on stack, avoiding almost all of its cost. However, I didn’t know of this and used it occasionally until I needed to port some of my code to MSVC and it didn’t work.

Wikipedia has an entire page about incompatibilities between C and C++, from C syntax that is not allowed in C++ to constructs with different behaviour.

So, how about telling C#/C++ instead?

Because of ridiculous similarities between names of languages, people with no knowledge of either can merge even less related languages. I have seen people mentioning Java/JavaScript or C++/C#.

However, is it so wrong?

  • C++ is a statically and strongly typed general purpose programming language supporting object oriented programming and generic programming, with type inference and coroutines
  • C# is a statically and strongly typed general purpose programming language supporting object oriented programming and generic programming with type inference and coroutines

A C# programmer doing C++ would probably run into some problems due to syntactic and functional differences, but would probably adapt to different syntax smart pointers in the end would produce cleaner object-oriented code than a C programmer whose code somehow works and doesn’t have a clear motivation to use proper C++.

We need C#/C++ programmers more than C/C++ programmers.

Leave a Reply

Your email address will not be published. Required fields are marked *