C++:  More Bang, More Bucks, But No Blank Check

 

Copyright 2012 Walter William Karas

Copy freely, do not remove copyright notice, and take credit/blame for any changes.

Version 1.2

 

It’s common to hear software people call C++ a “high overhead” language.  But rarely do you hear a specific explanation of why.  It’s true that, if you call a member function in C++, it may not translate into a “call immediate” instruction.  Whereas in C, a function call (that doesn’t use a function pointer) could typically be expected to translate to a “call immediate” machine instruction.  So does that mean C++ has high overhead?  Consider a car.  You have to stop occasionally and buy fuel to put in your car.  You don’t have to do that with a bicycle.  But people (at least in the US) are not in the habit of calling their cars “high overhead”.  If A only does as much for you as B, but A takes more of your time, money or other resources than B, then A can clearly be called high overhead.  But if A also does more for you than B, then things are not so clear.

 

Back in the early 1980’s, the buzzword “object-oriented” seem to be most often associated with the Smalltalk programming language.  A major goal of OO (or my sect of the OO religion at least) is greater reusability of code.  Inheritance was one way code could be reused.  A new type B could be create by taking an existing type A and adding stuff to it, and existing code for A could automatically be used for B.  But more importantly, if two types X and Y had some common set of  operations (with the same names, parameters and “meanings”), then code that used only this common set of operations could be used with objects (also known as variables, also known as instances) of either type X or type Y.  But this flexibility came with a performance price – in Smalltalk, when a function is called, its address has to be looked up in a hash table.  C++ (following the example of Simula) did not want to pay this price, so it made a compromise.  It limited itself to the first sort of code reuse, using inheritance, to eliminate the need for looking up called functions in a hash table.  You can tell a goal is truly important when big sacrifices are made to reach it, so this shows you that minimizing overhead is an important goal in the design of C++.

 

(When you can write code that will work for any type sharing an implicit common interface, as in Smalltalk, the implicit common interface is often called a “duck type”.  The term comes from the expression “if it quacks like a duck, it is one”.  Because the effective definition of the implicit interface to the type that is required in order to work with the reused code is that the type works with the reused code.  C++ mostly recuperated the ability to code to an implicit interface when templates were added to the language.  But templates have issues and complexities that languages like Smalltalk don’t have.)

 

In C++, if (non-template) code is going to work on objects of two different types X and Y, then X and Y need to have a common “base” class Z.  (“Base” in quotes because Z could be either X or Y.)  But this is made much less limiting by the mechanism of virtual member functions.  The common code that works for both X and Y is interfacing formally with an object of type Z.  But when this common code invokes one of Z’s member functions/operations, this can result in code specific to the “real” type of the object, X or Y, being executed.

 

C++, unlike Smalltalk, allows for multiple base classes (multiple inheritance).  This has many advantages, because like many good things, inheritance is a good thing that’s often worth doing more than once.  But the crucial need for multiple inheritance is to allow a class to have more shared interfaces.  Multiple inheritance does add a lot of picayune complexity to C++.  That’s probably why Java chose a very limited form of multiple inheritance – one inherited class, but  many inherited “interfaces”.  A Java interface is a class that’s all hat and no cattle.  It has member function prototypes, but no function bodies and no member variables.  This does eliminate some language complexity (Java does not have virtual base classes).  But it also limits the use of inheritance as a mechanism for reusing internal class code, in addition to code that is not in the class but uses it.

 

Consider this small sample of C++ code:

 

class A

  {

  private:

    int m, n;

 

  public:

    void f();

    virtual void g(int i);

  };

 

class B : public A

  {

  private:

    int s;

 

  public:

    virtual void g(int i);

    virtual void h(void);

  };

 

void foo(A *bp)

  {

    B b;

 

    b.f();

    b.g(5);

    bp->g(5);

  }

 

The object code that would result from this code would have similarity to the object code that would result from this C code:

 

struct A

  {

    int m, n;

 

    const void *v_struct_p;

  };

 

void A_f(A *a);

void A_g(A *a, int i);

 

struct A_v_struct

  {

    void (*g)(int i);

  };

 

const struct A_v_struct A_v_st = { A_g };

 

struct B

  {

    struct A a;

 

    int s;

  };

 

void B_g(A *a, int i);

void B_h(void);

 

struct B_v_struct

  {

    struct A_v_struct as;

 

    void (*h)();

  };

 

const struct B_v_struct B_v_st = { { B_g }, B_h };

 

void foo(struct A *ap)

  {

    struct B b;

 

    b.a.v_struct = &B_v_st;

 

    A_f(&b.a);

    B_g(&b.a, 5);

    (((const struct A_v_struct *) (ap->v_struct_p))->g)(ap, 5);  // *

  }

 

The line of the code with the  // * comment illustrates the cost of a call to a virtual member function.  Notice there is no hash table to be found.  The cost is more than for a “normal” function call but it’s not huge.  And note that the cost is only incurred when the virtual function is called through a pointer (or reference).  It’s not incurred for the first call to the member function B::g().  Every base class (that is not also a derived class) with virtual member functions (and/or virtual base classes) has the overhead of a hidden pointer to the appropriate “V struct” (V for virtual).  The “V struct” gets bigger with more virtual functions, but there is (at most) one “V struct” instance needed per class.  The expression of C++ features in object code can be clever and tricky.  But it’s not that hard to get your head around the rough relative cost , in space and execution time, of using a feature in a particular case.  Using a C++ feature never gives the compiler a blank check to insert  unpredictable quantities of slow object code, as some seem to fear.

 

(Side note:  you will hear the rule of thumb “always make destructors virtual”.  I think there are even some compilers that will issue a warning if a destructor isn’t virtual.  This is because, if an object is allocated using new, then freed by delete with a pointer to a base class, the base class’s destructor will be called, not the destructor for the full derived class.  So if there isn’t already something virtual in the class, think whether you really need, or want for future safety, to make the destructor virtual and thus incur the overhead of a hidden pointer in every instance of the class.)

 

And then of course there are templates.  There are lots of good reasons to be terrified of templates, but using templates does not by itself cause any performance penalty.  None, zero, zip.  The C macro processor (in Standard C) conceptually runs after tokenization but before parsing.  Templates are like macros that run after parsing but before object code generation.  They are generally much more convenient to use than equivalent macros.  But often one encounters some esoteric and picayune rules when using templates, and even more so when writing them.  C++ compilers have very very painful error messages for mistakes involving templates.  A C++ compiler is doing well if its template-related diagnostics are merely very paintul.

 

In C, code is often made reusable by using void pointers.  But this leads to a lot of casting, and thus to errors that the compiler can’t detect, and that are hard to diagnose at runtime.  You can rewrite this code as a template, and all the casting will go away.  But, problem is, the old C version also resulted in common object code as well as common source code.  Each time you invoke a template, like a macro, it effectively takes the code in the template definition, does the appropriate token substitution, and feeds it all to the code generator.  Thus there is no sharing of object code.  Often, it turns out that the best compromise is to use a template or set of templates to create a thin wrapper for the original C code with the void pointers.  The wrapper will contain all the ugly, error-prone casting.  The big advantage being, you only have to get it right once.

 

But perhaps the most fear-provoking feature in C++  is exceptions.  I personally have never used them.  It is possible, however, to implement exceptions in a way that there is no performance penalty if an exception is not thrown.  The compiler can construct a hash table where the keys are all the possible return adresses in the executable.  When an exception is thrown, the destructors of the objects in each stack frame must be called, up to the stack frame where the exception is caught.  Each stack frame contains the return address within the calling function, which is looked up in the aforementioned table.  The result of the table lookup is a pointer to code that calls the destructors of the objects that exist in the calling function at the point of the return address.  So clearly, using exceptions could potentially bulk up the size of an executable a great deal, but  not necessarily the execution time.  I think it’s the case that GNU C++ uses this approach to exceptions.  I’ve heard that Microsoft C++ uses an imlementation that does have a significant preformance cost even if you never throw, but this information may be out of date.