Saturday, January 7, 2012

The C++ Object Model

I've got a bunch of books lying around that I've never had a chance to thoroughly read through; every once in a while, I pick one up and read a chapter. Sometimes, like with "General Lattice Theory" by George Gr├Ątzer, I try to sprint through a chapter or two just to get a general introductory feel for a subject that I have not yet had much exposure to - after all, one cannot really hope to fully come to terms with an abstract subject matter without dedicating a reasonable amount of time to reading it and tirelessly working through some exercises. I feel that it's good to survey areas of science and engineering with which one has little familiarity, if only to receive a first taste for the material, to have something to contemplate and internalize for a few days, and most importantly, to humble oneself.

Two years ago, I acquired "Inside the C++ Object Model" by Stanley Lippman. The book is a little dated (it was published in 1996), but it remains fairly relevant nonetheless. Back in 2009, the book was more than a little beyond my ability to take in, but I have since then have had much experience with C++ and many of its bizarre features: Multiple inheritance, virtual base classes, pointers-to-member-functions, etc. Thoroughly exhausted with abstract mathematics (at least for a few days), I found myself wanting something a bit more concrete and decided to try my hand reading the book once more.

I have so far read only through the first chapter out of seven, but it has certainly been a pleasure. The book elucidates the truth behind several misconceptions about C++ and is ruthlessly, but necessarily, precise in its description of C++'s memory model for object-oriented programming. Lippman even takes the time to discuss the pros and cons behind alternative memory models which could achieve similar goals, and rationalizes why C++'s model is the way that it is.

C++ is something of an anomaly as a programming language. It's a systems language (yes it is, Steven!). It's an object-oriented language. It's a high-level language. It's a low-level language. It's procedural. It's generic. Hell, thanks to template-metaprogramming, it can even be functional! And yet every one of these adjectives comes with a caveat.

   It's a systems language...but it usually winds up calling lower level C code in practice.
   It's object-oriented...but it doesn't have to be.
   It's high-level...but you could write an operating system in it.
   It's low-level...but you could write a video game engine in it.
   It's procedural...but you can have full classes and inheritance.
   It's generic...yeah, sort of. But really you're just instantiating concrete functions and classes.
   It's functional...I guess, but only as a sublanguage of a procedural language.

But most of all...

   It's fast and memory-efficient...but only if you understand how your code is compiled.

C++'s reputation for being bulky, slow, bloated, and convoluted derives primarily from an abundance of users not willing to take the time to understand how it works "under the covers". Virtual tables. Template instantiations. What is done at compile time. What is done at runtime. The biggest favor that an aspiring C++ developer can do for himself is to learn how C++ handles classes and inheritance.

For those looking to obtain a deeper understanding of how C++ handles object-oriented programming, I have provided below a list of good questions to begin with. Don't be afraid to look them up online, but be sure to, where applicable, test out what you learn in your favorite IDE (read: Visual Studio) as well!

How is an object laid out in memory?
How is ordinary inheritance implemented?
What is the effect of qualifying inheritance with an access modifier, and why is this desirable?
What is a virtual function and how are they handled in memory?
What is a pure virtual function?
How are abstract classes implemented in C++?
Why are virtual destructors essential?
When is multiple inheritance appropriate (if ever)?
How is the diamond problem solved in C++?
What consequences does all of this yield on the layout of code in memory?
Why should one avoid virtual functions and inheritance when possible?
How can one obtain a function pointer to a member function of a class?
Why is a function pointer to a member function different from an ordinary function pointer?
How are static data members and class functions stored differently than member data/functions?
How is the const property of constant data typically guaranteed by the compiler?
How does the struct keyword differ in C++ from its usage in C?
What is the difference between using class and struct to declare an abstract data type?

That ought to be a good start. Write some toy classes, play around, and Google your heart out...Let me know if you've got any questions!