“inline” Is Yet Another Word For “Premature Optimization”

The fact that some C++ developers use the ‘inline’ keyword so much has always been a conundrum to me — I’ve never liked it. Why? First and foremost because it clutters up header files and exposes implementation details to the users of a class.

Most likely, inline aficionados believe that these disadvantages are more than compensated for by the fact that inlining gives them faster code, but this is not necessarily the case: according to the C++ standard (ISO/IEC 14882:2014), the compiler is allowed to silently ignore the ‘inline’ keyword:

“An implementation is not required to perform this inline substitution at the point of call”

Believing is not knowing, as the old saying goes. This is another reason why I don’t like the ‘inline’ keyword: it doesn’t guarantee you anything.

But let’s attack the ‘inline’ keyword from another angle. Even if we knew that declaring a method inline made it faster, shouldn’t we have to ask ourselves first if there is actually a performance case? Without profiling, without a proven need, any optimization is premature optimization, which — according to Donald Knuth — is the root of all evil. The fact that an optimization gives a local improvement doesn’t justify it sufficiently — it’s the overall improvement of the major use cases that matters. Otherwise we would implement all of our functions with inline assembly, wouldn’t we?

In the old days of C programming, developers used the ‘register’ keyword as a hint to tell the compiler what variables should be kept in registers for performance reasons. Nowadays, every C compiler is much better at allocating variables to registers than any human being. Consequently, the ‘register’ keyword has been deprecated in C11.

By the same token, today’s C++ compilers do a much better job of figuring out which functions should be inlined than we are able to do. Therefore, instead of giving hints to the compiler we should rather rely on automated, transparent inlinining that doesn’t clutter up class interfaces.

As an example, at optimization level -O2, the g++ compiler automatically inlines all functions that are small or called only once. Specifying -finline-functions (enabled by default at -O3) uses a heuristic to determine if its worthwhile to inline a function or not — without the need for any developer intervention.

To me, it’s about time that ‘inline’ goes the way of the ‘register’ keyword.

Counting Down Correctly in C

The countdown for the New Year is near to its end, so I want to take this opportunity to discuss how to implement loops that count down from an upper boundary to a lower boundary. I know it sounds mundane, but I will present a technique that is — at least in my experience — not widely known, not even amongst seasoned C coders (with the notable exception of Chuck Norris, of course).

But first, please take a moment to look at the following routine that employs a countdown for-loop and decide if it works correctly or not:

This code appears to be fine, but it has a flaw that shows only when the ‘lower’ index is 0: ‘size_t’ is an unsigned type, and when ‘i’ becomes 0, subtracting 1 yields a very large positive number (due to integer wrap-around) which in turn causes an out-of-bounds access to the given ‘array’. So what do we need to change such that the code works as expected, even for a lower bound of 0?

Most developer’s knee-jerk reaction is to change the type of the indices to a signed type, like ‘int’, but this is unfortunate, as it limits (at least halves) the available value range. As often in life, the proper solution is not to fight the enemy but to turn him into a friend: Let’s use unsigned wrap-around to our advantage:

Instead of using the greater-than operator, we now use the not-equals operator and instead of comparing against ‘lower’ we now compare against one less than ‘lower’. If ‘lower’ happens to be 0, ‘lower’ – 1 (again, due to integer wrap-around) will yield the maximum possible value representable by type ‘size_t’. The same will happen to the loop counter ‘i’ when it has a value of 0 and is decremented once more. As a consequence, the expression ‘i != lower – 1’ becomes false and the loop terminates — as desired.

A Happy New Year to all of my faithful readers!

Optimizing for Simplicity

In Jeet Kune-Do, one does not accumulate but eliminate. It is not daily increase but daily decrease. The height of cultivation always runs to simplicity. […] The height of cultivation is really nothing special. It is merely simplicity; the ability to express the utmost with the minimum. It is the halfway cultivation that leads to ornamentation.
— Bruce Lee

The old maxim “Keep It Simple, Stupid” is widely known, but unfortunately one of the most frequently violated best practices in software development. Why is simplicity so important?

Most developers assume that the reason why they shall keep their code simple is to get features out quicker, they should rather work on things that the customer urgently needs, things that provide immediate business value instead of letting him wait for the gold-plated version.

While there is nothing wrong with this interpretation, it doesn’t go far enough. Developers must strive for simplicity to achieve a high degree of maintainability: software development is an investment and software must be built such that not only today’s requirements but also future requirements can be implemented in an economic way. Don’t just think about a particular software product — think about how software evolves into a family of related products and versions over time. The most important reason for simplicity is to ensure that software can evolve with as little cost (aka. pain) as possible. Contrast this to the olden days of software development where developers added lots of flexibility up-front — flexibility that in most cases was never needed (YAGNI). These days, we rather keep it simple and adapt quickly, when the need arises.

Correctness comes first, no doubt, but the next priority is simplicity. Code must be simple such that it is easy to read and understand. Only code that is easy to comprehend can be maintained with little effort. Code that is complex (maybe because it is littered with unjustified optimizations and hard-to-grasp language features) makes changes hard and risky. Complex code is not an asset; it’s a liability that contributes to the overall technical debt.

One of today’s biggest challenges in software development is managing complexity. While there is little we can do about the essential (intrinsic) complexity of a software product, we constantly have to fight non-essential complexity; that is, complexity that arises as a side-effect, from the way we construct a software product.

So while developing code, constantly reflect and ask yourself what your code looks like, especially from another developer’s point of view. Is it easy to understand, does it even look mundane? Great! Resist the temptation to write clever code. Instead, take pride in being able to write the clearest, simplest code.

Don’t get me wrong. It’s OK to try out more complex designs and advanced language features — honing one’s skills is imperative. But always view such activities just as learning activities, like “playgrounding” and send your changes straight to /dev/null once you’re done. If you don’t have the heart to zap them, keep them on a private branch. But unless there is a compelling, justified reason you should take the high road and show true software development mastery by not integrating them with the code base.

Dangerously Confusing Interfaces III

confused.jpgJust like the other “Dangerously Confusing Interfaces” posts, this one was also inspired by a real-world blunder that I made.

Here’s the background: usually, routines that accept data via a pointer from the caller either execute synchronously or copy the data into their own internal data structures for later processing. Take the venerable ‘fwrite’ from the C standard library as an example:

‘fwrite’ blocks until the data has been written, either to disk or to an internal buffer. In either case, once ‘fwrite’ returns, it doesn’t care about the original data anymore. That’s why it’s safe (and common practice) to pass a pointer to a local buffer on the stack:

All standard library and POSIX APIs behave like ‘fwrite’, which is both, safe and convenient. However, with embedded systems, the story is different: in some cases, memory is so tight that additional buffers/internal storage can’t be afforded. Such functions don’t copy the provided data but only store a pointer to your data and expect the memory pointed-to by this pointer to be still valid long after the function call has returned. Here is an example from the AUTOSAR standard, which is used by almost all embedded automotive products:

‘NvM_WriteBlock’ is used to store data to a given non-volatile memory block. However, what this function does is only enqueue a request for the given block ID together with the data pointer (not a copy of your data). This is done for the sake of efficiency, because there can be multiple write requests in parallel. The queue is later processed in another task, long after any local buffer would have been removed from the stack.

Passing a pointer to a buffer with automatic storage is an easy mistake to make, especially since such “non-copy” interfaces are so rarely encountered. How can “write-like” interfaces that don’t make a copy of the provided data be made safer, such that misuse is less likely? Obviously, just adding documentation is not enough — nobody reads documentation, especially in the heat of the moment.

In my view, the root of the problem is that such functions accept just about any pointer. What if the caller was forced to explicitly cast the pointer to another type? A type with a cunningly chosen typename, one that reminded the caller of the potential pitfall? Here is my approach:

Whenever a pointer is passed to this function, developers have to write something like this to make the compiler happy:

Typing ‘uncopied_memory’ should shake up even the most focused developers and remind them to double-check what they are passing into this function.

Of course, within ‘SomeWritelikeFunction’, the provided pointer needs to be cast back into something more useful, like a ‘const uint8_t*’. Further, note that the ‘dummy’ member within ‘uncopied_memory’ must not be used; it only exists to make sure that the cast to ‘uncopied_memory*’ in the calling function is safe: a pointer to a struct is aligned such that it is compatible with the struct’s most-aligned member, which is ‘void*’ and ‘void*’ is by definition compatible with any other pointer type.

Cutting-up The Perfect Sausage

In a previous post I wrote about the benefits of frequent local commits and publishing them as a perfect sausage; that is, a single, perfect commit. Today, I want to slightly adjust my advice.

It seems obvious that perfect sausages make it easier to track and review changes to a code base, but that’s not necessarily always the case. Let me explain.

Imagine a typical scenario. Developer Joe needs to implement a small feature. While coding away, he suddenly realizes that the existing code is not really prepared for the new functionality: some rework on the class hierarchy is necessary. Being this clean coder that he is, he also renames quite a few really bad identifier names, factors out some common functionality and improves upon some missing comments here and there. Eventually, he implements the feature, codes a little, tests a little, refactors even more and finally releases everything (squashed together) as a perfect sausage.

Imagine further that busy developer Sue has been given the task to peer-review what Joe implemented. Naturally, she’s got a lot of other work already on her plate and when she takes a look at Joe’s changeset, she is shocked: to implement this little feature, Joe changed 800+ lines of code, much more than the 50 lines she’d expected. Sue has to learn the new class hierarchy and skim many editorial changes. When she finally reaches the meat of Joe’s change, her attention will have decreased to a level where she might overlook that Joe introduced that nasty off-by-one error (both in the implementation and the unit tests, of course).

I believe it would have been much easier for Sue if Joe had released his initial refactorings (possibly distributed over many small local commits) as an atomic “refactoring sausage” first, followed by a smaller atomic “functionality sausage”. Then, she could have focused on either change at a time, or even skip the refactoring altogether and only review the functional modifications. Other developers could learn about the new class hierarchy by just looking at the refactoring commit — without the burden of all the changes that were necessary for the new feature.

Another benefit of doing larger refactorings in a discrete step is that the overall risk is reduced because a) any regressions will likely be found by the continuous integration machinery and b) developers working in the same area are less likely to get merge conflicts.

Differentiating between refactoring and functional commits requires discipline, though. If you find out during feature development, that some refactoring is necessary first, you just stash your current changes, do the refactoring, deploy the refactored code, pop your original changes back and continue working on the feature. But in my view, the overall benefit outweighs these little inconveniences.

Some practical advice: First, don’t get religious about it! If only trivial refactorings are required (like changing a name/comment here and there), it’s probably not worth the effort. Instead, do it as-you-go, or, for extra credit, as a “refactoring sausage” after you have released the “functional sausage”. Second, it might be useful to establish a convention that makes it clear from the commit message what category a particular commit belongs to. Here is an example that discriminates at a fine level of granularity:

Category Tag Meaning
Editorial EDI Editorial changes, executable stays binary identical
Refactoring REF Structural changes, but no change in behavior
Functional FUN Functional changes
Test TST Changes in unit tests
Tuning TUN Improved efficiency, no change in behavior

With these tags, a commit log could look like this:

The take-away is this: big commits that mix major refactorings and functionality are often detrimental to traceability and make reviews harder. As a general rule, don’t mix significant functional and non-functional changes — a sausage is easier to digest if it’s consumed in smaller pieces.

Using the C Preprocessor to Perform Compile-time Computations

Sometimes, it is desirable to perform computations already at compile-time — either for efficiency or to avoid redundancy. Alas, what a compiler can compute at compile-time is rather limited — mostly just a combination of unary and binary operators. What if you need to do something more complex?

For the sake of illustration, I chose computing the (floored) log2 of an integer as an example, but the techniques presented below can be easily adapted to other use cases:

In C++ — provided you’re brave enough — you can always resort to template metaprogramming:

But since template metaprogramming was not deliberately built into C++ but rather discovered by accident, template metaprogramming code is far from pleasant to look at. If you are lucky and your compiler supports C++11 (or rather C++11’s constexpr feature), you have a better option:

It’s still recursive, but at least this solution is using real functions and not structs to achive its goal — much easier on the eyes!

But what if you code at the other end of the spectrum? What if you are limited to plain C?

Many years ago, I discovered the following technique that has proven useful in quite a few situations; it is used like this:

The “argument” is passed via a symbolic constant (STATIC_LOG2_ARG), the computation is done by “calling the function” (by including static_log2.h) and the “return value” is stored in another symbolic constant (STATIC_LOG2_VALUE).

Here’s an excerpt of what’s contained in the static_log2.h header file:

In the C++ examples, iteration is done using recursion, but here everything is unrolled/inlined.

For another case where this approach is employed, checkout this post. You probably don’t need this technique very often, but it’s good to have it in your bag of macro tricks.