Cutting-up The Perfect Sausage

In a previous post I wrote about the benefits of frequent local commits and publishing them as a perfect sausage; that is, a single, perfect commit. Today, I want to slightly adjust my advice.

It seems obvious that perfect sausages make it easier to track and review changes to a code base, but that’s not necessarily always the case. Let me explain.

Imagine a typical scenario. Developer Joe needs to implement a small feature. While coding away, he suddenly realizes that the existing code is not really prepared for the new functionality: some rework on the class hierarchy is necessary. Being this clean coder that he is, he also renames quite a few really bad identifier names, factors out some common functionality and improves upon some missing comments here and there. Eventually, he implements the feature, codes a little, tests a little, refactors even more and finally releases everything (squashed together) as a perfect sausage.

Imagine further that busy developer Sue has been given the task to peer-review what Joe implemented. Naturally, she’s got a lot of other work already on her plate and when she takes a look at Joe’s changeset, she is shocked: to implement this little feature, Joe changed 800+ lines of code, much more than the 50 lines she’d expected. Sue has to learn the new class hierarchy and skim many editorial changes. When she finally reaches the meat of Joe’s change, her attention will have decreased to a level where she might overlook that Joe introduced that nasty off-by-one error (both in the implementation and the unit tests, of course).

I believe it would have been much easier for Sue if Joe had released his initial refactorings (possibly distributed over many small local commits) as an atomic “refactoring sausage” first, followed by a smaller atomic “functionality sausage”. Then, she could have focused on either change at a time, or even skip the refactoring altogether and only review the functional modifications. Other developers could learn about the new class hierarchy by just looking at the refactoring commit — without the burden of all the changes that were necessary for the new feature.

Another benefit of doing larger refactorings in a discrete step is that the overall risk is reduced because a) any regressions will likely be found by the continuous integration machinery and b) developers working in the same area are less likely to get merge conflicts.

Differentiating between refactoring and functional commits requires discipline, though. If you find out during feature development, that some refactoring is necessary first, you just stash your current changes, do the refactoring, deploy the refactored code, pop your original changes back and continue working on the feature. But in my view, the overall benefit outweighs these little inconveniences.

Some practical advice: First, don’t get religious about it! If only trivial refactorings are required (like changing a name/comment here and there), it’s probably not worth the effort. Instead, do it as-you-go, or, for extra credit, as a “refactoring sausage” after you have released the “functional sausage”. Second, it might be useful to establish a convention that makes it clear from the commit message what category a particular commit belongs to. Here is an example that discriminates at a fine level of granularity:

Category Tag Meaning
Editorial EDI Editorial changes, executable stays binary identical
Refactoring REF Structural changes, but no change in behavior
Functional FUN Functional changes
Test TST Changes in unit tests
Tuning TUN Improved efficiency, no change in behavior

With these tags, a commit log could look like this:

The take-away is this: big commits that mix major refactorings and functionality are often detrimental to traceability and make reviews harder. As a general rule, don’t mix significant functional and non-functional changes — a sausage is easier to digest if it’s consumed in smaller pieces.

Using the C Preprocessor to Perform Compile-time Computations

Sometimes, it is desirable to perform computations already at compile-time — either for efficiency or to avoid redundancy. Alas, what a compiler can compute at compile-time is rather limited — mostly just a combination of unary and binary operators. What if you need to do something more complex?

For the sake of illustration, I chose computing the (floored) log2 of an integer as an example, but the techniques presented below can be easily adapted to other use cases:

In C++ — provided you’re brave enough — you can always resort to template metaprogramming:

But since template metaprogramming was not deliberately built into C++ but rather discovered by accident, template metaprogramming code is far from pleasant to look at. If you are lucky and your compiler supports C++11 (or rather C++11’s constexpr feature), you have a better option:

It’s still recursive, but at least this solution is using real functions and not structs to achive its goal — much easier on the eyes!

But what if you code at the other end of the spectrum? What if you are limited to plain C?

Many years ago, I discovered the following technique that has proven useful in quite a few situations; it is used like this:

The “argument” is passed via a symbolic constant (STATIC_LOG2_ARG), the computation is done by “calling the function” (by including static_log2.h) and the “return value” is stored in another symbolic constant (STATIC_LOG2_VALUE).

Here’s an excerpt of what’s contained in the static_log2.h header file:

In the C++ examples, iteration is done using recursion, but here everything is unrolled/inlined.

For another case where this approach is employed, checkout this post. You probably don’t need this technique very often, but it’s good to have it in your bag of macro tricks.

Why I don’t Use Apple Products

Every now and then, people ask me why I steer clear of iThings. Explaining my view over and over again is tedious, so I’ve decided to present my reasons in writing. When this topic comes up next time I just need to point people to this post, which will save a lot of time on both sides.

Open-source veteran Richard Stallman, founder of GNU and the Free Software Foundation (FSF) has already contributed his share to the topic. I do agree with most of his arguments but I still want to tell the story from my point of view.

First of all, let me emphasize that it has not always been like that. As a matter of fact, I used to be an Apple fan myself, even though this was in the early 1980s. My first home computer was an Apple II clone and I badly wanted to own a real Apple II, I can tell you. Alas, it was too expensive for an eighth grader’s measly monthly allowance.
»Read More

Bug Hunting Adventures #11: Bad Weather

“It is only in sorrow bad weather masters us;
in joy we face the storm and defy it”
— Amelia Barr

Imagine a weather monitoring system where environmental data is collected by various sensors and distributed via messages to other components for further processing.

In the code below, produce_env_measurement() represents a task that constantly produces messages containing various environmental measurements while another task (represented by process_env_measurement()) consumes them. To ensure data integrity, a Fletcher-16 checksum is appended to every message, but the application nevertheless doesn’t work reliably. Where’s the bug?


Distrusting Experts

“You don’t know what you don’t know until you know it.”
— anon

I recently wrote a post about Scott Meyers, one of my programming heroes. The one I’m writing about today ranks even higher: Michael Abrash. For decades, I have admired his ability to craft awesome, non-obvious, super-tight code. But he’s not just an excellent performance programmer; he is also an execellent writer and storyteller.

It’s no coincidence that he’s also the author of one of my favorite programming books: “ZEN of Code Optimization” [1]. Released in 1994, it predates the Internet age (lo and behold, it comes with a 3.5″ floppy disk), but most of the content is timeless advice for every developer who cares about performance. (We all should. Michael’s rule #1: “From the user’s perspective, performance is fundamental”.)

In chapter 10, “Patient Coding, Faster Code”, he explains that just doing micro optimizations (or recoding in assembly language) usually doesn’t cut it; all you end up with is “fast slow code”. For best results, you need key insights; that is, view the problem from a different angle and use an offbeat approach to solve it.

He presents the venerable “greatest common divisor of two given numbers” problem as a case in point. After toying with a brute-force approach (incrementing a variable and checking if it divides both numbers) he employs Euclid’s key insight, which improves the run-time by orders of magnitude:

gcd(int1, int2) ≡ gcd(int2, int1 % int2)

Or in plain English: the GCD of a larger positive integer ‘int1’ and a smaller positive integer ‘int2’ is the same as the GCD of ‘int2’ and the remainder of ‘int1’ divided by ‘int2’.

This is his original code:

Needless to say, being that assembly wizard that he is, he goes on and reimplements this routine in assembly language to make it even faster. But as for C, this is probably as fast as it can get. Or is it?

I’ve used this code blindly and almost literally for many, many years. After all, it’s from Michael Abrash, and to me, his code has always been beyond any doubt. Definitely not the right attitude, but let me explain.

I’m a big fan of code katas, little exercises that you do (and redo time and again, sometimes in different programming languages) to flex your programming muscle. I don’t know why, but when I did the GCD code kata this time, I stopped and scratched my head. What the heck is this int swapping business about? Why is this code necessary, as the comment says?

Just for the fun of it, I took it out and reran the unit tests — they all passed.

Actually, it’s not that hard to see why it’s not necessary. Let’s assume ‘int1’ is 366 and ‘int2’ is 60. Since ‘int1’ is greater than ‘int2’, no swapping is necessary. Within the ‘for’ loop, the variables take on the following values:

iteration temp int1 int2 return
1 366 % 60 60 6
2 60 % 6 60 6 6

Now if ‘int1’ is 60 and ‘int2’ is 366, the code as it is would swap their values, but let’s pretend there was no swapping code:

iteration temp int1 int2 return
1 60 % 366 366 60
2 366 % 60 60 6
3 60 % 6 60 6 6

Aha! There is one more iteration and this additional iteration (actually the very first iteration) swaps ‘int1’ and ‘int2’. The ‘for’ loop by itself takes care of rearranging the arguments, there is no need for the extra swap test at the beginning of the ‘gcd’ routine. I was puzzled. Is it really possible that I have discovered redundant code in Michael Abrash’s sacred code? I felt quite uneasy because I wanted my hero to be infallible.

A couple of days ago, I sent him an email about my findings. His reply was short and to the point:

“Believe me, I never claimed to be immune to cycle-wasting! And anyway, There Ain’t No Such Thing As The Fastest Code :)”

There you have it. It’s that simple. I found unnecessary code in a 20+ years old routine. That’s what I call delayed gratification!

The basic problem was that I blindly trusted what I saw, probably because it came from an expert. However, nobody is perfect, every once in a while even the best make blunders. Have a flexible mind, assume nothing and question everything — this is the right attitude for every software developer and proclaimed in many of Michael’s writings [2].

[1] “ZEN of Code Optimization” is available online as part I of “Michael Abrash’s Graphics Programming Black Book, Special Edition”.

[2] Some years ago, I gave a presentation titled “The Art of Writing Efficient Software”. It includes a lot of Michael Abrash’s advice plus more. Check it out here. I truly believe that efficiency is still (and forever will be) of utmost importance for end users.

A Micro Logger for Constrained Systems

Drei Notizzettel vor HolzhintergrundLogging is one of the oldest and most widely used software debugging tools: for decades, developers have put printf-like statements in their code to find out what their code is really doing, in situ. Even today, logging is still crucial for finding bugs in software, especially sporadic problems in concurrent and event-driven systems.

Fortunately, most programming environments come with powerful logging facilities (like Android’s Log) — but what if you are working on a highly-constrained system that simply can’t afford them?

Let’s have a look at an example that uses printf-like logging:

Traditional (printf-like) logging is expensive because:

  1. Log strings consume memory (ROM), which can easily increase the overall size of the executable by 10% to 30%.
  2. printf-based logging is extremely flexible, but the parsing of format strings can incur a significant run-time overhead.
  3. The amount of log data produced at run-time is quite large because all the log strings need to be transmitted (or stored locally, in case you keep logging data in a file), usually accompanied with timestamps and other bookkeeping information.

I recently had a crazy idea how to compress a log string of (almost) arbitrary size to a single byte: Instead of using real (read: expensive) C strings, why not define a constant whose symbol name serves as the “log string” and only log it’s address?

On the receiving side, you can reconstruct the original log strings from the map file produced by the linker.

Given an appropriate set of magic macros, the example above looks like this:

It may be hard to believe, but these three “log strings” really only consume three bytes of memory. To send out their addresses, four bytes are needed (assuming 32-bit addresses are used) and ‘sampledValue’ requires the transmission of another four bytes. Let’s compare these two approaches assuming that ‘sampledValue’ is 42 (two digits) and taking into account that the size of a C string is the number of characters plus one for a trailing ‘\0’:

footprint (bytes) transmission (bytes)
traditional logging
16 + 37 + 15
16 + 37 + 15
MLOG logging
1 + 1 + 1
4 + 4 + 4 + 4
savings %

That’s what I call an improvement!

I’ve implemented this idea for the GCC toolchain. The log server emits MLOG traces via UDP, which are received by a little tool that converts them back into human-readable log messages by exchanging MLOG symbol addresses with (beautified) MLOG symbol names from the linker map file. Extracting the MLOG symbols from the map file is the most GCC-dependent part. I believe it is a straightforward task to port MLOG to other platforms and toolchains.

Check it out at GitHub.

This is my 100th blog post (actually 101st, damn)! Thanks, faithful reader, for bearing with me through all these years.
— Yours Truly