« Posts under Code

Circular Adventures VII: A Ring Buffer Implementation

“The Limited Circle Is Pure”
— Franz Kafka

In systems programming, ring buffers are ubiquitous. Like regular queues, they’re first-in first-out data structures but contrary to classic queues they’re fixed in size and don’t grow dynamically. This is especially important in real-time contexts where time and space determinism is paramount.

In today’s circular episode I want to show you my go-to ring buffer implementation. I like it, because it’s short and decently efficient. I deliberately used only basic C++ syntax, so it should compile fine with even compilers from the previous millennium (i. e. C++98):

You can specify the data type of ring buffer elements via template parameter T and the total number of elements to track via template parameter N, respectively. Here’s a little usage scenario:

A common problem that you face when implementing a ring buffer is discriminating between empty and full states, as in both cases the head and tail index point to the same location. I solved it by allocating one extra element, which might waste a little bit of space but makes the code easier on the eye and probably faster compared to the alternative approach which requires you have to maintain an extra boolean flag.

When the buffer becomes full during an add operation (head == tail), the tail index is immediately advanced by one, thus dropping the oldest element. As this all happens within the add method, from a ring_buffer user’s point of view the head index is only ever equal to the tail index when the ring buffer is empty.

Contemporary compilers will replace the potentially costly modulo operation in the advance helper method with an AND operation for buffer sizes that are base-2 numbers. However, keep in mind that the advance method uses the internal buffer size (BUFSIZE), which is one greater than the requested ring buffer size (N), so this is most likely an efficient ring buffer:

while this isn’t:

More circular adventures…

The Happy Path to Modern C++ (C++11 done!)

I’ve finally found time and motivation to resume work on the “Happy Path to Modern C++” project. I’m sure it now contains enough code snippets to give you a good overview of C++11. As a matter of fact, I can’t think of an easier way to learn C++11, provided you’re already well-versed in C++ and C++98, in particular.

Here’s the project on GitHub.

“Dèyè mòn, gen mòn”, as the Haitians say: “Behind mountains there are more mountains”. The next goal is adding plenty of C++14 snippets. Again, contributions are highly appreciated!

Dangerously Confusing Interfaces V: The Erroneous ERRORLEVEL

“Design interfaces that are easy to use correctly and hard to use incorrectly.”
— Scott Meyers

Dangerously confusing interfaces can lurk anywhere, even in the venerable (yuck!) DOS batch scripting language. Some time ago, I burnt my fingers when I made a tiny tweak to an existing batch file, deploy.bat, which was part of a larger build script:

Because we had seen the ‘copy’ command fail in the past, I tried to improve things a little by adding an ‘if’ statement to ensure that we would get a clear error message in such events:

Alas, it didn’t work. There still was no error message produced in case the copy command failed. Worse yet, the outer build script happily continued to run. Puzzled, I opened a DOS box and did some experiments:

Hmm. Everything worked as expected. Why didn’t it work in deploy.bat? Next, I changed deploy.bat to output the exit code:

And tried again:

What? The copy command failed and yet the exit code was zero? How can this be? After some head scratching, I vaguely remembered that there was another (arcane) way of checking the exit code, namely ERRORLEVEL (without the percentage signs), so I tried it out:

I never really liked this style of checking the exit code, because ‘ERRORLEVEL n’ actually doesn’t test whether the last exit code was n; it rather checks if the last exit code was at least n. Thus, this statement

doesn’t check if the exit code is zero (ie. no error occurred). What it really does is check if the exit code is greater to or equal to zero, which is more or less always true, no matter the value of the exit code. That’s pretty confusing, if you’d ask me.

Anyway, for some reason, it seemed to work nicely in deploy.bat:

I hardly couldn’t believe my eyes. The copy command obviously failed, %ERRORLEVEL% was obviously zero, still the if statement detected a non-zero exit code. What was going on? I delved deeply into the documentation of the DOS batch language. After some searching I found this paragraph:

%ERRORLEVEL% will expand into a string representation of the current value of ERRORLEVEL, provided that there is not already an environment variable with the name ERRORLEVEL, in which case you will get its value instead.

Whoa! There are two kinds of ERRORLEVEL, who knew? One (the one whose value you can query with %ERRORLEVEL%) will be set to the value of the former, provided there is no variable named ERRORLEVEL already. Now I had a suspision what was going on. I opened the parent batch file and came across the following:

In an attempt to clear the error level, some unlucky developer introduced a variable named ERRORLEVEL which shadowed the value %ERRORLEVEL% from this point on. This can be easily verified:

Once the problem was understood, it was easy to fix: clear the error level in an “accepted way” (yuck, again!) instead of wrongly tying it to zero:

Even though the interface to DOS exit codes is dangerously confusing (and disgusting as well), it facilitates a nice practical joke: next time a colleague leaves the room without locking the screen, open Windows control panel, create a new global environment variable called ERRORLEVEL and set it to 0.

Grokking Integer Overflow

Please visit xkcd.comEven experienced C/C++ programmers often mix-up the terms integer overflow and wrap-around. Likewise, they are confused about the ramifications. This post attempts to clear things up a little.

OVERFLOW AND UNDERFLOW

In C (and C++), integer types (like the signed and unsigned versions of char, short, and int) have a fixed bit-size. Due to this fact, integer types can only support certain value ranges. For unsigned int, this range is 0 to UINT_MAX, for (signed) int INT_MIN to INT_MAX. On a typical platform, these constants have the following values:

The actual values depend on many factors, for instance, the native word size of a platform and whether 2’s complement represantation is used for negative values (which is almost universally the case), consult your compiler’s limits.h header file for details.

Overflow happens when an expression’s value is larger than the largest value supported by a type; conversely, underflow occurs if an expression yields a value that it is smaller than the smallest value representable by a type. For instance:

It’s common among programmers to use the term overflow for both, overrun and underrun of a type’s value range. And so shall I, for the rest of this discussion.

WRAP-AROUND

Now that we know what overflow is, we can tackle the question what happens on overflow. One possibility is what is conventionally referred to as wrap-around. Wrap-around denotes that an integer type behaves like a circle; that is, it has no beginning and no end. If you add one to the largest value, you arrive at the smallest; if you subtract one from the smallest value, you get the largest.

Wrap-around is, however, only one way to handle integer overflow. Other possibilities exist, ranging from saturation (the overflowing value is set to the largest/smallest value and stays there), to raising an exception, to doing whatever an implementation fancies.

AND THE C LANGUAGE SAYS…

If you want to find out how C (and C++) handles integer overflow, you have to take a look at chapter 6.7.5 “Types”, the following sentence in particular:

“A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type”

Which means in plain English:

0. Apparently, there is overflow (as defined above) and C-overflow (as used by the standard). C-overflow is more like an error condition that occurs if overflow behavior is not defined for a type.

1. Unsigned integer types wrap-around on overflow, “reduced module the number that is one greater than the largest value” is just a fancy name for it. Thus, unsigned integer overflow is well defined and not called overflow by the language standard.

2. Nothing is explicitly said about signed integer types. There are, however, various hints in the standard that signed integer overflow is undefined, for instance:

3.4.3 Undefined Behavior: An example of undefined behavior is the behavior on integer overflow.
J.2 Undefined behavior: The value of the result of an integer arithmetic or conversion function cannot be represented.

To sum it up: on overflow, unsigned integers wrap-around whereas signed integers “overflow” into the realm of undefined behavior (contrary to Java and C#, BTW, where signed integers are guaranteed to wrap around).

SIGNED OVERFLOW

You might have believed (and observed) that in C signed integers also wrap around. For instance, these asserts will hold on many platforms:

Both asserts hold when I compiled this code on my machine with gcc 7.4; the following only if optimizations are disabled (-O0):

From -O2 on, gcc enables the option -fstrict-overflow, which means that it assumes that signed integer expressions cannot overflow. Thus, the expression i + 42 < i is considered false, regardless of the value i. You can control signed integer overflow in gcc, check out the options -fstrict-overflow, -fwrapv, and -ftrapv. For maximum portability, however, you should always stay clear of signed integer overflow and never assume wrap-around.

SIGNED OVERFLOW THAT ISN’T

What about this code? Does this summon up undefined behavior, too? Doesn’t the resulting sum overflow the value range of the short type?

The short (pun intended!) answer is: it depends!

It depends because before adding x and y, a conforming C compiler promotes both operands to int. Thus, to the compiler, the code looks like this:

Adding two integers that hold a value of SHRT_MAX doesn’t overflow, unless — and that’s why it depends — you are hacking away on an ancient 16-bit platform where sizeof(short) == sizeof(int).

But even on a typical 32- or 64-bit platform, what about the assignment of the large integer result to the short variable sum. This surely overflows, doesn’t it? Doesn’t this yield undefined behavior? The answer in this case is a clear ‘no’. It’s rather ‘implementation specified’. Let’s see.

SIGNED INTEGER TYPE CONVERSIONS

In the previous example, a larger signed type is converted into a smaller signed type. This is what the C99 standard has to say about it:

6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, if the new type is unsigned […]

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

What an implementation will choose to do, in practice, is wrap-around.

Why does a compiler behaves like this? You can find an explanation by Linus Torvalds himself:

“Bit-for-bit copy of a 2’s complement value. Anything else would be basically impossible for an optimizing compiler to do unless it actively _tried_ to screw the user over.”

To sum it up:

1. Unsigned integers wrap around on overflow. 100 percent guaranteed.
2. Signed integer overflow means undefined behavior. Don’t rely on wrap-around.
3. Type conversions to smaller signed types will very likely wrap-around. Let’s call this “Torvalds-defined behavior”.

The Thinking That Cost Me The Fish

“The fishermen know that the sea is dangerous and the storm terrible, but they have never found these dangers sufficient reason for remaining ashore.”
— Vincent Van Gogh

Last December, I had a lot of fun working on “Linux Magazin’s” programming contest. I was both, surprised and excited when the editors contacted me in January and told me that I got second place and would even receive a price! But when I found out about the details, my initial joy quickly subsided—partly because of them, but much more because of me. Let met explain.

THE CHALLENGE

Since “Linux Magazin” is only available in German, I will briefly summarize what the competition was all about.

Imagine there’s a fisherman (or fisherwoman, or fisherperson, whatever you prefer) sitting in a boat on a pond. The pond is rectangular in shape and divided into M x N squares. The initial position of the boat is the square at column 0, row 0. Thus, using x and y as column and row coordinates, the boat’s initial position corresponds to x = 0 and y = 0. The fisherman can steer the boat through the pond by incrementing (or decrementing) either x or y by one with every move he makes.

Each of the squares in the pond may contain a fish. The aim is to write a Python script that picks up all the fish using a minimum number of moves.

Here’s an example of a 7 x 5 pond with the boat ‘B’ at the initial position and three fish, denoted by asterisks:

One straightforward solution is to scan the whole pond, column by column, row by row:

Obviously, this exhaustive scan wouldn’t win you a prize as it requires too many moves (M multiplied by N, to be precise).

GO FOR GOLD, ER, FISH

After scratching their heads for a while, most developers will realize that this is a variant of the traveling salesman (or saleswoman, or salesperson, whatever you prefer) problem. The difference is that the fisherman, unlike the traveling salesman, doesn’t have to return to his origin after having visited all other nodes [1].

After some more head-scratching, some developers will even remember that this is an NP-complete problem and finding the optimal solution (by trying out all possible routes) requires time proportional to O(n!), where n is the number of fish in the pond.

Due to the exponential nature of this problem, a better (even if not optimal) idea is needed. A reasonable choice to start with is the “next nearest neighbor” strategy, which always visits the node (fish) that is closest to the current position next, and so on. This is what I tried out first, it required only 30 lines of code and allowed me to implement some initial unit tests [2].

Most developers have learned the hard way that it’s essential to find out about the typical use-cases before optimizing a system. Otherwise, you will end up with what is referred-to as “fast slow code”. Unfortunately, there are no boundaries in the pond problem: the pond could consist of millions of squares and there could be millions of fish. How can you differentiate yourself from all the other contestants who will most likely also employ a nearest neighbor search?

IMPROVEMENTS

I wasn’t satisfied with my nearest neighbor approach. All kinds of problems and corner cases came to my mind. Consider the following pond set-up:

According to the nearest neighbor strategy, the fisherman would first visit x = 1, y = 0 and then process the whole first row by repeatedly moving one square to the right. After reaching x = 6, y = 0, he would move back all the way to x = 0, y = 2. What a waste!

Even though I knew about this problem’s exponential nature, I though about doing an exhaustive O(n!) search at least for smaller ponds with little fish. This approach would yield the optimal solution (which would start with picking up the fish at x = 0, y = 2 first, in the example above). When the problem size crossed a certain threshold, I would fall back to the nearest neighbor strategy.

Alas, my measurements showed discouraging results. It took only a couple of seconds to solve ponds with up to ten fish. However, it took hours to find the optimal solution if there were more than 15 fish in the pond [3]. Since I assumed that the ponds against which the submissions are tested were large and contained lots of fish, I ditched this idea immediately.

Then followed a period of days of frustration, with many dead-end experiments. Two days before the submission date, I decided to try out one final idea. I knew from my previous testing that visiting all permutations was not feasible. But what I could do instead of always aiming towards the closest square containing a fish with my initial move, I could try out all cells containing fish as the initial target square and use the nearest neighbor strategy only from this square onward. Of all the experiments, I would pick the route that required the least number of moves.

This was easy to implement and increased the overall run-time only by time proportional to n, the total number of fish. Even though this approach wasn’t able to always find the optimal route, it solved my degenerate pond example nicely, even for large N, M, and n, so I submitted it.

AND THE WINNER IS… NOT ME!

When the editors contacted me about a month later, I couldn’t wait to find out why I didn’t win, what the actual test data was and so on. What I saw, however, was quite frustrating: neither of the 5 ponds they used on the submitted scripts were large or complex. In fact, all of them were laid-out in such a way that a simple nearest neighbor search would yield optimal solutions, so the improvement that I just explained to you had no effect, whatsoever. When I fed my degenerate pond to the winner’s script, it required 14 moves, while mine only needed 10.

But I lost anyway. I lost, because one pond was set up such that at some point, there where multiple, equally near fish to choose from. While my naive implementation simply picked the first one, the winning algorithm explored all choices and selected the one that resulted in the shortest overall path, which is not only smart, but also easy to implement.

LESSONS LEARNED

There’s no point in sugar-coating it: I was quite disappointed. Why were the ponds so small and simple? I could have used the sluggish O(n!) search for almost all of the ponds and won easily. But I also beat myself up for not even thinking about the possibility of multiple nearest neighbors, which would have naturally led to the improvement done by the winner.

A couple of days later, I finally got over it. After all, I came in second, had a lot of fun, and something to post about. But the biggest benefit is the valuable lesson that I (re-)learned: when working on optimization problems, assume nothing!

I assumed that the ponds where big, that’s why I didn’t include the O(n!) search, even though I had already implemented it. I assumed that the ponds where obscenely complex, which led me to implement the improvement mentioned above. I was so focused on tricky ponds that the problem of simple ponds with multiple nearest neighbors didn’t even occur to me—that’s target fixation at it’s best (or worst, if you prefer).

Once again, the journey was the real reward. What I got for a prize, you wonder? A 500+ pages “Java for Beginners” tome, in German. ‘Nuff said!

________________________________

[1] Computer scientists would call this a Hamiltonian path with a fixed start point, but no fixed end point.

[2] Why unit tests for a programming contest? Because a fast solution that is incorrect is worth nothing! Some of my first tests revealed that a fish at the initial boat position (x = 0, y = 0) wasn’t picked up. Another test showed that I forgot to update the pond after picking up the fish, which obviously makes a difference when you come across a cells containing a fish twice. Unit tests are always necessary and always good.

[3] Python + O(n!) == no-no

Pointers in C: Part VII, Being Relaxed About The Strict Aliasing Rule

“I am free, no matter what rules surround me. If I find them tolerable, I tolerate them; if I find them too obnoxious, I break them. I am free because I know that I alone am morally responsible for everything I do.”
― Robert A. Heinlein

The largely unknown “Strict Aliasing Rule” (SAR) has the potential to send tears to the eyes of even the most seasoned C/C++ developers. Why? Because of it, a lot of the code they have written over the years belongs to the realm of “undefined behavior”.

Despite its name, the term “undefined behavior” itself is well-defined by the C language standard: it’s “behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements. Which means anything can happen: your program could randomly crash or even send suggestive emails to your boss.

THE PROBLEM

Let’s start with the code snippet that I used in my original post on SAR:

Here, data that has been received into a buffer (‘data’) is converted into a high-level data structure (‘measurements’). From the compiler’s point of view, what ‘data’ refers to is just a single ‘uint8_t’ but we access it through a pointer to type ‘struct measurements_t’. What we’ve got here is a clear violation of SAR, which entails undefined behavior.

SAFE ALTERNATIVES

“But, Ralf”, you might respond, “this can’t be true. I write code like this every day and it works flawlessly, even in safety-critical systems like medical devices!”

This doesn’t surprise me in the least. “Undefined behavior” can — get this — also mean “works flawlessly”. But there are no guarantees, whatsoever. It might work on one platform, with a particular compiler or compiler version, but might fail on another platform, or with a different compiler version. Hence, to err on the truly safe side (which you should, especially if you work on safety-critical systems), you should use truly safe alternatives.

One obvious and time-proven approach is to do such so-called type punning through unions. It works by storing data via a member of one type and reading it via another member of a different type:

The receiving function would store byte-wise into the ‘receive_buffer.data’ array, while high-level functions would use the ‘receive_buffer.measurements’ member. This will work reliably in any version of C, but it might fail in C++.

Bulletproof type-punning, one that works in both, C and C++, uses ‘memcpy’. ‘memcpy’!? ‘memcpy’, that’s right:

Believe it or not, there’s a high probability that your compiler will optimize-out the call to ‘memcpy’. I’ve observed this, among others, with ‘gcc’ and ‘clang’, but I’ve also seen compilers always call ‘memcpy’, even for the smallest amounts of data copied, regardless of the optimization level used (Texas Instruments ARM C/C++ compiler 19.6, for instance). Nevertheless, this is my go-to type-punning technique these days, unless performance is paramount. (You first have to prove that your code really impacts overall performance by profiling. Otherwise, your optimizations are like buying Dwayne Johnson an expensive hair brush — it doesn’t really harm, but it’s not of much use, either.)

BUT I REEELLY, REEELLY MUST USE CASTS

Sometimes, you have to use SAR-breaking casts, if only to maintain social peace in your team. So how likely is it that your compiler will do something obscene?

VERY unlikely, at least in this example. Let me explain.

First of all, compiler vendors know that most developers either haven’t heard about SAR or at least don’t give a foo about it. Therefore, they usually don’t aggressively optimize such instances. This is particularly true for compilers that are part of toolchains used in deeply (bare-metal) embedded systems. However, ‘gcc’ as well as ‘clang’, which are used in all kinds of systems, take advantage of SAR from optimization level 2 on. (You can explicitly disable SAR-related optimizations regardless of the optimization level by passing the ‘-fno-strict-aliasing’ option.)

Second, what ‘convert’ is doing is pretty much well-behaved. Sure, it aliases the ‘data’ and ‘measurements’ pointers, but it never accesses them concurrently. Once the ‘measurements’ pointer has been created, the ‘data’ pointer is not used anymore. If the caller (or the whole call-chain) are equally well-behaved, I don’t see a problem (don’t trust me!).

Third, there’s no aliased read/write access. Even if ‘data’ and ‘measurements’ were used concurrently, it wouldn’t be a problem, as long as both are only used for reading data (don’t trust me on this one, either!). By contrast, this I consider harmful:

To the compiler ‘data’ and ‘measurements’ are two totally unrelated pointers to unrelated memory areas. The original value of ‘data[0]’ might be cached in a register and not refetched from memory, hence the ‘assert’ might fail. In general, this is what will most likely happen when SAR is violated in contexts where it does matter: instead of suggestive emails being sent to your boss, you are much more likely got get stale values (which of course could lead to crashes later on).

NO PUN INTENDED

Let’s get real about SAR. Here are some relaxed, pragmatic rules on how to deal with the Strict Aliasing Rule:

0. Fully understand SAR
1. Try hard to adhere to SAR
2. Type-pun using ‘memcpy’
3. If you can’t, disable SAR-related compiler optimizations
4. If you can’t, avoid concurrent, aliased read/write access

But don’t assume that just because you didn’t get a ticket for speeding in the past, you will never ever get a ticket for speeding. What you’re doing is against the law. If you get busted someday, don’t whine and don’t complain I didn’t warn you. Rather own your failures and move on.

Breakin’ rocks in the hot sun
I fought the law and the law won
I fought the law and the law won
I USED SOME CASTS FOR TYPE PUN
I fought the law and the law won
I fought the law and the law won

(with apologies to Sonny Curtis)