« Posts under C/C++/Embedded

Code Kata 5: Reversing The Bits

“Boredom is just the reverse side of fascination: both depend on being outside rather than inside a situation, and one leads to the other.”
— Arthur Schopenhauer

Winter is definitely approaching. The days are getting shorter and temperatures are dropping. Isn’t this a wonderful time for doing code katas? Sitting in your easy chair, in front of a fireplace, serenely coding away in your favorite editor.

Today, we dip once more into the realm of low-level embedded programming. Your primary job is to write code that reverses the bits of an integer. For instance, reversing the 8-bit integer value 139 (10001011) yields a new value of 209 (11010001).

Sounds simple, doesn’t it? But it’s not just about coding a bit-reversing routine, that wouldn’t teach you enough lessons. You’ll also practice unit testing, code tuning and — if you like — assembly language programming, cross-compilation, and remote debugging along the way.

The goal is not to find the best/fastest algorithm possible. Instead, the emphasis is on exercising your programming muscle and learning new things. Don’t rush this kata, rather consider it a mini project. Do the research, be persistent. Try and fail and become better. Take your time, I did it over a period of three weeks. Now, without further ado, let’s get (gently) started.

1. Implement a bit reversing function in a dynamic programming language (eg. Perl, Ruby, Python).

a) Decide on a suitable interface, eg. in Python, it could look like this:

b) Implement a dummy version of ‘reversebits’ that simply returns 0.

c) Implement a couple of test cases, run them against the dummy version and watch them fail, eg.

d) Implement a straightforward solution (‘reversebits1’) that passes the unit tests.

e) Add more unit tests to gain better confidence in your solution.

f) Measure the execution time of ‘reversebits1’.

g) Be creative, find other solutions. Optimize for speed, optimize for shortest code (in terms of characters typed). Measure and compare the execution times.

h) What if ‘max_bits’ is smaller than the total number of bits already taken-up by ‘value’? For instance

In this case, only the first ‘max_bits’ of ‘value’ shall be reversed and the remaining upper bits shall be discarded:

Adapt your code accordingly. Probably, some of your solutions already behave correctly and need no modifications.

2. Port your code and unit tests to C/C++

a) Decide on a suitable interface. It’s OK if you use a fixed-bit unsigned integer data type for the value to be reversed, as long as it supports at least 32 bits, eg.

b) Measure execution times and compare them against the execution times from 1.

c) Play with various compiler optimization settings. How do they impact the timings?

d) Bonus: Implement a bit reversal function in x86 assembly language.

3. Port your code and tests from 1. and 2. to a Raspberry Pi

a) Don’t have a Pi? Find out how to emulate one on your PC.

b) Measure and compare execution times using various optimization settings.

c) Bonus: Implement a bit reversal function in ARM assembly language.

d) Bonus: Want to learn even more? Build your own cross toolchain (hint: ‘crosstool-ng’) and build the Pi code on a regular PC. Debug the code remotely over a TCP connection (hint: ‘gdbserver’, ‘gdb-multiarch’).

I certainly learned a lot by doing this kata. Far more, than I can possible tell here without risking to bore you and/or spoil all the fun. Nevertheless, I uploaded my work to GitHub. Have a peek in case you get stuck, need some inspiration or are just curious.

‘static’ With Array Parameters? Oh My!

There’s a rather obscure feature in C99 which allows you to put the ‘static’ keyword between the brackets when declaring array-like parameters to functions, as in

What this means, in a nutshell, is that the caller of this function is expected to pass an array of at least 42 uint8_t’s. Calling ‘transform’ like this would be perfectly OK:

whereas this would constitute a breach of contract:

Likewise, passing a NULL pointer is not allowed, as it — by definition — doesn’t point to anything, not even a single element:

THE QUEST FOR MEANING

While the semantics of this feature are pretty clear, the purpose isn’t. The C99 language standard doesn’t give a hint, either.

Even when ‘static’ is used, the called function ‘transform’ doesn’t know the exact size of the array. It only knows that it comprises at least 42 elements, so it can only reliably utilize 42 elements. There’s nothing ‘transform’ can do with any (potential) extra elements, because there might be none. So what’s different to this?

This question haunted me for days (if not weeks) and I just couldn’t find a satisfying answer. But let’s put it aside for a moment, as I still have a lot to critisize regarding the technicalities.

THE SYNTAX

First off, why on earth ‘static’? In C, the keyword ‘static’ has already overloaded meanings: a) module scope (an object or function declared ‘static’ is private to a translation unit) and b) lifetime (a static object defined within a function retains its value across invocations). Thanks to C99, ‘static’ now can also mean “at least”. Yuck!

Had I been forced to specify this feature, I would have used the ‘>=’ operator instead of ‘static’:

THE DANGERS

One of the biggest and most infamous blunders in engineering is when you don’t have a single point of truth; that is, the same information is stored in more than one place. Over time, inconsistencies arise in the copies which can lead to all kinds of havoc. All software engineers are reminded again and again to honor the DRY principle.

According to the grammar, ‘static’ can appear in function declarations as well as function definitions. Nobody prevents you from declaring ‘transform’ like this in your header file:

and like so in a C file:

The C99 standard says nothing about such inconsistencies, whether it constitutes an error, which one takes precedence, whether the compiler has to emit a warning message.

Another problem is that C99 also introduced variable-length arrays (VLAs) and hence an array’s size is not required to be a compile-time constant, like it used to be in C89 and like it still is in C++. Therefore, the C99 grammar blesses such abominations:

What’s the contract now? At least 42 elements (the value of ‘my_global’ at program start-up) or the value of ‘my_global’ when ‘transform’ is actually called:

Again, if you’d forced me to specify this feature, I’d required the size specifier to be a compile-time constant:

AN EPIPHANY

While ranting about this feature, I suddenly had an epiphany. Let’s revisit the question I asked above: What’s really the difference between

and

since my doubts regarding the usefulness of this feature are 100% founded in this question. Previously, I believed that in the second case, there are at least 42 elements, while in the first case there are exactly 42 elements. Not a big difference, because what benefit would a compiler be able to derive from this subtle difference?

But I obviously fell into a trap that I warned my readers a long time ago: believing that array parameters are really arrays, which is mistaken, of course, because arrays cannot be passed directly to functions. From the C language’s point of view, the former interface of ‘transform’ is equivalent to

which means that ‘buffer’ is a pointer to a single ‘uint8_t’, nothing more, nothing less. The compiler can’t make any assumptions beyond that, it’s obliged to ignore the fact that you indicated that there are 42 elements. Specifically, all these declarations are equivalent and effectively declare a function taking a pointer to a ‘uint8_t’:

By contrast, the version of ‘transform’ that uses ‘static’ tells the compiler that a pointer to the first of at least 42 elements is passed. Armed with this knowledge, the compiler can now perform certain optimizations, the most likely one being accessing memory in units much larger than a ‘uint8_t’, to save memory access times. Ah!

ADVICE

Even though we now understand the motivation behind this feature, I still wouldn’t recommend using it. Just like it’s the case with many other so-called “optimization” features (e. g. ‘inline‘ and ‘restrict‘) you, as a developer, have to keep a promise but get nothing dependable back from the compiler. As usual, my advice is this: if you think something needs to be optimized, prove it first; if evidence shows that optimization is necessary, optimize in a dependable way — don’t rely on something that might change from one compiler version to another, especially if it comes with an hideous interface that invites inconsistencies.

ONE MORE THING

Thinking about this odd feature once again, what it all boils down to is that ‘static’ with function parameters actually tries to deliver what regular array-like parameters can’t, namely inform the compiler that the pointer that is passed is really an array, not just a pointer to a single element. But there’s already a feature in the C language (available since C89) that achieves that, namely pointers to arrays:

Contrary to ‘static’ with array parameters, pointers to arrays is a clean, proven, and type-safe concept. Since the compiler knows how many elements the array comprises, it can apply the same optimizations that the ‘static’ with array parameters feature promises, that is, accessing multiple elements at once when traversing the array. There’s really no reason to put ‘static’ between your array brackets, trust me.

Alternative Representations in C and C++

When you go back to C/C++ after a week of coding in Python, funny things can happen. I already described such a case in this post.

Recently, I had another surprising incident. I was scanning my code changes in order to find a bug that had pestered me for quite a while. I rubbed my eyes in disbelief when I came across this ‘if’ statement:

How could this even compile? Instead of ‘&&’ I inadvertently typed ‘and’, the Python logical ‘and’ operator. At first, I thought that this must either be a g++ extension or some clever guy* had defined

somewhere in a header file that I pulled. Being biased towards the second theory, I did an experiment and undefined ‘and’ just before my ‘if’ statement:

but it didn’t compile. Luckily, the compiler message that I got from g++ unraveled the mystery:

Whoa! ‘and’ is an operator (a token) in C++ that hitherto I’d never heard of. In fact, there’s a whole bunch of what the C++ standard calles “alternative tokens” (or sometimes “alternative representations”):

Alternative Token Equivalent To
and &&
and_eq &=
bitand &
bitor |
compl ~
not !
not_eq !=
or ||
or_eq |=
xor ^
xor_eq ^=

It didn’t take long and the feeling of surprise was replaced with a feeling of disappointment. Some of the bit-wise operators have a ‘bit’ prefix while others don’t. ‘and’ and ‘bitand’, for instance, but ‘and_eq’ should actually be ‘bitand_eq’, because ‘&=’ is a bitwise operator — sigh! The same goes for ‘or’, ‘bitor’ and ‘or_eq’. By the same token (pun intended!), ‘xor’ and ‘xor_eq’ should really be ‘bitxor’ and ‘bitxor_eq’, if you’d ask me.

Not so confusing, actually quite beneficial, are the operators ‘and’, ‘or’, and ‘not’ by themselves. Consider:

While using these newly discovered keywords is certainly not a huge thing, I do believe that alternative operators are more readable than their cryptic, traditional counterparts. Habitually used, they even avoid classic blunders like forgetting to put a second ‘&’ or ‘|’ in logic expressions. I think I’ll give them a try — but this time as a deliberate act, instead of a Freudian slip like last time.

Note, however, that in C, these alternative representations don’t exist as keywords. Instead, they are defined in iso646.h as macros — just like the “clever guy” would have done it :-)

________________________________

*) Once, such a “clever guy” made fun of our team’s coding standard by defining this macro (the letter O) in a global header file:
#define O (1 – 1)
and using ‘O’ in his code whenever he needed a literal ‘0’ (zero):
for (int i = O; i < max; ++i) {

}
When I asked him why on earth he did such an insane thing, he replied, “Well, our coding standard says that we MUST not use octal literals, but 0 is an octal literal since it starts with a ‘0’. Smartass!

The Happy Path to Modern C++

“Show the readers everything, tell them nothing.”
― Ernest Hemingway

Considering all the C++ job offers that I’ve received over the last couple of months, it looks like modern C++ (any C++ release beyond C++03) is picking up momentum, especially in embedded systems. Why embedded systems? I think it’s because C++ has always put a lot of emphasis on efficiency and thus the features offered by C++11 and C++14 not only simplify a programmer’s life but also stand a good chance of yielding more performant code. Now that C++11 and C++14 are supported by almost all contemporary compilers, there’s no reason to hold back anymore.

Alas, reading standard documents or even books on the subject is no fun. While the idea behind new features is usually easy to grasp, there are often intricacies that dim a feature’s shine. Clearly, studying books and learning about subtleties is necessary at some point, but isn’t there an accelerated way to get started (or motivated)?

Now there is, at least that’s my goal with this new project that I’ve created: The Happy Path to Modern C++. The idea is to tackle modern C++ by example not by text; that is, by code snippets that demonstrate the beauty of language features, not their nitty-gritty details.

Every code snippet can be compiled and run in a debugger to explore its behavior. Thus familiarizing (and experimenting) with modern C++ language features should be painless (maybe even fun?).

Let me show you what I mean. The following snippet (which is part of cpp11/01_core_features/core_features.cpp) introduces range-based for-loops:

Just go to cpp11/01_core_features and execute

to compile the code and

to run it.

With this set-up, it’s easy to toy around: just uncomment the line in ‘test_range_based_for_loops’ that constitutes a syntax error (the line that attempts to modify the read-only variable), rerun make and see what happens. Contrast this hands-on approach with the (fine) documentation on cppreference.com, which is exact and comprehensive (and ten times more readable than the official language standardese), but nevertheless intimidating for beginners.

Be aware that you are looking at work in progress. My aim of this first release was to cover most of C++11 and C++14 core language features as well features pertaining to classes. Obviously, a lot is still missing, like

– Const expressions
– Variadic templates
– Lambdas
– Concurrency
– Move semantics
– Regexps

Contributions to this project are more than welcome, just send me your pull requests. Please keep in mind that the overall goal is to have crisp code examples that capture the essence of features, not the ugly corner cases. When in doubt, leave it out, keep it simple, keep it beautiful.

Pointers in C: Part VI, Faking ‘restrict’

Pointers in C

“”But then acting is all about faking. We’re all very good at faking things that we have no competence with.”
— John Cleese

As so often in life, people love to do the exact opposite of what you advise. In my previous post I claimed that the ‘restrict’ feature isn’t that all-important and guess what? I received questions on whether it was possible to achieve the (promised) effects of ‘restrict’ even if a particular C dialect doesn’t support it (in case you are using C++ or some C version predating C99, for example).

YES WE CAN!

We just need to creatively combine the wisdom from the previous “Pointers in C” installments. In particular:

  1. Pointer access involving multiple pointers can be optimized if the pointers point to incompatible types.
  2. structs with different tag names constitute different, incompatible types, regardless of whether the struct members are identical or not.
  3. A pointer can be converted to a pointer to a different type as long as the resulting pointer is suitably aligned for the target type.
  4. A pointer to struct points to its initial member.

Do you ‘C’ it?

SPOILER ALERT!

Again, I use the ‘silly’ example to demonstrate the technique. The idea is to wrap the original types (‘int’) in structs with different tag names thus yielding different (incompatible) types:

Believe it or not — it works:

Why? To the compiler, the pointer types are different (item 2) and hence it doesn’t assume that they point to the same memory (item 1). Because of item 3 and 4 in the list above, the calling code can still pass plain ‘int’ arrays/pointers, just like in the original code. However, nasty casts are required when calling ‘silly4’ from C++ code:

Contrary to C++, C is only weakly typed so such casts are not necessary officially, unless you want to get rid of compiler warnings, which you always want to, don’t you?

You could argue that these strange ‘struct int’ pointers that are used in the signature of ‘silly4’ kind of document the fact that the passed pointers are “restricted” and mustn’t overlap. However, I prefer to keep the original ‘int*’ interface and hide this struct business from callers altogether:

The generated code is identical to the one derived from ‘silly4’.

I still stick with my original recommendation that the ‘restrict’ feature is not that useful as it’s a legally binding contract for the caller while the compiler is free to ignore this plea for performance. If you want to use ‘restrict’ regardless of my advice, even if your compiler doesn’t support it, you now know how to emulate it in a portable fashion.

Pointers in C, Part V: The ‘restrict’ Qualifier

Pointers in C

“Le vrai est trop simple, il faut y arriver toujours par le compliqué.”
(“The truth is too simple: one must always get there by a complicated route.”)
― George Sand, Letter to Armand Barbès, 12 May 1867”

Exactly one year ago, I started this series on pointers, but what I really wanted to blog about originally was a rather arcane and rarely used keyword that first appeared in the C99 language standard: the ‘restrict’ qualifier. But after trying to digest the formal definition in chapter 6.7.3.1 I decided that taking a little detour would make my and my reader’s life much easier.

Let me set the stage for ‘restrict’ by summarizing what I wrote in episode 3 about the “strict aliasing rule”:

1. The compiler might optimize code involving multiple pointers, provided the pointers are not aliased; that is, they don’t point to the same object or memory.

2. The compiler assumes that pointers to incompatible types never alias.

3. The compiler assumes that pointers to compatible types (same types, apart from CV-qualification and signedness) potentially alias.

Therefore, a function with this signature is eligible for compiler optimization:

whereas this one is not:

This is unfortunate, because most likely, the arrays passed to the second version of ‘transform’ are in completely different, non-overlapping memory regions. But the compiler doesn’t know and hence stubbornly adheres to the strict aliasing rule.

The ‘restrict’ qualifier, which — contrary to the ‘const’ and ‘volatile’ qualifiers — can only be applied to pointers, is a promise given by the programmer to the compiler that pointers don’t alias even though they point to objects of the same type. Therefore, this version of ‘transform’ can be optimized by the compiler:

Let’s put this to the test with the ‘silly’ example from episode 3:

Before knowing about the strict aliasing rule, we were surprised to see that the memory access to ‘x’ in the return statement was not replaced with a simple ‘return 0’. After having learned about the strict alias rule, it’s clear: since ‘x and ‘y’ point to the same type, the compiler must assume that they may point to the same memory location and hence it loads the value pointed to by ‘x’ from memory afresh:

Now, if we tell the compiler that ‘x’ and ‘y’ never point to the same memory location, optimization is possible:

Nice, isn’t it?

If you use the ‘restrict’ qualifier on a pointer, you promise that — at least for the lifetime of the restricted pointer — the object pointed to is only accessed through this pointer. Break that promise and you get undefined behavior. (In the ‘silly3’ example, the lifetime of the pointers ‘x’ and ‘y’ end once the call to ‘silly3’ returns.)

In the C99 language standard, many functions from the standard library have been revised and now make use of the ‘restrict’ keyword. Take ‘memcpy’, for instance:

As everybody knows, ‘memcpy’ can only copy non-overlapping blocks of memory and this fact is nicely highlighted by the use of the ‘restrict’ keyword: during the call to ‘memcpy’ the memory regions src[0] to src[n] as well as dst[0] to dst[n] are exclusively owned and may not be accessed by other pointers. Since ‘memmove’ can copy overlapping blocks of memory (with a little speed penalty, of course), ‘memmove’ consequently doesn’t declare restricted pointers:

Please be aware that ‘restrict’ is not supported by the C++ language standard and it’s unclear whether it ever will be. If you mix C99 and C++ code, you might have to strip the ‘restrict’ keyword from C99 headers to avoid compilation errors:

In general, I’m not a big fan of optimization features that the compiler is free to ignore. If utmost performance is important, you want dependable performance. Most likely, your routine is not on the performance critical path, anyway. If you think it is, carefully profile your code and after you proved that it is, you’d better code that part in assembly language. Without such evidence, sprinkling your code with ‘restrict’ is little short of premature optimization. (I complained here about the unnecessarily overused ‘inline’ keyword for the same reason.)

What I do like about the ‘restrict’ keyword, though, is that by unraveling it, we’ve made a beautiful journey through important everyday programming topics like “pointers vs. arrays”, “type qualifiers”, “pointer conversion rules”, and the “strict aliasing rule”. The journey was the destination.