Alternative Representations in C and C++

When you go back to C/C++ after a week of coding in Python, funny things can happen. I already described such a case in this post.

Recently, I had another surprising incident. I was scanning my code changes in order to find a bug that had pestered me for quite a while. I rubbed my eyes in disbelief when I came across this ‘if’ statement:

How could this even compile? Instead of ‘&&’ I inadvertently typed ‘and’, the Python logical ‘and’ operator. At first, I thought that this must either be a g++ extension or some clever guy* had defined

somewhere in a header file that I pulled. Being biased towards the second theory, I did an experiment and undefined ‘and’ just before my ‘if’ statement:

but it didn’t compile. Luckily, the compiler message that I got from g++ unraveled the mystery:

Whoa! ‘and’ is an operator (a token) in C++ that hitherto I’d never heard of. In fact, there’s a whole bunch of what the C++ standard calles “alternative tokens” (or sometimes “alternative representations”):

Alternative Token Equivalent To
and &&
and_eq &=
bitand &
bitor |
compl ~
not !
not_eq !=
or ||
or_eq |=
xor ^
xor_eq ^=

It didn’t take long and the feeling of surprise was replaced with a feeling of disappointment. Some of the bit-wise operators have a ‘bit’ prefix while others don’t. ‘and’ and ‘bitand’, for instance, but ‘and_eq’ should actually be ‘bitand_eq’, because ‘&=’ is a bitwise operator — sigh! The same goes for ‘or’, ‘bitor’ and ‘or_eq’. By the same token (pun intended!), ‘xor’ and ‘xor_eq’ should really be ‘bitxor’ and ‘bitxor_eq’, if you’d ask me.

Not so confusing, actually quite beneficial, are the operators ‘and’, ‘or’, and ‘not’ by themselves. Consider:

While using these newly discovered keywords is certainly not a huge thing, I do believe that alternative operators are more readable than their cryptic, traditional counterparts. Habitually used, they even avoid classic blunders like forgetting to put a second ‘&’ or ‘|’ in logic expressions. I think I’ll give them a try — but this time as a deliberate act, instead of a Freudian slip like last time.

Note, however, that in C, these alternative representations don’t exist as keywords. Instead, they are defined in iso646.h as macros — just like the “clever guy” would have done it :-)

________________________________

*) Once, such a “clever guy” made fun of our team’s coding standard by defining this macro (the letter O) in a global header file:
#define O (1 – 1)
and using ‘O’ in his code whenever he needed a literal ‘0’ (zero):
for (int i = O; i < max; ++i) {

}
When I asked him why on earth he did such an insane thing, he replied, “Well, our coding standard says that we MUST not use octal literals, but 0 is an octal literal since it starts with a ‘0’. Smartass!

The Happy Path to Modern C++

“Show the readers everything, tell them nothing.”
― Ernest Hemingway

Considering all the C++ job offers that I’ve received over the last couple of months, it looks like modern C++ (any C++ release beyond C++03) is picking up momentum, especially in embedded systems. Why embedded systems? I think it’s because C++ has always put a lot of emphasis on efficiency and thus the features offered by C++11 and C++14 not only simplify a programmer’s life but also stand a good chance of yielding more performant code. Now that C++11 and C++14 are supported by almost all contemporary compilers, there’s no reason to hold back anymore.

Alas, reading standard documents or even books on the subject is no fun. While the idea behind new features is usually easy to grasp, there are often intricacies that dim a feature’s shine. Clearly, studying books and learning about subtleties is necessary at some point, but isn’t there an accelerated way to get started (or motivated)?

Now there is, at least that’s my goal with this new project that I’ve created: The Happy Path to Modern C++. The idea is to tackle modern C++ by example not by text; that is, by code snippets that demonstrate the beauty of language features, not their nitty-gritty details.

Every code snippet can be compiled and run in a debugger to explore its behavior. Thus familiarizing (and experimenting) with modern C++ language features should be painless (maybe even fun?).

Let me show you what I mean. The following snippet (which is part of cpp11/01_core_features/core_features.cpp) introduces range-based for-loops:

Just go to cpp11/01_core_features and execute

to compile the code and

to run it.

With this set-up, it’s easy to toy around: just uncomment the line in ‘test_range_based_for_loops’ that constitutes a syntax error (the line that attempts to modify the read-only variable), rerun make and see what happens. Contrast this hands-on approach with the (fine) documentation on cppreference.com, which is exact and comprehensive (and ten times more readable than the official language standardese), but nevertheless intimidating for beginners.

Be aware that you are looking at work in progress. My aim of this first release was to cover most of C++11 and C++14 core language features as well features pertaining to classes. Obviously, a lot is still missing, like

– Const expressions
– Variadic templates
– Lambdas
– Concurrency
– Move semantics
– Regexps

Contributions to this project are more than welcome, just send me your pull requests. Please keep in mind that the overall goal is to have crisp code examples that capture the essence of features, not the ugly corner cases. When in doubt, leave it out, keep it simple, keep it beautiful.

How To Hire Great Developers, Part I

“When there is a will, there is a way.”
— Unknown

Hiring great software developers is certainly not an easy task. To find out if a candidate is a good fit, companies usually focus on assessing an applicant’s knowledge, social skills, and intelligence.

THE STATE OF THE ART

Knowledge means that a person has a profound command of the domain, technology, and tools required for his/her job. If, for instance, you need a C++ developer for the development of an ECU (electronic control unit) for a car, you probably want somebody who is not only proficient in C++, but also has expert knowledge of embedded systems development, Autosar (an automotive middleware), CAN/FlexRay/MOST (some automotive communication buses), and probably a lot more technologies and tools. Determining a candidate’s level of knowledge is straightforward: you just need to ask enough technical questions.

Since today’s projects are usually done by teams, not individuals, social skills are very much sought after by every company. In general — and contrary to what is portrayed in Hollywood movies — software developers aren’t those pizza-eating and Red-Bull-drinking loners that reverently stare at their screens all day (and night). While such stereotypical coders do exist, they are the rare exception. These days, most professional software developers spend a significant share of their time communicating with peers, managers, and sometimes even customers. Developers need to successfully explain, present, and negotiate. Being able to do this in an open and respectful manner is not just highly desirable — it’s an essential survival skill. How to find out if a person has social skills? You ask questions and pay attention to how things are said and presented. Have candidates talk a little bit about themselves, their past projects, let them explain a difficult topic.

Many traditional companies would stop here. However, most successful, especially software-centric companies have recognized that while knowledge is important, it’s not sufficient. They want smart developers; that is, developers that come up with creative, efficient solutions and require little time to implement them. Thus, they also check a candidate’s intelligence by torturing them with riddles and logic puzzles, including those who look like they’re impossible to solve (“How Would You Move Mount Fuji?“). Others ask candidates to do online programming tests that challenge the candidate to find clever solutions to tough programming problems. The reasoning is this: intelligence ensures that knowledge is put to good use.

By now you know whether your candidate is a great developer, don’t you? If a candidate passes all the aforementioned tests, (s)he certainly has a lot of ability. But does this also imply that such a person will get the job done? Sadly, it’s a common fallacy to confuse ability with willingness.

MY HERO

More than fifteen years ago, I worked on project where the development team received insanely formatted error log files from the testing department. Information that would have helped us debug the problem was buried deeply in ASCII noise. Every time we received a bug ticket, we had to spend quite some time on figuring out what exactly went wrong. One day, a coworker gleefully came into my office and told me that he’d written a filter program that extracted and reformatted all the relevant data from the error log files. What a neat idea and what a time-saver! Full of pride, he showed me his code. What I was faced with was many pages of C code, written over the course of three days, full of low-level ‘getc’-based text parsing — definitely not easy on the eyes! Finally, I gave him a smile and thanked him for this great little tool. Half an hour later I went to his office and showed him a 30-line Perl listing that was functionally equivalent to his C code.

He didn’t know about regular expressions and neither Perl nor any other scripting language. All he knew was C. He clearly lacked knowledge. He was a lousy software developer, right?

As a matter of fact, I think he was a very fine developer. None of the other team members (including myself) — despite knowing about regular expressions and Perl — could be bothered implementing such a productivity boosting tool. Instead of lamenting, he just did it, even though it must have been painful for him. He didn’t waste three days because of his incompetence. His persistence saved hundredths of developer-days down the road. He did it because it was the right thing to do.

Of course, he was somewhat embarrassed at first. I told him that I reimplemented his program to demonstrate the power of regular expressions and dynamic languages, not to insult or humiliate him. Guess what? A couple of days later, he told me that he did a Perl tutorial in his spare time, because he was sure it would save him lots of time and effort in the future. He never defended himself and he didn’t make lame excuses — he acted like a pro.

This is a guy you certainly want to hire! He possesses two traits that are much more valuable than knowledge: persistence and professionalism.

PROSPECTING FOR GOLD

To me, persistence and professionalism are not fundamental personality traits; I strongly believe that they’re the by-products of a root trait that I call “professional software passion” (PSP). Developers who possess a high degree of PSP are self-motivated and in general enthusiastic about their craft, they commit to life-long learning, are able to take a lot of hardship and always work towards fulfilling the business goals of their employers and clients — not just the short-term goals — especially the long-term goals.

Unfortunately, typical hiring processes completely ignore this critical trait. A certain level of intelligence and social skills is crucial, no doubt about it. Knowledge, however, can and will be obtained as long as PSP exists in a developer. PSP ensures that the job gets done. I would pick PSP over knowledge any day.

I will explore the topic of “professional software passion” as well as strategies on how to detect it in candidates in future posts. Stay tuned.

Pointers in C: Part VI, Faking ‘restrict’

Pointers in C

“”But then acting is all about faking. We’re all very good at faking things that we have no competence with.”
— John Cleese

As so often in life, people love to do the exact opposite of what you advise. In my previous post I claimed that the ‘restrict’ feature isn’t that all-important and guess what? I received questions on whether it was possible to achieve the (promised) effects of ‘restrict’ even if a particular C dialect doesn’t support it (in case you are using C++ or some C version predating C99, for example).

YES WE CAN!

We just need to creatively combine the wisdom from the previous “Pointers in C” installments. In particular:

  1. Pointer access involving multiple pointers can be optimized if the pointers point to incompatible types.
  2. structs with different tag names constitute different, incompatible types, regardless of whether the struct members are identical or not.
  3. A pointer can be converted to a pointer to a different type as long as the resulting pointer is suitably aligned for the target type.
  4. A pointer to struct points to its initial member.

Do you ‘C’ it?

SPOILER ALERT!

Again, I use the ‘silly’ example to demonstrate the technique. The idea is to wrap the original types (‘int’) in structs with different tag names thus yielding different (incompatible) types:

Believe it or not — it works:

Why? To the compiler, the pointer types are different (item 2) and hence it doesn’t assume that they point to the same memory (item 1). Because of item 3 and 4 in the list above, the calling code can still pass plain ‘int’ arrays/pointers, just like in the original code. However, nasty casts are required when calling ‘silly4’ from C++ code:

Contrary to C++, C is only weakly typed so such casts are not necessary officially, unless you want to get rid of compiler warnings, which you always want to, don’t you?

You could argue that these strange ‘struct int’ pointers that are used in the signature of ‘silly4’ kind of document the fact that the passed pointers are “restricted” and mustn’t overlap. However, I prefer to keep the original ‘int*’ interface and hide this struct business from callers altogether:

The generated code is identical to the one derived from ‘silly4’.

I still stick with my original recommendation that the ‘restrict’ feature is not that useful as it’s a legally binding contract for the caller while the compiler is free to ignore this plea for performance. If you want to use ‘restrict’ regardless of my advice, even if your compiler doesn’t support it, you now know how to emulate it in a portable fashion.

Pointers in C, Part V: The ‘restrict’ Qualifier

Pointers in C

“Le vrai est trop simple, il faut y arriver toujours par le compliqué.”
(“The truth is too simple: one must always get there by a complicated route.”)
― George Sand, Letter to Armand Barbès, 12 May 1867”

Exactly one year ago, I started this series on pointers, but what I really wanted to blog about originally was a rather arcane and rarely used keyword that first appeared in the C99 language standard: the ‘restrict’ qualifier. But after trying to digest the formal definition in chapter 6.7.3.1 I decided that taking a little detour would make my and my reader’s life much easier.

Let me set the stage for ‘restrict’ by summarizing what I wrote in episode 3 about the “strict aliasing rule”:

1. The compiler might optimize code involving multiple pointers, provided the pointers are not aliased; that is, they don’t point to the same object or memory.

2. The compiler assumes that pointers to incompatible types never alias.

3. The compiler assumes that pointers to compatible types (same types, apart from CV-qualification and signedness) potentially alias.

Therefore, a function with this signature is eligible for compiler optimization:

whereas this one is not:

This is unfortunate, because most likely, the arrays passed to the second version of ‘transform’ are in completely different, non-overlapping memory regions. But the compiler doesn’t know and hence stubbornly adheres to the strict aliasing rule.

The ‘restrict’ qualifier, which — contrary to the ‘const’ and ‘volatile’ qualifiers — can only be applied to pointers, is a promise given by the programmer to the compiler that pointers don’t alias even though they point to objects of the same type. Therefore, this version of ‘transform’ can be optimized by the compiler:

Let’s put this to the test with the ‘silly’ example from episode 3:

Before knowing about the strict aliasing rule, we were surprised to see that the memory access to ‘x’ in the return statement was not replaced with a simple ‘return 0’. After having learned about the strict alias rule, it’s clear: since ‘x and ‘y’ point to the same type, the compiler must assume that they may point to the same memory location and hence it loads the value pointed to by ‘x’ from memory afresh:

Now, if we tell the compiler that ‘x’ and ‘y’ never point to the same memory location, optimization is possible:

Nice, isn’t it?

If you use the ‘restrict’ qualifier on a pointer, you promise that — at least for the lifetime of the restricted pointer — the object pointed to is only accessed through this pointer. Break that promise and you get undefined behavior. (In the ‘silly3’ example, the lifetime of the pointers ‘x’ and ‘y’ end once the call to ‘silly3’ returns.)

In the C99 language standard, many functions from the standard library have been revised and now make use of the ‘restrict’ keyword. Take ‘memcpy’, for instance:

As everybody knows, ‘memcpy’ can only copy non-overlapping blocks of memory and this fact is nicely highlighted by the use of the ‘restrict’ keyword: during the call to ‘memcpy’ the memory regions src[0] to src[n] as well as dst[0] to dst[n] are exclusively owned and may not be accessed by other pointers. Since ‘memmove’ can copy overlapping blocks of memory (with a little speed penalty, of course), ‘memmove’ consequently doesn’t declare restricted pointers:

Please be aware that ‘restrict’ is not supported by the C++ language standard and it’s unclear whether it ever will be. If you mix C99 and C++ code, you might have to strip the ‘restrict’ keyword from C99 headers to avoid compilation errors:

In general, I’m not a big fan of optimization features that the compiler is free to ignore. If utmost performance is important, you want dependable performance. Most likely, your routine is not on the performance critical path, anyway. If you think it is, carefully profile your code and after you proved that it is, you’d better code that part in assembly language. Without such evidence, sprinkling your code with ‘restrict’ is little short of premature optimization. (I complained here about the unnecessarily overused ‘inline’ keyword for the same reason.)

What I do like about the ‘restrict’ keyword, though, is that by unraveling it, we’ve made a beautiful journey through important everyday programming topics like “pointers vs. arrays”, “type qualifiers”, “pointer conversion rules”, and the “strict aliasing rule”. The journey was the destination.

Boolean Arguments are BOOL-Shit

“All generalizations are false, including this one.”
— Mark Twain

In programming, little is as obscure as code that passes boolean literals to functions.

As an example, consider the following invocation of a method that sets logging options on a logger object:

What does this really mean? Even looking at the declaration of ‘setOutputControlMode’ doesn’t give a satisfying answer:

You need to delve into the implementation of the ‘Logger’ class to really see the intent:

There you go! After calling ‘setOutputControlMode’ with ‘true’ as the first argument, subsequent logging will additionally go to ‘stderr’; passing ‘true’ as second argument will automatically append a newline character after every log statement.

While you could argue that the wishy-washy name of method ‘setOutputControlMode’ was badly chosen, it was probably done so to accommodate even more options in the future. I’m convinced that sooner or later another boolean parameter will pop up — ‘buffered’, to specify whether logging should be buffered or not.

How should you deal with such obviously inferior interfaces? The strategy depends on whether you own the offending code or not. Let’s start with the first case, where you’re using some 3rd party library that you can’t directly modify.

HIDING THE DIRT

First of all, make it a habit to never feed these boolean monsters with boolean literals. Don’t even do this if the method is part of a standard library, like Java’s ‘Component.setVisible(boolean b)’ method. Stand your ground and shame the original author by working around it. But how?

One easy solution is to define a couple of boolean constants in your code:

At the expense of a little bit more typing, code that calls ‘setOutputControlMode’ becomes much more expressive:

The only risk that’s not mitigated is that one can still confuse the order of arguments. After all, the compiler will happily accept this slip:

Which is, of course, not what the developer had in mind.

A safer approach is to use wrapper functions. This is particularly useful if you don’t need all flag combinations in practice. Let’s assume you always want to emit newlines at the end of every log entry and only sometimes want to enable/disable echo to ‘stderr’. All you need to do is define two straightforward helper functions in your own code base:

Using these wrapper functions is both, highly readable and safe:

AVOIDING THE DIRT

If you can modify the source code, you have even better options. Let’s assume that a colleague of yours developed a GUI framework that has the same weakness as Java’s java.awt.Component class; that is, it provides a method which takes a boolean argument:

Now, there’s probably tons of code that uses this method, as ugly as it may be. In order to maintain backwards compatibility, I would designate ‘setVisible’ as deprecated and add two wrapper functions to your colleague’s class which clearly communicate their intend:

If backwards compatibility is not an issue or if you’re designing a completely new method, you should do it the right way, and the right way is to use enums instead of booleans:

In exchange for roughly one additional minute of work for the developer, everyone gets a readable interface:

It’s a given fact of programming, that when there are two states or modes, it won’t be long until a third one pops up. No problem, if you used enum parameters from the outset:

In the visibility example, we have mutually exclusive modes. In the ‘setOutputControlMode’ example, however, options can be combined. Still, enums are a pragmatic solution:

Notice that I added the suffix ‘FLAGS’ as a hint to the reader that these options can be freely mixed with the binary OR operator:

This, of course, only works in plain C. A C++ compiler will complain that the integer result that operator ‘|’ yields cannot be converted back to an enum type, so in order to please your C++ compiler you have to apply a ‘static_cast’:

Such casts are never pretty, but when they get the job done, they’re acceptable.

Let me bring this post to a close by driving home the following guidelines:

1. Never pass boolean literals (‘true’/’false’) to functions. Define meaningful symbolic constants instead.

2. Alternatively, consider writing wrappers around functions taking boolean arguments. Remember that the best (most readable) functions take zero arguments:

“The ideal numbers of arguments for a function is zero (niladic). Next comes one (monadic), followed closely by two (dyadic). Three arguments (triadic) should be avoided where possible. More than three (polyadic) requires very special justification ‐ and then shouldn’t be used anyway.”
— Uncle Bob (“Clean Code”)

3. If you are the author of a method that takes options, spend the effort and code dedicated, niladic setters. If that doesn’t feel right, use enum parameters. But on no account should you ever resort to boolean arguments.