« Posts by Ralf

Using PC-Lint in a Linux Environment

PC-Lint is my favorite non-FLOSS tool. Not only does it find bugs and portability issues early in the development cycle: by using it regularly and listening to its words developers can significantly improve their C/C++ programming skills.

This post details how to run PC-Lint (which is normally intended for DOS/Windows environments) in Linux, saving developers from having to buy FlexeLint, the much more expensive Unix/Linux version.
»Read More

Dear Compiler, What’s My Size?

Sometimes, when you work with classes or structs, your code is not only dependent on particular fields (attributes, member variables) but on all of them at the same time.

The copy constructor, assignment operator, and the equality operator are prominent examples: if you add a field to your class, these functions need to be maintained in parallel; that is, all of them potentially need to be updated to support this new field. And we all know that it is so easy to forget to update one of them…

I got bitten by this when I used a class that lacked an operator to stream out all of its members. Since I didn’t own the source code of the class (it was part of a library), I couldn’t modify it directly. (I actually could have modified it, but this would have been outright stupid). Instead, I decided to add the missing functionality in my own code. Here is a simplified example:

Here’s what I added in my code:

It wasn’t long until I got a new version of the library in which “Fruit” was extended by another attribute, “weight”. Since my code didn’t support this new attribute yet, it was broken. Even worse: by (unconsciously) ignoring the new attribute my code failed silently at run-time! Just in case you didn’t know: If there is one thing I absolutely can’t stand it’s if CODE FAILS SILENTLY AT RUN-TIME.

Something had to be done to prevent this from happening again. After a while I came up with the idea of checking the size of class Fruit at compile-time against a hard-coded value. Even though this wasn’t perfect (you can easily think of cases where you would get false positives) it gave me a good chance of catching similar modifications next time I rebuilt my own code:

(Note: static assertions are like plain assertions but checked by the compiler at compile-time, not at run-time. I’m using a C++ 2011 feature here, but you can implement static assertions yourself, if your compiler lacks support for them. Read more here.)

Now, if the compiler flagged a failed assertion I would review the public interface of class Fruit and look for any new attributes, update my code along with the reference value of the size and everything would be fine. But how would I get the reference size value (i. e. 24) in the first place?

Calculating the size of a class or struct by hand is not trivial. There can be hundreds of members, including union members, padding bytes, hidden pointers to vtables and the like. A viable solution is to add a temporary print statement that writes sizeof(Fruit) to stdout, but that would require linking and running the program. And what if you are working on an embedded system where there is no stdout or where it is impossible to inspect the value in a debugger?

No, these approaches are too slow and cumbersome. If the compiler knows the size of Fruit it should be able to tell me. But how?

This question haunted me for quite some time. I devised a cunning plan: why not add a statement containing a syntax error that somehow forced the compiler to spit out the size of my class. Among other things, I tried this:

This hack exploits the fact that in C/C++ it is illegal to declare an array of negative size. I had hoped to get a message like this:

Alas, all compilers that I checked-out just gave a worthless error message among these lines:

There was no clue as to what the actual size was. Too bad. The compiler knows the answer but it doesn’t want to tell me!

I was just about to give up when I suddenly had another idea: what the compiler would always tell me is the line on which an error occurred. Why not attempt to create many arrays, where the size is decremented, starting from sizeof(Fruit):

I would just include this header file and the compiler would compile line after line until it hits a syntax error due to a negative array size. When it does, it would report the corresponding line number which in turn corresponds to the size of Fruit. G-r-e-a-t!

Generating this include file (getsize.h) was straightforward: I cobbled together a simple Bash for-loop:

I used it like this:

Compiling this code yields the following error messages (the exact messages may vary, depending on the compiler that you use):

The first line that is reported as erroneous is the size of class Fruit, 24 in our example. Once you know the size you just need to update the reference value and comment out the two lines to have them ready when you need them again.

You might find all of this a bit hacky, but I happen to like it very much. It serves a good purpose and nicely solved my problem. On top of that, it gave me a really good time. What more can one expect?

[Update 2015-12-06: Bernd Hanebaum sent me a nice implementation of this idea that is based on template metaprogramming — many thanks, Bernd!]

Documenting is a Team Sport, Too!

baton.jpgEveryone likes good documentation — unless they have to write it themselves, right?

One reason for this is that writing good documentation is hard, very hard in fact. It took Joseph Heller eight years to complete “Catch-22” and many other novels took even longer to write. As a countermeasure, some authors use a pipelined approach to writing (see Gerald M. Weinberg, “Weinberg on Writing: The Fieldstone Method“) that nevertheless allows them to release in shorter time-frames by working on many projects in parallel.

Speaking as a developer, documentation gets into my way of doing other, more enjoyable things, like, well, coding, for instance. I’m writing (!) this article to the defense of the poor chap who has been given the ungrateful job of writing version 1 of a document.

Imagine this situation. One of your team members, let’s call him Jack, is given the task of finding out how to setup a new development environment for some embedded Linux development board. After a week of trial and error he finally gets everything to work properly. Now — of course — he is expected to document what he did so that everyone else on the team can set up their own boards, too. Being a professional developer he sits down and types away; an hour later, he is finished.

What happens next is typical: Harry, the first guy who tries out Jack’s HOWTO, runs into problems. Not one — many. In some cases essential steps are missing, while others are confusing or just plain wrong.

Harry is upset. He runs about and whines how bad the documentation is, what a poor job Jack did and how unfair life is in general…

For sure, in a perfect world, Jack would have written a perfect document that lays out the shortest route from A to B; it would be instructive, entertaining, a work of great pedagogical value. In real life, Jack is exhausted. He had been a pioneer for an extended period of time, tried out many things that didn’t work, suffered hours of frustration and got stuck in a rut many times. Most likely he operated under time pressure and even more likely he doesn’t exactly remember what he did (and did not). Isn’t it a bit too much to expect that he now takes the perspective of the uninitiated developer and writes the perfect manual?

In my view, Harry shouldn’t complain — he should rather spend his energy on improving the document. He benefits tremendously from Jack’s pioneering work and I think it is only fair if he contributes a share. And what he can contribute is something that the original author can’t: When he reads Jack’s document his mind is fresh and clear, without any assumptions, so he is the best person to tune the document for the same kind of audience. And Jack is always there to support him — provided Harry didn’t insult him for not doing his job properly…

But even the next guy after Harry might spot mistakes or inconsistencies; and many month later people will discover parts that are obsolete because the environment has changed in the meantime. Then, it is their job to clean up; they are again the best persons to do it.

Writing good documentation takes both, a different mindset and time; and as the writer’s saying goes: “All good writing is rewriting”. Especially in an agile environment it is a bit too much to expect to get everything from a single person. XPers have long been used to this mode of software development through the principles of collective ownership and refactoring. I believe, these principles apply to writing documentation as well.

Where Richard Feynman Was Wrong

I’ve always been a great admirer of Richard Feynman. Too me, his intelligence combined with his ability to explain even the most complicated facts in easy-to-grasp words is unparalleled.
When he was asked what his recipe for solving problems was, he gave the following advice, which has become known as the “Feynman approach to problem solving”:

1. Define the problem.
2. Sit down and think hard about the problem.
3. Write down the solution.

This is a good example of why I like him so much: he was a joker, a prankster, a guy who never took himself and life too seriously.

Alas, according to what we know about how our brain works, his advice doesn’t work, at least not for really hard problems.

While focusing on the topic and tormenting your brains works for many problems (logic problems, like solving typical math problems or Sudokus), solving hard problems requires just the opposite: complete detachment from the problem.

The reason for this counterintuitive approach is that the part of our brain that solves hard problems (the creative part) is not only slow, but also works asynchronously. In fact, thinking hard about a problem is more than useless: it actually disturbs the the creative part and often prevents it from doing its job.

Does this mean you shouldn’t think about the problem at all? By no means! You should try to gather all kinds of information and facts about a problem, without paying attention to possible solutions. Just load your brains with information and than get away from the problem. Go for a walk, take a nap, or have a beer. Don’t stare at the screen for hours. Relax, even if it is hard. I know, this is the hardest part about solving hard problems.

Habits for Safer Coding

bear.jpgIf you have been coding in C and C++ for some time, you know that it is easy to introduce bugs: type in a single wrong character and you wreak havoc. It is therefore a good idea to avoid using certain (dangerous) language features or at least using them in a way that is safer.

My favorite example is always putting braces around blocks, even if the block only consists of a single statement:

I’ve witnessed cases where really experienced programmers omitted these two “redundant” characters for brevity only to be later faced with this classic bug:

It is so easy to make this mistake — especially during merging — that MISRA (Guidelines for the Use of the C++ Language in Critical Systems) made curly braces around if/else/do/while blocks a “required rule”. Note that by following this rule, you are also protected from “dangling else” mistakes:

In general, I follow such “best practices” if they meet these criteria:

– They must solve a real problem
– They must not require any thinking

The “always-put-braces-around-blocks” rule does solve a real problem: I’m not aware of any compiler (not even Sun’s Java compiler) that warns you about the mistake shown above. (Of course, static code analysis tools (like PC-Lint) do check for these errors, and many, many more, but unfortunately only few developers use them.)

Does it require any thinking? I don’t think so. The only thing I have to remember is this: whenever I use ‘if’, ‘else’, ‘for’, ‘while’, ‘do’ I put curly braces around my code. That’s all!

Maybe you have come across this convention:

Advocates of this style reason as follows: in C/C++ you can easily — accidentally — use an assignment operator when you actually wanted to use an equality operator. Since you cannot assign to a constant, the compiler will refuse to compile your buggy code:

I must admit that I’ve never liked this convention. One reason is that it looks unnatural. Another is that it lacks symmetry; that is, it is not consistent with other comparison operators. What if you need to change the comparison some day to also include values greater than MAX_RETRIES?

Or what if MAX_RETRIES is changed from a compile-time constant to a “normal” variable whose value is obtained from a config file? Do you swap then? In fact, this convention doesn’t provide any help for mistakes in variable/variable comparisons, which occur also very frequently in day-to-day programming:

For my taste, the utility of this convention is rather limited and it requires too much thinking: “If I compare a compile-time constant (or a literal) against a variable using the equality operator, I put the constant first”. Isn’t this a clear hindsight rule? It’s a bit like attaching a tag saying “always lock the door” to your key.

Why not, instead, turn this into an habit: “If I do an equality comparison, I double check that there are two equal signs”. That’s not any more difficult to adopt, doesn’t impair readability and has the additional benefit of working in all cases — not just when comparing against constants.

I’m prepared to follow even conventions that look funny if they solve a real problem but I don’t think that this convention does: today, most compilers produce warnings when you make this kind of mistake (for GCC it’s -Wparenthesis, which is part of -Wall).

In C, there are numerous ways to make silly mistakes (in C++ there are many, many more). Instead of using questionable practices that address one tiny area, it is much better to enable all compiler warnings or — even better — use static code checkers like PC-Lint. In my opinion, that’s the most worthwhile habit for safer coding of all.

When bytes bite

rulers.jpgNot long ago I had a discussion with a friend who had just bought a new 2 TB hard disk. He complained that those hard disk manufacturers always cheat: they sell you 2 TB but in the end the drives only have 1.8 TB of memory available. For them, a kilobyte comprises only 1000 bytes and not 1024, as for the rest of mankind.

We use the term “byte” everyday but it is surprising how man developers don’t know exactly what a byte is. “Byte” — and especially byte quantifiers like kilo, mega, and giga — seem to be surrounded by many misuses and misconceptions.

Traditionally, a byte is a collection of bits used to encode a single character for a system. It could be 4, 7, 8, 9, or any other number that the designer of a system happens to choose. This is the main reason for the CHAR_BITS symbolic constant in ISO C’s limits.h: it specifies precisely how many bits there are in a character (char).

Today, of course, we can safely assume that a byte comprises exactly 8 bits, but it is important to note that this is no universal, standardized definition. That’s why the term “octet” is used in RFCs and ASN.1: an octet is defined to be always 8 bits.

But what the heck is a kilobyte? Is it 1024 bytes? Or 1000? We use byte quantifiers so frequently, but not always correctly.

Some folks use the well known SI prefixes to mean powers of 1024:

While hard disk manufacturers usually have a different definition:

Which makes quite a difference, especially if sizes grow larger: for a 1 TB hard disk you might get about 10% less than you thought you’d paid for…

But don’t blame it on the hard disk manufacturers. They’re right, of course. Using SI prefixes to mean ‘powers of 1024’ is strongly discouraged by the SI institute.

Still, it is sometimes useful to use powers of 1024 but how should this be done without annoying the SI institute, or — more importantly — without confusing people?

Fortunately, there is an ISO standard (with roots back to work done by IEC in 1998) that addresses this problem: ISO/IEC IEC 80000-13:2008.

According to this standard, you use binary prefixes like “kibi” and it’s friends, where kibi is short for “kilo binary”; and instead of using SI’s “k” prefix you use “Ki”. Have a look at this table:

Binary prefixes can (and should) be applied to any other unit when the value is based on 1024: 1 Mib/s is 1024^2 bits per second, while 1 Mb/s is 1000^2 bits per second.

The international system of units is a wonderful achievement of the 20th century. We, especially as developers, should honor it and finally stop misusing its well-defined prefixes and instead use ISO binary prefixes.