« Posts under PC-lint related

A Small Matter Of Interface Design, Redux

“Bad design is simply great imagination without wisdom”
— Onur Mustak Cobanli

In my previous post on this subject, I introduced Scott Meyer’s most important interface design rule: “Make interfaces easy to use correctly and hard to use incorrectly”. As a case in point, we looked at a function which returns a temperature reading from a sensor plus a status code. The temperature sensor might be temporarily or permanently unavailable, so the temperature value can only be relied upon if the returned status is “good”. “Making interfaces hard to use incorrectly” means in this case “making it hard to ignore the returned status”. I claimed that this version of getTemperature is a move in the right direction:

Still, it’s not perfect for at least four reasons. First of all, the [[nodiscard]] attribute is only available if your compiler supports the C++17 language standard. Second, like I explained in my previous post, the C++17 language standard only “recommends” that a compiler issues a warning. Third, even if your compiler does issue a warning, a programmer might disable this/some/all warnings for a particular project, module, or function. But even if none of these reasons apply, a programmer might still dodge checking the status value.

The easiest way to suppress a warning resulting from an unchecked [[nodiscard]] return value is by casting the returned value to void:

You probably think that such developers are stupid and that they are putting their head on the block, that they deserve the pain caused by their deliberate misuse. I certainly agree, but in safety-critical systems, somebody other than the developer might actually suffer the pain, either physically or financially.

In other, more insidious cases, bypassing of [[nodiscard]] can happen accidentally:

Here, the return value is accessed (for logging), so no warning is issued by the compiler. Nevertheless, the value is not checked and thus the code is still buggy. What we need is a status code that enforces that its value is checked. Enter class Checked.

All but the most trivial template code can look intimidating, but Checked is actually quite simple. It acts as a wrapper around a type (Status, in our case) and stores a value of that type. At construction time, a flag called checked is set to false. If this flag is still false when the object is destructed, std::terminate() will be called. The only way to set the flag to true is by invoking the comparison operators == and !=. Here are some examples:

Applied to our getTemperature function, we get

Even though the status value is accessed for logging, std::terminate will still be called because we failed to actually check the status value. The correct usage looks like this:

Did you notice that I tagged the whole Checked class with the [[nodiscard]] attribute? As a consequence, all your functions that return Checked values are implicitly declared [[nodiscard]]. Less typing, less risk of forgetting to add it. Cool!

By using Checked you explicitly communicate to users of your code that they must check the value. The first line of defense is the [[nodiscard]] check at compile-time. If callers cast the returned value to (void) or otherwise fail to check/compare the returned value, they’ll get busted by std::terminate. The upshot of using Checked is that your interfaces are much harder to use incorrectly than by just using [[nodiscard]]. Let’s hope Scott Meyers is satisfied now.

A Small Matter Of Interface Design

“Bad design shouts at you.
Good design is the silent seller.”
— Anon

Retired C++ hero Scott Meyers once wrote an article for the renowned IEEE Software magazine titled “The Most Important Design Guideline?“. According to him, the most important design rule is this:

“Make interfaces easy to use correctly and hard to use incorrectly”

I couldn’t agree more. While this rule is applicable to any system, today I want to present it in the light of software interface (API) design. As an illustrative example, I’m going to borrow a method from a former article of mine:

From what you’ve been given, it’s pretty easy to conclude what this routine does and how it’s supposed to be used: ‘getTemperature’ returns a temperature value (let’s say in degrees centigrade) and a status as out parameter. Even without further documentation, it’s not hard to guess that the temperature returned can only be trusted if the status is ‘good’. Most likely, this method reads the temperature value from a hardware sensor that sometimes is busy or dysfunctional.

If you ask me, this design easily passes another important design rule, the so-called “Principle of Least Astonishment“. However, it fails miserably when we apply Scott Meyer’s interface design rule. Why? Because it seduces programmers into using it incorrectly:

As you can see, it’s possible for programmers to get what they want (the temperature) without checking the status value.

The compiler usually doesn’t warn you if you forget to inspect the status variable, because it doesn’t know what ‘getTemperature’ is doing: it might pass it on to another function or store the pointer in a static/global variable.

How can we change the interface such that this mistake is impossible to happen?

Often in such or similar situations, experienced programmers shout “exceptions to the rescue!”. But will exceptions really save us? Let’s see:

Now, ‘getTemperature’ doesn’t provide a status any longer, it rather throws an exception if the temperature value is unreliable or not available. But again, the compiler cannot enforce that programmers provide the try/catch block. Unlike Java, C++ doesn’t have checked exceptions (where checked means that the compiler enforces that exceptions are either caught or declared) and for good reason. So an inconsiderate programmers might naively write:

Even worse, the fact that there is no status parameter any more doesn’t remind them or reviewers at a later point in time that a necessary check has been omitted. Depending on the reliability of the sensor, such code can run for years without problems until it suddenly crashes due to an uncaught exception.

If exceptions don’t help, what does? How about swapping the return value with the status code?

While some would argue that this version is a bit harder to use incorrectly, it’s still possible to do so:

Again, the programmer doesn’t check the status, this time by ignoring the return value. No compiler I’m aware of warns you if you forget to check a function’s return value. Static analysis tools (like PC-lint) do, but who uses them, anyway? And of those good people who do, who enables this particular warning? There are many functions whose return values you almost never care about, like ‘printf’ for example, which returns the number of characters printed, or ‘memcpy’ which returns a pointer to the destination. Therefore, such warnings are usually turned off, even if they are supported.

If you are so lucky and use a C++17 compiler, you can mark a return value as ‘nodiscard’:

Should you now dare to ignore the returned status value, your compiler will report an error. Or will it? Unfortunately not! I don’t know why the people who devised the C++17 standard didn’t go for a compile-time error. Instead, if the return value is ignored by programmers, “implementations should issue a warning in such cases”, they wrote. Arrgh! They didn’t even have the decency to demand a warning. As usual, “should” doesn’t guarantee anything.

Still, I’ve never come across a C++17 conforming compiler that didn’t issue a warning, irrespective of the warning level (unless you force the compiler to suppress all warnings). Therefore, I think that this last version of ‘getTemperature’ is the first that is likely to receive Scott’s blessings.

In a future installment, we’ll explore other alternative versions of ‘getTemperature’ and determine if they comply with the most important interface design rule, which is so important that it bears repeating: “Make interfaces easy to use correctly and hard to use incorrectly”. Stay tuned!

Introduction to PC-Lint

I’ve uploaded my presentation “An Introduction to PC-lint” to Slideshare. Share and Enjoy!

The True Value of PC-lint

nobugs

“An ounce of prevention is worth a pound of cure.”

— Benjamin Franklin

When you ask folks who sell static analysis tools why you should buy their expensive products, they all will tell you “to find bugs, of course!”. Very likely, they will show you a diagram that displays the exponential cost growth of fixing a bug, depending on the stage where it is detected. Also very likely, they will brag that — compared to PC-lint — they have a very low “false positive” rate.

Fair enough, but the story doesn’t end here. Take this code, for instance:

The code, as it stands, is 100% error-free. Yet, PC-lint will not like it for several good reasons. For instance:

  1. There are no virtual functions in Base, so what’s the point deriving from it? Was it an oversight?
  2. There is no default constructor, and hence you cannot put objects of type Base in standard library containers
  3. The Base constructor is not explicit, so plain integers might get silently converted into Base objects
  4. The add method could be declared const; adding const-correctness later is usually difficult
  5. The ADD macro is not parenthesized and neither are its parameters; this gives rise to all sorts of operator precedence problems
  6. Base’s destructor is not virtual, which means that the behavior is undefined when somebody deletes Derived objects through a pointer to Base
  7. The iostream header file is not used and hence not needed; removing the #include statement improves clarity and compilation times

So there are no bugs. But are these issues flagged by PC-lint really false positives?

Too me, the reported warnings are a sign of the poor quality of this code; this code is full of latent bugs, just waiting to become alive in the future or in somebody else’s hands. Shady code like this increases technical debt, makes maintenance harder, riskier, and more expensive.

I want such issues reported and resolved before I check in my code, actually, before I even execute it. Right after a successful compile I pay attention to PC-lint’s feedback and resolve real bugs, latent bugs and any bad coding practices. I don’t want to get a ticket three weeks later from a software QA guy, after I’m done with debugging and when the mental distance has become large. So large  that it would take a lot of effort to recall what I was thinking at the time of writing. I want quick and easy desktop checking such that my bad code is never seen by anyone but me.

Finding bugs and code smells at the earliest possible time and thus keeping maintenance cost lost; not just focusing on finding bugs, but preventing bugs — today and tomorrow — in the first place. That’s the true value of PC-lint.

 

Bitten by the Python

pythonIt’s amazing how one can get used to something nice, up to a point where you don’t recognize it anymore. Sometimes, you don’t recognize it even when its not there anymore — at least initially.

After a night of Python coding, I was back at my embedded C++ project, when I wrote code similar to this:

The expression that I used to check whether the item index is in range is valid in both, Python and C++. Alas, only in Python it does what it should; in C++ it’s a clear bug. Obviously, the if statement should have looked like this:

Duh! Of course the comparison operator doesn’t work like this in Java/C#/C/C++, but while Java and C# compilers reject this code, WindRiver’s Diab C/C++ compiler happily accepts it.

I wasn’t so much annoyed about the fact that the Diab compiler accepted it, since it is syntactically correct according to the rules of C and C++; what upset me was the “happily” — it didn’t produce any warning about my nonsense!

There is, as I found out later, a warning level that would have reported my mistake, but this warning level was not enabled in our makefile, probably because it would have generated many false positives.

Even though I always pay attention to compiler warnings before flashing my code to the target, I had to find this bug the hard way: by using a debugger. I guess that I wasted more than one hour in total to track it down.

What I expect from compiler vendors is that (a) by default, warnings are on and that (b) this default warning level includes all warnings that are easy to detect and are almost always bugs or latent bugs, including (but not limited to):

– Use of unitinitalized variables
– Test of assignment, eg. if (a = b)
– Expressions with no side-effects, eg. a == b;
– Returning addresses of automatic variables
– Forgetting to return a value from a non-void function
– My stupid comparison mistake

I don’t expect all C/C++ compilers to be as thorough as PC-Lint. Ironically, most of today’s compilers are already able to report the aforementioned issues (and many more), but they do this only at higher warning levels that nobody usually enables.

Dangerously Confusing Interfaces II

confused.jpgLast week was a sad week for me. A bug in my code made it into the final version that was shipped to an important customer.

When something like this happens, it is almost always due to fact that there is a higher-level “bug” in the software development process, but I don’t want to go there. Instead, I want to focus on the technicalities. Once more, I got bitten by another instance of a dangerously confusing interface. Let me explain.

I’ve always had a liking for interfaces that are self-evident; that is, one knows immediately what’s going on by just looking at how the interface is used — without having to consult the documentation. Let me give you a counter example, an interface that is far from what I desire:

The interface to ‘sort_temperatures’ is not at all self-explanatory. What is clear is that it sorts ‘values_count’ values, which are obviously temperature values, but what the heck does the ‘true’ argument stand for? In order to find out, you have to look at the declaration of ‘sort_temperatures’ and/or its API documentation:

Now it is clear what the boolean parameter is for, but at the cost of having to take a detour. Maybe you got so distracted by this detour that you forgot what you originally were about to do.

Programmers often make this mistake when designing interfaces. They use boolean parameters when they have two mutual exclusive modes of operation. This is easy for the implementer of the routine, but confusing to the ones who have to use it. Contrast this with this alternative:

By replacing the boolean parameter with symbolic constants you not only make the code more readable, you also open it up for future extension: adding more modes becomes straightforward.

Now have a look at this code and try to guess what it does:

That’s fairly simple: ‘calc_hash’ calculates a SHA-256 checksum over ‘mydata’ (which is ‘mydata_len’ bytes in size) and stores it into the provided buffer ‘hash’. The length of the hash is stored to ‘hash_len’ via call-by-reference.

But is this code correct? The answer is — you guessed it — no. If you can’t spot the bug, you are in good company. You can’t see it with just the information given. ‘calc_hash’ interface is not self-evident.

I wrote this code more or less one year ago. It contains a bug that remained dormant until the product was in the hands of our customer. And it is there because of a silly interface.

The last (pointer) parameter ‘hash_len’ actually serves as both, an input AND output parameter. When you call ‘calc_hash’ it is expected that ‘*hash_len’ contains the size of the provided ‘hash’ buffer; on return ‘*hash_len’ will contain the actual number of bytes used by the hash algorithm; that is,the size (or length) of the SHA-256 checksum stored in ‘hash’. The whole idea behind this is that ‘calc_hash’ (or rather its author) wants to offer protection against buffer overruns — for cases where the provided ‘hash’buffer is not large enough to accommodate the checksum.

So the problem here is that ‘hash_len’ (being a stack variable) is not properly initialized to ‘HASH_SHA256_LEN’; it’s value is more or less arbitrary. If it is by chance greater or equal to 32 (the value of the ‘HASH_SHA256_LEN’ symbolic constant) everything is fine and the checksum is correctly calculated. If it is not, ‘calc_hash’ returns ‘false’ and an error is reported.

For as long as a year — by sheer coincidence — ‘*hashLen’ was never below 32 (which is not that unlikely, given that ‘size_t’ can accommodate values ranging from 0 to 4,294,967,296); but in the hands of the customer — and very much in line with Murphy’s Law — it happened.

OK, accuse me of not having initialized ‘*hashLen’ properly, accuse me of not having read the API documentation carefully. Maybe I did read the API documentation and then I was interrupted. I don’t know. But what I know for sure is that this bug would have never happened if the interface had been clearer.

The first problem with ‘calc_hash’ is that ‘hash_len’ is an IN/OUT parameter, which is unusual. I’m not aware of any function in the C/C++ standard library (or the POSIX libraries) which makes use of IN/OUT parameters. Since the input value (not just the output value) is passed by reference, neither the compiler nor static analysis tools like PC-Lint are able to detect its uninitialized state. One obvious improvement is to pass the length of the buffer by value:

Granted, there is now one more argument to pass (‘hash_buf_len’), but if the unlucky programmer ever forgets to initialize it, the compiler will issue a warning.

But let’s not stop here. I’d like to pose the following question: what good is the hash buffer length check, anyway?

In my view, it is not at all necessary. The length of a particular checksum is constant and known a priori — that’s why the hash buffer is allocated statically like this:

What additional benefit does a developer get by providing the same information again to the ‘calc_hash’? Isn’t this redundancy that just begs for consistency errors?

And what use is the output value that tells how long the checksum is? Again, this should, no it MUST be known a priori, there should never be a mismatch between what the caller of ‘calc_hash’ expects and what is returned. Of course, there can be error conditions but if ‘calc_hash’ fails, it should return ‘false’ and not a length different to what the caller expects.

Note that ‘calc_hash’ is not at all comparable to functions in the C API that add an additional output buffer length parameter like ‘strncpy’ or ‘snprintf’. These functions carry the length parameter for completely legitimate safety reasons because the total length of the output is usually not known a priori (for instance, as some input may stem from a human user and one has little control over how many characters (s)he will enter).

Based on these arguments, I dismiss the ‘hash_len’ parameter altogether and propose the following simplified interface:

and would use it like this:

Easy to read, hard to get wrong. It is impossible to forget to initialize a variable that just isn’t there.