« Posts under Code

Bug Hunting Adventures #16: Lame Surveillance

“Under observation, we act less free, which means we effectively are less free.”
― Edward Snowden

Imagine a distributed surveillance system where recorded video files are uploaded to a central server at regular intervalls.

Due to limitations of the transport protocol, video files must be split up in chunks and no chunk may exceed 1 GB (10^9 bytes). On top of that, in high-load scenarios, the server might shorten a chunk even more, in which case instead of N bytes only K bytes are transmitted. Naturally, the N-K bytes that were not transmitted need to be sent with the next chunk upload.

Everything works fine, all unit and system tests passed. Once deployed, however, sysadmins from the central server team started lamenting that the video files were arriving at a glacial pace. What’s wrong with this code?

Solution

Do They Treat You Like A Superuser?

“A good workman is known by his tools”
— proverb

The process of getting admin rights as a corporate software developer is definitely on a spectrum. Over the last 20+ years, I’ve written code for more than ten companies and boy, do their policies differ!

In one case, I had full admin rights from day one. In more typical cases, I had to start a workflow to request admin rights which would arrive within hours to days. In one extreme case, I had to do an online training about the dangers of working with admin rights before I could start the workflow. After I passed the exam and once my request was approved (7 days later), I would be granted admin rights only for a limited number of time (180 days at most). Even worse — the online training course would need to be taken again as well!

Let’s meditate a little bit on this latter case. Too me, it’s an utter catastrophe. As software developers, we constantly need to maintain and tweak our PC, our beloved toolbox. We need to install or upgrade development tools, device drivers and the like, sometimes just for the purpose of experimentation and learning. What if I wanted to switch to a newer version of g++ one day only to find out that my ‘sudo’ rights had expired? Sure, I could start the workflow again, wait a couple of days for approval, but why? Such processes are nothing but a nuisance that break developers’ flow and inspiration while not adding any real security.

A software developer is not a regular user — a software developer is a superuser, literally. If a company has to have their software developers take training courses to ensure that they don’t work in a root shell all day they should not have been hired in the first place. Doesn’t it border on insulting if you learn in such a training that you should not open email attachments from unknown senders, especially while being logged-in as root? You don’t say!

If a company doesn’t give you unlimited superuser rights within a couple of hours, you’re definitely not treated like a superuser. You’re rather treated like a regular office worker who has no clue about how computers work, let alone computer security.

It’s not just about wasted time. It’s about lack of empowerment and trust. But it’s mainly about a missing software culture: are you viewed as precious human capital that develops top-notch software products which will make the company thrive, or are you rather viewed as a schmuck that poses an severe risk to the company?

A company with good software culture understands the chief need of creative makers, which is: working on interesting projects in a frictionless, libertarian environment where they can spend most of their time doing what they love most: craft exciting software.

Restricting software developers in terms of admin rights is just one problem of companies lacking good software culture, but it’s symptomatic. While such shops might manage to lure in great creators, they will certainly not be able to retain them in the long run.

Why We Count From Zero

“A zero itself is nothing, but without a zero you cannot count anything; therefore, a zero is something, yet zero.”
— Dalai Lama

If you do a Google search for why programmers typically start counting from zero, you’ll likely find two reasons. Today, I’m going to add another one. But let’s start with the usual explanations.

DIJKSTRA’S HALF-OPEN RANGES

On August 11, 1982, Edsger Dijkstra wrote a short paper on why numbering should start at zero. He first demonstrates that half-open ranges with an excluded upper bound are superior to other alternatives:

Here’s a quick summary of his reasoning, in case you don’t want to read it yourself. The main advantages are that a) you can easily represent empty ranges (i. e. lower equals upper) and b) compute the number of elements in a range by subtracting the lower bound from the upper bound.

Based on ranges where the upper bound is excluded, Dijkstra goes on to show that for sequences of N elements, there are only two ways of indexing:

Obviously, the latter is much more elegant and hence we see code like this everywhere:

BASE-RELATIVE ADDRESSING

Sequences of heterogeneous data in contiguous memory are almost universally laid-out like this:

A sequence starts with the first element at some base address, the second follows sizeof(elem) bytes after the first element and so on. Computing the start address of the n-th element can be done using this formula:

However, this formula only applies if you index your elements from 0 to N – 1. If instead you chose to number indices from 1 to N, the formula would need to be adapted:

This alternative is not only less pleasant to look at, but because of the additional subtraction also harder for the CPU to compute. Consequently, the fathers of C employed the zero-based array access syntax that we are all so familiar with:

which is really just a shorthand notation for

If i is 0, you get the first element, if i is positive, the i-th successor of elem, and if i is negative, the i-th predecessor of elem. The latter fact often surprises developers because they either assume that negative offsets are illegal in the first place or yield elements from the end of the array, like in Python.

Incidentally, there’s another secret to C array indexing: since the addition operation is commutative, you can equally well write

As obvious as this is in hindsight, it’s not well known amongst C programmers and a good opportunity to show off at parties. However, I strongly advise against writing code like this for production use.

MODULAR ARITHMETIC

As you know, applying the mod operator like this

yields values ranging from 0 to N – 1. This dovetails nicely with zero-based indices into sequences.

Take hash maps for example. To determine the position of an element in a hash map containing N slots, just apply a hash function and take the result modulo N to get the index. That’s it!

Another use case is the ring buffer, one of my favorite containers: to advance an index into a ring buffer, just add the desired offset and apply the mod operator to get wrap-around — no need for extra if/else logic. Again, starting indices at 1 instead of 0 would entail extra additions and subtractions.

There you have it — one more reason to start numbering from zero. (As if you still needed to be convinced…)

Technical Debt is Not Bad!

“There once was a Master Programmer who wrote unstructured programs. A novice programmer, seeking to imitate him, also began to write unstructured programs. When the novice asked the Master to evaluate his progress, the Master criticized him for writing unstructured programs, saying, “What is appropriate for the Master is not appropriate for the novice. You must understand Tao before transcending structure.”
The Tao of Programming, Book 3, Verse 2.

I consider Ward Cunningham’s “technical debt” analogy as one of the most important metaphors in software development. According to him, he invented it to explain to his manager why his team needed to do refactorings: like financial debt, technical debt tends to grow over time and the burden gets bigger and bigger due to compound interest.

During the last decade, quality-minded developers have done a great job of educating their managers about technical debt but it seems to me that there is now the notion that all technical debt is bad and must be avoided. But Ward Cunningham never said such a thing — and rightly so.

In business, not all debt is considered bad. It’s a huge difference if someone takes out a 200k loan to buy a flashy sports car or if the owner of a construction company takes out the same 200k loan to buy a bulldozer. The sports car is a pure liability that makes the owner poorer whereas the bulldozer is an investment that will generate money in the future.

The same holds for technical debt. If you initially implement a limited version of a feature (e. g. slow, bloated, lacking proper error-handling) just to enable other teammates to carry on with their own work, that’s good technical debt.

So when does technical debt qualify as good technical debt? To me, good technical debt is

1. taken on consciously
2. managed
3. repaid timely

By contrast, bad technical debt arises out of laziness or incompetence, it lurks in the code base and won’t be repaid. Rather, like credit card debt in real life, more debt is accumulated over time and new loans are taken out just to repay the interest rates of other loans. This is the kind of technical debt that is to be avoided in the first place.

You can use a knife to kill somebody, but you can also use a knife to whittle and prepare food. Surgeons even use knives to save people’s lives. Thus, a knife is neither good nor bad per se and so is technical debt.

A Small Matter Of Interface Design

“Bad design shouts at you.
Good design is the silent seller.”
— Anon

Retired C++ hero Scott Meyers once wrote an article for the renowned IEEE Software magazine titled “The Most Important Design Guideline?“. According to him, the most important design rule is this:

“Make interfaces easy to use correctly and hard to use incorrectly”

I couldn’t agree more. While this rule is applicable to any system, today I want to present it in the light of software interface (API) design. As an illustrative example, I’m going to borrow a method from a former article of mine:

From what you’ve been given, it’s pretty easy to conclude what this routine does and how it’s supposed to be used: ‘getTemperature’ returns a temperature value (let’s say in degrees centigrade) and a status as out parameter. Even without further documentation, it’s not hard to guess that the temperature returned can only be trusted if the status is ‘good’. Most likely, this method reads the temperature value from a hardware sensor that sometimes is busy or dysfunctional.

If you ask me, this design easily passes another important design rule, the so-called “Principle of Least Astonishment“. However, it fails miserably when we apply Scott Meyer’s interface design rule. Why? Because it seduces programmers into using it incorrectly:

As you can see, it’s possible for programmers to get what they want (the temperature) without checking the status value.

The compiler usually doesn’t warn you if you forget to inspect the status variable, because it doesn’t know what ‘getTemperature’ is doing: it might pass it on to another function or store the pointer in a static/global variable.

How can we change the interface such that this mistake is impossible to happen?

Often in such or similar situations, experienced programmers shout “exceptions to the rescue!”. But will exceptions really save us? Let’s see:

Now, ‘getTemperature’ doesn’t provide a status any longer, it rather throws an exception if the temperature value is unreliable or not available. But again, the compiler cannot enforce that programmers provide the try/catch block. Unlike Java, C++ doesn’t have checked exceptions (where checked means that the compiler enforces that exceptions are either caught or declared) and for good reason. So an inconsiderate programmers might naively write:

Even worse, the fact that there is no status parameter any more doesn’t remind them or reviewers at a later point in time that a necessary check has been omitted. Depending on the reliability of the sensor, such code can run for years without problems until it suddenly crashes due to an uncaught exception.

If exceptions don’t help, what does? How about swapping the return value with the status code?

While some would argue that this version is a bit harder to use incorrectly, it’s still possible to do so:

Again, the programmer doesn’t check the status, this time by ignoring the return value. No compiler I’m aware of warns you if you forget to check a function’s return value. Static analysis tools (like PC-lint) do, but who uses them, anyway? And of those good people who do, who enables this particular warning? There are many functions whose return values you almost never care about, like ‘printf’ for example, which returns the number of characters printed, or ‘memcpy’ which returns a pointer to the destination. Therefore, such warnings are usually turned off, even if they are supported.

If you are so lucky and use a C++17 compiler, you can mark a return value as ‘nodiscard’:

Should you now dare to ignore the returned status value, your compiler will report an error. Or will it? Unfortunately not! I don’t know why the people who devised the C++17 standard didn’t go for a compile-time error. Instead, if the return value is ignored by programmers, “implementations should issue a warning in such cases”, they wrote. Arrgh! They didn’t even have the decency to demand a warning. As usual, “should” doesn’t guarantee anything.

Still, I’ve never come across a C++17 conforming compiler that didn’t issue a warning, irrespective of the warning level (unless you force the compiler to suppress all warnings). Therefore, I think that this last version of ‘getTemperature’ is the first that is likely to receive Scott’s blessings.

In a future installment, we’ll explore other alternative versions of ‘getTemperature’ and determine if they comply with the most important interface design rule, which is so important that it bears repeating: “Make interfaces easy to use correctly and hard to use incorrectly”. Stay tuned!

Taiming Unwieldy Initialization Routines

“Please forgive the long letter; I didn’t have time to write a short one.”
― Blaise Pascal

Sometimes, you need to write a component initialization method that does many things: initialize hardware (e. g. various sensors), multiple libraries, and so on. Usually, some initialization steps are dependent on the successful execution of previous steps, so if one step fails, you want to skip the rest of the initialization procedure to avoid crashes.

Such initialization methods can easily become 100+ lines long, which is far off from the 5 to 10 lines that Uncle Bob and followers of the Clean Code school of thought recommend. That’s why it’s often a good idea to put the initialization steps into methods of their own, each consisting of only 5 to 10 lines of code:

While this achieves the primary goal of reducing the length of the initialization method and works if only three steps are involved, a cascade of ten nested if-statements is not at all pleasant to look at. Clean Coders would certainly frown upon it.

Of course, C++ coders (clean or unclean) have a well-known solution for such cases: exceptions. Put a try/catch block around your code and throw exceptions if an initialization step fails:

But maybe you are not allowed to use exceptions in your project. How come? Some embedded folks abhor the idea of exceptions (for many, sometimes unjustified reasons) and even disable support for exceptions via compiler switches. Or, your language doesn’t support exceptions in the first place, like good ol’ C. So what do you do if you want to have cleaner code, anyway?

How about this technique:

This code is definitely shorter and more readable than the original code as well as the exception-based variant. But how does it work?

You simply pass a reference to a boolean flag (or pointer, if you prefer) to every initialization step routine. If the flag is already false, you skip executing the initialization code; otherwise you execute it and only set the flag to false in case of an error:

This approach flattens the nested-if cascade by moving the check for the successful execution of a previous step inside the method implementing the next step.

Usually, I try to avoid detailed error codes as much as possible. I prefer binary (boolean) results which only tell if something executed without failure. Should an error occur, I send detailed error information to a logger. If that’s not sufficient to you, you can use an enum instead of a boolean flag:

Instead of passing a reference to a boolean flag to you methods, you would pass a reference to an instance of enum Error. The rest of the code should stay mostly the same.