Why We Count From Zero

“A zero itself is nothing, but without a zero you cannot count anything; therefore, a zero is something, yet zero.”
— Dalai Lama

If you do a Google search for why programmers typically start counting from zero, you’ll likely find two reasons. Today, I’m going to add another one. But let’s start with the usual explanations.

DIJKSTRA’S HALF-OPEN RANGES

On August 11, 1982, Edsger Dijkstra wrote a short paper on why numbering should start at zero. He first demonstrates that half-open ranges with an excluded upper bound are superior to other alternatives:

Here’s a quick summary of his reasoning, in case you don’t want to read it yourself. The main advantages are that a) you can easily represent empty ranges (i. e. lower equals upper) and b) compute the number of elements in a range by subtracting the lower bound from the upper bound.

Based on ranges where the upper bound is excluded, Dijkstra goes on to show that for sequences of N elements, there are only two ways of indexing:

Obviously, the latter is much more elegant and hence we see code like this everywhere:

BASE-RELATIVE ADDRESSING

Sequences of heterogeneous data in contiguous memory are almost universally laid-out like this:

A sequence starts with the first element at some base address, the second follows sizeof(elem) bytes after the first element and so on. Computing the start address of the n-th element can be done using this formula:

However, this formula only applies if you index your elements from 0 to N – 1. If instead you chose to number indices from 1 to N, the formula would need to be adapted:

This alternative is not only less pleasant to look at, but because of the additional subtraction also harder for the CPU to compute. Consequently, the fathers of C employed the zero-based array access syntax that we are all so familiar with:

which is really just a shorthand notation for

If i is 0, you get the first element, if i is positive, the i-th successor of elem, and if i is negative, the i-th predecessor of elem. The latter fact often surprises developers because they either assume that negative offsets are illegal in the first place or yield elements from the end of the array, like in Python.

Incidentally, there’s another secret to C array indexing: since the addition operation is commutative, you can equally well write

As obvious as this is in hindsight, it’s not well known amongst C programmers and a good opportunity to show off at parties. However, I strongly advise against writing code like this for production use.

MODULAR ARITHMETIC

As you know, applying the mod operator like this

yields values ranging from 0 to N – 1. This dovetails nicely with zero-based indices into sequences.

Take hash maps for example. To determine the position of an element in a hash map containing N slots, just apply a hash function and take the result modulo N to get the index. That’s it!

Another use case is the ring buffer, one of my favorite containers: to advance an index into a ring buffer, just add the desired offset and apply the mod operator to get wrap-around — no need for extra if/else logic. Again, starting indices at 1 instead of 0 would entail extra additions and subtractions.

There you have it — one more reason to start numbering from zero. (As if you still needed to be convinced…)

A Neverending Story

“Nothing is lost. Everything is transformed.”
― Michael Ende, The Neverending Story

I explained in this post that I don’t view technical debt as something that is bad per se. Rather, I believe that “good technical debt” — at times — should be employed for strategic reasons. As an example, when developing a feature, you might take shortcuts in order to unblock stalling teammates who need your changes in order to carry on with their own work.

Let me reiterate: to qualify as good technical debt, it must be

1. taken on consciously
2. managed
3. repaid timely

How can this be achieved in practice?

Once you’ve made a deliberate decision and considered the pros and cons, you implement your makeshift solution. But before you mark your task as done, you create another task which has the goal of removing the technical debt you just introduced. But where should you put this task?

Don’t hide it in the product backlog as it a) contains usually many issues already b) is more concerned with externally observable features and c) is actually owned by the product owner. If you did, most likely technical debt issues would be delayed (or rather ignored) in favor of “real” stories, which would grossly violate good technical debt requirement #3: “repaid timely”. Instead, put it under a story of the current sprint titled “Repay technical debt”.

An immediate advantage of this approach is that it makes your shortcuts visible to the whole team. Further, everybody, including you, can pick up this task and just start working on it. However, in typical cases, such clean-up tasks won’t be done in the current sprint. So what happens with the “repay” story and all attached tasks at the end of a sprint?

You move it to the next sprint, of course! Consider it a story that never ends, which nicely mirrors a well-known software engineering truth: the fight against software entropy goes on forever. Since the story doesn’t go away, it’s a great reminder for the whole team that repaying technical debt (or constant improving of internal software quality) is of super-high importance.

Having this story in place allows you to manage technical debt, as stipulated by good technical debt requirement #2. There are too many technical debt tasks in this story? Maybe it’s time for a “technical debt sprint” where everyone focuses on getting the list shorter instead of adding new functionality. Are there no technical debt tasks at all? Maybe the team isn’t really tracking their technical debt. Another possibility is that the team doesn’t use “good technical debt” as a strategic tool and instead make other people wait for their gold-plated implementation.

In my view, a neverending “payback technical debt” story is a great tool. It’s a quality backlog maintained by developers which puts dirt right in front of everybody’s noses. I believe that this drastically mitigates the risk of creating a maintenance nightmare while still allowing for occasional shortcuts.

Working the Bash Shell Like a Pro, Service Pack 1

“Man is a tool-using animal. Without tools he is nothing, with tools he is all.”
— Thomas Carlyle

“A fool with a tool is still a fool.”
— anonymous

“A fool with a tool is a dangerous fool.”
— anonymous

Five years ago, I wrote a popular post called “Working the Bash Shell Like a Pro“, a short collection of essential tips and tricks for working efficiently with the Bash shell.

Like everyone, during my daily interaction with Bash I use many “tricks”, however, only few of them qualify for being added to my original list. Now, the time has come for a small update.

1. The Bash Curse

As you already know from the previous post, ‘!#’ selects the “entire command-line typed so far”, ‘:1’ takes the first argument in the “command-line typed so far” and ‘:r’ strips the extension from it. So changing the file extension of a file can be done with a fairly short command-line:

which is — depending on the path length — a lot shorter than

Instead of picking the first argument by employing the ‘:1’ word designator you could equally well take the last argument by specifying ‘:$’. Why? There is only one argument typed so far (“some/long/path/to/file.c”), so the first argument is the last argument. Consequently, this achieves the same effect as the previous command:

You probably think that this hasn’t gained us much since the number of characters to type hasn’t changed. That’s true but Bash has a special shortcut for ‘!#:$’ which is ‘!#$’. This saves us the colon and hence one character:

Since ‘!#$’ looks like a grawlix, I call this character sequence the “Bash curse”. I often use it many times a day, whenever I need to reuse (parts) of the argument preceding the argument I’m currently typing.

2. Skipping History

All of the mentioned tricks depend heavily on Bash’s history feature. After all, we want to save time by reusing something we typed earlier. Sometimes, however, we would rather not mess up our history. Take this command-line, for example:

Let’s further assume that you need to execute this regularly, many times a day. If you haven’t created an alias for it, you’ll probably pull the CTRL-R stunt*, something which I also explained in my previous post. So you hit CTRL-R and start typing “conan install”. Immediately, the desired command-line shows up and all you have to do is hit “Enter” to execute it.

So far so good. Sometimes, however, you want to fine-tune the command-line, for instance, to do a release build:

After you executed this command, your next attempt to retrieve the original command-line via CTRL-R will first find the one containing ‘-s build_type=Release’ which is not what you want as you only rarely want to do release builds. So it would have been better if the release build command-line had never been recorded in Bash’s history.

Another example of when you don’t want something to be entered in you Bash history is when you provide passwords/credentials on the command-line, as in

So how do you avoid that something is remembered in Bash? It’s easy. Just put a single space before the command you are about to execute:

This concludes service pack 1. Don’t expect service pack 2 anytime soon…

*) If I had to give up all the Bash tricks and only keep one, I’d would keep — hands down — the CTRL-R trick [back]

Why I Still Use std::lock_guard

“Better a lively old epigram than a deadly new one”
— Helen Rowland

If you’re programming in C++17 or above, you can choose among three different RAII-like mutex wrappers: std::lock_guard, std::scoped_lock, and std::unique_lock. But which one should you pick?

Most C++ programmers today advise you to use std::scoped_lock by default, unless you have to use std::unique_lock (if you want to move locks or if an API (like std::condition_variable) forces you to do so). One answer on StackOverflow goes as far as saying that

“… scoped_lock is a strictly superior version of lock_guard that locks an arbitrary number of mutexes all at once (using the same deadlock-avoidance algorithm as std::lock). In new code, you should only ever use scoped_lock.”

Despite of such advice, good ‘ol std::lock_guard is still my go to mutex wrapper. Why? Because most of the time, I

– only need to lock a single lock at a time
– don’t need to move or swap locks
– rarely need to manually lock/unlock the mutex once it’s wrapped
– prefer using features from earlier C++ standards, provided they fulfill my needs
– avoid code that violates Scott Meyers’ “most important design guideline”

Let me elaborate on the last point. Consider this code

This code looks like a typical case for a RAII-wrapper: You want to automatically lock and unlock (even if an exception is thrown, for instance) and reduce the time the lock is held to a minimum — hence the extra curly braces. Code like this gets written. I wrote code like this myself. I also saw similar code when reviewing somebody else’s code.

The problem with this code, however, is that it’s dead wrong. In the heat of the moment, the poor programmer forgot to pass a mutex to the std::scope_lock’s constructor, as in

In the original code sample, nothing is locked. Since the variadic constructor can be called without providing arguments, the compiler is happy. The application might actually work for quiet some time until it suddenly produces “strange results”. Try this with std::lock_guard and the compiler will rightfully reject your code, as std::lock_guard doesn’t have a default constructor.

In my view, the fact that std::scoped_lock’s constructor can be called without arguments is a gross violation of Scott Meyers’ “most important design guideline”, a guideline that I already wrote about recently: “Make interfaces easy to use correctly and hard to use incorrectly”.

To Pluralize, Or Not To Pluralize, That Is The Question

“Consciousness is a singular for which there is no plural”
— Erwin Schrödinger

When it comes to naming directories and files, some folks seem to insist on adding plural ‘s’ letters, as in

docs/, tests/, srcs/, recipes.txt

People like this have a collection-oriented view of the world. To them, “docs” is a label for a folder holding documents, just like a label on a box saying “screws” denotes that there’s a collection of screws in it.

When “pluralists” come across a directory named “doc”, it causes them grief. Why do people do that, they whine, there’s clearly an ‘s’ missing — is it just a typo or was it done deliberately, to save typing?

Let me tell you this: it’s usually done deliberately, but not to save a measly character. It’s done by individuals who have an identity-oriented view of the world and don’t care about containment and multiplicity. To such people, a folder named “doc” is the documentation of a project. It may hold a single text file, multiple PDFs and even some markdown documents. Likewise, “src” is the “source code” and “test” is the corresponding test in its entirety.

So there you have it: the reason why there is no ‘s’ in a name is just a coincidence in cases where the abbreviated name of what something is looks like the singular form of an item of a collection. “doc”, mind you, is not the abbreviation of “document” — it’s meant to be the abbreviation of “documentation”. (Incidentally, the Linux project avoids this confusion by keeping all the documentation in a folder named “Documentation“.)

My general advice is to strive hard to name something after what it is for the sake of better abstraction. Clearly, “essay” is more meaningful than “characters” and by the same token, I prefer “cookbook.txt” over “recipes.txt”. Only when a container has no higher-level purpose other than containment it should be given the plural form of items it contains, but this should rarely be the case.

A Small Matter Of Interface Design, Redux

“Bad design is simply great imagination without wisdom”
— Onur Mustak Cobanli

In my previous post on this subject, I introduced Scott Meyer’s most important interface design rule: “Make interfaces easy to use correctly and hard to use incorrectly”. As a case in point, we looked at a function which returns a temperature reading from a sensor plus a status code. The temperature sensor might be temporarily or permanently unavailable, so the temperature value can only be relied upon if the returned status is “good”. “Making interfaces hard to use incorrectly” means in this case “making it hard to ignore the returned status”. I claimed that this version of getTemperature is a move in the right direction:

Still, it’s not perfect for at least four reasons. First of all, the [[nodiscard]] attribute is only available if your compiler supports the C++17 language standard. Second, like I explained in my previous post, the C++17 language standard only “recommends” that a compiler issues a warning. Third, even if your compiler does issue a warning, a programmer might disable this/some/all warnings for a particular project, module, or function. But even if none of these reasons apply, a programmer might still dodge checking the status value.

The easiest way to suppress a warning resulting from an unchecked [[nodiscard]] return value is by casting the returned value to void:

You probably think that such developers are stupid and that they are putting their head on the block, that they deserve the pain caused by their deliberate misuse. I certainly agree, but in safety-critical systems, somebody other than the developer might actually suffer the pain, either physically or financially.

In other, more insidious cases, bypassing of [[nodiscard]] can happen accidentally:

Here, the return value is accessed (for logging), so no warning is issued by the compiler. Nevertheless, the value is not checked and thus the code is still buggy. What we need is a status code that enforces that its value is checked. Enter class Checked.

All but the most trivial template code can look intimidating, but Checked is actually quite simple. It acts as a wrapper around a type (Status, in our case) and stores a value of that type. At construction time, a flag called checked is set to false. If this flag is still false when the object is destructed, std::terminate() will be called. The only way to set the flag to true is by invoking the comparison operators == and !=. Here are some examples:

Applied to our getTemperature function, we get

Even though the status value is accessed for logging, std::terminate will still be called because we failed to actually check the status value. The correct usage looks like this:

Did you notice that I tagged the whole Checked class with the [[nodiscard]] attribute? As a consequence, all your functions that return Checked values are implicitly declared [[nodiscard]]. Less typing, less risk of forgetting to add it. Cool!

By using Checked you explicitly communicate to users of your code that they must check the value. The first line of defense is the [[nodiscard]] check at compile-time. If callers cast the returned value to (void) or otherwise fail to check/compare the returned value, they’ll get busted by std::terminate. The upshot of using Checked is that your interfaces are much harder to use incorrectly than by just using [[nodiscard]]. Let’s hope Scott Meyers is satisfied now.