Playgrounds Revamped

“Play is the highest form of research.”
— Albert Einstein

Many years ago, I wrote about the importance of having playgrounds, that is, easy-to-access try-out areas for carrying out programming-related experiments with the overall goal of exploring and learning.

Recently, I’ve reworked my C++ playground and uploaded it to GitHub. Compared to my previous C++ playground, the new one comes with the following major advantages:

  1. Shared access to playgrounds from multiple computers — since it is based on a Git repository.
  2. Every experiment has its own subdirectory — the top-level playground directory stays clean and clearly arranged.
  3. Unit test support through Google Test — running ‘make’ not just builds the experiment but also executes contained unit tests.

Once cloned and installed, you can start a new experiment is this:

‘pg-setup’ will create a directory called ‘init_within_loop_body’ along with a ‘Makefile’ and a ‘init_within_loop_body.cpp’ source file. Plus, if you have defined your ‘EDITOR’ environment variable properly, it will open ‘init_within_loop_body.cpp’ in your favorite editor for you. All that’s left to do is add your experiment’s code to the testcase template:

Now, just type/execute ‘make’ (either from within your editor or from the command-line) and your code will be compiled and run:

Pointers in C, Part II: CV-Qualifiers

“A teacher is never a giver of truth; he is a guide, a pointer to the truth that each student must find for himself.”
— Bruce Lee

In part I of this series, I explained what pointers are in general, how they are similar to arrays, and — more importantly — where, when, and why they are different to arrays. Today, I’ll shed some light on the so-called ‘cv qualifiers’ which are frequently encountered in pointer contexts.

CV-QUALIFIER BASICS

CV-qualifiers allow you to supplement a type declaration with the keywords ‘const’ or ‘volatile’ in order to give a type (or rather an object of a certain type) special treatment. Take ‘const’, for instance:

‘const’ is a guarantee that a value isn’t (inadvertently) changed by a developer. On top of that, it gives the compiler some leeway to perform certain optimizations, like placing ‘const’ objects in ROM/non-volatile memory instead of (expensive) RAM, or even not storing the object at all and instead ‘inline’ the literal value whenever it’s needed.

‘volatile’, on the other hand, prevents optimizations. It’s a hint to the compiler that the value of an object can change in ways not known by the compiler and thus the value must never be cached in a processor register (or inlined) but instead always loaded from memory. Apart from this ‘don’t optimize’ behavior, there’s little that ‘volatile’ guarantees. In particular — and contrary to common belief — it’s no cure for typical race condition problems — It’s mostly used in signal handlers and to access memory-mapped hardware devices.

Even if it sounds silly at first, it’s possible to combine ‘const’ and ‘volatile’. The following code declares a constant that shall not be inlined/optimized:

Using both ‘const’ and ‘volatile’ together makes sense when you want to ensure that developers can’t change the value of a constant and at the same time retain the possibility to update the value through some other means, later. In such a setting, you would place ‘MAX_SENSORS’ in a dedicated non-volatile memory section (ie. flash or EEPROM) that is independent of the code, eg. a section that only hosts configuration values*. By combining ‘const’ and ‘volatile’ you ensure that the latest configuration values are used and that these configuration values cannot be altered by the programmer (ie. from within the software).

To sum it up, ‘const’ means “not modifiable by the programmer” whereas ‘volatile’ denotes “modifiable in unforeseeable ways”.

CV-QUALIFIERS COMBINED WITH POINTERS

Like I stated in the intro, cv-qualifiers often appear in pointer declarations. However, this poses a problem because we have to differentiate between cv-qualifying the pointer and cv-qualifying the pointed-to object. There are “pointers to ‘const'” and “‘const’ pointers”, two terms that are often confused. Here’s code involving a pointer to a constant value:

Since the pointer is declared as pointing to ‘const’, no changes through this pointer are possible, even if it points to a mutable object in reality.

Constant pointers, on the other hand, behave differently. Have a look at this example:

The takeaway is this: if the ‘const’ keyword appears to the left of the ‘*’, the pointed-to value is ‘const’ and hence we are dealing with a pointer to ‘const’; if the ‘const’ keyword is to the right of the ‘*’, the pointer itself is ‘const’. Of course, it’s possible to have the ‘const’ qualifier on both sides at the same time:

The same goes for multi-level pointers:

Here, ‘v’ is a regular (non-‘const’) pointer to ‘const’ pointer to a pointer to a ‘const’ integer.

Yuck! Sometimes, I really wish the inventors of C had used ‘<-‘ instead of ‘*’ for pointer declarations — the resulting code would have been easier on the eyes! Consider:

versus

So

would read from right to left as “v is a POINTER TO const POINTER TO const int”. Life would be some much simpler… but let’s face reality and stop day-dreaming!

Everything I said about ‘const’ equally applies to pointers to ‘volatile’ and ‘volatile’ pointers: pointers to ‘volatile’ ensure that the pointed-to value is always loaded from memory whenever a pointer is dereferenced; with ‘volatile’ pointers, the pointer itself is always loaded from memory (and never kept in registers).

Things really get complicated when there is a free mix of ‘volatile’ and ‘const’ keywords with pointers involving more than two levels of indirection:

Let’s better not go there! If you are in multi-level pointer trouble, remember that there’s a little tool called ‘cdecl‘ which I showcased in the previous episode. But now let’s move on to the topic of how and when cv-qualified pointers can be assigned to each other.

ASSIGNMENT COMPATIBILITY I

Pointers are assignable if the pointer on the left hand side of the ‘=’ sign is not more capable than the pointer on the right hand side. In other words: you can assign a less constrained pointer to a more constrained pointer, but not vice versa. If you could, the promise made by the constrained pointer would be broken:

If the previous statement was legal, a programmer could suddenly get write access to a read-only variable:

Again, the same restrictions hold for pointers to ‘volatile’. In general, pointers to cv-qualified objects are more constrained than their non-qualified counterparts and hence may not appear on the right hand side of an assignment expression. By the same token, this is not legal:

ASSIGNMENT COMPATIBILITY II

The rule which requires that the right hand side must not be more constrained than the left hand side might lead you to the conclusion that the following code is perfectly kosher:

However, it’s not, and for good reason, as I will explain shortly. But it’s far from obvious and it’s a conundrum to most — even seasoned — C developers. Why is it possible to assign a pointer to non-const to a pointer to ‘const’:

but not a pointer to a pointer to non-const to a pointer to a pointer to ‘const’?

Here is why. Imagine this example:

Graphically, our situation is this. ‘ppc’ points to ‘p’ which in turn points to some random memory location, as it hasn’t been initialized yet:

Now, when we dereference ‘ppc’ one time, we get to our pointer ‘p’. Let’s point it to ‘VALUE’:

It shouldn’t surprise you that this assignment is valid: the right hand side (pointer to const int) is not less constrained than the left hand side (also pointer to const int). The resulting picture is this:

Everything looks safe. If we attempt to update ‘VALUE’, we won’t succeed:

But we are far from safe. Remember that we also (indirectly) updated ‘p’ which was declared as pointing to a non-const int and ‘p’ was declared as pointing to non-const? The compiler would happily accept the following assignment:

which leads to undefined behavior, as the C language standard calls it.

This example should have convinced you that it’s a good thing that the compiler rejects the assignment from ‘int**’ to ‘const int**’: it would open-up a backdoor for granting write access to more constrained objects. Finding the corresponding words in the C language standard is not so easy, however and requires some digging. If you feel “qualified” enough (sorry for the pun), look at chapter “6.5.16.1 Simple assignment”, which states the rules of objects assignability. You probably also need to have a look at “6.7.5.1 Pointer declarators” which details pointer type compatibility as well as “6.7.3 Type qualifiers” which specifies compatibility of qualified types. Putting this all into a cohesive picture is left as an exercise to the diligent reader.

________________________________
*) Separating code from configuration values is generally a good idea in embedded context as it allows you to replace either of them independently.

The Right to Choose

Shame on me! For the first time in a decade, I forgot to celebrate Towel Day. The fact that my towel proved so incredibly useful a couple of days earlier makes me feel even guiltier.

Douglas Adams is admired by most of his fans because he was not just tremendously funny, but also a shrewd freethinker who loved to show people how limited and simple-minded their views and beliefs are (anyone remember the Total Perspective Vortex?). But actually, he himself had his eyes opened one day by another person: in an interview with BBC, Douglas Adams disclosed that one of the most influential, eye-opening books he had read was “The Blind Watchmaker” by Richard Dawkins.

In his book, Richard Dawkins demonstrates that by Darwin’s Theory of Evolution and natural selection enormous complexity can arise out of even the simplest building blocks and that there really is no need for a Creator. Douglas Adams and Richard Dawkins were so like-minded that it’s no wonder they later became close friends.

At the end of one of his talks, Richard Dawkins presents a slide titled “The Illogic of Default”, which demonstrates the ill-reasoning of many Creationists:

1. We have theory A and theory B
2. Theory A is supported by loads of evidence
3. Theory B is supported by no evidence at all
4. I can’t understand how theory A explains X
5. Therefore theory B must be right

This exactly describes what happend to me one day when a muslim taxi driver tried to “prove” to me that there indeed must be a god. “See”, he said, “the theory of evolution just can’t be right. If we humans are really decedents of animals like apes and dogs, why are there still apes and dogs around?”.

I refrained from arguing with my taxi driver, even though it was obvious that he didn’t understand the Theory of Evolution at all. How could I resist the temptation?

I absolutely admire Richard Dawkins for his wit and I recommend that you watch all of his Youtube talks and videos, but there is a problem with atheists like him: sometimes, they are just as annoying as followers of any other conviction when they try to foist their views on others. Even if we know that Creationist’s arguments are utterly wrong from a logical and scientific point of view, it doesn’t help: it’s a documented fact that a significant part of humankind has a strong desire for spirituality. And, as we all know, whenever strong desires are involved, appealing to logic doesn’t work: drug addicts do know that drugs ultimately kill them but they nevertheless don’t quit.

Likewise, deep in their hearts, many believers in god assume that there is no god, at least not one like the one that is depicted in the holy scriptures, but they need a god for their mental well-being, anyway. Trying to prove to them that there is no Creator is a) futile and b) hurts them — not physically but emotionally. This is why I didn’t argue with the taxi driver. I strive to follow the Golden Rule, which — incidentally — also appears in many holy scriptures: always treat others the way you want to be treated.

So why I and many other people prefer to have their eyes opened, others don’t. Everybody should be free to choose the color of the pill they take.

Pointers in C, Part I: Pointers vs. Arrays

“Remember, When You Point a Finger at Someone, There Are Three More Pointing Back at You”
— Unknown

It’s easy to meet even long-time C programmers who don’t fully grok pointers, let alone beginners. Because of this and the fact that pointers play such a crucial role in the C programming language, I’ve decided to launch a new series of blog posts on pointers. I want to start off with an episode that sheds some light on similarities and — more importantly — differences between pointers and arrays.

POINTERS AND ARRAYS: THE BASICS

An array is a sequence of same-sized objects, integers, for instance:

On a big-endian machine, ‘array’ could be stored like this (that it starts at memory address 0xB00010 is just an example):

The compiler (or rather the linker) places the array at a fixed memory location. Thus, When you think array, think memory.

By contrast, a pointer is an object that holds a memory address. Pointers are used to refer to memory where an object of a specific type (like ‘int’) resides.

Pointers are used for flexibility: you can refer to another object at run-time by changing the memory address stored inside the pointer variable:

A pointer introduces a level of indirection: in order to access the actual object it refers to (and not the pointer variable itself), you dereference it:

DIRECT ACCESS VS. INDIRECT ACCESS

The crucial difference between pointers and arrays is how memory is accessed. For instance, when you retrieve the first array element:

the compiler generates code along these lines:

1. Load address of beginning of array into register A
2. Load data at address stored in A into register B

Whereas when you fetch the first array element via a pointer pointing to it:

The generated code will access memory indirectly very much like this:

1. Load address of pointer into register X
2. Load data at address in register X into register Y
3. Load data at address in register Y into register B

So as you can see, pointers and arrays use different ways to access memory and hence are fundamentally different beasts.

WHEN POINTERS LOOK LIKE ARRAYS AND VICE VERSA

Nevertheless, there are cases where pointers and arrays appear to be same thing.

The C language comes with a little bit of syntactic sugar. In certain situations you can use an array like you would use a pointer:

This looks like you are dereferencing a pointer named ‘array’, but looks can be deceiving. What this really compiles to is this:

Why? According to the C language standard, in expressions, the name of an array acts as a pointer to the first array element. Hence, the compiler really sees this:

which is equivalent to

Similarly, you can dereference pointers not just by using the ‘*’ operator but also by using the subscript operator [], which is another form of syntactic sugar — one that makes you believe you are accessing an array instead of a pointer:

All this syntactic sugar makes C code involving pointers and arrays easier on the eyes — the compiler will do some access magic behind the scenes. The downside is, that it deludes people into believing that pointers and arrays are the same, which is not the case: arrays employ direct access, pointers indirect access.

Contrary to expressions, such syntactic sugar is not available in declarations. If you define an array in one translation unit (file):

and foolishly attempt to import it into another translation unit via this forward declaration:

you risk a crash because dereferencing ‘VALUES’ will indirectly access memory when a direct access was required. Let’s assume that the array is stored like this, as defined in the first translation unit:

Now, dereferencing ‘VALUES’ declared as a pointer will lead to these steps:

1. Load address of pointer ‘VALUES into register X (X = 0x00B00210)
2. Load data at address in register X into register Y (Y = 0x00001111)
3. Load data at address in register Y into register B (B = ???)

What this means in practice depends on whether the address 0x00001111 is a valid address or not. If it is, arbitrary data will be read; otherwise, the memory management unit (MMU) will raise an exception. Therefore, make sure that your array declarations exactly match your definitions:

PASSING ARRAYS TO FUNCTIONS

So far so good (or bad). Another source of confusion is the fact that arrays are the only objects in C that are implicitly passed by reference:* You always provide a pointer to the first array element to get an array into a function:

At the caller’s site, the code looks like this:

TYPE-SAFETY THAT ISN’T

Sometimes, you want to ensure at compile-time, that only arrays of certain sizes can enter your function. Imagine you have a function that builds a 128-bit random value in an array of eight bytes:

‘get_random’ assumes that it is passed the address of eight bytes of memory, but nobody prevents the caller from passing an array that is not big enough:

Which will — of course — lead to a dreaded buffer overrun.

Is it possible to make ‘get_random’ type-safe, such that arrays with a length different to eight lead to compile-time errors?

One (ill-fated) approach is to employ a C feature that allows you to declare arguments using array-like notation:

However, this doesn’t give you any extra type safety. To the compiler, ‘random’ is still a pointer to a ‘uint8_t’ and if you ask for the size of ‘random’ (via sizeof(random)) in the body of the function, you will still get the value returned by sizeof(uint8_t*). Few developers are aware of this fact. To me, it’s a source of nasty bugs.

Since this array-ish syntax fools people into believing that a real array was passed to a function (by value) I don’t recommend using it.

TYPE-SAFETY DONE RIGHT

You can get real type-safety for your “array” arguments through so-called “pointers to arrays”. Alas, this C feature tends to confuse the heck out of programmers.

In the previous examples, we passed an array (conceptually) by passing a pointer to the first element:

The real type of the array and the size of the array is lost in this process; the called function only sees a pointer to a ‘uint8_t’. By contrast, the following syntax allows you to obtain a pointer to an array that preserves the full type information:

This ‘pointer’ is completely type-safe:

To add type-safety to our ‘get_random’ function, we could define it like this:

With this change, ‘get_random_type_safe’ only accepts pointers to 8 element arrays of uint8_t’s. Passing any other kind of pointer will result in a compile-time error.

We know that in expressions, using an array’s name like ‘array’ is short for “pointer to first element in array” but that doesn’t mean that ‘&array’ is a pointer to a pointer to the first element — the ‘&’ operator doesn’t create another level of indirection, even though it looks like it did. In the previous example, the value stored in ‘pointer’ is still the address of the first element of the array. Hence, this assertion holds:

Since the actual pointer values are the same, you can still use legacy APIs that only accept pointers to ‘uint8_t’s (like the original ‘get_random’ function), if you apply type casts:

You don’t need typedefs like ‘RANDVAL’ if you want to employ pointers to arrays. I mainly used it to avoid overwhelming you with the hideous pointer-to-array syntax. Without typedefs, you would need to type in things like this:

The syntax to declare pointers to arrays is similar to the syntax to declare pointers to functions and takes a little getting used to. If in doubt, ask the Linux tool ‘cdecl’ which is also available online:

Do I recommend using pointers to arrays? No, at least not in general. It confuses way too many developers and leads to ugly casts in order to access plain pointer interfaces. Still, pointers to arrays make sense every now and then and it’s always good to know your options.

This concludes my first installment on pointers. There is more to come. Stay tuned!

________________________________

*) The language designers of C believed that passing an array by value (e. g. as a copy via the stack) would be extremely inefficient and dangerous (think: stack overflow), so there is no direct way to do it. However, they were not so fearful regarding structs (which can also get quite large and overflow the stack), so you could pass an array by value if you wrapped it inside a struct:

Bug Hunting Adventures #12: String Limits

“The limits of my language mean the limits of my world.”
— Ludwig Wittgenstein

The aim of the routine below (‘reduce_string’) is to limit a given ‘string’ to at most ‘max_len’ characters. If the length of ‘string’ exceeds ‘max_len’, characters are removed from around the middle and filled with an ‘ellipsis’ string. Here are some examples that demonstrate what ‘reduce_string’ is supposed to do:

But as always in this series, a bug slipped in. Can you find it?

Solution

Working the Bash Shell Like a Pro

“People drive cars with steering wheels and gas pedals. Does that mean you don’t need wrenches?”
— Rob Pike

I’ve always preferred command-line interfaces (CLI) over GUIs. If I use GUIs at all then it’s mostly for browsing the web. Luckily, there is a plugin for my web browser that allows me to do most of my surfing using vi keystrokes. Yes, I try to avoid the mouse as much as I can.

I believe that most people who prefer GUIs either are bad at typing or haven’t taken the time to learn to use a CLI in an idiomatic way. With this post, I want to share some Bash CLI idioms that can significantly improve your efficiency. I don’t aim for a complete list — I rather present a compilation of the most frequently “not-known” techniques that I’ve encountered when I observed people working with Bash.

THE BASICS

First of all, make sure that you have enabled the command-line editing mode that suits you best. You are much more productive when your favorite shortcuts are available:

Often, we need to do something that we’ve already done before. You can easily re-execute the previous command with this shortcut:

Courtesy of this operator, forgetting to put ‘sudo’ in front of a command is not a big deal:

If you want to re-execute “one of the previous commands” instead of the last one, you could use the following abbreviations:

However, I don’t find these history expansions particularly useful. Often, going through the history by pressing ‘Arrow Up’ is equally fast.

The real game changer, however, is CTRL-R. Press CTRL-R and start typing any characters from one of your previous commands. This will start a backwards search through your command-line history and present you with the most recent command containing the characters you typed (or continue to type). Pressing CTRL-R again will find the next most recent command and so on. If you only partly need what you typed in the past, no problem — you can always edit the command-line that CTRL-R found before executing it.

MORE FUN WITH WORD DESIGNATORS

If you want to rename a file, please don’t do it like this:

Even with TAB completion, this requires too much typing. Why not reuse the path of the first argument?

‘!#’ is a shortcut for “the command-line typed so far” ‘:1’ selects the first argument and ‘:h’ strips off the last component (ie. “oldfile”).

In some cases, you don’t want to rename the file entirely but only change the extension. This can be achieved in a similar fashion:

You guessed it: ‘:r’ removes the file extension.

What if you did a mistake and wanted to undo this change? Again, that’s quite easy if you know the trick:

Which translates to “do another move but swap the arguments from the previous command”.

Sometimes, my fingers get ahead of me and I type ‘vm’ instead of ‘mv’:

Of course, you can always edit the last command be pressing ‘Arrow Up’ and change ‘vm’ to ‘mv’, but the following is much easier to type:

‘!*’ is a placeholder for “all arguments of the previous command”.

The word designator that I use the most — by far — is ‘!$’; it expands to the last argument of the last command:

Many times, people gratuitously reach for the mouse to copy the output of a previous command in order to use it as an argument for another command. Why not use ‘$()’ to capture command output?

SOME EXTERNAL HELP

If I was asked to name my favorite standard command-line tool, no doubt I would pick ‘xargs‘. Even though it is largely useless by itself, it’s the superglue that allowes you to build powerful command-lines. It takes the output of a command and uses it as arguments for another one.

Here’s an example that uses ‘xargs’ to build a tar archive containing all the C files that contain C++ comments:

In rare cases, when I have to do work that involves complicated selection/filtering, I reach out for TUI (usually ncurses-based) applications like ‘tig‘, ‘vifm‘, or ‘mc‘ that run in the shell and can be fully operated by the keyboard. Nevertheless, I first try to get by with the simpler ‘menucmd‘ tool. Here’s an example that builds a menu from all shell script files found in a directory. If the user selects an item, the specified action (‘cp -t /tmp’) is executed on it.

There you go. Even if this bag of tricks is not complete I hope it will serve you well. As always, in order to use a tool efficiently, you have to invest in learning how to use it idiomatically*. But this investment pays nice dividends in the medium-to-long term.

*) What’s the idiomatic way for vi users to underline a headline? 1. Yank the headline (‘yy’). 2. Paste the yanked headline under the exiting headline (‘p’). 3. Select the second headline (‘V’). 4. Replace every character in selected line with an underscore (‘r-‘) — that’s only six keystrokes! Awesome!