Working the Bash Shell Like a Pro

“People drive cars with steering wheels and gas pedals. Does that mean you don’t need wrenches?”
— Rob Pike

I’ve always preferred command-line interfaces (CLI) over GUIs. If I use GUIs at all then it’s mostly for browsing the web. Luckily, there is a plugin for my web browser that allows me to do most of my surfing using vi keystrokes. Yes, I try to avoid the mouse as much as I can.

I believe that most people who prefer GUIs either are bad at typing or haven’t taken the time to learn to use a CLI in an idiomatic way. With this post, I want to share some Bash CLI idioms that can significantly improve your efficiency. I don’t aim for a complete list — I rather present a compilation of the most frequently “not-known” techniques that I’ve encountered when I observed people working with Bash.

THE BASICS

First of all, make sure that you have enabled the command-line editing mode that suits you best. You are much more productive when your favorite shortcuts are available:

Often, we need to do something that we’ve already done before. You can easily re-execute the previous command with this shortcut:

Courtesy of this operator, forgetting to put ‘sudo’ in front of a command is not a big deal:

If you want to re-execute “one of the previous commands” instead of the last one, you could use the following abbreviations:

However, I don’t find these history expansions particularly useful. Often, going through the history by pressing ‘Arrow Up’ is equally fast.

The real game changer, however, is CTRL-R. Press CTRL-R and start typing any characters from one of your previous commands. This will start a backwards search through your command-line history and present you with the most recent command containing the characters you typed (or continue to type). Pressing CTRL-R again will find the next most recent command and so on. If you only partly need what you typed in the past, no problem — you can always edit the command-line that CTRL-R found before executing it.

MORE FUN WITH WORD DESIGNATORS

If you want to rename a file, please don’t do it like this:

Even with TAB completion, this requires too much typing. Why not reuse the path of the first argument?

‘!#’ is a shortcut for “the command-line typed so far” ‘:1’ selects the first argument and ‘:h’ strips off the last component (ie. “oldfile”).

In some cases, you don’t want to rename the file entirely but only change the extension. This can be achieved in a similar fashion:

You guessed it: ‘:r’ removes the file extension.

What if you did a mistake and wanted to undo this change? Again, that’s quite easy if you know the trick:

Which translates to “do another move but swap the arguments from the previous command”.

Sometimes, my fingers get ahead of me and I type ‘vm’ instead of ‘mv’:

Of course, you can always edit the last command be pressing ‘Arrow Up’ and change ‘vm’ to ‘mv’, but the following is much easier to type:

‘!*’ is a placeholder for “all arguments of the previous command”.

The word designator that I use the most — by far — is ‘!$’; it expands to the last argument of the last command:

Many times, people gratuitously reach for the mouse to copy the output of a previous command in order to use it as an argument for another command. Why not use ‘$()’ to capture command output?

SOME EXTERNAL HELP

If I was asked to name my favorite standard command-line tool, no doubt I would pick ‘xargs‘. Even though it is largely useless by itself, it’s the superglue that allowes you to build powerful command-lines. It takes the output of a command and uses it as arguments for another one.

Here’s an example that uses ‘xargs’ to build a tar archive containing all the C files that contain C++ comments:

In rare cases, when I have to do work that involves complicated selection/filtering, I reach out for TUI (usually ncurses-based) applications like ‘tig‘, ‘vifm‘, or ‘mc‘ that run in the shell and can be fully operated by the keyboard. Nevertheless, I first try to get by with the simpler ‘menucmd‘ tool. Here’s an example that builds a menu from all shell script files found in a directory. If the user selects an item, the specified action (‘cp -t /tmp’) is executed on it.

There you go. Even if this bag of tricks is not complete I hope it will serve you well. As always, in order to use a tool efficiently, you have to invest in learning how to use it idiomatically*. But this investment pays nice dividends in the medium-to-long term.

*) What’s the idiomatic way for vi users to underline a headline? 1. Yank the headline (‘yy’). 2. Paste the yanked headline under the exiting headline (‘p’). 3. Select the second headline (‘V’). 4. Replace every character in selected line with an underscore (‘r-‘) — that’s only six keystrokes! Awesome!

Random Casting

Recently, a security-related bug slipped into libcurl 7.52.0.

For those of you who don’t know, libcurl is a popular open source library that supports many protocols and greatly simplifies data transfer over the Internet; an uncountable number of open- and closed-source projects depend on it.

Because of the bug, this particular version of libcurl doesn’t use random numbers when it should, which is really bad for security:

Since all the surrounding code is stripped away it is pretty easy to see what went wrong, right?

Within ‘randit’ there is an attempt to obtain a random number by calling ‘Curl_ssl_random’. However, ‘Curl_ssl_random’ is not passed the pointer ‘rnd’, but instead a pointer to ‘rnd’. Hence, the memory pointed to by ‘rnd’ is not filled with a random number but rather the pointer ‘rnd’ will point to a random memory location.

How did this bug come about? I’m pretty sure that — initially — the unlucky developer had accidentally typed this:

When (s)he compiled the code with gcc, the following error message was produced:

Which exactly explains the problem, but most likely, the developer only skimmed the error message and jumped to the wrong conclusion; that is, (s)he thought that a cast was needed because of a simple pointer incompatibility (unsigned int* vs. unsigned char*) when in fact there is a severe pointer incompatibility (pointer to pointer vs. pointer).

I’ve seen this many times before: developers apply casts to get rid of warnings from the compiler (or a static analysis tool) without a second thought. Don’t do this. Be very considerate when your compiler speaks to you. Casting, on the other hand, will silence it forever.

“inline” Is Yet Another Word For “Premature Optimization”

The fact that some C++ developers use the ‘inline’ keyword so much has always been a conundrum to me — I’ve never liked it. Why? First and foremost because it clutters up header files and exposes implementation details to the users of a class.

Most likely, inline aficionados believe that these disadvantages are more than compensated for by the fact that inlining gives them faster code, but this is not necessarily the case: according to the C++ standard (ISO/IEC 14882:2014), the compiler is allowed to silently ignore the ‘inline’ keyword:

“An implementation is not required to perform this inline substitution at the point of call”

Believing is not knowing, as the old saying goes. This is another reason why I don’t like the ‘inline’ keyword: it doesn’t guarantee you anything.

But let’s attack the ‘inline’ keyword from another angle. Even if we knew that declaring a method inline made it faster, shouldn’t we have to ask ourselves first if there is actually a performance case? Without profiling, without a proven need, any optimization is premature optimization, which — according to Donald Knuth — is the root of all evil. The fact that an optimization gives a local improvement doesn’t justify it sufficiently — it’s the overall improvement of the major use cases that matters. Otherwise we would implement all of our functions with inline assembly, wouldn’t we?

In the old days of C programming, developers used the ‘register’ keyword as a hint to tell the compiler what variables should be kept in registers for performance reasons. Nowadays, every C compiler is much better at allocating variables to registers than any human being. Consequently, the ‘register’ keyword has been deprecated in C11.

By the same token, today’s C++ compilers do a much better job of figuring out which functions should be inlined than we are able to do. Therefore, instead of giving hints to the compiler we should rather rely on automated, transparent inlinining that doesn’t clutter up class interfaces.

As an example, at optimization level -O2, the g++ compiler automatically inlines all functions that are small or called only once. Specifying -finline-functions (enabled by default at -O3) uses a heuristic to determine if its worthwhile to inline a function or not — without the need for any developer intervention.

To me, it’s about time that ‘inline’ goes the way of the ‘register’ keyword.

Counting Down Correctly in C

The countdown for the New Year is near to its end, so I want to take this opportunity to discuss how to implement loops that count down from an upper boundary to a lower boundary. I know it sounds mundane, but I will present a technique that is — at least in my experience — not widely known, not even amongst seasoned C coders (with the notable exception of Chuck Norris, of course).

But first, please take a moment to look at the following routine that employs a countdown for-loop and decide if it works correctly or not:

This code appears to be fine, but it has a flaw that shows only when the ‘lower’ index is 0: ‘size_t’ is an unsigned type, and when ‘i’ becomes 0, subtracting 1 yields a very large positive number (due to integer wrap-around) which in turn causes an out-of-bounds access to the given ‘array’. So what do we need to change such that the code works as expected, even for a lower bound of 0?

Most developer’s knee-jerk reaction is to change the type of the indices to a signed type, like ‘int’, but this is unfortunate, as it limits (at least halves) the available value range. As often in life, the proper solution is not to fight the enemy but to turn him into a friend: Let’s use unsigned wrap-around to our advantage:

Instead of using the greater-than operator, we now use the not-equals operator and instead of comparing against ‘lower’ we now compare against one less than ‘lower’. If ‘lower’ happens to be 0, ‘lower’ – 1 (again, due to integer wrap-around) will yield the maximum possible value representable by type ‘size_t’. The same will happen to the loop counter ‘i’ when it has a value of 0 and is decremented once more. As a consequence, the expression ‘i != lower – 1’ becomes false and the loop terminates — as desired.

A Happy New Year to all of my faithful readers!

Optimizing for Simplicity

In Jeet Kune-Do, one does not accumulate but eliminate. It is not daily increase but daily decrease. The height of cultivation always runs to simplicity. […] The height of cultivation is really nothing special. It is merely simplicity; the ability to express the utmost with the minimum. It is the halfway cultivation that leads to ornamentation.
— Bruce Lee

The old maxim “Keep It Simple, Stupid” is widely known, but unfortunately one of the most frequently violated best practices in software development. Why is simplicity so important?

Most developers assume that the reason why they shall keep their code simple is to get features out quicker, they should rather work on things that the customer urgently needs, things that provide immediate business value instead of letting him wait for the gold-plated version.

While there is nothing wrong with this interpretation, it doesn’t go far enough. Developers must strive for simplicity to achieve a high degree of maintainability: software development is an investment and software must be built such that not only today’s requirements but also future requirements can be implemented in an economic way. Don’t just think about a particular software product — think about how software evolves into a family of related products and versions over time. The most important reason for simplicity is to ensure that software can evolve with as little cost (aka. pain) as possible. Contrast this to the olden days of software development where developers added lots of flexibility up-front — flexibility that in most cases was never needed (YAGNI). These days, we rather keep it simple and adapt quickly, when the need arises.

Correctness comes first, no doubt, but the next priority is simplicity. Code must be simple such that it is easy to read and understand. Only code that is easy to comprehend can be maintained with little effort. Code that is complex (maybe because it is littered with unjustified optimizations and hard-to-grasp language features) makes changes hard and risky. Complex code is not an asset; it’s a liability that contributes to the overall technical debt.

One of today’s biggest challenges in software development is managing complexity. While there is little we can do about the essential (intrinsic) complexity of a software product, we constantly have to fight non-essential complexity; that is, complexity that arises as a side-effect, from the way we construct a software product.

So while developing code, constantly reflect and ask yourself what your code looks like, especially from another developer’s point of view. Is it easy to understand, does it even look mundane? Great! Resist the temptation to write clever code. Instead, take pride in being able to write the clearest, simplest code.

Don’t get me wrong. It’s OK to try out more complex designs and advanced language features — honing one’s skills is imperative. But always view such activities just as learning activities, like “playgrounding” and send your changes straight to /dev/null once you’re done. If you don’t have the heart to zap them, keep them on a private branch. But unless there is a compelling, justified reason you should take the high road and show true software development mastery by not integrating them with the code base.

Dangerously Confusing Interfaces III

confused.jpgJust like the other “Dangerously Confusing Interfaces” posts, this one was also inspired by a real-world blunder that I made.

Here’s the background: usually, routines that accept data via a pointer from the caller either execute synchronously or copy the data into their own internal data structures for later processing. Take the venerable ‘fwrite’ from the C standard library as an example:

‘fwrite’ blocks until the data has been written, either to disk or to an internal buffer. In either case, once ‘fwrite’ returns, it doesn’t care about the original data anymore. That’s why it’s safe (and common practice) to pass a pointer to a local buffer on the stack:

All standard library and POSIX APIs behave like ‘fwrite’, which is both, safe and convenient. However, with embedded systems, the story is different: in some cases, memory is so tight that additional buffers/internal storage can’t be afforded. Such functions don’t copy the provided data but only store a pointer to your data and expect the memory pointed-to by this pointer to be still valid long after the function call has returned. Here is an example from the AUTOSAR standard, which is used by almost all embedded automotive products:

‘NvM_WriteBlock’ is used to store data to a given non-volatile memory block. However, what this function does is only enqueue a request for the given block ID together with the data pointer (not a copy of your data). This is done for the sake of efficiency, because there can be multiple write requests in parallel. The queue is later processed in another task, long after any local buffer would have been removed from the stack.

Passing a pointer to a buffer with automatic storage is an easy mistake to make, especially since such “non-copy” interfaces are so rarely encountered. How can “write-like” interfaces that don’t make a copy of the provided data be made safer, such that misuse is less likely? Obviously, just adding documentation is not enough — nobody reads documentation, especially in the heat of the moment.

In my view, the root of the problem is that such functions accept just about any pointer. What if the caller was forced to explicitly cast the pointer to another type? A type with a cunningly chosen typename, one that reminded the caller of the potential pitfall? Here is my approach:

Whenever a pointer is passed to this function, developers have to write something like this to make the compiler happy:

Typing ‘uncopied_memory’ should shake up even the most focused developers and remind them to double-check what they are passing into this function.

Of course, within ‘SomeWritelikeFunction’, the provided pointer needs to be cast back into something more useful, like a ‘const uint8_t*’. Further, note that the ‘dummy’ member within ‘uncopied_memory’ must not be used; it only exists to make sure that the cast to ‘uncopied_memory*’ in the calling function is safe: a pointer to a struct is aligned such that it is compatible with the struct’s most-aligned member, which is ‘void*’ and ‘void*’ is by definition compatible with any other pointer type.