PPSD: The Attaboy

“The better you feel about yourself, the less you feel the need to show off.”
― Robert Hand

In his famous book, Code Complete, Steve McConnell tells the story of a maintenance programmer who was called out of bed one night to fix a critical bug. The original author had long left the company and the poor maintenance programmer had never worked on the program before. There were no comments in the code, except six letters on a line of assembly code:

After working with the program through the night and puzzling over the comments, the programmer made a successful patch and went home to get some sleep. Months later, he met the program’s original author at a conference. “What does the comment R. I. P. L. V. B. stand for?” he asked, to which the original author replied: “‘Rest in peace, Ludwig van Beethoven.’ Beethoven died in 1827 (decimal), which is 723 (hexadecimal).”

Ladies and Gentlemen, such conduct is the mark of an Attaboy!

An Attaboy is a developer who craves admiration for being smart. To satisfy his needs, he regularly pulls coding stunts that get him the attention of his coworkers. Attaboys usually follow this pattern:

1. Bury an outlandish “nugget” in the code base.
2. Patiently wait until it’s discovered by an unsuspecting victim.
3. When the horrified victim demands answers, smugly explain.
4. Watch the victim’s jaw drop.
5. Savor the attention.

Many years back, I was a victim of an Attaboy myself, even though I hadn’t coined the term yet. I was reviewing a coworker’s code when a gut feeling told me that something wasn’t quite right. At first, I couldn’t really explain it, but then, all of a sudden, I knew: there was something weird about his zeros: Whenever he needed a decimal 0, he put the letter ‘O’ instead, as in

instead of

Since code like this wouldn’t normally compile, I suspected that he had defined a preprocessor macro somewhere like this:

but a regex grep across the code base didn’t yield any matches.

Finally, it dawned on me: he must have done it in the Makefile and so it was: he predefined O through a command-line argument to the compiler, like the following:

Arrgh! Even though this explains why his code compiled, it didn’t explain why he did something insane like this in the first place. It was time to confront this schmuck! But little did I know that this was all part of a carefully premeditated game.

“Well,” he said, with a smirk on his face, “our coding standard says, we’re not allowed to use octal constants. According to the C programming language, any number starting with a zero is an octal constant, so 0 is by definition also an octal constant, which according to our coding standard we shouldn’t use.”

Touché!

The proper way to handle this, of course, would have been to ask our software architect to make a tiny adjustment to our coding standard. Not so for the Attaboy who saw this as a unique opportunity to show off.

APPEARANCE

Attaboys are usually fresh out of college and lack professional experience. The ones that I’ve met looked like stereotypical nerds, but I don’t think that you can generally discern them by their looks or the way they dress. Attaboys are all about their wits.

PERSONALITY TRAITS

Attaboys are not just ordinary pranksters—they desire praise and validation and pranks are just a means to attain it. However, contrary to pathological cases of attention-seeking personality disorders, their behavior is transitory and not rooted in either getting too much or too little attention from their parents during childhood.

Instead of getting praise from their colleagues for their outstanding “achievements,” Attaboys are often confronted with despise and in rare cases even hostility. While such negative attention is far from ideal for Attaboys, it’s nevertheless proof of their intellectual prowess. And like every aging actress knows: any attention is better than being ignored.

These antics notwithstanding, most of the time, Attaboys are usually productive, write decent code, and get along well with others.

RATING

According to the Q²S² framework, an Attaboy’s rating is 4/4/3/3.

TOOLING

Since tools can also be a means of getting attention, tools play a significant role in Attaboys’ lives. They prefer tools, techniques, and programming languages that are considered unusual by their teammates. This is of course dependent on the context: in an environment, where everyone uses Visual Studio to write their code, an Attaboy might use Vi or Emacs as an editor. In C++ projects, Attaboys make lots of use of C++ template meta-programming, yielding code that is illegible to almost everyone, often including themselves. Or, they use programming languages that are illegible by design, like Brainfuck. Highly readable mainstream languages like Python are only used if absolutely necessary but even then, Attaboys find rarely used or recently added features that baffle peers.

Regarding the selection of tools, an Attaboy is almost indistinguishable from a Programming Hipster. But while a Programming Hipster’s main motivation is being different, an Attaboy’s main motivation is being admired.

CONCLUSION

Attaboys are tech-savvy rookies that are still wet behind the ears. Despite being occasionally a nuisance, Attaboys are not much of a problem. On the contrary, their productivity is above average, and they definitely care about their craft. Sometimes, you can even learn something from an Attaboy’s highbrow pranks—even if it’s just the day of death of a German composer, or that zero is an octal constant. While most would refer to an Attaboy as a smartass, I would like to add that an Attaboy is a benign smartass. Actually, I tend to think of an Attaboy as a diamond in the rough. Over time, the attention-seeking behavior will disappear and what’s left over will be a rock-solid developer.

Pointers in C: Part VII, Being Relaxed About The Strict Aliasing Rule

“I am free, no matter what rules surround me. If I find them tolerable, I tolerate them; if I find them too obnoxious, I break them. I am free because I know that I alone am morally responsible for everything I do.”
― Robert A. Heinlein

The largely unknown “Strict Aliasing Rule” (SAR) has the potential to send tears to the eyes of even the most seasoned C/C++ developers. Why? Because of it, a lot of the code they have written over the years belongs to the realm of “undefined behavior”.

Despite its name, the term “undefined behavior” itself is well-defined by the C language standard: it’s “behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements. Which means anything can happen: your program could randomly crash or even send suggestive emails to your boss.

THE PROBLEM

Let’s start with the code snippet that I used in my original post on SAR:

Here, data that has been received into a buffer (‘data’) is converted into a high-level data structure (‘measurements’). From the compiler’s point of view, what ‘data’ refers to is just a single ‘uint8_t’ but we access it through a pointer to type ‘struct measurements_t’. What we’ve got here is a clear violation of SAR, which entails undefined behavior.

SAFE ALTERNATIVES

“But, Ralf”, you might respond, “this can’t be true. I write code like this every day and it works flawlessly, even in safety-critical systems like medical devices!”

This doesn’t surprise me in the least. “Undefined behavior” can — get this — also mean “works flawlessly”. But there are no guarantees, whatsoever. It might work on one platform, with a particular compiler or compiler version, but might fail on another platform, or with a different compiler version. Hence, to err on the truly safe side (which you should, especially if you work on safety-critical systems), you should use truly safe alternatives.

One obvious and time-proven approach is to do such so-called type punning through unions. It works by storing data via a member of one type and reading it via another member of a different type:

The receiving function would store byte-wise into the ‘receive_buffer.data’ array, while high-level functions would use the ‘receive_buffer.measurements’ member. This will work reliably in any version of C, but it might fail in C++.

Bulletproof type-punning, one that works in both, C and C++, uses ‘memcpy’. ‘memcpy’!? ‘memcpy’, that’s right:

Believe it or not, there’s a high probability that your compiler will optimize-out the call to ‘memcpy’. I’ve observed this, among others, with ‘gcc’ and ‘clang’, but I’ve also seen compilers always call ‘memcpy’, even for the smallest amounts of data copied, regardless of the optimization level used (Texas Instruments ARM C/C++ compiler 19.6, for instance). Nevertheless, this is my go-to type-punning technique these days, unless performance is paramount. (You first have to prove that your code really impacts overall performance by profiling. Otherwise, your optimizations are like buying Dwayne Johnson an expensive hair brush — it doesn’t really harm, but it’s not of much use, either.)

BUT I REEELLY, REEELLY MUST USE CASTS

Sometimes, you have to use SAR-breaking casts, if only to maintain social peace in your team. So how likely is it that your compiler will do something obscene?

VERY unlikely, at least in this example. Let me explain.

First of all, compiler vendors know that most developers either haven’t heard about SAR or at least don’t give a foo about it. Therefore, they usually don’t aggressively optimize such instances. This is particularly true for compilers that are part of toolchains used in deeply (bare-metal) embedded systems. However, ‘gcc’ as well as ‘clang’, which are used in all kinds of systems, take advantage of SAR from optimization level 2 on. (You can explicitly disable SAR-related optimizations regardless of the optimization level by passing the ‘-fno-strict-aliasing’ option.)

Second, what ‘convert’ is doing is pretty much well-behaved. Sure, it aliases the ‘data’ and ‘measurements’ pointers, but it never accesses them concurrently. Once the ‘measurements’ pointer has been created, the ‘data’ pointer is not used anymore. If the caller (or the whole call-chain) are equally well-behaved, I don’t see a problem (don’t trust me!).

Third, there’s no aliased read/write access. Even if ‘data’ and ‘measurements’ were used concurrently, it wouldn’t be a problem, as long as both are only used for reading data (don’t trust me on this one, either!). By contrast, this I consider harmful:

To the compiler ‘data’ and ‘measurements’ are two totally unrelated pointers to unrelated memory areas. The original value of ‘data[0]’ might be cached in a register and not refetched from memory, hence the ‘assert’ might fail. In general, this is what will most likely happen when SAR is violated in contexts where it does matter: instead of suggestive emails being sent to your boss, you are much more likely got get stale values (which of course could lead to crashes later on).

NO PUN INTENDED

Let’s get real about SAR. Here are some relaxed, pragmatic rules on how to deal with the Strict Aliasing Rule:

0. Fully understand SAR
1. Try hard to adhere to SAR
2. Type-pun using ‘memcpy’
3. If you can’t, disable SAR-related compiler optimizations
4. If you can’t, avoid concurrent, aliased read/write access

But don’t assume that just because you didn’t get a ticket for speeding in the past, you will never ever get a ticket for speeding. What you’re doing is against the law. If you get busted someday, don’t whine and don’t complain I didn’t warn you. Rather own your failures and move on.

Breakin’ rocks in the hot sun
I fought the law and the law won
I fought the law and the law won
I USED SOME CASTS FOR TYPE PUN
I fought the law and the law won
I fought the law and the law won

(with apologies to Sonny Curtis)

Bug Hunting Adventures #14: Bitmap [BM]adness (Solution)

It’s a given fact of life that something that’s deemed totally safe in one environment may be totally unsafe in another. Every German who has ever used an American sauna knows what I’m talking about.

Similar (but far less embarrassing!) traps lurk in situations where you reuse perfectly working C++ code in a C environment. Some time ago, I integrated a little home-grown C++ library into a plain C project. However, instead of the expected, proven functionality I got plenty of core dumps. After some assembly-level debugging, I came to the conclusion that I had found a compiler bug. Code along these lines

was compiled to this:

Why the heck did the compiler insert an offset of 4 instead of 1?

The answer to this question, which is also the answer to our bug hunting adventure, can be found here.

Bug Hunting Adventures #14: Bitmap [BM]adness

“What’s the meaning of goodness if there isn’t a little badness to overcome?”
― Anne Revere

The code below is part of a C graphics processing library, which parses data in the venerable bitmap (BMP) file format. A bitmap file consists of a two parts: a header and the pixel data block. More specifically, a bitmap file is laid-out like this:

Offset Size Content
0 1 Character ‘B’
1 1 Character ‘M’
2 4 Size of the bitmap file
6 4 Reserved
10 4 Offset to the first byte of the pixel data (ofs)
14 n Info block
ofs m Pixel data

All multi-byte integer values (like the bitmap file size and the offset to the pixel data) are stored in little-endian format.

The function ‘bmp_pixel_data’ takes a pointer to a bitmap file data and returns a pointer to the bitmap’s pixel data area within the bitmap. The size of the pixel data area is returned via the ‘size’ out parameter. In case the provided bitmap file data is malformed, a NULL pointer is returned and the ‘size’ out parameter is set to zero.

As always, the code compiles cleanly without warnings (at ‘-W -Wall’), but when the function ‘bmp_pixel_data’ was put to use, it failed miserably. Where did the programmer goof?

Solution

Simplified: The Weasel That Teaches Evolution

“Weaseling out of things is important to learn. It’s what separates us from the animals… except the weasel.”
— Matt Groening

It’s Towel Day again! What a great opportunity to reflect upon Life, the Universe, and Everything. In this installment of the “Simplified” series, I want to tackle nothing less than Darwin’s theory of evolution. Why? Because I seriously believe that the world would be a better place if people finally understood it.

One common misconception, for example, is that some falsely believe that evolution occurs in a linear fashion, from single-celled organisms to homo sapiens. Such individuals, like the taxi driver that I mentioned in a previous post, think that they can smash the theory of evolution by posing this cunning question:

“If we humans are really decedents of animals like apes and dogs, why are there still apes and dogs around?”

Just in case this reasoning sounds conclusive to you as well: In reality, there is no single line of evolution, it’s rather a tree with many, many branches. On these branches, the universe performs experiments and decides which species survive, through natural selection. Consequently, humans did not evolve from apes, they are rather cousins whose common ancestor was neither ape nor human.

Another fallacy is Fred Hoyle’s famous Boeing 747 analogy. Hoyle was a brilliant scientist, no doubt, yet he refused to accept that complex life-forms can emerge by chance:

“A junkyard contains all the bits and pieces of a Boeing-747, dismembered and in disarray. A whirlwind happens to blow through the yard. What is the chance that after its passage a fully assembled 747, ready to fly, will be found standing there?”

A variant of this argument is the infinite monkey theorem: a monkey typing random letters at a typewriter will never be able to produce Shakespeare’s works.

In 1986, evolutionary biologist Richard Dawkins set out to dispel such misunderstandings by implementing a computer program that works—in Dawkins’ own words—like this:

“It […] begins by choosing a random sequence of 28 letters, […] it duplicates it repeatedly, but with a certain chance of random error – ‘mutation’ – in the copying. The computer examines the mutant nonsense phrases, the ‘progeny’ of the original phrase, and chooses the one which, however slightly, most resembles the target phrase, METHINKS IT IS LIKE A WEASEL.”

Here’s a more detailed explanation of his weasel program: it starts with a string of 28 random letters. Next, it creates N offspring strings by copying the original 28 random letters N times. When being copied, the chance of a letter being changed into another (random) letter is P. Now that we have a set of N new strings, we compare each string letter by letter against “METHINKS IT IS LIKE A WEASEL” (a quote from Shakespeare’s Hamlet, by the way) and pick the one with the most character matches (the highest match score). This one is deemed the “fittest” and kept as the survivor of the first generation; the other N-1 strings are discarded. We repeat the whole process by creating again N offspring from the survivor, then pick a new survivor and so on until the survivor finally matches “METHINKS IT IS LIKE A WEASEL”.

Let’s choose N to be 100 (one hundred offspring per generation) and P to be 5%, as in Dawkins original experiment. How many generations would it take until we finally reach “METHINKS IT IS LIKE A WEASEL”? Just guess a number, I’ll wait here…

Answer: about 50 generations! To me, this number was so ridiculously small that I decided to implement the weasel program myself. Please check it out here. Through command-line parameters, you can change the values of N and P, as well as the target phrase. Toying around with this program is so eye-opening, you suddenly realize that evolution is possible, that complexity can emerge by mutation and survival of the fittest.

Interestingly, if you set P to a high value (say 20%), you are likely to never arrive at the target phrase. A high value of P simulates an early universe, or any environment with a high natural radioactivity level. In one experiment where I set P to 20%, it took 800 generations, while a P value of 1% reached the target phrase after only 90 generations.

I also played with much longer target phrases with a length of more than 150 letters. What I’ve found is that the longer the phrase gets, the more detrimental a high mutation probability becomes. If you think about it, that’s not surprising, as the likelihood that already matching letters are replaced again with non-matching letters increases. Conclusion: the evolution of complex life-forms requires an environment that provides the right mutation probability—neither too low, nor too high. But while a low mutation probability can always be compensated by time (evolution will just progress slower), a high mutation probability stifles evolution entirely.

Anyway, while the weasel program is a blatant simplification of real life, I think it’s a great tool for demonstrating that random variation combined with non-random cumulative selection can produce complex structures. This is what evolution is all about; not monkeys hacking away at keyboards and winds blowing through airplane junkyards.

Share and Enjoy!

Coding Horrors: Sometimes They Come Back

“Within its gates I heard the sound
Of winds in cypress caverns caught
Of huddling trees that moaned, and sought
To whisper what their roots had found.”
― George Sterling, A Dream of Fear

This month’s post is a guest post by Giles Payne, a teammate from a project at Giesecke & Devrient Mobile Security almost two decades ago. At the time, we were trying to teach good coding practices by distributing so-called “Coding Horrors”, bad code examples, that showed what not to do. The other day, Giles sent me such a “Coding Horror” that he came across recently. I couldn’t help but talk him into blogging about it. Giles, take it away!

Thank you, Ralf! Let’s dive right into the code — for fun please take a minute to try and work out what this little piece of Java is doing.

(5 minute pause for head scratching.)

Were you able to work it out? No, I didn’t think so. So what is going on?

This code comes from a kind of simple browser-like application that needs to handle a stack of displayed pages. In some cases it will manage a history of the pages, in other cases it doesn’t care about the history — hence the “USE_PAGE_HISTORY” descriptor.

But what exactly is it doing? Well, what this particular code does is change the meaning of the equals comparator for PageIds to always return true when USE_PAGE_HISTORY is false. (Face palm.)

In an attempt to pinpoint exactly what is so horrendous about this I’ve written up a few basic “programming principles” that I feel have been grievously violated here.

Principle #1: Don’t be language snob.

If you’re programming Java, then program Java, if you’re programming Visual Basic, then program Visual Basic. The person that wrote this code was obviously heavily into functional programming languages. So totally into functional programming languages, that the thought of writing code in a non-functional programming language like Java drove him or her to distraction — or more exactly to use the “Functional Java” library. “Functional Java” is basically an attempt to turn Java into a functional programming language, which it doesn’t really manage to do. What it does do is turn your code into cryptic, impenetrable gobbledygook like this.

Principle #2: Don’t bury important logic deep in non-obvious abstractions.

The usePageHistory moniker is pretty easy to understand. If I see code like

it’s pretty easy to follow. If I had lots of places with the same if-else logic, I might separate the logic into two classes like PageStackBasic and PageStackWithHistory. Hiding the logic like whether you’re using history or not inside the equals comparison operator is a recipe for disaster because nobody is ever going to think of looking there (see next principle).

Principle #3: Don’t override standard operators in non-standard ways.

Some languages like C++ let you override arithmetic/boolean operators — others don’t. There is a lot of debate among language designers about whether allowing operator overriding is a good thing or a bad thing. On the plus side there are some very cool things you can do, for example if you have a class representing matrices then you can override the * operator to do matrix multiplication. On the minus side, they let you write stuff like this. While this code probably works now — it’s the kind of code that probably will stop working one day and when it does, it will take out an entire country’s air-traffic control system*. While it probably seemed like a clever or elegant piece of code to the person that wrote it — it’s a maintenance programmer’s nightmare.

Principle #3 is actually picked up on in the highly amusing “How to write unmaintainable code“. It’s basically a back-to-front version of “Coding Horrors” where bad coding practices are encouraged as means to achieving a job for live. (Oh my gosh, this is a Fake Surgeon’s gold mine! — Ralf’s note.)

________________________________

*) Relax — this code is not part of an air-traffic control system. My point was simply that when bad code like this blows up, the impact usually tends to be massive.