Growing a Solid Software Company

Isn’t it a shame that so many software development managers don’t code anymore?

Since I am in a malicious mood today, I claim that they didn’t even write much code when they were still developers. But does it always have to be like this?

After years of deep contemplation I’ve come up with this rule:

Managers, regardless of their position in a hierarchy, should spend at least one-third of their time doing the work of their immediate subordinates.

Not only would this mean that every manager is productive and actively contributes to a project; it would also mean that managers stay current from a technological point of view. Especially the latter would ensure that their strategic decisions are based on much firmer ground.

Wouldn’t performance appraisals (and promotions) suddenly become more objective and fair? Wouldn’t it be much easier for managers to hire new people since they would know — from first-hand experience — what exactly to look for?

If this recursive rule were applied, even the software manager’s boss would write small parts of the software himself. His boss, in turn, would probably not code that much but maybe do some code-reviews or check the nightly build for compiler warnings.

Wouldn’t the code quality be much higher if developers knew that someone way up the corporate ladder scrutinizes their work and gave feedback? Wouldn’t everyone feel much better because they knew that their bosses really cared for what they do?

This is a rule for building up a hierarchy of software craftsmen, a rule that yields what I call a “Solid Software Company”: A company were everyone is a developer (at least to some extent), where everyone understands software’s true nature and developer’s needs.

Imagine you could travel back in time, to the early days of a once hip, now bureaucratic, politics-laden, inefficient, dreadful-to-work-for monster of a software company. You will arrive at a point at which they suddenly start to promote or hire people to be “just managers”.

So, viewed from a different, more negative angle, my rule can be rephrased like this:

The long and slow demise of a young, aspiring software company begins when its software development managers cease to write code.

Bug Hunting Adventures #2: Monitoring Temperature Sensors

Imagine a distributed control system that relies on correct and timely temperature measurements. Various temperature controllers distribute their temperature readings periodically over a bus for other controllers to consume.

To ensure that the temperature sensors work as expected, a ‘TemperatureMonitor’ was implemented. It listens for messages from the temperature controllers and checks whether they arrive in time (at the very latest every 20 ms) and whether their values are within the valid range.

This checking of temperature messages upon arrival obviously doesn’t catch cases where temperature controllers don’t send messages at all (or with a delay that would cause the timer to overflow multiple times) so there is an additional cyclic task that caters for situations like these. This cyclic task is executed every 100 ms and it additionally takes care of the so-called ‘idle period’.

During the ‘idle period’, which is 500 ms, ‘TemperatureMonitor’ is lenient and doesn’t report problems (if any) to allow for an undisturbed start-up of the whole system.

Once the idle period is over and ‘TemperatureMonitor’ detects abnormal conditions it notifies the global ‘ErrorManager’, which will decide how to handle problems based on its current error handling policy (eg. just log the error, reset or disable a temperature controller).

Since the cyclic task and the message handler (the one that is invoked upon the reception of a temperature sensor message) may run in parallel, access to shared data is protected by a simple (but sufficient) synchronization scheme based on enabling/disabling interrupts.

Time measurement is implemented based on a wrap-around 32-bit tick counter that counts raw clock cycles. These clock cycles are converted to a more convenient unit (ie. milliseconds).


Bug Hunting Adventures #1: Logging Binary Data

Normally, loggers are used to track down bugs, but today, I present a Logger class method that contains a bug itself — isn’t it ironic?

Logger::logbuf() takes ‘len’ bytes of binary data from memory pointed to by ‘buf’ and converts it into printable, zero-terminated hex strings ala “AA 01 B3 C4…”.

For efficiency and readability, the hex string is broken up into smaller parts. Every hex string part is fed to the existing Logger::log() method which is capable of outputting arbitrary, zero-terminated strings (a new-line character is appended automatically).

Happy bug hunting! I will post the solution in two weeks time.


Bug Hunting Adventures

I’ve always loved to find bugs in code through mental debugging — especially other people’s code. Besides being fun, bug hunting improves product quality as well as one’s programming and problem-solving skills.

One of the books I enjoyed most is “Find the Bug” by Adam Barr. It contains many examples of buggy code in various programming languages, ranging from assembly language to Python. I heartily recommend it to everyone who shares my passion.

What a pity that there aren’t more books like this on the market!

To relieve our misery, I’ve decided to setup a new series: at irregular intervals I will post a bug-afflicted piece of code and challenge you to spot the problem. I don’t want to limit this sport to particular programming languages or defect categories; some bugs will be straightforward while others may be intricate, potentially only showing under certain, favorable circumstances.

Anyway, I hope you enjoy this new series as much as I do — stay tuned, the hunting season is open!

The God’s Equation for Perfect Highballs

“Mix a stranger a good drink and no longer will he be a stranger”
– Yours truly

I’ve never been a fan of pure mathematics. Even though I must admit that pure maths bears a lot of elegance, it’s real-world applicability that makes mathematical discoveries shine.

Boy oh boy, how time flies! It’s summertime again, at least in the northern hemisphere, on this planet, and what a great time it is for drinking plenty of highballs. For those of you who don’t know: highballs are simple long drinks that are made up of two ingredients: liquor (eg. gin, whisk(e)y, rum) and a non-alcoholic beverage (eg. soda, juice, cola).

Even though there are usually only two ingredients it is not that easy to make a perfect highball. It all depends on the right mixture. Some like their highballs stronger, others lighter. So how do you determine the perfect ratio? The task is complicated by the fact that the strength of the liquor used might vary (for instance, rums come with alcoholic strength levels reaching from 35% to 80%).

So I sat down for a while and did a little math — to solve a real-world problem once and for all. What I’ve come up with is this:

Vn / Vl = R = (Sl / Sh) - 1

In this equation, Vn is the volume (amount) of non-alcoholic beverage, Vl is the volume (amount) of liquor, R is the mix ratio, Sl is the alcoholic strength of the liquor used and Sh is the desired target strength of the resulting highball. Isn’t this wonderful?

As an example, I happen to like my highballs with a strength of 10% and most liquor in Europe is sold with a strength of 40%. What’s the recipe for a perfect Gin and Tonic? Answer: 40% divided by 10% is four, minus one is three. Therefore, you would need three parts of tonic water for one part of Gin to make me smile.

If you prefer a 5% Gin and Tonic (are you a wimp, by any chance?), note that you will need seven parts of tonic water, not six, as a naive would-be drinks mixer would expect.

There you have it — the formula for perfect, repeatable quality drinks. People in the same spirit (pun intended!) — let’s raise our glasses to this discovery!


I know, I know. Some of you smug, mathematically inclined folks with an IQ of 180+ would probably have known this right away. I don’t mind if you point out that my pathetic “discovery” is obviously just a special case of something more profound. Congratulations! I will drink a very special toast to you tonight.

And — just in case you haven’t done the maths yourself in the meantime, here is how I derived it:

Strength of liquor Sl is volume of alcohol in liquor Val divided by the total volume of liquor Vl:

Val / Vl = Sl         Eq (1)
Val = Sl * Vl         Eq (2)

Total volume of highball Vh is volume of liquor Vl + volume of non-alcoholic beverage Vn:

Vh = Vl + Vn          Eq (3)

Strength of highball Sh is volume of alcohol from liquor Val divided by the total volume of highball:

Sh = Val / Vh         Eq (4)

Inserting (2) in (4):

Sh = (Sl * Vl) / Vh   Eq (5)

Inserting (3) in (5):

Sh = (Sl * Vl) / (Vl + Vn)
Sh * (Vl + Vn) = Sl * Vl
Sh * Vl + Sh * Vn = Sl * Vl
Vl * Sh - Vl * Sl = - Vn * Sh
Vl * (Sh - Sl) = -Vn * Sh
Vl / Vn = - Sh / (Sh - Sl)
Vn / Vl = (Sl - Sh) / Sh
Vn / Vl = (Sl / Sh) - 1


A Tale of Two Qualities

Green and rotten. Isolated odject. Element of design.

“Here then, as I lay down the pen and proceed to seal up my confession, I bring the life of that unhappy Henry Jekyll to an end.”

– Robert Louis Stevenson, The Strange Case of Dr. Jekyll and Mr. Hyde

The old saying “What gets measured gets done” makes immediate sense: before you can improve anything, you first have to gain insight into it. If you act, you want to see the consequences, as soon as possible, compare them to desired results and adapt accordingly. Further, measurements not only give you a picture of the past and present — through extrapolation you get an impression of the future as well.

In modern software development, we constantly measure the quality of our product right from the beginning. We check for build breakers and failing test cases, code coverage, memory consumption and execution times on a check-in basis, as part of our continuous integration process. We always know the quality of our product — there won’t be any big surprises at major milestones or the end of the project. Running a software project like this removes the chance factor — this is quite the opposite of what happens when you follow a waterfall model with its dreaded big-bang integration phases.

But let’s face it: we mainly focus on controlling external quality; that is, everything that is visible to the customer.

The level of external quality is usually (relatively) easy to determine since external requirements are specified such that they are unambiguously verifiable (at least they should be). Thus, developers and testers can implement automated tests that will reveal any deviations. Because of the fact that external quality is directly visible to customers and directly influences whether they buy (or return) a product or not, it is not difficult to get proper funding for people and tools. In this sense, external quality really lives on the sunny side of software life.

Sadly, external quality’s brother, internal quality, lives an unhappy life in the dark: internal quality requirements are usually not nailed down precisely, in fact, they are often fuzzy and neglected. Sure, there are coding guidelines that mandate a certain indentation style and whether to use tabs or spaces (among other things) but are there actually checks against violations? And what about compiler warnings, bad coding practices, size of classes/methods, cyclomatic complexity, coupling of classes, missing API documentation and so on? And who is willing to invest in work the customer doesn’t see, anyway?

In most cases the answer is a blatant ‘No’. Neither are internal quality requirements stated in a precise and verifiable way, nor are they given high priority and almost never are they automatically checked as part of the continuous integration process. Since internal quality is not observable by the user, you can get away without paying attention. But then you have to pay for something else: an ever-increasing amount of technical debt and the compound interest that ensues.

Why does this all matter? In order to compete, a software product undergoes a multitude of major and minor modifications over many years, often carried out by developers other than the original authors. Once the first release is shipped, maintainability becomes a crucial factor and internal quality determines whether future changes this will be easy (cheap), hard (expensive), or outright impossible. Don’t underestimate the impact of even the smallest things; even a single broken window may be responsible for decay and increase of crime level in town districts.

What we need is a shift in mindset: first, internal quality requirements must get the same priority as external quality requirements and second, internal quality is perpetually tracked (and acted upon) as part of the continuous integration process.

It is of utmost importance to keep track of internal quality right from the beginning, when the mental distance is low and issues can be corrected with the least amount of work. Further, the whole team immediately benefits from the improved quality and the learning effect ensures that the amount of effort that needs to be spent goes rapidly down. Postponing internal quality work to late phases of the project is a costly and frustrating experience.

For some internal quality metrics it is necessary to build special-purpose analysis tools that cannot be bought off the shelf, but dynamic/scripting languages and heuristic regex-based parsing can go a long way. But even if you need more precision there are often free/open-source tools and frameworks at your disposal: I know of one C++ project that implemented several clang plugins to ensure that (among other things) identifier names where in line with the coding conventions.

Every developer has different standards as to what good-enough internal quality is. To make matters worse, day-to-day routine and especially schedule pressure will lure developers into accepting the status quo of their code once testing has proved that external quality criteria are met. But the converse is also true: without indisputable measurements, developers might spend more time than needed, endlessly polishing their beloved code.

We should give both sides of the quality coin equal consideration. While external quality is the necessary basis for the short-term success of a product, it is internal quality that ultimately determines the long-term success of a software company. Hence, we should remove the chance factor of internal quality, too.

Introduction to PC-Lint

I’ve uploaded my presentation “An Introduction to PC-Lint” to Slideshare. Share and Enjoy!

Free-Riding the Team Train

I usually start my day with a thirty-minutes run in the morning. Since I prefer running through the countryside it is not unlikely that I meet people walking their dogs. Sometimes — for reasons I cannot fathom –, these folks walk their dogs on the wrong side of the road (or rather country lane).

But let’s be fair. Even in Germany there is no law that requires pedestrians to walk on a certain side of a country lane; but what drives me nuts is that when they see me coming against them, most of them don’t dodge, let alone change to the other side of the lane — they just stay where they are. Almost all of them expect me to move, to overtake the wrong-way dog walkers.

Sometimes, just for the fun of it, I refuse to obey and stop right in front of them, which completely baffles them. Then, we stand there, deadlocked for a couple of seconds until I finally decide to give in.

I believe that what the dog walker subconsciously assumes is this: “the other one is faster, so he should move, it’s less effort for him than it is for me“. Even if this might be true in some cases, it is by no means generally true. Just because one is going faster doesn’t imply that doing something extra is easier for this person than for anyone else.

Probably, this is just all too human behavior, to get by with spending as little (effort) as possible. Especially members of sufficiently advanced societies expect that the strong protect and support (or at least are considerate of) the weak: “Those who can should support those who can’t“.

While I fully subscribe to this principle, at least in general, I have observed that this argument is often misinterpreted (misused) by slackers to mean “those who can should support those who don’t want“, an attitude that is totally unacceptable to me, neither in a society, nor in a company, and least in a team.

As another case in point, in 1913, French agricultural engineer Maximilien Ringelmann discovered that people — when working collaboratively on a given task (like pulling a rope) — excert a lot less effort than when acting alone. The bigger the group the bigger the tendency to hide behind others, to free-ride, to prefer taking over giving. Today, social psychologist call this phenomenon “social loafing” and it not only takes place in societies, but also within companies and teams.

In a company, the “Ringelmann effect” does not only impact day-to-day project work. For instance, if you call for a meeting, the more people you invite, the less people will come prepared and even fewer will make substantial contributions. Some “strategists” even exploit this fact by inviting a large party to a meeting where only a few protagonists actually make decisions. The large size of the group then gives an illusion of quality and broad acceptance of the decision, when in fact most people were just daydreaming. If there is one lesson to be (re-)learned from the Ringelmann effect it is this: “less is more”.

Here is my definition of teamwork: a group of self-motivated, self-directed individuals who share a common vision work together to achieve a common goal. If somebody needs support from a team member than never because one doesn’t want to do a task — only because one is temporarily unable to do the task oneself. Further, it is the responsibility of the supported person to keep the support to an absolute minimum and to learn and grow from it — not only to become independent of others but to be able to help others who are in need some day. The ultimate state of any team member is neither dependence, nor independence, but rather interdependence.

Making up for slackers, on the other hand, is not only a waste of time: sooner or later it drains the motivation and morale of even the most self-motivated people and encourages them to loaf as well. While I see a lot of value in so-called “B players“, I have zero tolerance for free-riders.

Surprised Again, In and Out!


“No tears in the writer, no tears in the reader.
No surprise in the writer, no surprise in the reader.”
– Robert Frost

My first real programming language was Pascal, actually Turbo Pascal.

Turbo Pascal was not just Pascal — it was Pascal on steroids: small, fast, and yet affordable to everyone, including students and hobbyists. While the version numbers increased from 1.0 to 6.0 I wrote dozens of little tools, games and applications. For a long time, I had seen no need to switch to a crude, low-level language like C; Pascal had so many advanced features like a strong type system and sets.

One of the features I really loved was the ‘in’ operator, that allowed to test for set membership:

    days = (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday);
    allday : set of days;
    workday : set of Monday .. Friday;
    today : days;
    if today in workday then
        writeln('I have to go to work today...')

Having an ‘in’ operator at your disposal means that there is no need to write lengthy if-then-else or switch-case abominations — you can express your intentions with ease and elegance.

I got so accustomed to Pascal’s sets and especially the ‘in’ operator that I missed it terribly in almost every programming language that I learned in the years to follow, including C, C++, Java, even Perl. I was really pleased to see that my next programming language — Python — had it.

But what was my point, again? Ah, yes…

The folks at E.S.R.Labs do an internal coding competition every once in a while. The current assignment has to do with finding best strategies for knocking-out your opponent in a Tron-like game. The game framework itself is based on Node.js/JavaScript but since I’m much more familiar with Python, I decided to do my algorithm prototyping in Python and later port everything back to JavaScript.

Actually, the transformation from Python to JavaScript was a lot easier than I thought, still my JavaScript version didn’t work as expected. After some (read: way too much) debugging, I found the problem; later, I decided that I had to write a blog post about it, as part of my therapy to get over my trauma.

Let’s first have a look at a snippet of the original Python code. I used the ‘in’ operator to check whether the value of ‘field’ at position ‘x’ and ‘y’ was either ‘FIELD_FREE’ or ‘FIELD_OTHER’:

# Go down.
x = me_x + 1
y = me_y
if y < len(field[x]) and field[x][y] in [FIELD_FREE, FIELD_OTHER]:
    if first_round:

Now for the JavaScript version. As you will surely agree, it is almost identical:

// Go down.
x = me_x + 1;
y = me_y;
if (x < field.length && field[x][y] in [Field.FREE, Field.OTHER]) {
    if (first_round) {

But as we all know looks can be deceiving. With regards to the ‘in’ operator, the designers of the JavaScript language grossly violated the “Principle of Least Surprise“: Everyone (not just people with a Pascal background) would assume that the ‘in’ operator tests whether a particular value is contained in a given set. But not so in JavaScript — not at all! In JavaScript, the ‘in’ operator does what you would expect least.

In JavaScript the ‘in’ operator tests whether its left-hand argument is a valid index into the right-hand argument. That is, the expression

field[x][y] in [Field.FREE, Field.OTHER]

would evaluate to true if the value at position (x,y) is an integer value in range [0..1], because only 0 and 1 are valid indices into a set (actually an array) containing just two elements. If the value at position (x,y) equaled the value of ‘Field.FREE’ or ‘Field.OTHER’ the result would be false, unless — by sheer coincidence and hard luck — the symbolic constants ‘Field.FREE’ and ‘Field.OTHER’ happen to have an integral value of either 0 or 1. Which they had in my case, at least in part:

Field = {
    FREE: 0,
    YOU: 1,
    OTHER: 2,
    WALL: 3

Hence, the JavaScript version didn’t fail loudly and obviously, but in very subtle ways.

The main reason why it took me so long to track this bug down was that the last thing I questioned was the behavior of the ‘in’ operator. When I finally proved by debugging that ‘in’ worked different to my expectations, my initial thought was that there must have been a bug in my version of Node.js. But deep in my heart I knew that this was extremely unlikely. So I did a search on the Internet and found out about the shocking truth.

Why on earth did the designers of JavaScript implement the ‘in’ operator the way they did? It works kind of as expected with dictionaries (aka. associative arrays) but using a dictionary would have been an ugly and unnatural choice in my case.  For regular arrays, the ‘in’ operator is worse than useless: it deludes programmers into using it but then doesn’t live up to their expectations.

Then, I would rather have no ‘in’ operator at all.

Password Sins of Omission — Will They Ever Learn?

“Where have all the passwords gone, long time passing?
Where have all the passwords gone, long time ago?
Where have all the passwords gone?
bad admins stole them everyone.
Oh, when will they ever learn?
Oh, when will they ever learn?”

(With apologies to Peter, Paul, and Mary)

It’s been quite a while but yesterday it happened again: after I had registered with a new electricity provider I was sent an email showing my password — right next to the words “Your Password”. There was also a link that I was supposed to click to verify my personal data, including name, address, date of birth and a lot more things that I certainly don’t want share with the world.

What amazes me most about this incident is that even today some companies are completely oblivious to how easy it is to automatically scan emails for keywords on their way from sender to receiver.

But even if the probability of someone sending around your credentials in plaintext is a lot lower today that ten years ago, the chances of being faced with poor password systems are still quite high. Here is my personal list of the grossest password-related blunders.

1. Storing user passwords. This is probably the greatest of all password sins, especially since it may spawn many other password-related follies. Don’t store passwords, not even encrypted. Instead, salt the password with a random salt and then hash it with a strong hash function before you save it. This way, no rogue admin can run away with your users’ passwords and no crazy customer service team is able to send around mails disclosing credentials.

2. Gratuitously limiting the password character set. I’ve come across password systems that only accept numbers (i. e. PINs), others only a mix of characters and numbers, some would accept special characters like %, &, and ? but not the dollar sign. I don’t see any reason why it’s not possible to use the whole unicode character set; at the very least, all printable ASCII characters should be considered valid. If I wanted to attack a site the very first thing I would do would be to set up an account and find out about their password character limitations in order to prune my search space.

3. Gratuitously limiting the password length. Some folks don’t seem to realize that with passwords, size matters: longer is stronger. An easy-to-type passphrase like “theluckgreensunsdontkiss” is much more secure than the complex, ugly, it’s-so-hard-to-remember-that-I-need-to-write-it-down “B!A7n4$”. While there is good reason to set a limit on the lower bound (I personally would require a minimum of eight characters) it is outright dangerous (read: stupid) to place an upper bound on the length. Be careful: If a site demands a password length of 6 to 8 characters, it is a clear sign that they take security lightheartedly. Maybe they even store your passwords and shorter passwords consume less space on server hard disks. Yikes!

4. Not telling users about limitations. Limiting the password length and character set is by itself a horrible practice, only made worse by not telling users what the limits are beforehand. It is quite annoying if you attempt to set a new password only to be rejected with an error message saying that your password is not in line with the password policy. This has happened to me many times and in one case there was a link to the policy that was broken.

5. No “show password” option. I vividly remember the day when I had to choose a password for a smartcard-based access system at a company I worked for. At the company’s registration office was a terminal that asked me to select a new password for my card. Since there were quite some (gratuitous) limitations (at least one number, a single capital letter, no number at the beginning, no underscore, just to name a few, and they were not displayed beforehand, of course), I had to try several times making minor variations to my favorite password until it was finally accepted. Later, when I had to use it for the first time, I couldn’t remember it anymore. I’m not sure, but I guess that if I had been able to see my weird password in plaintext after I set it, my chances of remembering it would have greatly increased. Normally, I know if somebody is staring at my screen (and this is rarely the case), so why not add a “show password” check box that I can tick? This would also help me in cases where caps lock was on or when I inadvertently switched to a foreign keyboard layout.

6. Setting constraints on the login name. Why does the login have to be an email address (which an attacker is likely to know or which might change in the future), or a number, or an eight character word in all uppercase? One of my web hosters forces me to sign in with my 12 digit customer ID that they assigned to me years ago; I always have to look it up and I always curse the web hoster. Users should have absolute freedom regarding their choice of login name — not only when they create the account; also at any point later in time.

7. Not encrypting passwords before sending them to the server. There are still some sites (and email providers) out there that ask for your credentials without establishing a secure channel first. What that means is that your login name and password are transferred in plaintext to the server and every random wiretapper is able to use it to place adverts to all kinds of shady products in your name. I don’t know how many bloggers are out there using WordPress, but it must be hundreds of thousands; and almost all of them login through a login page that doesn’t use any form of encryption. Even if the NSA is able to read your SSL-protected traffic anyway, it doesn’t mean that you want everybody else to read it, too.

8. Not (or wrongly) using a brute-force attack countermeasure. To prevent brute force attacks a good password system should track the number of failed login attempts. My bank does this but in a rather stupid way: after three unsuccessful tries, your account will be locked and you have to personally go to your local branch office to request a password reset. My suggestion: from three unsuccessful attempts on, have the user wait 10 minutes for every further attempt; additionally present a Captcha to defend against automated denial-of-service attacks.

So here you have it and this is just my personal tip of the iceberg. In order to get real security, password systems need to be both: secure from a technical point of view and user-friendly; at least, they shouldn’t haphazardly limit freedom. The best password policy is worthless (actually dangerous) if it requires users to invent next-to-impossible-to-remember passwords that will end up written down on sticky notes.

Since there are so many sites out there that carelessly deal with passwords and personal user data, I think it is a about time for a law that requires companies to follow certain minimum standards. We just can’t rely on good will and common sense. We have done this for quite some time now and it demonstrably hasn’t work out.