Bug Hunting Adventures #5: False Diagnosis

Today’s cars host dozens of ECUs (electronic control units) that exchange tens of thousands of messages and signals on various buses (eg. CAN, MOST, Flexray, Ethernet). On top of that, ECUs provide hundredths of diagnostic services, that allow factories, maintenance personnel and even devices on the car to obtain (and change) various parameters, including vehicle speed, diagnostic trouble codes, and sensor/actuator data. By using diagnostic services, you can literally remote control your car, provided you get passed the security mechanisms, of course.

Diagnostic commands are exchanged based on the UDS protocol, as specified by ISO 14229-1:2013, via request/response pairs. Requests and responses are coded like this:

Request: SA, DA, SID1, ..., SIDn
Response (OK): SA, DA, SID1|0x40, SID2, ..., SIDn, RESP-PAYLOAD
Response (error): SA, DA, 0x7F, SID1, ERROR-CODE

Every ECU in a car has a unique 1-byte diagnostic address that uniquely identifies it (signified as source address SA and destination address DA, depending on whether a device acts as a sender or receiver) and each diagnostic service of an ECU is addressed by a 1 to n byte long service ID (SID). (Note that in reality, the n-byte “SID” consists of the real SID and a sub-level function ID, but this detail is of no importance for our considerations.)

If, for example, a device with SA = 0x20 wants to read all diagnostic trouble codes from an ECU with DA = 0x60, it would issue this request:

0x20 0x60 0x19 0x0A

where 0x19 0x0A is the two-byte service ID for the “READ DIAGNOSTIC TROUBLE CODES” service.

The positive response of the targeted ECU might look like this:

0x60 0x20 0x59 0x0A 0xFF 0x02 0x40 0x00 0x50 0x02 ...

Notice that the values of SA and DA are swapped; the replying ECU is now the sender and the requesting ECU the receiver. 0x59 is the first byte of the service ID (SID1) or’ed with the “response OK” flag 0x40. The response payload (the requested information, that is, the actual trouble codes) starts with 0xFF 0x02… .

If an ECU cannot fulfill a request, it sends a negative response indicator, followed by an error code that gives details on what went wrong:

0x60 0x20 0x7F 0x019 0x11

Here, 0x7F denotes ‘error’, 0x19 is the first SID byte of the original request (this time without the “response OK” flag) and 0x11 is the error code for “SERVICE NOT SUPPORTED”.

There are two error code values that usually require special treatment: If an ECU responds with error code 0x21 it means that it is currently busy and unable to process the request; if it responds with 0x78 it denotes that it is able to process the request but needs more time to complete it.

This description of the UDS protocol is of course an oversimplification, but it is enough for our bug-hunting purposes. Attached is an extract of an application that incorrectly processes the response to a diagnostic service request. Can you diagnose the problem?

Code
Solution

Hero or Zero?

Andrea: “Unhappy is the land that breeds no hero.”
Galileo: “No, Andrea: Unhappy is the land that needs a hero”

– Bert Brecht, “The Life of Galileo”

If you ask children what they want to be when they grow up, chances are that you get answers from this list: police officer, firefighter, astronaut, rock star. Children nearly always choose professions that are accompanied with acclamation, jobs where they can shine, where they can be big.

Even as adults, most of us still want to be heroes (or heroines), at least to some extent. This desire is deeply rooted in all human cultures and societies: Those who act, especially at the peril of their own lives, get praise and reward.

Strangely enough, the yearning for recognition is sometimes so strong, that people who should actually help in hazardous situations deliberately create them, just to get a chance to prove their bravery. You have probably heard about cases where firefighters set buildings on fire, just to be the first on the scene to extinguish it. This phenomenon is called “hero syndrome” and is an extreme form of degenerated professionalism, but it nevertheless does happen.

There is a lot of heroism going on in software development as well. Many software projects, for lack of software engineering knowledge and discipline, rely on heroics; the successful outcome totally depends on individuals who work crazy hours in an endless code-and-fix cycle. Especially the game industry has gained a sad reputation for such malpractice.

But sometimes it is not the companies which foist heroics upon their developers — sometimes, and much in line with the hero syndrome, developers deliberately create “hero situations” themselves.

One case in point is when developers accept impossible assignments in hope to get praise by their bosses for making the impossible happen, cases, where a clear, professional “No” would haven been the right answer. In his book “The Clean Coder” Uncle Bob cites a programmer whose words say it all: “I hit SEND, lay back in my chair and with a smug grin began to fantasize how the company would run me up onto their shoulders and lead a procession down 42nd Street while I was crowned ‘Greatest Developer Ev-ar'”.

By accepting ridiculous schedules and working ridiculous hours, such “heroes” not only put their pesonal well-being at risk, they also put their companies future at risk, at least for two reasons: first, if management is told that something which is highly improbable is doable, they might not look for alternatives, they might not set up a life-saving plan B; second, by leaving rushed, brittle, unmaintainable quick-hack code behind that nobody understands or dares to change.

But I’ve also met developers who gold-plate their code, write overly complicated code, or fervently and constantly use the most obscure language features — C++ template meta-programming immediately comes to mind — just to show everyone how clever they are. Again, the only thing such “heroes” usually achieve is that coworkers don’t dare touching their abominations — there is no praise at all, just head-shaking. This behavior is quite the opposite of what modern software development practices demand: software minimalism; that is, the simplest code that works, code that is easy to read, refactor, and extend by everyone — the basis for one of the most important software values: evolvability.

To me, people who behave in such a manner are no heroes at all, in fact they are quite the opposite of a hero. What they do is not a selfless act in order to help others but rather an act of selfishness that almost always hurts others. A wise manager would advise these folks to consult a specialist to get over their inferiority complex; if that doesn’t help, to seek their luck elsewhere, in the game industry, for instance.

By contrast, true heroes are invisible, they shun the limelight. They do things because they know that the overall benefit of their action is by far larger and more important than their own personal benefit. They are heroes because they do things that are right and not things that just shine bright.

There are firefighters out there who save lives every day, and some of them can rightly be called heroes. But the even bigger heroes are the people who probably nobody knows, people who work out fire protection standards — they must have saved hundreds of thousands of lives over the last century. Benjamin Franklin’s saying “An ounce prevention is worth a pound of cure” is what they live by. Such heroes, modest people who strive to prevent, are the true role models of every software professional.

Bug Hunting Adventures #4: Casino Royale

On August 18, 1913, something very strange happened at Monaco’s Monte Carlo casino: for the twentieth time in a row, the ball had fallen into a black pocket. More and more people gathered at the table and started betting like crazy for red — they were convinced that the time for red was more than overdue. Well, most of them lost a lot of money because it took another seven spins of the wheel until red finally arrived.

Today, I present a little Groovy program that I wrote in order to get insight into Roulette probability, to save us from what is now known as the Monte Carlo Fallacy.

When you call ‘runSimulation’ you can specify how many number of times you want to spin the wheel. With every spin, the method ‘spinWheel’ returns either 0 (for black), 1 (for red), or 2 (for Zero).

‘runSimulation’ keeps track of the length of a certain color series in three associative arrays (i. e. maps), one for every color (including the color Zero). The key into these maps is the length of the series and the associated value is the count of how often this series length was encountered during the experiment.

To make accessing these color length maps generic, their references are stored in a plain list (‘seriesMaps’). By using the color value returned from ‘spinWheel’ as an index, one can easily obtain the length map for a particular color. This list of length maps is returned at the end of the simulation. As an example, after 100 spins, the list might look like this:

[ [2:6, 3:3, 1:14, 5:1], [3:5, 1:8, 2:9, 4:1, 5:1, 7:1], [1:3] ]

In this simulation, for black, a series of length 2 was encountered 6 times, and a length of 5 one time. For red, there was one series length of 7, and our virtual ball landed three times on Zero, but there was never a series of Zeros (i. e. longer than 1). (Aside: in one experiment I did with the bug-fixed version, I spun the wheel more than 100 million times; I got a maximum series of 30 for black and — believe it or not — I once got a series of four times Zero in a row.)

But the program, as it is presented to you, contains a bug. It just doesn’t work as it is supposed to. Can you spot it? (Note: the bug has nothing to do with any Groovy idiosyncrasies.)

Code
Solution

Bug Hunting Adventures #3: Silent Threads

‘select’ isn’t broken
— The Pragmatic Programmers

Many moons ago, when I tried to familiarize myself with POSIX threads, I wrote a simple test program that was based on a textbook example.

My program sported two threads, one printing ‘+’ characters, the other one printing ‘-‘ characters. Everything worked as expected: a mixed stream of ‘+’ and ‘-‘ characters was emitted to stdout.

But everything happened so fast! Literally thousands of characters were outputted at the blink of an eye, so I added a little extra code that made the threads sleep for a specified amount of time before printing the next character.

Alas, when I set the delay (SLEEP_SECS) to 1 second (or in fact any value different to zero) nothing was printed at all! It looked like the threads got locked up completely. I came up with the weirdest theories about what had happened, including a bug in the pthreads library and the implementation of ‘sleep’.

It wasn’t until the next morning that I realized my mistake. Once I again, I had blamed it on the good ones, when the real problem was blind stupidity.

What was my mistake?

Code
Solution

Growing a Solid Software Company — Update

Yesterday, on October 21, all Lufhansa pilots went on strike in Germany. Nevertheless, Lufhansa managed to conduct half of their flights. They achieved this miracle by using a two-fold strategy: subcontractors and — lo and behold — their own managers, who, according to Lufthansa, work in most cases as part-time pilots, anyway.

Pilots that manage, managers that fly. Exactly my words, friends, exactly my words…

Let’s sing a song together, shall we?

I see trees of green, red roses, too,
I see them bloom, for me and you
And I think to myself
What a wonderful world.

Growing a Solid Software Company

Isn’t it a shame that so many software development managers don’t code anymore?

Since I am in a malicious mood today, I claim that they didn’t even write much code when they were still developers. But does it always have to be like this?

After years of deep contemplation I’ve come up with this rule:

Managers, regardless of their position in a hierarchy, should spend at least one-third of their time doing the work of their immediate subordinates.

Not only would this mean that every manager is productive and actively contributes to a project; it would also mean that managers stay current from a technological point of view. Especially the latter would ensure that their strategic decisions are based on much firmer ground.

Wouldn’t performance appraisals (and promotions) suddenly become more objective and fair? Wouldn’t it be much easier for managers to hire new people since they would know — from first-hand experience — what exactly to look for?

If this recursive rule were applied, even the software manager’s boss would write small parts of the software himself. His boss, in turn, would probably not code that much but maybe do some code-reviews or check the nightly build for compiler warnings.

Wouldn’t the code quality be much higher if developers knew that someone way up the corporate ladder scrutinizes their work and gave feedback? Wouldn’t everyone feel much better because they knew that their bosses really cared for what they do?

This is a rule for building up a hierarchy of software craftsmen, a rule that yields what I call a “Solid Software Company”: A company were everyone is a developer (at least to some extent), where everyone understands software’s true nature and developer’s needs.

Imagine you could travel back in time, to the early days of a once hip, now bureaucratic, politics-laden, inefficient, dreadful-to-work-for monster of a software company. You will arrive at a point at which they suddenly start to promote or hire people to be “just managers”.

So, viewed from a different, more negative angle, my rule can be rephrased like this:

The long and slow demise of a young, aspiring software company begins when its software development managers cease to write code.

Bug Hunting Adventures #2: Monitoring Temperature Sensors

Imagine a distributed control system that relies on correct and timely temperature measurements. Various temperature controllers distribute their temperature readings periodically over a bus for other controllers to consume.

To ensure that the temperature sensors work as expected, a ‘TemperatureMonitor’ was implemented. It listens for messages from the temperature controllers and checks whether they arrive in time (at the very latest every 20 ms) and whether their values are within the valid range.

This checking of temperature messages upon arrival obviously doesn’t catch cases where temperature controllers don’t send messages at all (or with a delay that would cause the timer to overflow multiple times) so there is an additional cyclic task that caters for situations like these. This cyclic task is executed every 100 ms and it additionally takes care of the so-called ‘idle period’.

During the ‘idle period’, which is 500 ms, ‘TemperatureMonitor’ is lenient and doesn’t report problems (if any) to allow for an undisturbed start-up of the whole system.

Once the idle period is over and ‘TemperatureMonitor’ detects abnormal conditions it notifies the global ‘ErrorManager’, which will decide how to handle problems based on its current error handling policy (eg. just log the error, reset or disable a temperature controller).

Since the cyclic task and the message handler (the one that is invoked upon the reception of a temperature sensor message) may run in parallel, access to shared data is protected by a simple (but sufficient) synchronization scheme based on enabling/disabling interrupts.

Time measurement is implemented based on a wrap-around 32-bit tick counter that counts raw clock cycles. These clock cycles are converted to a more convenient unit (ie. milliseconds).

Code
Solution

Bug Hunting Adventures #1: Logging Binary Data

Normally, loggers are used to track down bugs, but today, I present a Logger class method that contains a bug itself — isn’t it ironic?

Logger::logbuf() takes ‘len’ bytes of binary data from memory pointed to by ‘buf’ and converts it into printable, zero-terminated hex strings ala “AA 01 B3 C4…”.

For efficiency and readability, the hex string is broken up into smaller parts. Every hex string part is fed to the existing Logger::log() method which is capable of outputting arbitrary, zero-terminated strings (a new-line character is appended automatically).

Happy bug hunting! I will post the solution in two weeks time.

Code
Solution

Bug Hunting Adventures

I’ve always loved to find bugs in code through mental debugging — especially other people’s code. Besides being fun, bug hunting improves product quality as well as one’s programming and problem-solving skills.

One of the books I enjoyed most is “Find the Bug” by Adam Barr. It contains many examples of buggy code in various programming languages, ranging from assembly language to Python. I heartily recommend it to everyone who shares my passion.

What a pity that there aren’t more books like this on the market!

To relieve our misery, I’ve decided to setup a new series: at irregular intervals I will post a bug-afflicted piece of code and challenge you to spot the problem. I don’t want to limit this sport to particular programming languages or defect categories; some bugs will be straightforward while others may be intricate, potentially only showing under certain, favorable circumstances.

Anyway, I hope you enjoy this new series as much as I do — stay tuned, the hunting season is open!

The God’s Equation for Perfect Highballs

“Mix a stranger a good drink and no longer will he be a stranger”
— Yours truly

I’ve never been a fan of pure mathematics. Even though I must admit that pure maths bears a lot of elegance, it’s real-world applicability that makes mathematical discoveries shine.

Boy oh boy, how time flies! It’s summertime again, at least in the northern hemisphere, on this planet, and what a great time it is for drinking plenty of highballs. For those of you who don’t know: highballs are simple long drinks that are made up of two ingredients: liquor (eg. gin, whisk(e)y, rum) and a non-alcoholic beverage (eg. soda, juice, cola).

Even though there are usually only two ingredients it is not that easy to make a perfect highball. It all depends on the right mixture. Some like their highballs stronger, others lighter. So how do you determine the perfect ratio? The task is complicated by the fact that the strength of the liquor used might vary (for instance, rums come with alcoholic strength levels reaching from 35% to 80%).

So I sat down for a while and did a little math — to solve a real-world problem once and for all. What I’ve come up with is this:

Vn / Vl = R = (Sl / Sh) - 1

In this equation, Vn is the volume (amount) of non-alcoholic beverage, Vl is the volume (amount) of liquor, R is the mix ratio, Sl is the alcoholic strength of the liquor used and Sh is the desired target strength of the resulting highball. Isn’t this wonderful?

As an example, I happen to like my highballs with a strength of 10% and most liquor in Europe is sold with a strength of 40%. What’s the recipe for a perfect Gin and Tonic? Answer: 40% divided by 10% is four, minus one is three. Therefore, you would need three parts of tonic water for one part of Gin to make me smile.

If you prefer a 5% Gin and Tonic (are you a wimp, by any chance?), note that you will need seven parts of tonic water, not six, as a naive would-be drinks mixer would expect.

There you have it — the formula for perfect, repeatable quality drinks. People in the same spirit (pun intended!) — let’s raise our glasses to this discovery!

Epilog

I know, I know. Some of you smug, mathematically inclined folks with an IQ of 180+ would probably have known this right away. I don’t mind if you point out that my pathetic “discovery” is obviously just a special case of something more profound. Congratulations! I will drink a very special toast to you tonight.

And — just in case you haven’t done the maths yourself in the meantime, here is how I derived it:

Strength of liquor Sl is volume of alcohol in liquor Val divided by the total volume of liquor Vl:

Val / Vl = Sl         Eq (1)
Val = Sl * Vl         Eq (2)

Total volume of highball Vh is volume of liquor Vl + volume of non-alcoholic beverage Vn:

Vh = Vl + Vn          Eq (3)

Strength of highball Sh is volume of alcohol from liquor Val divided by the total volume of highball:

Sh = Val / Vh         Eq (4)

Inserting (2) in (4):

Sh = (Sl * Vl) / Vh   Eq (5)

Inserting (3) in (5):

Sh = (Sl * Vl) / (Vl + Vn)
Sh * (Vl + Vn) = Sl * Vl
Sh * Vl + Sh * Vn = Sl * Vl
Vl * Sh - Vl * Sl = - Vn * Sh
Vl * (Sh - Sl) = -Vn * Sh
Vl / Vn = - Sh / (Sh - Sl)
Vn / Vl = (Sl - Sh) / Sh
Vn / Vl = (Sl / Sh) - 1

Cheers!