Category Archives: Code

Simplified II: The Weasel That Teaches Evolution

“Weaseling out of things is important to learn. It’s what separates us from the animals… except the weasel.”
— Matt Groening

It’s Towel Day again! What a great opportunity to reflect upon Life, the Universe, and Everything. In this installment of the “Simplified” series, I want to tackle nothing less than Darwin’s theory of evolution. Why? Because I seriously believe that the world would be a better place if people finally understood it.

One common misconception, for example, is that some falsely believe that evolution occurs in a linear fashion, from single-celled organisms to homo sapiens. Such individuals, like the taxi driver that I mentioned in a previous post, think that they can smash the theory of evolution by posing this cunning question:

“If we humans are really decedents of animals like apes and dogs, why are there still apes and dogs around?”

Just in case this reasoning sounds conclusive to you as well: In reality, there is no single line of evolution, it’s rather a tree with many, many branches. On these branches, the universe performs experiments and decides which species survive, through natural selection. Consequently, humans did not evolve from apes, they are rather cousins whose common ancestor was neither ape nor human.

Another fallacy is Fred Hoyle’s famous Boeing 747 analogy. Hoyle was a brilliant scientist, no doubt, yet he refused to accept that complex life-forms can emerge by chance:

“A junkyard contains all the bits and pieces of a Boeing-747, dismembered and in disarray. A whirlwind happens to blow through the yard. What is the chance that after its passage a fully assembled 747, ready to fly, will be found standing there?”

A variant of this argument is the infinite monkey theorem: a monkey typing random letters at a typewriter will never be able to produce Shakespeare’s works.

In 1986, evolutionary biologist Richard Dawkins set out to dispel such misunderstandings by implementing a computer program that works—in Dawkins’ own words—like this:

“It […] begins by choosing a random sequence of 28 letters, […] it duplicates it repeatedly, but with a certain chance of random error – ‘mutation’ – in the copying. The computer examines the mutant nonsense phrases, the ‘progeny’ of the original phrase, and chooses the one which, however slightly, most resembles the target phrase, METHINKS IT IS LIKE A WEASEL.”

Here’s a more detailed explanation of his weasel program: it starts with a string of 28 random letters. Next, it creates N offspring strings by copying the original 28 random letters N times. When being copied, the chance of a letter being changed into another (random) letter is P. Now that we have a set of N new strings, we compare each string letter by letter against “METHINKS IT IS LIKE A WEASEL” (a quote from Shakespeare’s Hamlet, by the way) and pick the one with the most character matches (the highest match score). This one is deemed the “fittest” and kept as the survivor of the first generation; the other N-1 strings are discarded. We repeat the whole process by creating again N offspring from the survivor, then pick a new survivor and so on until the survivor finally matches “METHINKS IT IS LIKE A WEASEL”.

Let’s choose N to be 100 (one hundred offspring per generation) and P to be 5%, as in Dawkins original experiment. How many generations would it take until we finally reach “METHINKS IT IS LIKE A WEASEL”? Just guess a number, I’ll wait here…

Answer: about 50 generations! To me, this number was so ridiculously small that I decided to implement the weasel program myself. Please check it out here. Through command-line parameters, you can change the values of N and P, as well as the target phrase. Toying around with this program is so eye-opening, you suddenly realize that evolution is possible, that complexity can emerge by mutation and survival of the fittest.

Interestingly, if you set P to a high value (say 20%), you are likely to never arrive at the target phrase. A high value of P simulates an early universe, or any environment with a high natural radioactivity level. In one experiment where I set P to 20%, it took 800 generations, while a P value of 1% reached the target phrase after only 90 generations.

I also played with much longer target phrases with a length of more than 150 letters. What I’ve found is that the longer the phrase gets, the more detrimental a high mutation probability becomes. If you think about it, that’s not surprising, as the likelihood that already matching letters are replaced again with non-matching letters increases. Conclusion: the evolution of complex life-forms requires an environment that provides the right mutation probability—neither too low, nor too high. But while a low mutation probability can always be compensated by time (evolution will just progress slower), a high mutation probability stifles evolution entirely.

Anyway, while the weasel program is a blatant simplification of real life, I think it’s a great tool for demonstrating that random variation combined with non-random cumulative selection can produce complex structures. This is what evolution is all about; not monkeys hacking away at keyboards and winds blowing through airplane junkyards.

Share and Enjoy!

Coding Horrors: Sometimes They Come Back

“Within its gates I heard the sound
Of winds in cypress caverns caught
Of huddling trees that moaned, and sought
To whisper what their roots had found.”
― George Sterling, A Dream of Fear

This month’s post is a guest post by Giles Payne, a teammate from a project at Giesecke & Devrient Mobile Security almost two decades ago. At the time, we were trying to teach good coding practices by distributing so-called “Coding Horrors”, bad code examples, that showed what not to do. The other day, Giles sent me such a “Coding Horror” that he came across recently. I couldn’t help but talk him into blogging about it. Giles, take it away!

Thank you, Ralf! Let’s dive right into the code — for fun please take a minute to try and work out what this little piece of Java is doing.

(5 minute pause for head scratching.)

Were you able to work it out? No, I didn’t think so. So what is going on?

This code comes from a kind of simple browser-like application that needs to handle a stack of displayed pages. In some cases it will manage a history of the pages, in other cases it doesn’t care about the history — hence the “USE_PAGE_HISTORY” descriptor.

But what exactly is it doing? Well, what this particular code does is change the meaning of the equals comparator for PageIds to always return true when USE_PAGE_HISTORY is false. (Face palm.)

In an attempt to pinpoint exactly what is so horrendous about this I’ve written up a few basic “programming principles” that I feel have been grievously violated here.

Principle #1: Don’t be language snob.

If you’re programming Java, then program Java, if you’re programming Visual Basic, then program Visual Basic. The person that wrote this code was obviously heavily into functional programming languages. So totally into functional programming languages, that the thought of writing code in a non-functional programming language like Java drove him or her to distraction — or more exactly to use the “Functional Java” library. “Functional Java” is basically an attempt to turn Java into a functional programming language, which it doesn’t really manage to do. What it does do is turn your code into cryptic, impenetrable gobbledygook like this.

Principle #2: Don’t bury important logic deep in non-obvious abstractions.

The usePageHistory moniker is pretty easy to understand. If I see code like

it’s pretty easy to follow. If I had lots of places with the same if-else logic, I might separate the logic into two classes like PageStackBasic and PageStackWithHistory. Hiding the logic like whether you’re using history or not inside the equals comparison operator is a recipe for disaster because nobody is ever going to think of looking there (see next principle).

Principle #3: Don’t override standard operators in non-standard ways.

Some languages like C++ let you override arithmetic/boolean operators — others don’t. There is a lot of debate among language designers about whether allowing operator overriding is a good thing or a bad thing. On the plus side there are some very cool things you can do, for example if you have a class representing matrices then you can override the * operator to do matrix multiplication. On the minus side, they let you write stuff like this. While this code probably works now — it’s the kind of code that probably will stop working one day and when it does, it will take out an entire country’s air-traffic control system*. While it probably seemed like a clever or elegant piece of code to the person that wrote it — it’s a maintenance programmer’s nightmare.

Principle #3 is actually picked up on in the highly amusing “How to write unmaintainable code“. It’s basically a back-to-front version of “Coding Horrors” where bad coding practices are encouraged as means to achieving a job for live. (Oh my gosh, this is a Fake Surgeon’s gold mine! — Ralf’s note.)

________________________________

*) Relax — this code is not part of an air-traffic control system. My point was simply that when bad code like this blows up, the impact usually tends to be massive.

Bug Hunting Adventures #13: Prime Sums (Solution)

The challenge suffers from what I call a “chain of blunders”, where one blunder leads to another. Here are the exact details, in the traditional format.

The first who got close to the true nature of this bug was reader Shlomo who commented directly on the post, but I held back his comment in order not to spoil the fun for others. (Unfortunately, I couldn’t tell him, because he used a bogus email address—boo!). Christian Hujer, hacker extraordinaire, gave the most precise and extensive account on LinkedIn. While many found the blunder in the Makefile (Joe Nelson was the first), it was apparently such a good smokescreen that many people didn’t look any further. To me, the root blunder that started the chain of blunders is in the C language itself, which should have never allowed implicit zero-initialization of constants in the first place (which was corrected in C++).

Some believed that the preincrement of the loop-counter was the culprit as it would skip the first prime, but that’s not the case. The expression after the second semicolon gets evaluated always at the end of the loop body:

is equivalent to

Substitute ++i or i++ for <e> — there’s no difference!

On a general note, guys, please register by entering your email address in the top right corner to ensure that you will get automatic notifications for new posts as soon as they’re published. I also (usually) announce new posts on LinkedIn, but mostly hours if not days later. Nevertheless, connecting with me on LinkedIn is always a good idea and highly encouraged. Your subscriptions, likes, praise, and criticism keep me motivated to carry on, so don’t hold back!

Bug Hunting Adventures #13: Prime Sums

“Why, yes; and not exactly that either. The fact is, we have all been a good deal puzzled because the affair is so simple, and yet baffles us altogether.”
― Edgar Allan Poe, The Purloined Letter

Below, you find a little C project that doesn’t do what it’s supposed to do, namely print the sum of the first 10 prime numbers. The program builds cleanly with gcc and clang; that is, without any warnings even when using -Wextra -Wall -pedantic -ansi as compiler options. It’s well-formed and doesn’t crash.

What’s the root cause of this bug? What’s the output of the program? Here are the files, you can also find them on GitHub:

prime_table.h:

prime_table.c:

prime_sum.c:

Makefile:

Solution

const_map, anyone?

Today, I want to share a little C++ container class with you that has proved useful over the years: const_map. I finally got around to turning it into something reusable, or so I hope. Special thanks go to John Hinke, who provided awesome feedback.

In simple words, const_map is an associated array, just like good ol’ std::map: a mapping from keys to values which allows you to lookup values by keys:

Contrary to std::map, however, const_map is a read-only container. To set it up, you provide a key/value mapping table to the constructor:

Once a const_map is instantiated, it’s not possible to add or remove key/value pairs anymore:

Just like std::map, const_map’s runtime is O(log(n)). Contrary to std::map, the implementation isn’t based on binary trees but on binary search. Hence, the mapping table must be sorted by key in ascending order.

const_map doesn’t do any heap allocations. All it does is maintain two pointers: one to the beginning and another one to the end of the mapping table. Apart form these two pointers, const_map doesn’t need any other state or housekeeping data. Thus, on a 32-bit platform, const_map instances will cost you eight measly bytes of RAM.

Now, I know there are some embedded cheapskates out there to whom even eight bytes is sometimes too big a sacrifice. If you’re so tight on RAM and are willing to forgo the fancy STL-like interface and rather expend a little bit more typing, you can use the static class member function ‘lookup’ which is completely stateless and doesn’t require an instance:

I’ve mostly used const_map as a replacement for long, unwieldy switch-case abominations. Either to translate values from one data type to another or to determine which action to take based on a given value. In the latter case, my value type was a function pointer or functor.

Share and enjoy!

Code Kata 5: Reversing The Bits

“Boredom is just the reverse side of fascination: both depend on being outside rather than inside a situation, and one leads to the other.”
— Arthur Schopenhauer

Winter is definitely approaching. The days are getting shorter and temperatures are dropping. Isn’t this a wonderful time for doing code katas? Sitting in your easy chair, in front of a fireplace, serenely coding away in your favorite editor.

Today, we dip once more into the realm of low-level embedded programming. Your primary job is to write code that reverses the bits of an integer. For instance, reversing the 8-bit integer value 139 (10001011) yields a new value of 209 (11010001).

Sounds simple, doesn’t it? But it’s not just about coding a bit-reversing routine, that wouldn’t teach you enough lessons. You’ll also practice unit testing, code tuning and — if you like — assembly language programming, cross-compilation, and remote debugging along the way.

The goal is not to find the best/fastest algorithm possible. Instead, the emphasis is on exercising your programming muscle and learning new things. Don’t rush this kata, rather consider it a mini project. Do the research, be persistent. Try and fail and become better. Take your time, I did it over a period of three weeks. Now, without further ado, let’s get (gently) started.

1. Implement a bit reversing function in a dynamic programming language (eg. Perl, Ruby, Python).

a) Decide on a suitable interface, eg. in Python, it could look like this:

b) Implement a dummy version of ‘reversebits’ that simply returns 0.

c) Implement a couple of test cases, run them against the dummy version and watch them fail, eg.

d) Implement a straightforward solution (‘reversebits1’) that passes the unit tests.

e) Add more unit tests to gain better confidence in your solution.

f) Measure the execution time of ‘reversebits1’.

g) Be creative, find other solutions. Optimize for speed, optimize for shortest code (in terms of characters typed). Measure and compare the execution times.

h) What if ‘max_bits’ is smaller than the total number of bits already taken-up by ‘value’? For instance

In this case, only the first ‘max_bits’ of ‘value’ shall be reversed and the remaining upper bits shall be discarded:

Adapt your code accordingly. Probably, some of your solutions already behave correctly and need no modifications.

2. Port your code and unit tests to C/C++

a) Decide on a suitable interface. It’s OK if you use a fixed-bit unsigned integer data type for the value to be reversed, as long as it supports at least 32 bits, eg.

b) Measure execution times and compare them against the execution times from 1.

c) Play with various compiler optimization settings. How do they impact the timings?

d) Bonus: Implement a bit reversal function in x86 assembly language.

3. Port your code and tests from 1. and 2. to a Raspberry Pi

a) Don’t have a Pi? Find out how to emulate one on your PC.

b) Measure and compare execution times using various optimization settings.

c) Bonus: Implement a bit reversal function in ARM assembly language.

d) Bonus: Want to learn even more? Build your own cross toolchain (hint: ‘crosstool-ng’) and build the Pi code on a regular PC. Debug the code remotely over a TCP connection (hint: ‘gdbserver’, ‘gdb-multiarch’).

I certainly learned a lot by doing this kata. Far more, than I can possible tell here without risking to bore you and/or spoil all the fun. Nevertheless, I uploaded my work to GitHub. Have a peek in case you get stuck, need some inspiration or are just curious.