« Posts under Code Katas

Code Kata 4: Struct Member Alignment

karate.jpg

“The whole is greater than the sum of its parts”
–Aristotle

How time flies! More than two years have passed since I posted my last kata, so it’s high time for a new one. Today’s kata is based on a real problem that I had to solve last week when I worked on a code generator that produced C/C++ code. I hope you enjoy this task as much as I did!

When you define a struct in C, the overall size of the struct is not necessarily the sum of its members. Much like in real life, the whole is greater than the sum of the parts. Consider this example:

Only novice programmers would assume that sizeof(Foo) equals 5; experienced programmers know that 8 is a much safer bet. How come?

Most computer architectures can access data only if it is properly aligned in memory, e. g. a 32-bit (4-byte) integer can only be accessed if it is stored at an address that is evenly divisible by 4. The compiler usually achieves this by inserting invisible padding bytes between the struct members. Thus, internally, struct Foo is likely to look like this:

As a first step, familiarize yourself with struct padding. Check out this must-read by legendary Eric S. Raymond, especially if you are a systems programmer.

Now that you have read it, you should understand why sometimes there is also trailing padding (but never leading padding) and hence why the size of the following struct is most likely 12:

Equipped with this knowledge we are ready to tackle our first programming task: assume that every primitive struct member of base-2 size is to be aligned on its base-2 boundary (a 2-byte integer on an address that is evenly divisible by 2, a 4-byte integer on an address that is evenly divisible by 4 and so on). Implement an algorithm that computes the overall size of a struct given an ordered list of its members. Instead of real types, provide a list of integer values where the values represent the sizes of the members. Here are examples for Foo and Bar (in Python):

One weakness of this approach is that you cannot differentiate between a unit32_t (size is 4, alignment is 4) and an array of 4 uint8_ts (size is 4, alignment is 1):

Extend your algorithm to accept a list of pairs, where the first pair member specifies the size and the second pair member specifies the alignment:

But there is one more feature we need to add before we’re done: support for arbitrarily nested structs:

How does a member that itself is a struct impact the alignment of the outer struct? Devise a suitable data structure for passing in nested struct specifications.

[Update 2016-03-12: I’ve uploaded a sample solution to GitHub]

Code Kata 3: The Perfect Match

karate.jpgIt was certainly a mix of emotions that I experienced when my then 13 year old daughter came by to talk about programming: a weird mix of pride, curiosity and — suspicion. Normally, she didn’t care at all about technical stuff and even less about what her father is doing for a living. So what the heck was she up to?

She told me about a cool “game” that she and her friends had been playing: calculating the match factor for two persons. Obviously, she wanted to try out many different person/person combinations and she had figured that this cried for some kind of automation.

I thought this was a nice opportunity to demonstrate that programming can be both: useful and fun, so we sat down and she explained how the algorithm worked.

1. Write down the names of the two persons, in capital letters, e. g.

2. Start at the first character, cross out all occurrences of this character and write down how many occurrences there are:

(since there is only one ‘H’)

3. Repeat this with the next character

(since there are two ‘A’s)

4. Repeat with the next (not yet crossed-out) character until all characters are crossed-out

(H:1, A:2, R:2, Y:2, S:1, L:2)

5. Now add up the number of occurrences. Add the first digit and the last digit and write down the sum; next the second and the last-but-one. Repeat until finished:

(1 + 2 = 3, 2 + 1 = 3, 2 + 2 = 4)

6. If the result (interpreted as a single value) is greater than 100, repeat step 5 until it is less or equal to 100.

(3 + 4 = 7, 3 + 0 = 3)

7. The result is the match value (73%) for the given two persons.

I implemented the first version in Perl but since I had so much fun, I decided to reimplement it in C++. Being that shell freak that I am I chose this user interface:

OK, so now it is your turn. Implement this algorithm in a programming language of your liking.

You’ll get the same advice that I gave to my daughter then: Don’t be sad if you don’t get the results that you wish for. Even Romeo and Juliet have a match factor of only 56% and yet they are together in eternity.

Epilogue

Almost three years have past since that day and this was the only case where my daughter showed some interest in programming. This leads me to think that she found out — already at such an early age — that no computer will ever solve the really important problems of human beings.

Code Kata 2: The “Average” Developer

karate.jpgYou might be tempted to think that calculating the average of two integers is a trivial thing. If you think that it is as easy as this:

you are wrong. But don’t be depressed — you are in good company. Code like this is written every day — not just by average developers.

(Task#1: Hold on for a minute. Can you spot any problems with this code?)

I’m not talking about the fact that the result is not precise — it can’t be, due to the limitations of the integer type. If ‘a’ is 5 and ‘b’ is 6, the mathematically correct result is 5.5. With integer division, however, it is 5, which is perfectly OK for our purposes; otherwise, we would have used ‘float’ or ‘double’. No, that’s not what I’m looking for…

The trouble is that if a developer looks at the signature, (s)he will probably think “Well, this function takes two integers and returns their average value. I can pass ANY unsigned integer to this function.”, which is of course not true. Why? If the integers are large enough, their sum will overflow the unsigned int range and hence the result of the division will be wrong. There you have it!

(Task#2: Write tests that show when and how overflow happens.)

You could use a larger type internally:

but maybe ‘long long’ is not available on your platform, or you are already using ‘long long’ and there is no bigger type, or you simply don’t want to sacrifice efficiency, especially by using a floating point type. (Note that ‘long’ is not sufficient in our example as on 32-bit platforms ‘int’ and ‘long’ are usually of the same size). So how can you solve this problem without resorting to a larger (more expensive) type?

Years ago I found a solution (I don’t claim that I’m the first who invented this, and I won’t be the last to reinvent it, either) by reasoning like this:

In pure mathematical terms, with a mathematical division operator, (a + b) / 2 is identical to a / 2 + b / 2. By applying this principle — and “wasting” another division — one could avoid the dreaded overflow in the sum. Unfortunately, integer division doesn’t exactly work like mathematical division. For instance, if ‘a’ is 5 and ‘b’ is 7, (5 + 7) / 2 is 6, but 5 / 2 + 7 / 2 = 2 + 3, which is 5. So it is possible that the result is one less than what we expect. How can we compensate for that?

Here is the insight: For both divisions, the integer division operator leaves a remainder behind and if the sum of the remainders (of both divisions) is equal to two, we have to add 1 to our result. Think of it this way: in such cases, the sum of ‘a’ and ‘b’ contains a 2 more than the parts of the sum themselves:

Here is an implementation that applies this correction:

Unfortunately, this is not only ugly, it is not very efficient, either. Now we have two divisions, two modulo operations (which are as expensive as division operations), two additions, an if-statement and an increment operation. But as every bit-fiddler knows, there is a simpler modulo operation for divisors that are a power of two:

Equipped with this knowledge we get:

Which is identical to:

In C, results of boolean operations are either 0 or 1 (in C++, they are ‘false’ or ‘true’, but ‘false’ and ‘true’ are converted implicitly to 0 or 1 in arithmetic expressions), so we can shorten our average function:

Cool isn’t it? But it’s not unlikely that your compiler will apply these optimizations for you automatically, in case you write your code like in the first, ugly solution. But it certainly won’t add the correction that is missing from the first implementation — trust me on this one.

(Task#3: Write a small program that uses this implementation of average() and test it.)
(Task#4: If you change the types used by average() from ‘unsigned int’ to ‘int’, will the code also work, specifically for negative values of ‘a’ and ‘b’? Why? Write some tests to proof your theory.)
(Task#5: Implement and test an average4() function that calculates the average of four integers.)

Code Kata 1: The Closer You Get

karate.jpgI’ve already written about Code Katas and how they can help you become a better programmer; this time, I’m getting serious about it.

Like all craftsmen, we need to practice a lot; eight hours of professional software development is not enough, especially if a great share of these hours is dedicated to email and meetings. A carpenter is not just building houses but also tries out new ideas on his workbench in the basement; painters sketch and explore new combinations of paint and color. Photographers do it, musicians do it: everyone who wants to become better at their craft has to practice in a safe and dry place.

Today’s kata is a simple one. Imagine you have a set of numeric values. If I give you another value (which may or may not be within the set), find a value from the set that matches the given value as close as possible. Don’t write any code yet! Proceed as indicated below.

  1. Think about the problem and draw a picture that illustrates the problem.
  2. What special cases do you see?
  3. Try to come up with a simple algorithm (use pseudo code, flowcharts, whatever you prefer).
  4. Write down a few good test cases and mentally check your algorithm against them. Select a programming language of your choice and code a dummy implementation that always returns the same (wrong) result.
  5. Code your test cases and execute them against your dummy implementation. You don’t have to use unit testing frameworks like JUnit; use the simplest way of comparing results to expected results and flag an error if there is no match.
  6. Watch your tests fail.
  7. Implement your algorithm.
  8. Execute your tests and debug your code until all tests pass.
  9. If you haven’t done it yet: single-step through every line of your code.

Bonus exercise:

  1. Turn your algorithm into a generic routine that is suitable for a library; that is, support different value types (floating point types, integer types).