October 2012 - Approxion

I’ve already written about the importance of daily practicing for software developers. One essential habit is Playgrounding, where developers practice in easily accessible try-out areas.

Playgrounds are great for tactical programming, when you want to explore a language feature or a certain technique. In order to try out strategic — or larger — ideas you need something else, something I call Masterpieces.

I borrowed the term Masterpiece from the guild system that was (and still is in many countries) responsible for the education of professional craftsmen. Journeymen who want to become masters not only have to take theoretical lessons and write exams; at the end they have to present a so-called ‘masterpiece’ which is meant to demonstrate that they possess the required skills of the trade.

In my opinion, developers should work on Masterpieces, too.

By my definition, a Masterpiece is a project that serves to improve development skills, provides real-world utility to others, and demonstrates the skill and maturity levels to potential clients and employers.

The last point should not be underestimated. As a hiring manager, would you rather hire developers who can demonstrate their abilities (and stamina) to create real-world software products or people who just claim how good they are? (Usually, for legal reasons, it is next to impossible to show production code from a previous (commercial) product that candidates have worked on, but with Masterpieces it is easy.)

Masterpieces don’t have to be true masterpieces according to the other generally accepted definition of the term: “works of art of outstanding quality and/or novelty”. If they are, that’s terrific, but the main focus of a Masterpiece is on skill-improvement and demonstrating the “can-and-willing-to-do” attitude of a developer. Nevertheless, it goes without saying that Masterpieces should be of decent quality. For instance, source code should be laid-out nicely, have good identifier names and there should be documentation, at least a README file, that shows how to use it. If there are automated unit tests, so much the better!

Here is an example of a very small Masterpiece. Some time ago, while trying to improve my Python skills, I wrote a tiny tool called “NoComment!” — a Python script that scans C/C++/Java source code and counts the total number of commented and uncommented lines. It also calculates a comment-density metric. Call it trivial, but it does have real-world utility. You can use it to easily find out if there is code in your project with too little documentation. Even though it is extremely simply, I added a README file, some regression tests and put it up on Sourceforge.

But just showing that you are a great coder is not enough these days — social and especially communication skills are at least as important.

That’s why Masterpieces are not limited to implementing software. You can improve, share, and demonstrate your skills by hosting blogs, giving lectures at conferences, publishing tutorials, podcasts, screencasts, and by contributing to Q&A sites like stackoverflow.com (who wouldn’t want to hire Jon Skeet for his next C# project?).

Masterpieces are a clear win-win habit: you can improve and demonstrate your skills, while others benefit from your knowledge and experience. Just like in the guild system, you cannot become or be a master without them.

“Distance has the same effect on the mind as on the eye.”
― Samuel Johnson, The History of Rasselas, Prince of Abissinia

In previous episodes of this series, I’ve only focused on circular problems involving a single index into a ring buffer. Today, I want to extend the scope to use-cases involving two indices.

At the end of part two of this series, I posted a problem where I wanted to keep track of the time when certain events occurred:


+---------------------------------+-----------+
| First type of event             | 353011    |
+---------------------------------+-----------+
| Second type of event            | 1643      |
+---------------------------------+-----------+
| Third type of event             | 4         |
+---------------------------------+-----------+
| ...                             | ...       |
+---------------------------------+-----------+
:                                 :           :

+---------------------------------+-----------+

| First type of event | 353011 |

+---------------------------------+-----------+

| Second type of event | 1643 |

+---------------------------------+-----------+

| Third type of event | 4 |

+---------------------------------+-----------+

| ... | ... |

+---------------------------------+-----------+

: : :

The first column of this table represents the event type, the second a time-stamp showing when the event took place. The time-stamp comes from a 32-bit timer which can wrap-around.

Our task is to find out in which order these events occurred, which basically boils down to finding a method that allows us to compare two time-stamps, taking wrap-around into account. Once we have such a method, we are able to sort the event list.

Viewed from a another angle, comparing two values of a wrap-around timer is exactly the same as comparing two indices into a ring buffer:


0 1 2 3 4 5 6 7 8 9 (Fig1)
      ^     ^ 
      a     b

0 1 2 3 4 5 6 7 8 9 (Fig1)

^ ^

a b

In this ring buffer of 10 elements, is a ahead of b or is it the other way around? Which one is larger and which one is smaller? Since we don’t know how many times a or b have wrapped around we are not able to give a meaningful answer to this question. So let’s agree on a rule, an invariant that at any time holds for our indices: a and b may wrap around (as many times as they like) but their real, absolute distance is always less than half the size of the buffer (i. e. N div 2). This effectively means that one index can never get ahead of the other by half the size of the buffer or more.

Under this assumption, it is clear that in Fig1 b is ahead of a and the true distance is 6 – 3 = 3. Because of our invariant, a cannot be ahead of b. Why? If a were ahead of b it would have wrapped around and the real, absolute distance between a and b would then be 7, thus violating our rule which requires that the real distance is less than 5.

Contrast this with the next example:


0 1 2 3 4 5 6 7 8 9 (Fig2)
  ^           ^ 
  a           b

0 1 2 3 4 5 6 7 8 9 (Fig2)

^ ^

a b

a must have wrapped around and it must be ahead of b by 4. Otherwise (if b were ahead of a), the distance would be 6 and this — according to our “less than half the buffer size” rule — would be illegal.

The next example is ill-formed: regardless of how you view it, the distance is always 5 so the invariant is violated and we cannot tell which index is ahead of which:


0 1 2 3 4 5 6 7 8 9 (Fig3)
  ^         ^ 
  a         b

0 1 2 3 4 5 6 7 8 9 (Fig3)

^ ^

a b

Just in case you wonder if there is a way to do without our cumbersome invariant: We could in fact utilize the full ring buffer range and not only less than the half, but this would require another invariant: one that requires that index a is always behind index b and there is no overtaking taking place (see part VI). Employing this invariant works in producer/consumer scenarios but turns out to be a lot less useful in the most typical circular use cases. It certainly would be of no help in our event-recording example above, as we cannot assume which event comes before or after another.

Based on our invariant, it was quite easy for us (as human beings) to determine which index is ahead of which. But how can we determine it in software? Simply calculating the linear distance b – a is not sufficient, since we are operating in a circle and not along a one-dimensional axis. In Fig2, calculating b – a gives 6, but the real distance is -4, since a is ahead of b by 4. If we calculate a – b, we get -6 still not -4. How can we get from linear differences to circular differences?

Ah, you must have guessed it! Here it goes again, the mod operator, our universal weapon for fighting circular problems!

Let me explain how the mod operator helps by using one of the most powerful analysis tools ever invented — a table:


b-a  (b-a) mod 10  leader  a < b    Note
-------------------------------------------------
 0         0         -     false    Distance is 0
 1         1         b     true
 2         2         b     true
 3         3         b     true
 4         4         b     true
 5         5         -     -        Illegal, Fig3
 6         6         a     false
 7         7         a     false
 8         8         a     false
 9         9         a     false
-1         9         a     false
-2         8         a     false
-3         7         a     false
-4         6         a     false
-5         5         -     -        Illegal, Fig3
-6         4         b     true
-7         3         b     true
-8         2         b     true
-9         1         b     true
-------------------------------------------------

b-a (b-a) mod 10 leader a < b Note

-------------------------------------------------

0 0 - false Distance is 0

1 1 b true

2 2 b true

3 3 b true

4 4 b true

5 5 - - Illegal, Fig3

6 6 a false

7 7 a false

8 8 a false

9 9 a false

-1 9 a false

-2 8 a false

-3 7 a false

-4 6 a false

-5 5 - - Illegal, Fig3

-6 4 b true

-7 3 b true

-8 2 b true

-9 1 b true

-------------------------------------------------

In the first column are all possible values for the linear distance of two indices; in the second column, this linear distance is taken modulo N, the size of the ring buffer (which happens to be 10 in our examples). In the third column, I put the leader, the index which is ahead of the other index. I did this manually, by drawing sketches and counting, just like I did in Fig2, when I claimed that a is ahead of b by 4. There are two discontinuities in the table: when the distance is exactly 5, that is, half the size of the ring buffer.

Ignoring these discontinuities and looking at column 2 and 3 we can observe that b is ahead of (or equal to) a if the linear distance taken modulo 10 is less than 5; otherwise a is ahead of b.

Based on this observation, we are now able to implement a less-than operator, which allows us to sort circular values. (As you may know, many sorting algorithms from standard libraries, like, for instance, the C++ standard library sort() algorithm, invoke the less operator ‘<‘ on values of a given range to do its job.) Here is an implementation of a circular less function in pseudo-code:


# Returns true if a is less than b in a circle of size N,
# false otherwise.
circular_less(a, b, N):
    if ((b - a) mod N < N div 2)
        return false
    return true

# Returns true if a is less than b in a circle of size N,

# false otherwise.

circular_less(a, b, N):

if ((b - a) mod N < N div 2)

return false

return true

Having had so much fun and success with the table, I added yet another column: the true circular distance, again based on sketching and counting. In Fig2, the true circular distance between a and b would be -4, since a is ahead of b. In Fig1, the circular distance is +3, since b is ahead of a:


b-a  (b-a) mod 10  leader  a < b   cd(a, b) 
-------------------------------------------
 0         0         -     false      0
 1         1         b     true       1
 2         2         b     true       2
 3         3         b     true       3
 4         4         b     true       4
 5         5         -     -          -
 6         6         a     false     -4
 7         7         a     false     -3
 8         8         a     false     -2
 9         9         a     false     -1
-1         9         a     false     -1
-2         8         a     false     -2
-3         7         a     false     -3
-4         6         a     false     -4
-5         5         -     -          -
-6         4         b     true       4
-7         3         b     true       3
-8         2         b     true       2
-9         1         b     true       1
-------------------------------------------

b-a (b-a) mod 10 leader a < b cd(a, b)

-------------------------------------------

0 0 - false 0

1 1 b true 1

2 2 b true 2

3 3 b true 3

4 4 b true 4

5 5 - - -

6 6 a false -4

7 7 a false -3

8 8 a false -2

9 9 a false -1

-1 9 a false -1

-2 8 a false -2

-3 7 a false -3

-4 6 a false -4

-5 5 - - -

-6 4 b true 4

-7 3 b true 3

-8 2 b true 2

-9 1 b true 1

-------------------------------------------

Comparing column 2 and 5 shows that the circular distance (cd) coincides with the linear distance mod 10 for values less than half the size of the circle. For other cases (again, ignoring the discontinuity cases) the circular distance can be obtained by subtracting 10 from the linear distance (mod 10). In pseudo-code:


circular_distance(a, b, N):
    dm = (b - a) mod N
    if (dm < N div 2)
        return dm
    return dm - N

circular_distance(a, b, N):

dm = (b - a) mod N

if (dm < N div 2)

return dm

return dm - N

Now this function is even more useful than circular_less. You can use it directly in sorting algorithms like qsort() from the C standard library, or, if you are using C++’s sort(), you can easily implement circular_less in terms of circular_distance:


circular_less(a, b, N):
    if (circular_distance(a, b, N) < 0)
        return true
    return false

circular_less(a, b, N):

if (circular_distance(a, b, N) < 0)

return true

return false

But the real benefit of circular_distance over circular_less is that you are now able to do real circular distance calculations. In our event table example above, you can determine how many ticks are between event 1 and event 2. Or, as another example, if you have a free running 16-bit timer (a counter with N = 65536) that is updated every millisecond, you could use circular_distance like this:


start_time = get_timer_value()
... some lengthy calculation
end_time = get_timer_value()

cd = circular_distance(start_time, end_time, 65536)
log("Lengthy calculation took " + cd + " milliseconds")

start_time = get_timer_value()

... some lengthy calculation

end_time = get_timer_value()

cd = circular_distance(start_time, end_time, 65536)

log("Lengthy calculation took " + cd + " milliseconds")

Even at the risk of repeating myself, please be aware that the mathematical mod operator and Java’s/C’s % operator are not the same. You can, however, apply the trick that I showed in part I: use ANDing instead of the mod operator for cases where N is a base-2 number; 65536 is a base-2 number so in C you would implement circular_distance like this:


inline int circular_distance_16bit(int a, int b) {
    const int N = 65536;
    int dm = (b - a) & (N - 1);
    if (dm < N / 2) {
        return dm;
    }
    return dm - N;
}

inline int circular_distance_16bit(int a, int b) {

const int N = 65536;

int dm = (b - a) & (N - 1);

if (dm < N / 2) {

return dm;

}

return dm - N;

}

In this installment, I’ve shown how to efficiently use the mod operator to solve circular problems involving more than one index. The solution is based on an invariant that states that the distance between two indices is always less than half the size of the value range. In the next episode, we will learn about another technique, one that is even more powerful than the mod operator: performing circular distance calculation in a two’s complement system.

More circular adventures…

Approxion

Code – People – Everything

Monthly Archives: October 2012

Masterpieces

Circular Adventures III: A Distance Apart