« Posts under Circular Adventures

Circular Adventures VII: A Ring Buffer Implementation

“The Limited Circle Is Pure”
— Franz Kafka

In systems programming, ring buffers are ubiquitous. Like regular queues, they’re first-in first-out data structures but contrary to classic queues they’re fixed in size and don’t grow dynamically. This is especially important in real-time contexts where time and space determinism is paramount.

In today’s circular episode I want to show you my go-to ring buffer implementation. I like it, because it’s short and decently efficient. I deliberately used only basic C++ syntax, so it should compile fine with even compilers from the previous millennium (i. e. C++98):

You can specify the data type of ring buffer elements via template parameter T and the total number of elements to track via template parameter N, respectively. Here’s a little usage scenario:

A common problem that you face when implementing a ring buffer is discriminating between empty and full states, as in both cases the head and tail index point to the same location. I solved it by allocating one extra element, which might waste a little bit of space but makes the code easier on the eye and probably faster compared to the alternative approach which requires you have to maintain an extra boolean flag.

When the buffer becomes full during an add operation (head == tail), the tail index is immediately advanced by one, thus dropping the oldest element. As this all happens within the add method, from a ring_buffer user’s point of view the head index is only ever equal to the tail index when the ring buffer is empty.

Contemporary compilers will replace the potentially costly modulo operation in the advance helper method with an AND operation for buffer sizes that are base-2 numbers. However, keep in mind that the advance method uses the internal buffer size (BUFSIZE), which is one greater than the requested ring buffer size (N), so this is most likely an efficient ring buffer:

while this isn’t:

More circular adventures…

Circular Adventures VI: When the Winner is Known Beforehand

“Girls have an unfair advantage over men: if they can’t get what they want by being smart, they can get it by being dumb.”
— Yul Brynner

In part III and IV I discussed solutions for a circular problem where two indices performed a race within a circular buffer; that is, either index could be ahead of the other.

Sometimes, however, life is simpler and it is always known which index is the leader, and thus the winner — from the outset:

In this example, provided we know that b must always be ahead of a, we can deduce that b has wrapped around and the distance between a and b is 4.

Either implementation (the one given in part III and the one given in part IV) of circular_distance will return the same result:

However, both will fail for this case:

Why? Under the premise that b is always ahead of a, the distance is +6 not -4. circular_distance computes the wrong result because it assumes that the leading index is less than half the circular buffer size ahead of the other index. This assumption was made (I called it ‘invariant’ at the time) to be able to compute the circular distance even if it is not known which index is ahead of which.

Based on the prerequisite that b is always ahead of a we can give a simplified version of the circular_distance function from part III:

I call this function circular_lead instead of circular_distance to emphasize that it returns how much b is ahead of a, which is always a positive number. As usual, all the restrictions (and optimizations) regarding the mod operator apply. In C, which lacks a true mod operator, a generic, portable implementation looks like this:

In situations where one index is known to be ahead of the other, circular_lead has an edge over circular_distance because it supports distances in the range 0 to N-1, whereas circular_distance only supports ranges from 0 to (N-1)/2. This is always the case in “monotonically increasing” scenarios, like run-time measurement, at least until the flux capacitor is invented. Hence, the last example of part IV can be rewritten like this:

If we replace the mod operator with a cast to uint32_t, circular_lead_32bit boils down to this:

[For the mathematically inclined: What we are really talking about here is residue class rings, a concept used by all contemporary computers. I might explore the number theory behind this and other circular topics in a future post.]

More circular adventures…

Circular Adventures V: The Christmas Edition

“And the Grinch, with his Grinch-feet ice cold in the snow,
stood puzzling and puzzling, how could it be so?
It came without ribbons. It came without tags.
It came without packages, boxes or bags.
And he puzzled and puzzled ’till his puzzler was sore.
Then the Grinch thought of something he hadn’t before.
What if Christmas, he thought, doesn’t come from a store?
What if Christmas, perhaps, means a little bit more?”

— Dr. Seuss
“How the Grinch Stole Christmas”

Oh well, oh well, it’s Christmas time again. Year after year, in this dark season, people contemplate the circle of life, watch reruns on TV, and rush to shopping malls to buy overpriced items for their loved ones; it is not unlikely, that — as soon as the shops are open again — these loved ones rush to return all the stuff that they don’t really need and buy something cool instead. You can also be sure that the level of madness increases every year — due to a well-known effect that economists call “inflation”.

Since so many things recur around Christmas, this is the perfect time for me to share more “circular” thoughts with you.

But first some background on the origins of Christmas. The reason why people celebrate Christmas on December 25 has little to do with the birth of Jesus, but rather with an ancient Roman cult called “Sol Invictus” (which means something like “unconquerable or invincible sun”). When the Julian calendar system was introduced around BC 45, the winter solstice occurred on December  25 (as opposed to today, where it happens on either December 21 or 22); this day was celebrated as the birth of the sun: on December 25, the sun would come back and rise to its full power over the next months.

It is believed that early Christians also took part in these festivities and celebrated together with the pagans by kindling lights. When the “Sol Invictus” cult was finally replaced by Christianity around AD 300, Christians decided to keep that special day but celebrate the birth of Jesus Christ, instead. (Today, it is assumed that Jesus Christ was born somewhere around March, BC 4.)

But now back to more technical stuff. You might have observed that the weekday of a particular date within a year gets advanced by one weekday in the following year. That is, if first of August is a Tuesday this year it will be a Wednesday next year. But how come?

Being a veteran circular adventurer by know, you should be able to come up with an answer yourself — at least you should try. Don’t cheat. Don’t read on. Think about it.

OK, here you go. A regular year (no leap year) has 365 days and a week comprises seven days. 365 mod 7 is 1, which means that after 52 7-day weeks, there is still this one day left in the old year to be filled with a weekday. You can think of it like this: the old year nibbles away one weekday from the new year and thus weekdays in the new year are “rotated left” by one. For leap years, the “rotate value” is two, since 366 mod 7 equals 2.

Regardless of your convictions, regardless of how strange they might look to adherents of other convictions, regardless of whether or what you celebrate, I wish all of you, dear readers of my blog, a Great Time and the best for the New Year.

More circular adventures…

Circular Adventures IV: A “Complementary” Gift

“Th’ hast spoken right, ’tis true.
The wheel is come full circle, I am here.”

— Shakespear, King Lear Act 5, scene 3, 171–175

In the previous part of this series, I’ve shown how to calculate the distance between two indices into a circular buffer using the mathematical mod operator. Today, I will give you a solution that doesn’t need the mod operator and is even more efficient.

We have always viewed our N element ring buffer like this:

There are N = 10 elements, indices start at 0 and end at N – 1. But let’s pretend for a moment that we had signed indices instead; indices that can become negative:

Like in the unsigned system, we assume that the borders are joined, that is, the successor of 4 is -5 and the predecessor of -5 is 4.

The arithmetic distance between a and b is, like before, b – a:

What is special about this new system is that if the distance gets larger than 4, it suddenly becomes negative. Remember that the successor of 4 is -5. So in this case

b – a is +6 which is, taken our special wrap-around mechanism into account, -4. This means that not b is ahead of a, but a is ahead of b by 4!

In fact, the wrap-around value of the arithmetic distance is exactly the circular distance that we calculated in part III using a different approach:

Note that our invariant which we established in the previous installment still applies: the true circular distance between two indices must be half the size of the ring buffer (N) or less. That’s why the circular distance for cases where b – a equals +/-5 is marked as undefined in the table.

This “unusual” range of index values from -N/2 – 1 to +N/2 is what’s called a 2’s complement system in computer science. It’s a means of supporting signed types in programming languages. The unsigned value range 0 to N is split in two halves, one for the negative values and one for the positive values. Even though it’s not stipulated by the C language standard, 2’s complement systems are the de facto standard for signed type implementations in all contemporary compilers (including Java and C#; contrary to C, both actually do stipulate 2’s complement).

In order to compute the circular distance in a 2’s complement system, it looks as if we need to take three steps: 1) convert our real-world unsigned indices into signed indices, 2) calculate their signed difference, and 3) wrap-around the signed difference if it lays outside the -N/2 – 1 to +N/2 value range, similarly to what the mod operator does for unsigned values.

Step 1 we can skip altogether, because to a CPU, there really is no difference between signed and unsigned numbers, it’s just a matter of how we (or the compiler) interpret the bit patterns of the operation’s result. Hence, we can proceed to step 2 and compute the unsigned difference and interpret it as a signed number in step 3:

Now, if N is a base-2 number that coincides with our signed integer types

Step 3 is as simple as a type-cast:

or even simpler:

Strictly speaking, casting an unsigned integer to a signed integer is “implementation defined”. Implementation defined is not to be confused with “undefined” and means for all practical purposes: interpret the unsigned value as a 2’s complement signed integer value. This achieves the intended 2’s complement wrap-around, just as we desire.

As a case in point, consider a 32-bit timer that keeps track of the system time. A timer interrupt service routine will update the timer value regularly behind the scenes. Based on what we have learned, you can easily measure the time between two events, even if the timer should overflow in between:

Most systems software (including embedded systems) is implemented in C, mainly because efficiency is at a premium. Due to the fact that C doesn’t come with a mathematical mod operator, it is sometimes sensible to “force” the size of timers/counters/ring buffers to the upper bound of an unsigned integer type. With this being the case, the aforementioned ‘cast-to-signed’ trick can be applied and calculating circular distances becomes both, easy and computationally cheap.

More circular adventures…

Circular Adventures III: A Distance Apart

“Distance has the same effect on the mind as on the eye.”
― Samuel Johnson, The History of Rasselas, Prince of Abissinia

In previous episodes of this series, I’ve only focused on circular problems involving a single index into a ring buffer. Today, I want to extend the scope to use-cases involving two indices.

At the end of part two of this series, I posted a problem where I wanted to keep track of the time when certain events occurred:

The first column of this table represents the event type, the second a time-stamp showing when the event took place. The time-stamp comes from a 32-bit timer which can wrap-around.

Our task is to find out in which order these events occurred, which basically boils down to finding a method that allows us to compare two time-stamps, taking wrap-around into account. Once we have such a method, we are able to sort the event list.

Viewed from a another angle, comparing two values of a wrap-around timer is exactly the same as comparing two indices into a ring buffer:

In this ring buffer of 10 elements, is a ahead of b or is it the other way around? Which one is larger and which one is smaller? Since we don’t know how many times a or b have wrapped around we are not able to give a meaningful answer to this question. So let’s agree on a rule, an invariant that at any time holds for our indices: a and b may wrap around (as many times as they like) but their real, absolute distance is always less than half the size of the buffer (i. e. N div 2). This effectively means that one index can never get ahead of the other by half the size of the buffer or more.

Under this assumption, it is clear that in Fig1 b is ahead of a and the true distance is 6 – 3 = 3. Because of our invariant, a cannot be ahead of b. Why? If a were ahead of b it would have wrapped around and the real, absolute distance between a and b would then be 7, thus violating our rule which requires that the real distance is less than 5.

Contrast this with the next example:

a must have wrapped around and it must be ahead of b by 4. Otherwise (if b were ahead of a), the distance would be 6 and this — according to our “less than half the buffer size” rule — would be illegal.

The next example is ill-formed: regardless of how you view it, the distance is always 5 so the invariant is violated and we cannot tell which index is ahead of which:

Just in case you wonder if there is a way to do without our cumbersome invariant: We could in fact utilize the full ring buffer range and not only less than the half, but this would require another invariant: one that requires that index a is always behind index b and there is no overtaking taking place (see part VI). Employing this invariant works in producer/consumer scenarios but turns out to be a lot less useful in the most typical circular use cases. It certainly would be of no help in our event-recording example above, as we cannot assume which event comes before or after another.

Based on our invariant, it was quite easy for us (as human beings) to determine which index is ahead of which. But how can we determine it in software? Simply calculating the linear distance b – a is not sufficient, since we are operating in a circle and not along a one-dimensional axis. In Fig2, calculating b – a gives 6, but the real distance is -4, since a is ahead of b by 4. If we calculate a – b, we get -6 still not -4. How can we get from linear differences to circular differences?

Ah, you must have guessed it! Here it goes again, the mod operator, our universal weapon for fighting circular problems!

Let me explain how the mod operator helps by using one of the most powerful analysis tools ever invented — a table:

In the first column are all possible values for the linear distance of two indices; in the second column, this linear distance is taken modulo N, the size of the ring buffer (which happens to be 10 in our examples). In the third column, I put the leader, the index which is ahead of the other index. I did this manually, by drawing sketches and counting, just like I did in Fig2, when I claimed that a is ahead of b by 4. There are two discontinuities in the table: when the distance is exactly 5, that is, half the size of the ring buffer.

Ignoring these discontinuities and looking at column 2 and 3 we can observe that b is ahead of (or equal to) a if the linear distance taken modulo 10 is less than 5; otherwise a is ahead of b.

Based on this observation, we are now able to implement a less-than operator, which allows us to sort circular values. (As you may know, many sorting algorithms from standard libraries, like, for instance, the C++ standard library sort() algorithm, invoke the less operator ‘<‘ on values of a given range to do its job.) Here is an implementation of a circular less function in pseudo-code:

Having had so much fun and success with the table, I added yet another column: the true circular distance, again based on sketching and counting. In Fig2, the true circular distance between a and b would be -4, since a is ahead of b. In Fig1, the circular distance is +3, since b is ahead of a:

Comparing column 2 and 5 shows that the circular distance (cd) coincides with the linear distance mod 10 for values less than half the size of the circle. For other cases (again, ignoring the discontinuity cases) the circular distance can be obtained by subtracting 10 from the linear distance (mod 10). In pseudo-code:

Now this function is even more useful than circular_less. You can use it directly in sorting algorithms like qsort() from the C standard library, or, if you are using C++’s sort(), you can easily implement circular_less in terms of circular_distance:

But the real benefit of circular_distance over circular_less is that you are now able to do real circular distance calculations. In our event table example above, you can determine how many ticks are between event 1 and event 2. Or, as another example, if you have a free running 16-bit timer (a counter with N = 65536) that is updated every millisecond, you could use circular_distance like this:

Even at the risk of repeating myself, please be aware that the mathematical mod operator and Java’s/C’s % operator are not the same. You can, however, apply the trick that I showed in part I: use ANDing instead of the mod operator for cases where N is a base-2 number; 65536 is a base-2 number so in C you would implement circular_distance like this:

In this installment, I’ve shown how to efficiently use the mod operator to solve circular problems involving more than one index. The solution is based on an invariant that states that the distance between two indices is always less than half the size of the value range. In the next episode, we will learn about another technique, one that is even more powerful than the mod operator: performing circular distance calculation in a two’s complement system.

More circular adventures…

Circular Adventures II: A Ring within a Ring

“Is all that we see or seem
But a dream within a dream?”
–Edgar Allan Poe

In part one of this series I’ve discussed the mod operator and demonstrated that it is a flexible tool for solving circular problems. I’ve also explained that the mod operator, as implemented by most (system) programming languages, is not able to handle problems involving negative dividends. As a remedy, I’ve introduced an efficient alternative of the mod operator for divisors that are base-2 numbers.

Today, I want to have a look at some first circular use-cases. Again, we meet our good old friend, the ring buffer.

Sometimes, you need to store data and keep previous versions of that data, but the memory available is limited. In such situations, an obvious approach is to use a ring buffer.

In this example, we have a ring buffer (again, pretend that the last and first element are joined) comprising eight (N = 8) fixed-size slots for storing information. Each slot contains a data set. We start by writing to slot 0, then slot 1 and so on; when the ring buffer is full (that is, slot 7 has been written to), we wrap around to slot 0 and overwrite what has been stored there previously, thus kicking out the oldest data set. By using this scheme, we not only have access to the most recent data set, but also to up to seven predecessors.

In addition to the actual ring buffer, we need a variable to keep track of the most-recently updated (“MRU”) slot number; that is, an index that points to the “newest” data set. Based on the current value of the MRU we can easily calculate which slot is to be updated upon the next write access:

Updating the ring buffer is just one part of the story, reading another. For instance, to get the predecessor of the current data set we need to calculate the slot index based on the previous value of MRU. Again, by using the mod operator, this can be achieved easily:

Dumping the whole ring buffer in reverse chronological order is achieved with this simple loop:

Note that for this to work, the mod operator has to be a mathematical modulo operator since the dividends may become negative. If you don’t have access to a true mathematical mod operator, you should make N a base-2 number and use the modulo replacement operation given in the previous installment (mod_base2).

Now let’s make our ring buffer example a bit more interesting. Let’s pretend our ring buffer must be stored in persistent, non-volatile memory (NVM), like flash. One problem with flash memory is that it wears out over time: the more often you update flash memory, the shorter the data retention time gets. (It’s like using a bucket with a crack to carry water from one place to another and the crack gets bigger with every lap you go.)

Therefore, storing the most-recently updated value in a single flash variable is not a good idea, since this memory location would wear-out quickly: every update to a ring buffer would require an update to the MRU variable, which means that the update load on the MRU variable is N times the update load on the individual slots.

A frequently used countermeasure is to have a “distributed MRU variable”, which is achieved by prefixing or suffixing every slot with a so-called “age counter” field.

The age counter starts at zero and is incremented with every write to a slot. So after three updates to an initially empty ring buffer, we would have this:

After ten updates, the ring buffer would look like this:

The slot containing the age counter with the highest value must be the MRU slot, slot 1 in our example. For obvious reasons, the value range of the age counter must be larger than the slot range of the ring buffer, but sooner or later, even the age counter will wrap around. After many writes to the ring buffer we might get this picture:

So which slot is the MRU slot now? The age counter is obviously an unsigned 8-bit type, since it wrapped around at value 255. Starting with 253 (the age counter value of slot 0) and taken modulo 256, all counter values increase by one up to 3. But the next value, 252, is not a successor of 3; it is too far ahead (or rather behind). This means that the slot containing age counter value 3 must be the most recently updated slot (which is 6).

The MRU is the basis for all ring buffer operations and it makes sense to maintain its value in a RAM variable. But how can one determine the MRU programmatically, for instance after a reset when all RAM content is lost? Here is a method to search for it:

It is a good exercise to convince yourself that this algorithm actually works. Think about the initial “empty” ring buffer case where all age counters are zero as well as the case where the “highest” age counter is stored in the last slot.

Luckily, the dividends in this algorithm cannot become negative, so if we wanted to code this in C, we could use the ‘%’ operator. But since we are doing a modulo 256 operation, we can apply the optimization given in the first installement, which is replacing the modulo 256 operation with a (uint8_t) cast:

Once we are in possession of the MRU, we can work with this ring buffer just like we did with the previous ring buffer. The only thing we have to do in addition is to memorize the highest (or current) age counter value for future updates to the ring buffer:

Finding the MRU in our second ring buffer was not particularly difficult — we just needed to compare the age counters until we found a “discontinuity”. We knew that the slots are updated, one after another, cyclically, with increasing age counter values. Things get more complicated when there is no such strict sequential ordering. Let me explain.

Imaginge you want to keep track of when certain events occur. For each kind of event, you would store a timestamp from a 32-bit timer source that indicates when this particular event occurred:

Further, assume that the 32-bit timer can wrap around (otherwise we wouldn’t have a circular problem), just like our 8-bit age counter wrapped around in our ring buffer example. How would you know which event occurred first or last? How would you calculate the relative time between two events? This and more is the topic of the next part of this series. Stay tuned…

More circular adventures…