« Posts under Circular Adventures

Circular Adventures IX: The Poor Ring Buffer That Had No Tail

“If you hold a cat by the tail, you learn things you cannot learn any other way.”
― Mark Twain

The typical use case for a ring buffer is a producer consumer scenario, where a producer sometimes produces data faster than the consumer can consume. Eventually, however, the consumer will catch up. This typical use case can be summarized as “short term buffering of bursts of data”.

But what happens if buffered data is not consumed fast enough and the ring buffer becomes full? Answer: adding another element will drop the oldest element, as can be seen by looking at the implementation of the ‘add’ method:

After the new element is added to the buffer, the head index is incremented (modulo ‘BUFSIZE’). If it now points at the tail index, the tail index is incremented (modulo ‘BUFSIZE’) as well, which effectively removes the oldest element from the ring buffer, making space for a new ring buffer element to be added. Here’s an example of this happening to a ring buffer that already holds 9 elements (why not 10? The head index is ‘exclusive‘ and points at the location where the next added element will go):

Calling ‘add’ in this situation advances the head and the tail index by 1:

This will be continue for every subsequent ‘add’. Here’s how the ring buffer looks like after six more calls to ‘add’:

Why am I telling you all this? Because there’s a ring buffer use case, where a producer endlessly produces data that is never consumed. I call this the “history buffer” use case where record is kept over the last N events, similar to how a flight recorder operates.

If a ring buffer is used as a history buffer, it is always fully loaded. Thus, the ‘add’ operation is unnecessarily inefficient, as the ‘if (head == tail)’ conditions is always true. So let’s get rid of it:

This naturally leads to another optimization: the incremented value of ‘head_’ is the current value of ‘tail_’. Thus, we just set the new ‘head_’ to the current value of ‘tail_’ and only increment ‘tail_’:

That’s even better, isn’t it? But there’s more! The tail is usually only used when consuming elements, which we don’t do for the history buffer use case. Why maintain it’s value at all? Plus, if we need it later (as we shall see shortly), we can always compute it from the head, since it is one element ahead of head (sounds a bit weird, doesn’t it? The tail is always ahead of the head):

Adding to the ring buffer has definitely become simpler and cheaper, but you probably ask yourself how to retrieve the history data stored in our ring buffer. As usual, there are many ways, but the one that I find most useful is through the standard library’s way of doing things, namely iterators. ‘begin()’ and ‘cbegin()’ simply return the tail, which is — as we already know — one element ahead of ‘head_’:

‘end()’ and ‘cend()’ just return ‘head_’.

In the simplest case, the iterators returned are forward iterators, which means that they only provide ‘operator++()’. As limited as forward operators are, they allow you to use many useful algorithms from the standard library, including:

But there’s probably another nagging question on your mind: what if the ring buffer is not fully loaded? What if, for instance, only 3 elements have been added so far to a 10-element ring buffer holding doubles? In this case, the 7 unused elements will hold default-initialized doubles, which can be easiliy skipped over:

Most of the time, this extra work is unnecessary in history buffer use cases, as a) the ring buffer is quickly filled and from this point on always fully loaded and b) default constructed elements don’t harm in typical history buffer use cases.

I like to call this particular variant of ring buffer “Manx buffer”, named after a peculiar breed of cat, called Manx, whose salient feature is the lack of tail.

Choosing a Manx buffer over a regular ring buffer may make sense in the following cases:

  • The elements added are never consumed but rather kept to have chronological, historical data.
  • The size of the element type is small, such that the cost of maintaining the tail is significant compared to the overall cost of adding an element.
  • Elements are added at a rapid rate.

An example of when a Manx buffer can prove useful is a data collection system for an aircraft that records the last 10 minutes of accelerometer readings for roll, pitch, and yaw as double values.

You can find my 70 lines implementation of a Manx buffer here.

Circular Adventures VIII: The Eternal Quest For Mod-Like Behavior

“A circle is a round straight line
with a hole in the middle.”
— Mark Twain

In circular situations, we often wish to employ the mod operator to achieve “wrap-around” at both ends of a linear system to mimic a circular system:

Achieving clockwise wrap-around (from n to 0) usually poses no problem:

The % operator is the compiler vendor’s approximation of the mod operator. It ensures that values which exceed n wrap-around nicely:

Alas, most operator % implementations are not so useful when it comes to counter-clockwise wrap-around (from 0 to n):

If index – 1 becomes negative, we likely get negative results — which is far from what we desire:

To readers of this column (part I in particular) this is of course old hat. They know that you can’t trust the % operator when negative operands are involved. Circular coders often work around this shortcoming like so:

Not nice, but no big deal either, you probably think. But what if you need a function that advances an index in either direction, depending on the sign of the offset:

Now this looks disgusting, this is a big deal! And all just because our wretched % operator doesn’t behave like it should. Remember, if it did, we could simply write:

Did you notice that our convoluted circular_advance function actually contains a bug? No? Here’s a hint. The bug is in this line:

What happens if the offset is larger than n? Answer: the expression n – offset will underflow producing a wrong result. That’s why we would have to write this, at least if we wanted to be 100% correct:

which makes the code even uglier. But do we really need to support such large offsets? If we don’t (which is almost always the case in circular scenarios) and can live with offsets whose value is less than or equal to n, we can simplify circular_advance significantly. Here’s how.

We know that adding a multiple of n to the dividend x doesn’t change the result of the modulo operation:

If we include an arbitrary offset, we get this:

This allows us to turn negative arguments to operator % into positive arguments by adding i*n, without changing the result. Further, if we ignore cases where the offset value is greater than n, we only have to add 1*n to enforce a positive argument to operator %. Here’s an example:

Since all operands are positive, we can use 6 % 8 to compute -2 mod 8. Equipped with this knowledge, we can eradicate the ugliness and circular_advance boils down to this:

If we apply the same technique to the ring buffer’s original ring_buffer::size method from the last installment:

we get a simple branch-free solution:

which can be optimized easily by the compiler, especially if BUFSIZE is a base-2 number.

So here’s the takeaway. To get efficient, branch-free mod-like behavior for positive divisors n:

If n is a base-2 number, use bitwise-AND instead of the % operator:

If there’s only clockwise wrap-around (ie. a >= 0), % works just fine

If a is either positive or negative, but |a| is <= n, add n to the operand before applying the % operator:

More circular adventures…

Circular Adventures VII: A Ring Buffer Implementation

“The Limited Circle Is Pure”
— Franz Kafka

In systems programming, ring buffers are ubiquitous. Like regular queues, they’re first-in first-out data structures but contrary to classic queues they’re fixed in size and don’t grow dynamically. This is especially important in real-time contexts where time and space determinism is paramount.

In today’s circular episode I want to show you my go-to ring buffer implementation. I like it, because it’s short and decently efficient. I deliberately used only basic C++ syntax, so it should compile fine with even compilers from the previous millennium (i. e. C++98):

You can specify the data type of ring buffer elements via template parameter T and the total number of elements to track via template parameter N, respectively. Here’s a little usage scenario:

A common problem that you face when implementing a ring buffer is discriminating between empty and full states, as in both cases the head and tail index point to the same location. I solved it by allocating one extra element, which might waste a little bit of space but makes the code easier on the eye and probably faster compared to the alternative approach which requires you have to maintain an extra boolean flag.

When the buffer becomes full during an add operation (head == tail), the tail index is immediately advanced by one, thus dropping the oldest element. As this all happens within the add method, from a ring_buffer user’s point of view the head index is only ever equal to the tail index when the ring buffer is empty.

Contemporary compilers will replace the potentially costly modulo operation in the advance helper method with an AND operation for buffer sizes that are base-2 numbers. However, keep in mind that the advance method uses the internal buffer size (BUFSIZE), which is one greater than the requested ring buffer size (N), so this is most likely an efficient ring buffer:

while this isn’t:

[Update 2020-05-29: Part VIII of this series shows how to optimize/simplify ring_buffer::size().]

More circular adventures…

Circular Adventures VI: When the Winner is Known Beforehand

“Girls have an unfair advantage over men: if they can’t get what they want by being smart, they can get it by being dumb.”
— Yul Brynner

In part III and IV I discussed solutions for a circular problem where two indices performed a race within a circular buffer; that is, either index could be ahead of the other.

Sometimes, however, life is simpler and it is always known which index is the leader, and thus the winner — from the outset:

In this example, provided we know that b must always be ahead of a, we can deduce that b has wrapped around and the distance between a and b is 4.

Either implementation (the one given in part III and the one given in part IV) of circular_distance will return the same result:

However, both will fail for this case:

Why? Under the premise that b is always ahead of a, the distance is +6 not -4. circular_distance computes the wrong result because it assumes that the leading index is less than half the circular buffer size ahead of the other index. This assumption was made (I called it ‘invariant’ at the time) to be able to compute the circular distance even if it is not known which index is ahead of which.

Based on the prerequisite that b is always ahead of a we can give a simplified version of the circular_distance function from part III:

I call this function circular_lead instead of circular_distance to emphasize that it returns how much b is ahead of a, which is always a positive number. As usual, all the restrictions (and optimizations) regarding the mod operator apply. In C, which lacks a true mod operator, a generic, portable implementation looks like this:

In situations where one index is known to be ahead of the other, circular_lead has an edge over circular_distance because it supports distances in the range 0 to N-1, whereas circular_distance only supports ranges from 0 to (N-1)/2. This is always the case in “monotonically increasing” scenarios, like run-time measurement, at least until the flux capacitor is invented. Hence, the last example of part IV can be rewritten like this:

If we replace the mod operator with a cast to uint32_t, circular_lead_32bit boils down to this:

[For the mathematically inclined: What we are really talking about here is residue class rings, a concept used by all contemporary computers. I might explore the number theory behind this and other circular topics in a future post.]

More circular adventures…

Circular Adventures V: The Christmas Edition

“And the Grinch, with his Grinch-feet ice cold in the snow,
stood puzzling and puzzling, how could it be so?
It came without ribbons. It came without tags.
It came without packages, boxes or bags.
And he puzzled and puzzled ’till his puzzler was sore.
Then the Grinch thought of something he hadn’t before.
What if Christmas, he thought, doesn’t come from a store?
What if Christmas, perhaps, means a little bit more?”

— Dr. Seuss
“How the Grinch Stole Christmas”

Oh well, oh well, it’s Christmas time again. Year after year, in this dark season, people contemplate the circle of life, watch reruns on TV, and rush to shopping malls to buy overpriced items for their loved ones; it is not unlikely, that — as soon as the shops are open again — these loved ones rush to return all the stuff that they don’t really need and buy something cool instead. You can also be sure that the level of madness increases every year — due to a well-known effect that economists call “inflation”.

Since so many things recur around Christmas, this is the perfect time for me to share more “circular” thoughts with you.

But first some background on the origins of Christmas. The reason why people celebrate Christmas on December 25 has little to do with the birth of Jesus, but rather with an ancient Roman cult called “Sol Invictus” (which means something like “unconquerable or invincible sun”). When the Julian calendar system was introduced around BC 45, the winter solstice occurred on December  25 (as opposed to today, where it happens on either December 21 or 22); this day was celebrated as the birth of the sun: on December 25, the sun would come back and rise to its full power over the next months.

It is believed that early Christians also took part in these festivities and celebrated together with the pagans by kindling lights. When the “Sol Invictus” cult was finally replaced by Christianity around AD 300, Christians decided to keep that special day but celebrate the birth of Jesus Christ, instead. (Today, it is assumed that Jesus Christ was born somewhere around March, BC 4.)

But now back to more technical stuff. You might have observed that the weekday of a particular date within a year gets advanced by one weekday in the following year. That is, if first of August is a Tuesday this year it will be a Wednesday next year. But how come?

Being a veteran circular adventurer by know, you should be able to come up with an answer yourself — at least you should try. Don’t cheat. Don’t read on. Think about it.

OK, here you go. A regular year (no leap year) has 365 days and a week comprises seven days. 365 mod 7 is 1, which means that after 52 7-day weeks, there is still this one day left in the old year to be filled with a weekday. You can think of it like this: the old year nibbles away one weekday from the new year and thus weekdays in the new year are “rotated left” by one. For leap years, the “rotate value” is two, since 366 mod 7 equals 2.

Regardless of your convictions, regardless of how strange they might look to adherents of other convictions, regardless of whether or what you celebrate, I wish all of you, dear readers of my blog, a Great Time and the best for the New Year.

More circular adventures…

Circular Adventures IV: A “Complementary” Gift


“Th’ hast spoken right, ’tis true.
The wheel is come full circle, I am here.”

— Shakespear, King Lear Act 5, scene 3, 171–175

In the previous part of this series, I’ve shown how to calculate the distance between two indices into a circular buffer using the mathematical mod operator. Today, I will give you a solution that doesn’t need the mod operator and is even more efficient.

We have always viewed our N element ring buffer like this:

There are N = 10 elements, indices start at 0 and end at N – 1. But let’s pretend for a moment that we had signed indices instead; indices that can become negative:

Like in the unsigned system, we assume that the borders are joined, that is, the successor of 4 is -5 and the predecessor of -5 is 4.

The arithmetic distance between a and b is, like before, b – a:

What is special about this new system is that if the distance gets larger than 4, it suddenly becomes negative. Remember that the successor of 4 is -5. So in this case

b – a is +6 which is, taken our special wrap-around mechanism into account, -4. This means that not b is ahead of a, but a is ahead of b by 4!

In fact, the wrap-around value of the arithmetic distance is exactly the circular distance that we calculated in part III using a different approach:

Note that our invariant which we established in the previous installment still applies: the true circular distance between two indices must be half the size of the ring buffer (N) or less. That’s why the circular distance for cases where b – a equals +/-5 is marked as undefined in the table.

This “unusual” range of index values from -N/2 – 1 to +N/2 is what’s called a 2’s complement system in computer science. It’s a means of supporting signed types in programming languages. The unsigned value range 0 to N is split in two halves, one for the negative values and one for the positive values. Even though it’s not stipulated by the C language standard, 2’s complement systems are the de facto standard for signed type implementations in all contemporary compilers (including Java and C#; contrary to C, both actually do stipulate 2’s complement).

In order to compute the circular distance in a 2’s complement system, it looks as if we need to take three steps: 1) convert our real-world unsigned indices into signed indices, 2) calculate their signed difference, and 3) wrap-around the signed difference if it lays outside the -N/2 – 1 to +N/2 value range, similarly to what the mod operator does for unsigned values.

Step 1 we can skip altogether, because to a CPU, there really is no difference between signed and unsigned numbers, it’s just a matter of how we (or the compiler) interpret the bit patterns of the operation’s result. Hence, we can proceed to step 2 and compute the unsigned difference and interpret it as a signed number in step 3:

Now, if N is a base-2 number that coincides with our signed integer types

Step 3 is as simple as a type-cast:

or even simpler:

Strictly speaking, casting an unsigned integer to a signed integer is “implementation defined”. Implementation defined is not to be confused with “undefined” and means for all practical purposes: interpret the unsigned value as a 2’s complement signed integer value. This achieves the intended 2’s complement wrap-around, just as we desire.

As a case in point, consider a 32-bit timer that keeps track of the system time. A timer interrupt service routine will update the timer value regularly behind the scenes. Based on what we have learned, you can easily measure the time between two events, even if the timer should overflow in between:

Most systems software (including embedded systems) is implemented in C, mainly because efficiency is at a premium. Due to the fact that C doesn’t come with a mathematical mod operator, it is sometimes sensible to “force” the size of timers/counters/ring buffers to the upper bound of an unsigned integer type. With this being the case, the aforementioned ‘cast-to-signed’ trick can be applied and calculating circular distances becomes both, easy and computationally cheap.

More circular adventures…