Category Archives: Circular Adventures

Circular Adventures III: A Distance Apart

“Distance has the same effect on the mind as on the eye.”
― Samuel Johnson, The History of Rasselas, Prince of Abissinia

In previous episodes of this series, I’ve only focused on circular problems involving a single index into a ring buffer. Today, I want to extend the scope to use-cases involving two indices.

At the end of part two of this series, I posted a problem where I wanted to keep track of the time when certain events occurred:


+---------------------------------+-----------+
| First type of event             | 353011    |
+---------------------------------+-----------+
| Second type of event            | 1643      |
+---------------------------------+-----------+
| Third type of event             | 4         |
+---------------------------------+-----------+
| ...                             | ...       |
+---------------------------------+-----------+
:                                 :           :

+---------------------------------+-----------+

| First type of event | 353011 |

+---------------------------------+-----------+

| Second type of event | 1643 |

+---------------------------------+-----------+

| Third type of event | 4 |

+---------------------------------+-----------+

| ... | ... |

+---------------------------------+-----------+

: : :

The first column of this table represents the event type, the second a time-stamp showing when the event took place. The time-stamp comes from a 32-bit timer which can wrap-around.

Our task is to find out in which order these events occurred, which basically boils down to finding a method that allows us to compare two time-stamps, taking wrap-around into account. Once we have such a method, we are able to sort the event list.

Viewed from a another angle, comparing two values of a wrap-around timer is exactly the same as comparing two indices into a ring buffer:


0 1 2 3 4 5 6 7 8 9 (Fig1)
      ^     ^ 
      a     b

0 1 2 3 4 5 6 7 8 9 (Fig1)

^ ^

a b

In this ring buffer of 10 elements, is a ahead of b or is it the other way around? Which one is larger and which one is smaller? Since we don’t know how many times a or b have wrapped around we are not able to give a meaningful answer to this question. So let’s agree on a rule, an invariant that at any time holds for our indices: a and b may wrap around (as many times as they like) but their real, absolute distance is always less than half the size of the buffer (i. e. N div 2). This effectively means that one index can never get ahead of the other by half the size of the buffer or more.

Under this assumption, it is clear that in Fig1 b is ahead of a and the true distance is 6 – 3 = 3. Because of our invariant, a cannot be ahead of b. Why? If a were ahead of b it would have wrapped around and the real, absolute distance between a and b would then be 7, thus violating our rule which requires that the real distance is less than 5.

Contrast this with the next example:


0 1 2 3 4 5 6 7 8 9 (Fig2)
  ^           ^ 
  a           b

0 1 2 3 4 5 6 7 8 9 (Fig2)

^ ^

a b

a must have wrapped around and it must be ahead of b by 4. Otherwise (if b were ahead of a), the distance would be 6 and this — according to our “less than half the buffer size” rule — would be illegal.

The next example is ill-formed: regardless of how you view it, the distance is always 5 so the invariant is violated and we cannot tell which index is ahead of which:


0 1 2 3 4 5 6 7 8 9 (Fig3)
  ^         ^ 
  a         b

0 1 2 3 4 5 6 7 8 9 (Fig3)

^ ^

a b

Just in case you wonder if there is a way to do without our cumbersome invariant: We could in fact utilize the full ring buffer range and not only less than the half, but this would require another invariant: one that requires that index a is always behind index b and there is no overtaking taking place (see part VI). Employing this invariant works in producer/consumer scenarios but turns out to be a lot less useful in the most typical circular use cases. It certainly would be of no help in our event-recording example above, as we cannot assume which event comes before or after another.

Based on our invariant, it was quite easy for us (as human beings) to determine which index is ahead of which. But how can we determine it in software? Simply calculating the linear distance b – a is not sufficient, since we are operating in a circle and not along a one-dimensional axis. In Fig2, calculating b – a gives 6, but the real distance is -4, since a is ahead of b by 4. If we calculate a – b, we get -6 still not -4. How can we get from linear differences to circular differences?

Ah, you must have guessed it! Here it goes again, the mod operator, our universal weapon for fighting circular problems!

Let me explain how the mod operator helps by using one of the most powerful analysis tools ever invented — a table:


b-a  (b-a) mod 10  leader  a < b    Note
-------------------------------------------------
 0         0         -     false    Distance is 0
 1         1         b     true
 2         2         b     true
 3         3         b     true
 4         4         b     true
 5         5         -     -        Illegal, Fig3
 6         6         a     false
 7         7         a     false
 8         8         a     false
 9         9         a     false
-1         9         a     false
-2         8         a     false
-3         7         a     false
-4         6         a     false
-5         5         -     -        Illegal, Fig3
-6         4         b     true
-7         3         b     true
-8         2         b     true
-9         1         b     true
-------------------------------------------------

b-a (b-a) mod 10 leader a < b Note

-------------------------------------------------

0 0 - false Distance is 0

1 1 b true

2 2 b true

3 3 b true

4 4 b true

5 5 - - Illegal, Fig3

6 6 a false

7 7 a false

8 8 a false

9 9 a false

-1 9 a false

-2 8 a false

-3 7 a false

-4 6 a false

-5 5 - - Illegal, Fig3

-6 4 b true

-7 3 b true

-8 2 b true

-9 1 b true

-------------------------------------------------

In the first column are all possible values for the linear distance of two indices; in the second column, this linear distance is taken modulo N, the size of the ring buffer (which happens to be 10 in our examples). In the third column, I put the leader, the index which is ahead of the other index. I did this manually, by drawing sketches and counting, just like I did in Fig2, when I claimed that a is ahead of b by 4. There are two discontinuities in the table: when the distance is exactly 5, that is, half the size of the ring buffer.

Ignoring these discontinuities and looking at column 2 and 3 we can observe that b is ahead of (or equal to) a if the linear distance taken modulo 10 is less than 5; otherwise a is ahead of b.

Based on this observation, we are now able to implement a less-than operator, which allows us to sort circular values. (As you may know, many sorting algorithms from standard libraries, like, for instance, the C++ standard library sort() algorithm, invoke the less operator ‘<‘ on values of a given range to do its job.) Here is an implementation of a circular less function in pseudo-code:


# Returns true if a is less than b in a circle of size N,
# false otherwise.
circular_less(a, b, N):
    if ((b - a) mod N < N div 2)
        return false
    return true

# Returns true if a is less than b in a circle of size N,

# false otherwise.

circular_less(a, b, N):

if ((b - a) mod N < N div 2)

return false

return true

Having had so much fun and success with the table, I added yet another column: the true circular distance, again based on sketching and counting. In Fig2, the true circular distance between a and b would be -4, since a is ahead of b. In Fig1, the circular distance is +3, since b is ahead of a:


b-a  (b-a) mod 10  leader  a < b   cd(a, b) 
-------------------------------------------
 0         0         -     false      0
 1         1         b     true       1
 2         2         b     true       2
 3         3         b     true       3
 4         4         b     true       4
 5         5         -     -          -
 6         6         a     false     -4
 7         7         a     false     -3
 8         8         a     false     -2
 9         9         a     false     -1
-1         9         a     false     -1
-2         8         a     false     -2
-3         7         a     false     -3
-4         6         a     false     -4
-5         5         -     -          -
-6         4         b     true       4
-7         3         b     true       3
-8         2         b     true       2
-9         1         b     true       1
-------------------------------------------

b-a (b-a) mod 10 leader a < b cd(a, b)

-------------------------------------------

0 0 - false 0

1 1 b true 1

2 2 b true 2

3 3 b true 3

4 4 b true 4

5 5 - - -

6 6 a false -4

7 7 a false -3

8 8 a false -2

9 9 a false -1

-1 9 a false -1

-2 8 a false -2

-3 7 a false -3

-4 6 a false -4

-5 5 - - -

-6 4 b true 4

-7 3 b true 3

-8 2 b true 2

-9 1 b true 1

-------------------------------------------

Comparing column 2 and 5 shows that the circular distance (cd) coincides with the linear distance mod 10 for values less than half the size of the circle. For other cases (again, ignoring the discontinuity cases) the circular distance can be obtained by subtracting 10 from the linear distance (mod 10). In pseudo-code:


circular_distance(a, b, N):
    dm = (b - a) mod N
    if (dm < N div 2)
        return dm
    return dm - N

circular_distance(a, b, N):

dm = (b - a) mod N

if (dm < N div 2)

return dm

return dm - N

Now this function is even more useful than circular_less. You can use it directly in sorting algorithms like qsort() from the C standard library, or, if you are using C++’s sort(), you can easily implement circular_less in terms of circular_distance:


circular_less(a, b, N):
    if (circular_distance(a, b, N) < 0)
        return true
    return false

circular_less(a, b, N):

if (circular_distance(a, b, N) < 0)

return true

return false

But the real benefit of circular_distance over circular_less is that you are now able to do real circular distance calculations. In our event table example above, you can determine how many ticks are between event 1 and event 2. Or, as another example, if you have a free running 16-bit timer (a counter with N = 65536) that is updated every millisecond, you could use circular_distance like this:


start_time = get_timer_value()
... some lengthy calculation
end_time = get_timer_value()

cd = circular_distance(start_time, end_time, 65536)
log("Lengthy calculation took " + cd + " milliseconds")

start_time = get_timer_value()

... some lengthy calculation

end_time = get_timer_value()

cd = circular_distance(start_time, end_time, 65536)

log("Lengthy calculation took " + cd + " milliseconds")

Even at the risk of repeating myself, please be aware that the mathematical mod operator and Java’s/C’s % operator are not the same. You can, however, apply the trick that I showed in part I: use ANDing instead of the mod operator for cases where N is a base-2 number; 65536 is a base-2 number so in C you would implement circular_distance like this:


inline int circular_distance_16bit(int a, int b) {
    const int N = 65536;
    int dm = (b - a) & (N - 1);
    if (dm < N / 2) {
        return dm;
    }
    return dm - N;
}

inline int circular_distance_16bit(int a, int b) {

const int N = 65536;

int dm = (b - a) & (N - 1);

if (dm < N / 2) {

return dm;

}

return dm - N;

}

In this installment, I’ve shown how to efficiently use the mod operator to solve circular problems involving more than one index. The solution is based on an invariant that states that the distance between two indices is always less than half the size of the value range. In the next episode, we will learn about another technique, one that is even more powerful than the mod operator: performing circular distance calculation in a two’s complement system.

More circular adventures…

Circular Adventures II: A Ring within a Ring

“Is all that we see or seem
But a dream within a dream?”
–Edgar Allan Poe

In part one of this series I’ve discussed the mod operator and demonstrated that it is a flexible tool for solving circular problems. I’ve also explained that the mod operator, as implemented by most (system) programming languages, is not able to handle problems involving negative dividends. As a remedy, I’ve introduced an efficient alternative of the mod operator for divisors that are base-2 numbers.

Today, I want to have a look at some first circular use-cases. Again, we meet our good old friend, the ring buffer.

Sometimes, you need to store data and keep previous versions of that data, but the memory available is limited. In such situations, an obvious approach is to use a ring buffer.


    +---------------------------------+
 0  | First data set                  | 
    +---------------------------------+ 
 1  | Second data set                 | 
    +---------------------------------+ 
 2  | ...                             | 
    +---------------------------------+ 
 3  | ...                             | 
    +---------------------------------+ 
 4  | ...                             | 
    +---------------------------------+ 
 5  | ...                             | 
    +---------------------------------+ 
 6  | ...                             | 
    +---------------------------------+ 
 7  | Eighth data set                 | 
    +---------------------------------+

+---------------------------------+

0 | First data set |

+---------------------------------+

1 | Second data set |

+---------------------------------+

2 | ... |

+---------------------------------+

3 | ... |

+---------------------------------+

4 | ... |

+---------------------------------+

5 | ... |

+---------------------------------+

6 | ... |

+---------------------------------+

7 | Eighth data set |

+---------------------------------+

In this example, we have a ring buffer (again, pretend that the last and first element are joined) comprising eight (N = 8) fixed-size slots for storing information. Each slot contains a data set. We start by writing to slot 0, then slot 1 and so on; when the ring buffer is full (that is, slot 7 has been written to), we wrap around to slot 0 and overwrite what has been stored there previously, thus kicking out the oldest data set. By using this scheme, we not only have access to the most recent data set, but also to up to seven predecessors.

In addition to the actual ring buffer, we need a variable to keep track of the most-recently updated (“MRU”) slot number; that is, an index that points to the “newest” data set. Based on the current value of the MRU we can easily calculate which slot is to be updated upon the next write access:


mru = (mru + 1) mod N

mru = (mru + 1) mod N

Updating the ring buffer is just one part of the story, reading another. For instance, to get the predecessor of the current data set we need to calculate the slot index based on the previous value of MRU. Again, by using the mod operator, this can be achieved easily:


i = (mru - 1) mod N

i = (mru - 1) mod N

Dumping the whole ring buffer in reverse chronological order is achieved with this simple loop:


for i in (0 .. N - 1) 
    print buffer[(mru - i) mod N]     
end

for i in (0 .. N - 1)

print buffer[(mru - i) mod N]

end

Note that for this to work, the mod operator has to be a mathematical modulo operator since the dividends may become negative. If you don’t have access to a true mathematical mod operator, you should make N a base-2 number and use the modulo replacement operation given in the previous installment (mod_base2).

Now let’s make our ring buffer example a bit more interesting. Let’s pretend our ring buffer must be stored in persistent, non-volatile memory (NVM), like flash. One problem with flash memory is that it wears out over time: the more often you update flash memory, the shorter the data retention time gets. (It’s like using a bucket with a crack to carry water from one place to another and the crack gets bigger with every lap you go.)

Therefore, storing the most-recently updated value in a single flash variable is not a good idea, since this memory location would wear-out quickly: every update to a ring buffer would require an update to the MRU variable, which means that the update load on the MRU variable is N times the update load on the individual slots.

A frequently used countermeasure is to have a “distributed MRU variable”, which is achieved by prefixing or suffixing every slot with a so-called “age counter” field.

The age counter starts at zero and is incremented with every write to a slot. So after three updates to an initially empty ring buffer, we would have this:


    +---------------------------------+-----+
 0  | First data set                  |   1 |
    +---------------------------------+-----+
 1  | Second data set                 |   2 |
    +---------------------------------+-----+
 2  | Third data set                  |   3 |
    +---------------------------------+-----+
 3  |                                 |   0 |
    +---------------------------------+-----+
 4  |                                 |   0 |
    +---------------------------------+-----+
 5  |                                 |   0 |
    +---------------------------------+-----+
 6  |                                 |   0 |
    +---------------------------------+-----+
 7  |                                 |   0 |
    +---------------------------------+-----+

+---------------------------------+-----+

0 | First data set | 1 |

+---------------------------------+-----+

1 | Second data set | 2 |

+---------------------------------+-----+

2 | Third data set | 3 |

+---------------------------------+-----+

3 | | 0 |

+---------------------------------+-----+

4 | | 0 |

+---------------------------------+-----+

5 | | 0 |

+---------------------------------+-----+

6 | | 0 |

+---------------------------------+-----+

7 | | 0 |

+---------------------------------+-----+

After ten updates, the ring buffer would look like this:


    +---------------------------------+-----+
 0  | ...                             |   9 |
    +---------------------------------+-----+
 1  | ...                             |  10 |
    +---------------------------------+-----+
 2  | ...                             |   3 |
    +---------------------------------+-----+
 3  | ...                             |   4 |
    +---------------------------------+-----+
 4  | ...                             |   5 |
    +---------------------------------+-----+
 5  | ...                             |   6 |
    +---------------------------------+-----+
 6  | ...                             |   7 |
    +---------------------------------+-----+
 7  | ...                             |   8 |
    +---------------------------------+-----+

+---------------------------------+-----+

0 | ... | 9 |

+---------------------------------+-----+

1 | ... | 10 |

+---------------------------------+-----+

2 | ... | 3 |

+---------------------------------+-----+

3 | ... | 4 |

+---------------------------------+-----+

4 | ... | 5 |

+---------------------------------+-----+

5 | ... | 6 |

+---------------------------------+-----+

6 | ... | 7 |

+---------------------------------+-----+

7 | ... | 8 |

+---------------------------------+-----+

The slot containing the age counter with the highest value must be the MRU slot, slot 1 in our example. For obvious reasons, the value range of the age counter must be larger than the slot range of the ring buffer, but sooner or later, even the age counter will wrap around. After many writes to the ring buffer we might get this picture:


    +---------------------------------+-----+
 0  | ...                             | 253 |
    +---------------------------------+-----+
 1  | ...                             | 254 |
    +---------------------------------+-----+
 2  | ...                             | 255 |
    +---------------------------------+-----+
 3  | ...                             |   0 |
    +---------------------------------+-----+
 4  | ...                             |   1 |
    +---------------------------------+-----+
 5  | ...                             |   2 |
    +---------------------------------+-----+
 6  | ...                             |   3 |
    +---------------------------------+-----+
 7  | ...                             | 252 |
    +---------------------------------+-----+

+---------------------------------+-----+

0 | ... | 253 |

+---------------------------------+-----+

1 | ... | 254 |

+---------------------------------+-----+

2 | ... | 255 |

+---------------------------------+-----+

3 | ... | 0 |

+---------------------------------+-----+

4 | ... | 1 |

+---------------------------------+-----+

5 | ... | 2 |

+---------------------------------+-----+

6 | ... | 3 |

+---------------------------------+-----+

7 | ... | 252 |

+---------------------------------+-----+

So which slot is the MRU slot now? The age counter is obviously an unsigned 8-bit type, since it wrapped around at value 255. Starting with 253 (the age counter value of slot 0) and taken modulo 256, all counter values increase by one up to 3. But the next value, 252, is not a successor of 3; it is too far ahead (or rather behind). This means that the slot containing age counter value 3 must be the most recently updated slot (which is 6).

The MRU is the basis for all ring buffer operations and it makes sense to maintain its value in a RAM variable. But how can one determine the MRU programmatically, for instance after a reset when all RAM content is lost? Here is a method to search for it:


for mru in (0..N-2)
    if (buffer[mru].counter + 1) mod 256 != (buffer[mru + 1].counter)
        break
end

for mru in (0..N-2)

if (buffer[mru].counter + 1) mod 256 != (buffer[mru + 1].counter)

break

end

It is a good exercise to convince yourself that this algorithm actually works. Think about the initial “empty” ring buffer case where all age counters are zero as well as the case where the “highest” age counter is stored in the last slot.

Luckily, the dividends in this algorithm cannot become negative, so if we wanted to code this in C, we could use the ‘%’ operator. But since we are doing a modulo 256 operation, we can apply the optimization given in the first installement, which is replacing the modulo 256 operation with a (uint8_t) cast:


int mru;
for (mru = 0; mru <= N - 2; ++mru) {
    if ( (uint8_t) (buffer[mru].counter + 1) != (buffer[mru + 1].counter) )
        break;
}

int mru;

for (mru = 0; mru <= N - 2; ++mru) {

if ( (uint8_t) (buffer[mru].counter + 1) != (buffer[mru + 1].counter) )

break;

}

Once we are in possession of the MRU, we can work with this ring buffer just like we did with the previous ring buffer. The only thing we have to do in addition is to memorize the highest (or current) age counter value for future updates to the ring buffer:


# Determine current age counter.
age_counter = buffer[mru].counter
...
# Print most recent data set.
print buffer.dataset[mru]     
# Print oldest data set.
print buffer.dataset[(mru - N - 1) mod N]     
# Write a new data set.
mru = (mru + 1) mod N
age_counter = (age_counter + 1) mod 256
buffer[mru].age_counter = age_counter
buffer[mru].dataset = "some new data"

# Determine current age counter.

age_counter = buffer[mru].counter

...

# Print most recent data set.

print buffer.dataset[mru]

# Print oldest data set.

print buffer.dataset[(mru - N - 1) mod N]

# Write a new data set.

mru = (mru + 1) mod N

age_counter = (age_counter + 1) mod 256

buffer[mru].age_counter = age_counter

buffer[mru].dataset = "some new data"

Finding the MRU in our second ring buffer was not particularly difficult — we just needed to compare the age counters until we found a “discontinuity”. We knew that the slots are updated, one after another, cyclically, with increasing age counter values. Things get more complicated when there is no such strict sequential ordering. Let me explain.

Imaginge you want to keep track of when certain events occur. For each kind of event, you would store a timestamp from a 32-bit timer source that indicates when this particular event occurred:


    +---------------------------------+-----------+
 0  | First type of event             | 353011    |
    +---------------------------------+-----------+
 1  | Second type of event            | 1643      |
    +---------------------------------+-----------+
 2  | Third type of event             | 4         |
    +---------------------------------+-----------+
 3  | ...                             | ...       |
    +---------------------------------+-----------+
    :                                 :           :

+---------------------------------+-----------+

0 | First type of event | 353011 |

+---------------------------------+-----------+

1 | Second type of event | 1643 |

+---------------------------------+-----------+

2 | Third type of event | 4 |

+---------------------------------+-----------+

3 | ... | ... |

+---------------------------------+-----------+

: : :

Further, assume that the 32-bit timer can wrap around (otherwise we wouldn’t have a circular problem), just like our 8-bit age counter wrapped around in our ring buffer example. How would you know which event occurred first or last? How would you calculate the relative time between two events? This and more is the topic of the next part of this series. Stay tuned…

More circular adventures…

Circular Adventures I: The Modulo Operation

“All things from eternity are of like forms and come round in a circle”
–Marcus Aurelius, AD 121-180

In computer science, just like in real life, many things go round in circles. There are ring buffers, circular lists, wrap-around counters and timers, just to name a few. In this multi-part article, I want to explore the theory behind circular behavior. Equipped with this knowledge, I attempt to provide solutions to “recurring” problems.

Let’s start with the very basics; that is, the movement of an index within a ring buffer.

Assume you have a ring buffer containing N = 10 elements and an index i:


0 1 2 3 4 5 6 7 8 9
      ^
      i

0 1 2 3 4 5 6 7 8 9

(For simplicity, I depict circular structures as a flat arrays — just imagine the first and last element are joined to form a ring.)

Question: How do you advance the index by a signed offset n, taking wrap-around into account?
Answer: i = (i + n) mod N

Many circular problems like ring buffer operations can be elegantly expressed by using the ‘mod’ operator. Alas, only theoretically, as in practice, there are at least two problems surrounding the ‘mod’ operator: portability and efficiency.

The ‘mod’ operator, as it is used in the equation above, is the mathematical ‘mod’ operator — the one that is used in number theory. Programming languages, however, sport many different flavors of the ‘mod’ operator, which vary in the way they treat negative operands.

All of them obey this equation:


a mod N = a - ( a div N ) * N

a mod N = a - ( a div N ) * N

Yet the results for negative operands depend on whether the ‘div’ operator rounds towards zero or negative infinity. If the ‘div’ operator rounds towards zero, the expression -5 div 2 yields -2; if it rounds towards negative infinity the result is -3.

The following example illustrates how this influences the calculation of -3 mod 8:


                          Round towards
                        zero  neg infinity
-3 div 8                  0      -1
(-3 div 8) * 8            0      -8
-3 - ((-3 div 8) * 8)    -3       5

Round towards

zero neg infinity

-3 div 8 0 -1

(-3 div 8) * 8 0 -8

-3 - ((-3 div 8) * 8) -3 5

This means that -3 mod 8 might be -3 or +5 depending on the style the implementers of your programming language have chosen. Some languages (like Ada) have even two modulo operators (rem and mod), while others allow to control the behavior at run-time (Perl: ‘use integer’). Still others (like C90 and C++98) leave it as ‘implementation-defined’. Have a look at this for a nice overview on how different programming languages implement the modulo operator.

(And just in case you haven’t guessed it already: C/C++/Java’s approximation of ‘mod’ is ‘%’ and ‘div’ is ‘/’.)

Now, if you only travel through a circle in positive direction (by adding positive offsets), either rounding style will do; however, if you intend to travel backwards (or calculate differences between indices that yield negative values), only the ‘negative-infinity’ modulus operator will do the job; that is, wrap (in the 10 element ring buffer example) from -1, -2, -3 … to 9, 8, 7 …, respectively.

How about efficiency? As can be seen in the table above, the calculation of the remainder is based on a division operation, which is, — alas — a rather expensive operation on many platforms. On most processors, the ‘div’ operation is many times slower than other primitive operations. As an example, some popular ARM processors don’t even have a ‘div’ instruction, so division is done through a software library which consumes up to 40 cycles. Compare this to the 5 cycle multiplication instruction and the other instructions that typically execute in a single cycle.

Due to these portability and efficiency issues, many developers shun the modulo operator in performance-critical code. Instead of


i = (i + n) mod N

i = (i + n) mod N

they use the computationally equivalent


i = i + n
while i >= N
    i = i - N
while i < 0
    i = i + N

i = i + n

while i >= N

i = i - N

while i < 0

i = i + N

or, if it guaranteed that our index is not more than N – 1 off the ends


if i >= N
    i = i - N
if i < 0
    i = i + N

if i >= N

i = i - N

if i < 0

i = i + N

But this approach is still not very efficient, since it uses branches and on modern multi-stage pipelined processors a branch might invalidate already prefetched and preprocessed instructions. Isn’t there a true mathematical ‘mod’ operator out there that is also efficient at the same time? You bet there is, but only if N happens to be a base-2 number.

If you are lucky and N is a base-2 number (like 64, 1024, or 4096) i mod N is computationally equivalent to


i and (N - 1)

i and (N - 1)

where ‘and’ is the binary and operator (‘&’ in C, C++, and Java). This works even if i is negative, but requires that your environment stores negative numbers in two’s complement fashion, which is the case for pretty much all systems that you will ever program for.

As an example, consider -2 mod 16. -2 is 0xFFFFFFFE in 32-bit two’s complement notation and 16 – 1 corresponds to 0x0000000F:


  0xFFFFFFFE
& 0x0000000F
  ----------
  0x0000000E

0xFFFFFFFE

& 0x0000000F

----------

0x0000000E

which is 14, exactly, what a mathematical ‘mod’ operator would yield.

The ‘and’ operation is easy to compute for any processor. In C/C++ we might define an optimized ‘mod’ function like this:


inline int mod_base2(int dividend, int divisor) {
    // Ensure divisor is positive.
    assert(divisor >= 0);
    // Ensure divisor is a base-2 number.
    assert((divisor & (divisor - 1)) == 0);
    return dividend & (divisor - 1);
}

inline int mod_base2(int dividend, int divisor) {

// Ensure divisor is positive.

assert(divisor >= 0);

// Ensure divisor is a base-2 number.

assert((divisor & (divisor - 1)) == 0);

return dividend & (divisor - 1);

}

It is a good idea to use this optimization whenever possible. Sometimes, it makes even sense to round-up the size of circular structures to a base-2 boundary, just to be able to use this kind of optimization.

A variant of this theme is casting a (potentially negative) value to an unsigned type in C/C++. Casting x to an uint8_t is equivalent to calculating x mod 256. While most optimizing compilers will generate the same code for x & 0xFF and (uint8_t)x, there is a certain likelihood that the latter might be a bit faster. The obvious disadvantage of casting to unsigned is, that this approach practically limits N to the value range of uint8_t, uint16_t, and uint32_t, which is 256, 65536, and 4294967296, respectively. Because of the fact that the performance gain in only hypothetical, it is usually much wiser to go for the ‘mod_base2’ optimization, though.

This concludes my first installment, which is mainly about some of the many facets of the ‘mod’ operator. Just like the constant PI appears in all circular problems in mathematics, some variant of the ‘mod’ operator appears in all circular problems in computer science. Next time, I will explore what it means to have two indices into a circular structure, which turns out to be the foundation of many interesting circular use cases.

More circular adventures…

Approxion

Code – People – Everything

Category Archives: Circular Adventures

Circular Adventures III: A Distance Apart

Circular Adventures II: A Ring within a Ring

Circular Adventures I: The Modulo Operation